Roundup Tracker - Issues

Issue 2550653

classification
xapian search, stemming is not working
Type: Severity: normal
Components: Versions:
process
Status: fixed fixed
:
: : ThomasAH, ber, bruce, jvstein, olly, rouilj, wolever
Priority: : patch

Created on 2010-06-28 08:37 by ber, last changed 2016-06-28 02:11 by rouilj.

Messages
msg4067 Author: [hidden] (ber) Date: 2010-06-28 08:37
This issue has been split out from issue2550583 (xapian search yields 
too few results), see b) in msg4059.

A search for "silent" does not match "silently" like it should.
The proposed solution is to chance the code to feed xapian only lower 
case words. The lower case words is what the stemming algorithm is
able to work with. The documentation of this within Xapian is not 
as clear as it could be.
msg4075 Author: [hidden] (wolever) Date: 2010-06-28 13:03
> I am also not quite sure about your comment in msg4065.
> You are saying that capitals will not be preserved and that this
> is correct?
Sorry — I'm saying that capitalization will *not* be preserved and that this is *incorrect*.

Because Xapian is case-sensitive (in certain circumstances), the Xapian indexer should never 
change the case of text. As far as I can tell, this should be as simple as removing the calls to 
'.upper()' from 'indexer_xapian.py'.

When I get a chance, I'll bang up the unit tests and a fix.
msg4076 Author: [hidden] (wolever) Date: 2010-06-28 13:26
(from msg4068)
> Can you be more explicit about which implementation you are suggesting?,
Again, sorry — forgot that people read this through email.
The implementation in file436 is the implementation that I'm currently using, and is currently 
working.
However, it doesn't preserve capitalization, as I've noted in msg4075.
msg4941 Author: [hidden] (ThomasAH) Date: 2013-10-21 11:19
Issue2550583 (xapian search yields too few results) is now solved.
The patch from that issue (file436) to fix the stemming is not applied yet.

Stemming has another problem: It is always done with "english", but
there is already a TODO for this in the code.
msg5671 Author: [hidden] (rouilj) Date: 2016-06-28 02:11
Applied patch, updated:

  CHANGES.txt

  doc/installation.txt mentioning limitations of xapian indexer
    (what exactly are the implications of not preserving case?)

  doc/upgrading.txt - discussed limitations and benefits and recommend
roundup-admin reindex.
History
Date User Action Args
2016-06-28 02:11:23rouiljsetstatus: new -> fixed
resolution: fixed
messages: + msg5671
nosy: + rouilj
2016-06-28 01:42:59rouiljsetkeywords: + patch
2013-10-21 11:19:19ThomasAHsetmessages: + msg4941
2010-10-25 00:42:38brucesetnosy: + bruce
2010-06-28 13:26:15woleversetmessages: + msg4076
2010-06-28 13:03:48woleversetmessages: + msg4075
2010-06-28 08:37:58bersetnosy: + ThomasAH, olly, wolever, jvstein
2010-06-28 08:37:29bercreate