Roundup Tracker - Issues

Issue 2550653

classification
Title: xapian search, stemming is not working
Type: Severity: normal
Components: Versions:
process
Status: new Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ThomasAH, ber, bruce, jvstein, olly, wolever
Priority: Keywords:

Created on 2010-06-28 08:37 by ber, last changed 2010-10-25 00:42 by bruce.

Messages
msg4067 Author: [hidden] (ber) Date: 2010-06-28 08:37
This issue has been split out from issue2550583 (xapian search yields 
too few results), see b) in msg4059.

A search for "silent" does not match "silently" like it should.
The proposed solution is to chance the code to feed xapian only lower 
case words. The lower case words is what the stemming algorithm is
able to work with. The documentation of this within Xapian is not 
as clear as it could be.
msg4075 Author: [hidden] (wolever) Date: 2010-06-28 13:03
> I am also not quite sure about your comment in msg4065.
> You are saying that capitals will not be preserved and that this
> is correct?
Sorry — I'm saying that capitalization will *not* be preserved and that this is *incorrect*.

Because Xapian is case-sensitive (in certain circumstances), the Xapian indexer should never 
change the case of text. As far as I can tell, this should be as simple as removing the calls to 
'.upper()' from 'indexer_xapian.py'.

When I get a chance, I'll bang up the unit tests and a fix.
msg4076 Author: [hidden] (wolever) Date: 2010-06-28 13:26
(from msg4068)
> Can you be more explicit about which implementation you are suggesting?,
Again, sorry — forgot that people read this through email.
The implementation in file436 is the implementation that I'm currently using, and is currently 
working.
However, it doesn't preserve capitalization, as I've noted in msg4075.
History
Date User Action Args
2010-10-25 00:42:38brucesetnosy: + bruce
2010-06-28 13:26:15woleversetmessages: + msg4076
2010-06-28 13:03:48woleversetmessages: + msg4075
2010-06-28 08:37:58bersetnosy: + ThomasAH, olly, wolever, jvstein
2010-06-28 08:37:29bercreate