Issue 2550653
Created on 2010-06-28 08:37 by ber, last changed 2016-06-28 02:11 by rouilj.
msg4067 |
Author: [hidden] (ber) |
Date: 2010-06-28 08:37 |
|
This issue has been split out from issue2550583 (xapian search yields
too few results), see b) in msg4059.
A search for "silent" does not match "silently" like it should.
The proposed solution is to chance the code to feed xapian only lower
case words. The lower case words is what the stemming algorithm is
able to work with. The documentation of this within Xapian is not
as clear as it could be.
|
msg4075 |
Author: [hidden] (wolever) |
Date: 2010-06-28 13:03 |
|
> I am also not quite sure about your comment in msg4065.
> You are saying that capitals will not be preserved and that this
> is correct?
Sorry — I'm saying that capitalization will *not* be preserved and that this is *incorrect*.
Because Xapian is case-sensitive (in certain circumstances), the Xapian indexer should never
change the case of text. As far as I can tell, this should be as simple as removing the calls to
'.upper()' from 'indexer_xapian.py'.
When I get a chance, I'll bang up the unit tests and a fix.
|
msg4076 |
Author: [hidden] (wolever) |
Date: 2010-06-28 13:26 |
|
(from msg4068)
> Can you be more explicit about which implementation you are suggesting?,
Again, sorry — forgot that people read this through email.
The implementation in file436 is the implementation that I'm currently using, and is currently
working.
However, it doesn't preserve capitalization, as I've noted in msg4075.
|
msg4941 |
Author: [hidden] (ThomasAH) |
Date: 2013-10-21 11:19 |
|
Issue2550583 (xapian search yields too few results) is now solved.
The patch from that issue (file436) to fix the stemming is not applied yet.
Stemming has another problem: It is always done with "english", but
there is already a TODO for this in the code.
|
msg5671 |
Author: [hidden] (rouilj) |
Date: 2016-06-28 02:11 |
|
Applied patch, updated:
CHANGES.txt
doc/installation.txt mentioning limitations of xapian indexer
(what exactly are the implications of not preserving case?)
doc/upgrading.txt - discussed limitations and benefits and recommend
roundup-admin reindex.
|
|
Date |
User |
Action |
Args |
2016-06-28 02:11:23 | rouilj | set | status: new -> fixed resolution: fixed messages:
+ msg5671 nosy:
+ rouilj |
2016-06-28 01:42:59 | rouilj | set | keywords:
+ patch |
2013-10-21 11:19:19 | ThomasAH | set | messages:
+ msg4941 |
2010-10-25 00:42:38 | bruce | set | nosy:
+ bruce |
2010-06-28 13:26:15 | wolever | set | messages:
+ msg4076 |
2010-06-28 13:03:48 | wolever | set | messages:
+ msg4075 |
2010-06-28 08:37:58 | ber | set | nosy:
+ ThomasAH, olly, wolever, jvstein |
2010-06-28 08:37:29 | ber | create | |
|