Issue 2550653
Created on 2010-06-28 08:37 by ber, last changed 2010-10-25 00:42 by bruce.
| msg4067 |
Author: [hidden] (ber) |
Date: 2010-06-28 08:37 |
|
This issue has been split out from issue2550583 (xapian search yields
too few results), see b) in msg4059.
A search for "silent" does not match "silently" like it should.
The proposed solution is to chance the code to feed xapian only lower
case words. The lower case words is what the stemming algorithm is
able to work with. The documentation of this within Xapian is not
as clear as it could be.
|
| msg4075 |
Author: [hidden] (wolever) |
Date: 2010-06-28 13:03 |
|
> I am also not quite sure about your comment in msg4065.
> You are saying that capitals will not be preserved and that this
> is correct?
Sorry — I'm saying that capitalization will *not* be preserved and that this is *incorrect*.
Because Xapian is case-sensitive (in certain circumstances), the Xapian indexer should never
change the case of text. As far as I can tell, this should be as simple as removing the calls to
'.upper()' from 'indexer_xapian.py'.
When I get a chance, I'll bang up the unit tests and a fix.
|
| msg4076 |
Author: [hidden] (wolever) |
Date: 2010-06-28 13:26 |
|
(from msg4068)
> Can you be more explicit about which implementation you are suggesting?,
Again, sorry — forgot that people read this through email.
The implementation in file436 is the implementation that I'm currently using, and is currently
working.
However, it doesn't preserve capitalization, as I've noted in msg4075.
|
|
| Date |
User |
Action |
Args |
| 2010-10-25 00:42:38 | bruce | set | nosy:
+ bruce |
| 2010-06-28 13:26:15 | wolever | set | messages:
+ msg4076 |
| 2010-06-28 13:03:48 | wolever | set | messages:
+ msg4075 |
| 2010-06-28 08:37:58 | ber | set | nosy:
+ ThomasAH, olly, wolever, jvstein |
| 2010-06-28 08:37:29 | ber | create | |
|