Message3864
In roundup-1.4.9 and current SVN \b\w{2,25}\b is used for the regular
expression finding all words to be indexed, but the comments and the
find code uses 3 as the lower limit.
indexer_dbm.py and indexer_rdbms.py do not filter stopwords from the
list of words to be searched, therefore searching for "foo with bar"
will never find anything, because WITH is in the STOPWORDS default.
The attached patch (against SVN, works against 1.4.9, too) makes minimum
and maximum length of words to be indexed easily changeable in one
source location, which could easily be extended to a config option if
anyone wants (in a separate patch), and consistently uses this while
generating and using the index.
Additionally it consistently uses the stopwords when finding (the xapian
find already did this).
I chose 2 as for the minimum word length for two reasons:
1. Existing indexes will already have words of this length included.
(in one of our trackers there are about 50000 entries with two
letters, about 700000 entries with more letters and about 150000 entries
would be added when using an empty STOPWORDS set)
2. Searching for two-letter words could really be useful, e.g. for
search terms like "HP UX" or "Windows XP". |
|
Date |
User |
Action |
Args |
2009-09-01 15:26:08 | ThomasAH | set | messageid: <1251818768.07.0.621541080617.issue2550584@psf.upfronthosting.co.za> |
2009-09-01 15:26:08 | ThomasAH | set | recipients:
+ ThomasAH, ber |
2009-09-01 15:26:07 | ThomasAH | link | issue2550584 messages |
2009-09-01 15:26:06 | ThomasAH | create | |
|