Roundup Tracker - Issues

Issue 2550584

classification
Patch to fix inconsistent behavior of Indexer (length, stopwords)
Type: behavior Severity: normal
Components: Web interface Versions: 1.4
process
Status: closed fixed
:
: ber : ThomasAH, ber
Priority: urgent : patch

Created on 2009-09-01 15:26 by ThomasAH, last changed 2009-09-11 15:56 by ber.

Files
File name Uploaded Description Edit Remove
roundup-indexer-length-stopwords.patch ThomasAH, 2009-09-01 15:26
Messages
msg3864 Author: [hidden] (ThomasAH) Date: 2009-09-01 15:26
In roundup-1.4.9 and current SVN \b\w{2,25}\b is used for the regular
expression finding all words to be indexed, but the comments and the
find code uses 3 as the lower limit.

indexer_dbm.py and indexer_rdbms.py do not filter stopwords from the
list of words to be searched, therefore searching for "foo with bar"
will never find anything, because WITH is in the STOPWORDS default.

The attached patch (against SVN, works against 1.4.9, too) makes minimum
and maximum length of words to be indexed easily changeable in one
source location, which could easily be extended to a config option if
anyone wants (in a separate patch), and consistently uses this while
generating and using the index.
Additionally it consistently uses the stopwords when finding (the xapian
find already did this).

I chose 2 as for the minimum word length for two reasons:
1. Existing indexes will already have words of this length included.
   (in one of our trackers there are about 50000 entries with two
letters, about 700000 entries with more letters and about 150000 entries
would be added when using an empty STOPWORDS set)
2. Searching for two-letter words could really be useful, e.g. for
search terms like "HP UX" or "Windows XP".
msg3876 Author: [hidden] (ber) Date: 2009-09-11 15:56
Enchanced test_indexer.py to at least trigger each problem once,
commited with revision 4355.

Then committed an improved fix with revision 4356.
History
Date User Action Args
2009-09-11 15:56:44bersetstatus: new -> closed
resolution: fixed
messages: + msg3876
2009-09-11 14:42:10bersetpriority: urgent
assignee: ber
2009-09-01 15:26:07ThomasAHcreate