Roundup Tracker - Issues

Issue 2550636

classification
[PATCH] Add support for the Whoosh indexer
Type: behavior Severity: normal
Components: Infrastructure Versions: 1.3
process
Status: fixed fixed
:
: rouilj : rouilj, thomas_ah, wolever
Priority: : patch

Created on 2010-02-05 16:38 by wolever, last changed 2016-06-26 00:19 by rouilj.

Files
File name Uploaded Description Edit Remove
whoosh_patch_old.diff wolever, 2010-02-05 16:38
whoosh_patch.diff wolever, 2010-02-05 16:48 Fixing a typo, making the matching fuzzy.
Messages
msg4021 Author: [hidden] (wolever) Date: 2010-02-05 16:38
I've banged together a first pass at using Whoosh for text indexing.

Currently the patch passes all unit tests, but hasn't had much battle testing.

Is this useful?
msg5617 Author: [hidden] (rouilj) Date: 2016-06-23 03:19
Warning the patch as it exists doesn't properly import indexer_rdbms.

Will need to support that somehow. Suggestion is to use a value of
native (or no value) to load indexer_dbm or indexer_rdbms.

Then change patch to explicitly require woosh or xapian.
msg5620 Author: [hidden] (rouilj) Date: 2016-06-24 04:12
I have an outstanding issue with test/test_indexer.py. I have to
comment out the mysql and postgres tests and imports otherwise the
tests don't run at all. I don't know why this is. It looks like the
skip_mysql and skip_postgresl that are defined in .test_postgres and
.test_mysql are not being properly set or interfering in some way with
the tests if you don't have mysql or postgres to test against.

I am going to try to fix this before I check in but if I can't do that
I will document as a fixme and commit.

At this time the dbm, sqlite (rdbms), xapian and whoosh indexes are
all passing the indexer tests.

This patch also adds the indexer config option which takes the values:

   xapian, whoosh and native (indexer_dbm or indexer_rdbms depending
   on the db in use)

if the indexer option is not set, it will try all three of the above in
order and use the first one found.

I applied the newest attached patch and modified it to:

1) support native back ends dbm and rdbms.
2) Developed whoosh stopfilter to not index stopwords or words outside
   the the maxlength and minlength limits defined in index_common.py.
   Required to pass the extremewords test_indexer test.  Also I
   removed a call to .lower on the input text as the tokenizer I chose
   automatically does the lowercase.
3) Added support for max/min length to find. This was needed to pass
   extremewords test.
4) Added back a call to save_index in add_text. This allowed all but
   two tests to pass.
5) Fixed a call to:
    results = searcher.search(query.Term("identifier", identifier))
   which had an extra parameter that is an error under current whoosh.
6) Set limit=None in search call for find() otherwise it only return
   10 items. This allowed it to pass manyresults test

Also due to changes in the roundup code removed the call in
indexer_whoosh to 

  from roundup.anypy.sets_ import set

since we use the python builtin set.
msg5627 Author: [hidden] (rouilj) Date: 2016-06-26 00:19
Applied this patch. Revision: e74c3611b138

No luck on fixing the test_indexer.py. No tests will run if you
have any missing backend. issue2550910 opened to request help on
fixing this.
History
Date User Action Args
2016-06-26 00:19:14rouiljsetstatus: open -> fixed
resolution: fixed
messages: + msg5627
2016-06-24 04:13:01rouiljsetstatus: new -> open
assignee: rouilj
messages: + msg5620
nosy: + thomas_ah
2016-06-23 03:19:43rouiljsetnosy: + rouilj
messages: + msg5617
2010-07-12 04:18:41richardlinkissue2550508 superseder
2010-02-05 16:48:45woleversetfiles: + whoosh_patch.diff
2010-02-05 16:38:07wolevercreate