Issue 2550636
Created on 2010-02-05 16:38 by wolever, last changed 2016-06-26 00:19 by rouilj.
msg4021 |
Author: [hidden] (wolever) |
Date: 2010-02-05 16:38 |
|
I've banged together a first pass at using Whoosh for text indexing.
Currently the patch passes all unit tests, but hasn't had much battle testing.
Is this useful?
|
msg5617 |
Author: [hidden] (rouilj) |
Date: 2016-06-23 03:19 |
|
Warning the patch as it exists doesn't properly import indexer_rdbms.
Will need to support that somehow. Suggestion is to use a value of
native (or no value) to load indexer_dbm or indexer_rdbms.
Then change patch to explicitly require woosh or xapian.
|
msg5620 |
Author: [hidden] (rouilj) |
Date: 2016-06-24 04:12 |
|
I have an outstanding issue with test/test_indexer.py. I have to
comment out the mysql and postgres tests and imports otherwise the
tests don't run at all. I don't know why this is. It looks like the
skip_mysql and skip_postgresl that are defined in .test_postgres and
.test_mysql are not being properly set or interfering in some way with
the tests if you don't have mysql or postgres to test against.
I am going to try to fix this before I check in but if I can't do that
I will document as a fixme and commit.
At this time the dbm, sqlite (rdbms), xapian and whoosh indexes are
all passing the indexer tests.
This patch also adds the indexer config option which takes the values:
xapian, whoosh and native (indexer_dbm or indexer_rdbms depending
on the db in use)
if the indexer option is not set, it will try all three of the above in
order and use the first one found.
I applied the newest attached patch and modified it to:
1) support native back ends dbm and rdbms.
2) Developed whoosh stopfilter to not index stopwords or words outside
the the maxlength and minlength limits defined in index_common.py.
Required to pass the extremewords test_indexer test. Also I
removed a call to .lower on the input text as the tokenizer I chose
automatically does the lowercase.
3) Added support for max/min length to find. This was needed to pass
extremewords test.
4) Added back a call to save_index in add_text. This allowed all but
two tests to pass.
5) Fixed a call to:
results = searcher.search(query.Term("identifier", identifier))
which had an extra parameter that is an error under current whoosh.
6) Set limit=None in search call for find() otherwise it only return
10 items. This allowed it to pass manyresults test
Also due to changes in the roundup code removed the call in
indexer_whoosh to
from roundup.anypy.sets_ import set
since we use the python builtin set.
|
msg5627 |
Author: [hidden] (rouilj) |
Date: 2016-06-26 00:19 |
|
Applied this patch. Revision: e74c3611b138
No luck on fixing the test_indexer.py. No tests will run if you
have any missing backend. issue2550910 opened to request help on
fixing this.
|
|
Date |
User |
Action |
Args |
2016-06-26 00:19:14 | rouilj | set | status: open -> fixed resolution: fixed messages:
+ msg5627 |
2016-06-24 04:13:01 | rouilj | set | status: new -> open assignee: rouilj messages:
+ msg5620 nosy:
+ thomas_ah |
2016-06-23 03:19:43 | rouilj | set | nosy:
+ rouilj messages:
+ msg5617 |
2010-07-12 04:18:41 | richard | link | issue2550508 superseder |
2010-02-05 16:48:45 | wolever | set | files:
+ whoosh_patch.diff |
2010-02-05 16:38:07 | wolever | create | |
|