Issue 2550771
Created on 2012-08-21 11:18 by ber, last changed 2019-11-10 14:00 by ber.
msg4617 |
Author: [hidden] (ber) |
Date: 2012-08-21 11:18 |
|
Productivity of roundup can be raised if substring search for the all
text search would be possible.
I am attaching to hacks that have been in use at Intevation a few
years ago. One is probably very inefficient, the other specific to
backends with the LIKE sql statement.
Hacks done by Frank Koormann:
2008-roundup_indexer_rdbms_pattern_full_textsearch.patch
2008-roundup_indexer_dbm_pattern_search.diff
|
msg4622 |
Author: [hidden] (ezio.melotti) |
Date: 2012-08-22 07:01 |
|
-re.findall(r'\b\w{2,25}\b', request.search_text), klass)
+re.findall(r'[\w_%]{2,25}', request.search_text), klass)
\w already includes _, so [\w%] is enough here.
There are also a few deprecated/obsolete idioms in the patches:
+if word.find('_') == -1 and word.find('%') == -1:
These should use the "in" operator.
+if not entries.has_key(word):
This (and others) should use "not in".
+raise ValueError, 'Index is corrupted: re-generate it'
This should use "raise ValueError(...)"
The patch(es) should also include some documentation that explains how
to use this new feature.
We also had some problem[0] with case-sensitive searches, however I'm
not sure if this is related to this issue.
[0]: http://psf.upfronthosting.co.za/roundup/meta/issue298
|
msg4623 |
Author: [hidden] (ber) |
Date: 2012-08-22 07:30 |
|
Ezio,
thanks for taking a look. As I wrote, those are old patches
and we need to evaluate if they at all provide a generic solution I
guess. To me there are two parts of it:
a) how would the user interface to the substring search look like?
Using the same as the "LIKE" operator?
Here is some describtion:
"""
string LIKE pattern
If pattern does not contain percent signs or underscore, then the
pattern only represents the string itself; in that case LIKE acts like
the equals operator. An underscore (_) in pattern stands for (matches)
any single character; a percent sign (%) matches any string of zero or
more characters."""
Given that we want to use an index of some kind, I haven't looked into
solutions how to do a pattern match that is not directly supported
by the index engine (like "LIKE").
|
msg4624 |
Author: [hidden] (ber) |
Date: 2012-08-22 07:32 |
|
About http://psf.upfronthosting.co.za/roundup/meta/issue298 (realname
search should be case insensitive): To me this looks like a different
case, as it is specific to finding users.
The issue at hand would be about substring searches in the msg texts.
(And case insensitive as those searches are already.)
|
msg4886 |
Author: [hidden] (ber) |
Date: 2013-05-10 21:28 |
|
Maybe issue2550805 is related. Do you use the postgresql backend for the python
tracker or something else?
|
msg4887 |
Author: [hidden] (ezio.melotti) |
Date: 2013-05-11 00:38 |
|
bugs.python.org uses Postgres
|
msg4893 |
Author: [hidden] (ber) |
Date: 2013-05-13 09:40 |
|
Ezio, could you try the change in Issue2550805, it may or may not affect
user name search . It definately affected Postgresql. :)
|
msg5411 |
Author: [hidden] (ThomasAH) |
Date: 2016-01-06 13:01 |
|
Attached is an updated version of
2008-roundup_indexer_rdbms_pattern_full_textsearch.patch
named roundup_indexer_rdbms_pattern_full_textsearch2.patch
which still is just a quick hack, but works for us.
(PostgreSQL backend, Xapian is disabled)
|
msg5412 |
Author: [hidden] (ThomasAH) |
Date: 2016-01-06 13:14 |
|
oops, restored the title
|
msg5462 |
Author: [hidden] (pefu) |
Date: 2016-02-01 12:41 |
|
I assume it is necessary to use 'roundup-admin reindex' after applying
the roundup_indexer_rdbms_pattern_full_textsearch2.patch?
|
msg5463 |
Author: [hidden] (ThomasAH) |
Date: 2016-02-02 15:37 |
|
roundup-admin reindex should not be needed, unless you are switching
from Xapian.
|
msg6808 |
Author: [hidden] (rouilj) |
Date: 2019-11-09 19:28 |
|
Ralf,
We have substring search in the rest interface. Could that be leveraged
to allow us to resolve this ticket as well?
Also I don't see any test cases for this. Am I missing something?
|
msg6809 |
Author: [hidden] (schlatterbeck) |
Date: 2019-11-09 19:42 |
|
On Sat, Nov 09, 2019 at 07:28:07PM +0000, John Rouillard wrote:
>
> John Rouillard added the comment:
>
> Ralf,
>
> We have substring search in the rest interface. Could that be leveraged
> to allow us to resolve this ticket as well?
>
> Also I don't see any test cases for this. Am I missing something?
As I read it it applies to fulltext search.
And doing a substring search in the fulltext index is probably always
ending up with performance problems. So I would not accept a patch that
uses this?
Ralf
--
Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16
Open Source Consulting www: http://www.runtux.com
Reichergasse 131, A-3411 Weidling email: office@runtux.com
|
msg6813 |
Author: [hidden] (ber) |
Date: 2019-11-10 14:00 |
|
> And doing a substring search in the fulltext index is probably always
> ending up with performance problems.
For PostgreSQL there is a a trigram (trigraph) index that support
fast substring searches.
https://www.postgresql.org/docs/current/pgtrgm.html
|
|
Date |
User |
Action |
Args |
2019-11-10 14:00:03 | ber | set | messages:
+ msg6813 |
2019-11-09 19:42:53 | schlatterbeck | set | messages:
+ msg6809 |
2019-11-09 19:28:07 | rouilj | set | keywords:
patch, patch nosy:
+ rouilj messages:
+ msg6808 |
2019-11-09 19:24:49 | rouilj | set | keywords:
patch, patch nosy:
+ schlatterbeck |
2016-02-02 15:37:10 | ThomasAH | set | keywords:
patch, patch messages:
+ msg5463 |
2016-02-01 12:41:11 | pefu | set | nosy:
+ pefu messages:
+ msg5462 |
2016-01-06 13:14:49 | ThomasAH | set | keywords:
patch, patch messages:
+ msg5412 title: t -> Allow to find substring matches for all text search |
2016-01-06 13:01:04 | ThomasAH | set | keywords:
patch, patch files:
+ roundup_indexer_rdbms_pattern_full_textsearch2.patch messages:
+ msg5411 nosy:
+ ThomasAH title: Allow to find substring matches for all text search -> t |
2013-05-13 09:40:54 | ber | set | keywords:
patch, patch messages:
+ msg4893 |
2013-05-11 00:38:28 | ezio.melotti | set | keywords:
patch, patch messages:
+ msg4887 |
2013-05-10 21:28:13 | ber | set | keywords:
patch, patch messages:
+ msg4886 |
2012-08-22 07:32:51 | ber | set | keywords:
patch, patch messages:
+ msg4624 |
2012-08-22 07:30:50 | ber | set | keywords:
patch, patch messages:
+ msg4623 |
2012-08-22 07:01:34 | ezio.melotti | set | keywords:
patch, patch nosy:
+ ezio.melotti messages:
+ msg4622 |
2012-08-21 11:18:52 | ber | set | keywords:
patch, patch files:
+ 2008-roundup_indexer_dbm_pattern_search.diff |
2012-08-21 11:18:27 | ber | create | |
|