Roundup Tracker - Issues

Issue 2550771

classification
Allow to find substring matches for all text search
Type: rfe Severity: major
Components: Web interface Versions: 1.4
process
Status: new
:
: : ThomasAH, ber, ezio.melotti, pefu, rouilj, schlatterbeck
Priority: normal : patch, patch

Created on 2012-08-21 11:18 by ber, last changed 2019-11-10 14:00 by ber.

Files
File name Uploaded Description Edit Remove
2008-roundup_indexer_rdbms_pattern_full_textsearch.patch ber, 2012-08-21 11:18
2008-roundup_indexer_dbm_pattern_search.diff ber, 2012-08-21 11:18
roundup_indexer_rdbms_pattern_full_textsearch2.patch ThomasAH, 2016-01-06 13:01
Messages
msg4617 Author: [hidden] (ber) Date: 2012-08-21 11:18
Productivity of roundup can be raised if substring search for the all
text search would be possible.

I am attaching to hacks that have been in use at Intevation a few
years ago. One is probably very inefficient, the other specific to 
backends with the LIKE sql statement.

Hacks done by Frank Koormann:
2008-roundup_indexer_rdbms_pattern_full_textsearch.patch
2008-roundup_indexer_dbm_pattern_search.diff
msg4622 Author: [hidden] (ezio.melotti) Date: 2012-08-22 07:01
-re.findall(r'\b\w{2,25}\b', request.search_text), klass)
+re.findall(r'[\w_%]{2,25}', request.search_text), klass)

\w already includes _, so [\w%] is enough here.


There are also a few deprecated/obsolete idioms in the patches:

+if word.find('_') == -1 and word.find('%') == -1:
These should use the "in" operator.

+if not entries.has_key(word):
This (and others) should use "not in".

+raise ValueError, 'Index is corrupted: re-generate it'
This should use "raise ValueError(...)"


The patch(es) should also include some documentation that explains how
to use this new feature.
We also had some problem[0] with case-sensitive searches, however I'm
not sure if this is related to this issue.

[0]: http://psf.upfronthosting.co.za/roundup/meta/issue298
msg4623 Author: [hidden] (ber) Date: 2012-08-22 07:30
Ezio,
thanks for taking a look. As I wrote, those are old patches
and we need to evaluate if they at all provide a generic solution I
guess. To me there are two parts of it:
a) how would the user interface to the substring search look like?
Using the same as the "LIKE" operator?

Here is some describtion:
"""
string LIKE pattern 

If pattern does not contain percent signs or underscore, then the
pattern only represents the string itself; in that case LIKE acts like
the equals operator. An underscore (_) in pattern stands for (matches)
any single character; a percent sign (%) matches any string of zero or
more characters."""

Given that we want to use an index of some kind, I haven't looked into
solutions how to do a pattern match that is not directly supported
by the index engine (like "LIKE").
msg4624 Author: [hidden] (ber) Date: 2012-08-22 07:32
About http://psf.upfronthosting.co.za/roundup/meta/issue298 (realname
search should be case insensitive): To me this looks like a different
case, as it is specific to finding users. 

The issue at hand would be about substring searches in the msg texts.
(And case insensitive as those searches are already.)
msg4886 Author: [hidden] (ber) Date: 2013-05-10 21:28
Maybe issue2550805 is related. Do you use the postgresql backend for the python 
tracker or something else?
msg4887 Author: [hidden] (ezio.melotti) Date: 2013-05-11 00:38
bugs.python.org uses Postgres
msg4893 Author: [hidden] (ber) Date: 2013-05-13 09:40
Ezio, could you try the change in Issue2550805, it may or may not affect
user name search . It definately affected Postgresql. :)
msg5411 Author: [hidden] (ThomasAH) Date: 2016-01-06 13:01
Attached is an updated version of
2008-roundup_indexer_rdbms_pattern_full_textsearch.patch
named roundup_indexer_rdbms_pattern_full_textsearch2.patch
which still is just a quick hack, but works for us.
(PostgreSQL backend, Xapian is disabled)
msg5412 Author: [hidden] (ThomasAH) Date: 2016-01-06 13:14
oops, restored the title
msg5462 Author: [hidden] (pefu) Date: 2016-02-01 12:41
I assume it is necessary to use 'roundup-admin reindex' after applying 
the roundup_indexer_rdbms_pattern_full_textsearch2.patch?
msg5463 Author: [hidden] (ThomasAH) Date: 2016-02-02 15:37
roundup-admin reindex should not be needed, unless you are switching
from Xapian.
msg6808 Author: [hidden] (rouilj) Date: 2019-11-09 19:28
Ralf,

We have substring search in the rest interface. Could that be leveraged 
to allow us to resolve this ticket as well?

Also I don't see any test cases for this. Am I missing something?
msg6809 Author: [hidden] (schlatterbeck) Date: 2019-11-09 19:42
On Sat, Nov 09, 2019 at 07:28:07PM +0000, John Rouillard wrote:
> 
> John Rouillard added the comment:
> 
> Ralf,
> 
> We have substring search in the rest interface. Could that be leveraged 
> to allow us to resolve this ticket as well?
> 
> Also I don't see any test cases for this. Am I missing something?

As I read it it applies to fulltext search.
And doing a substring search in the fulltext index is probably always
ending up with performance problems. So I would not accept a patch that
uses this?

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg6813 Author: [hidden] (ber) Date: 2019-11-10 14:00
> And doing a substring search in the fulltext index is probably always
> ending up with performance problems.

For PostgreSQL there is a a trigram (trigraph) index that support
fast substring searches.
https://www.postgresql.org/docs/current/pgtrgm.html
History
Date User Action Args
2019-11-10 14:00:03bersetmessages: + msg6813
2019-11-09 19:42:53schlatterbecksetmessages: + msg6809
2019-11-09 19:28:07rouiljsetkeywords: patch, patch
nosy: + rouilj
messages: + msg6808
2019-11-09 19:24:49rouiljsetkeywords: patch, patch
nosy: + schlatterbeck
2016-02-02 15:37:10ThomasAHsetkeywords: patch, patch
messages: + msg5463
2016-02-01 12:41:11pefusetnosy: + pefu
messages: + msg5462
2016-01-06 13:14:49ThomasAHsetkeywords: patch, patch
messages: + msg5412
title: t -> Allow to find substring matches for all text search
2016-01-06 13:01:04ThomasAHsetkeywords: patch, patch
files: + roundup_indexer_rdbms_pattern_full_textsearch2.patch
messages: + msg5411
nosy: + ThomasAH
title: Allow to find substring matches for all text search -> t
2013-05-13 09:40:54bersetkeywords: patch, patch
messages: + msg4893
2013-05-11 00:38:28ezio.melottisetkeywords: patch, patch
messages: + msg4887
2013-05-10 21:28:13bersetkeywords: patch, patch
messages: + msg4886
2012-08-22 07:32:51bersetkeywords: patch, patch
messages: + msg4624
2012-08-22 07:30:50bersetkeywords: patch, patch
messages: + msg4623
2012-08-22 07:01:34ezio.melottisetkeywords: patch, patch
nosy: + ezio.melotti
messages: + msg4622
2012-08-21 11:18:52bersetkeywords: patch, patch
files: + 2008-roundup_indexer_dbm_pattern_search.diff
2012-08-21 11:18:27bercreate