Issue 2550771
 
 
 
              
              
Created on 2012-08-21 11:18 by ber, last changed 2019-11-10 14:00 by ber. 
 |
 
   | msg4617 | Author: [hidden] (ber) | Date: 2012-08-21 11:18 |  |  
   | Productivity of roundup can be raised if substring search for the all
text search would be possible.
I am attaching to hacks that have been in use at Intevation a few
years ago. One is probably very inefficient, the other specific to 
backends with the LIKE sql statement.
Hacks done by Frank Koormann:
2008-roundup_indexer_rdbms_pattern_full_textsearch.patch
2008-roundup_indexer_dbm_pattern_search.diff |  
   | msg4622 | Author: [hidden] (ezio.melotti) | Date: 2012-08-22 07:01 |  |  
   | -re.findall(r'\b\w{2,25}\b', request.search_text), klass)
+re.findall(r'[\w_%]{2,25}', request.search_text), klass)
\w already includes _, so [\w%] is enough here.
There are also a few deprecated/obsolete idioms in the patches:
+if word.find('_') == -1 and word.find('%') == -1:
These should use the "in" operator.
+if not entries.has_key(word):
This (and others) should use "not in".
+raise ValueError, 'Index is corrupted: re-generate it'
This should use "raise ValueError(...)"
The patch(es) should also include some documentation that explains how
to use this new feature.
We also had some problem[0] with case-sensitive searches, however I'm
not sure if this is related to this issue.
[0]: http://psf.upfronthosting.co.za/roundup/meta/issue298 |  
   | msg4623 | Author: [hidden] (ber) | Date: 2012-08-22 07:30 |  |  
   | Ezio,
thanks for taking a look. As I wrote, those are old patches
and we need to evaluate if they at all provide a generic solution I
guess. To me there are two parts of it:
a) how would the user interface to the substring search look like?
Using the same as the "LIKE" operator?
Here is some describtion:
"""
string LIKE pattern 
If pattern does not contain percent signs or underscore, then the
pattern only represents the string itself; in that case LIKE acts like
the equals operator. An underscore (_) in pattern stands for (matches)
any single character; a percent sign (%) matches any string of zero or
more characters."""
Given that we want to use an index of some kind, I haven't looked into
solutions how to do a pattern match that is not directly supported
by the index engine (like "LIKE"). |  
   | msg4624 | Author: [hidden] (ber) | Date: 2012-08-22 07:32 |  |  
   | About http://psf.upfronthosting.co.za/roundup/meta/issue298 (realname
search should be case insensitive): To me this looks like a different
case, as it is specific to finding users. 
The issue at hand would be about substring searches in the msg texts.
(And case insensitive as those searches are already.) |  
   | msg4886 | Author: [hidden] (ber) | Date: 2013-05-10 21:28 |  |  
   | Maybe issue2550805 is related. Do you use the postgresql backend for the python 
tracker or something else? |  
   | msg4887 | Author: [hidden] (ezio.melotti) | Date: 2013-05-11 00:38 |  |  
   | bugs.python.org uses Postgres |  
   | msg4893 | Author: [hidden] (ber) | Date: 2013-05-13 09:40 |  |  
   | Ezio, could you try the change in Issue2550805, it may or may not affect
user name search . It definately affected Postgresql. :) |  
   | msg5411 | Author: [hidden] (ThomasAH) | Date: 2016-01-06 13:01 |  |  
   | Attached is an updated version of
2008-roundup_indexer_rdbms_pattern_full_textsearch.patch
named roundup_indexer_rdbms_pattern_full_textsearch2.patch
which still is just a quick hack, but works for us.
(PostgreSQL backend, Xapian is disabled) |  
   | msg5412 | Author: [hidden] (ThomasAH) | Date: 2016-01-06 13:14 |  |  
   | oops, restored the title |  
   | msg5462 | Author: [hidden] (pefu) | Date: 2016-02-01 12:41 |  |  
   | I assume it is necessary to use 'roundup-admin reindex' after applying 
the roundup_indexer_rdbms_pattern_full_textsearch2.patch? |  
   | msg5463 | Author: [hidden] (ThomasAH) | Date: 2016-02-02 15:37 |  |  
   | roundup-admin reindex should not be needed, unless you are switching
from Xapian. |  
   | msg6808 | Author: [hidden] (rouilj) | Date: 2019-11-09 19:28 |  |  
   | Ralf,
We have substring search in the rest interface. Could that be leveraged 
to allow us to resolve this ticket as well?
Also I don't see any test cases for this. Am I missing something? |  
   | msg6809 | Author: [hidden] (schlatterbeck) | Date: 2019-11-09 19:42 |  |  
   | On Sat, Nov 09, 2019 at 07:28:07PM +0000, John Rouillard wrote:
> 
> John Rouillard added the comment:
> 
> Ralf,
> 
> We have substring search in the rest interface. Could that be leveraged 
> to allow us to resolve this ticket as well?
> 
> Also I don't see any test cases for this. Am I missing something?
As I read it it applies to fulltext search.
And doing a substring search in the fulltext index is probably always
ending up with performance problems. So I would not accept a patch that
uses this?
Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com |  
   | msg6813 | Author: [hidden] (ber) | Date: 2019-11-10 14:00 |  |  
   | > And doing a substring search in the fulltext index is probably always
> ending up with performance problems.
For PostgreSQL there is a a trigram (trigraph) index that support
fast substring searches.
https://www.postgresql.org/docs/current/pgtrgm.html |  |
 
| Date | User | Action | Args |  | 2019-11-10 14:00:03 | ber | set | messages:
  + msg6813 |  | 2019-11-09 19:42:53 | schlatterbeck | set | messages:
  + msg6809 |  | 2019-11-09 19:28:07 | rouilj | set | keywords:
  patch, patch nosy:
  + rouilj
 messages:
  + msg6808
 |  | 2019-11-09 19:24:49 | rouilj | set | keywords:
  patch, patch nosy:
  + schlatterbeck
 |  | 2016-02-02 15:37:10 | ThomasAH | set | keywords:
  patch, patch messages:
  + msg5463
 |  | 2016-02-01 12:41:11 | pefu | set | nosy:
  + pefu messages:
  + msg5462
 |  | 2016-01-06 13:14:49 | ThomasAH | set | keywords:
  patch, patch messages:
  + msg5412
 title: t -> Allow to find substring matches for all text search
 |  | 2016-01-06 13:01:04 | ThomasAH | set | keywords:
  patch, patch files:
  + roundup_indexer_rdbms_pattern_full_textsearch2.patch
 messages:
  + msg5411
 nosy:
  + ThomasAH
 title: Allow to find substring matches for all text search -> t
 |  | 2013-05-13 09:40:54 | ber | set | keywords:
  patch, patch messages:
  + msg4893
 |  | 2013-05-11 00:38:28 | ezio.melotti | set | keywords:
  patch, patch messages:
  + msg4887
 |  | 2013-05-10 21:28:13 | ber | set | keywords:
  patch, patch messages:
  + msg4886
 |  | 2012-08-22 07:32:51 | ber | set | keywords:
  patch, patch messages:
  + msg4624
 |  | 2012-08-22 07:30:50 | ber | set | keywords:
  patch, patch messages:
  + msg4623
 |  | 2012-08-22 07:01:34 | ezio.melotti | set | keywords:
  patch, patch nosy:
  + ezio.melotti
 messages:
  + msg4622
 |  | 2012-08-21 11:18:52 | ber | set | keywords:
  patch, patch files:
  + 2008-roundup_indexer_dbm_pattern_search.diff
 |  | 2012-08-21 11:18:27 | ber | create |  | 
 |