Roundup Tracker - Issues

Message7441

Author ThomasAH
Recipients ThomasAH
Date 2022-01-25.07:41:05
Message-id <1643096466.37.0.516439572323.issue2551189@roundup.psfhosted.org>
In-reply-to
We recently had some support requests where the browser indicated an
error message with "SSL_ERROR_HANDSHAKE_FAILURE_ALERT". Roundup's search
wasn't able to find these messages, because the indexer is hard coded to
25 characters.

We had other searches in the past where searching failed due to this low
limit, as the German language can easily produce long words, here some
words that have actually been used in our trackers :)
Immatrikulationsbescheinigung (29 letters)
Softwareentwicklungsdienstleistungen (36 letters)
Auftragsdatenverarbeitungsvereinbarungen (40 letters)
Betriebshaftpflichtversicherungsbedingungen (43 letters)

So I guess the maximum length for the search index should be at least 45
characters, but analysis of the data in our trackers shows that even 50
characters does not increase the size of the index too much.
Most "junk" lines are base64 encoded things, which are usually between
64 and 76 characters long.

Problem is, that some database tables need to become wider to allow
increasing the size of the search index, but there has been a recent
change that required changes to the database structure anyway, so this
might be a good time to do this? (Priority "high" because of this)

Here is a patch I use for an older version (almost 1.6) of roundup.
I increased 25 to 50, and some places which had 30 to 60.


diff -r 45ad337e0c4a roundup/backends/back_mysql.py
--- a/roundup/backends/back_mysql.py	Thu Nov 14 13:00:49 2019 +0100
+++ b/roundup/backends/back_mysql.py	Tue Nov 16 17:23:10 2021 +0100
@@ -228,7 +228,7 @@
         self.sql('''CREATE TABLE __textids (_class VARCHAR(255),
             _itemid VARCHAR(255), _prop VARCHAR(255), _textid INT)
             ENGINE=%s'''%self.mysql_backend)
-        self.sql('''CREATE TABLE __words (_word VARCHAR(30),
+        self.sql('''CREATE TABLE __words (_word VARCHAR(60),
             _textid INT) ENGINE=%s'''%self.mysql_backend)
         self.sql('CREATE INDEX words_word_ids ON __words(_word)')
         self.sql('CREATE INDEX words_by_id ON __words (_textid)')
diff -r 45ad337e0c4a roundup/backends/back_postgresql.py
--- a/roundup/backends/back_postgresql.py	Thu Nov 14 13:00:49 2019 +0100
+++ b/roundup/backends/back_postgresql.py	Tue Nov 16 17:23:10 2021 +0100
@@ -206,7 +206,7 @@
         self.sql('''CREATE TABLE __textids (
             _textid integer primary key, _class VARCHAR(255),
             _itemid VARCHAR(255), _prop VARCHAR(255))''')
-        self.sql('''CREATE TABLE __words (_word VARCHAR(30),
+        self.sql('''CREATE TABLE __words (_word VARCHAR(60),
             _textid integer)''')
         self.sql('CREATE INDEX words_word_idx ON __words(_word)')
         self.sql('CREATE INDEX words_by_id ON __words (_textid)')
diff -r 45ad337e0c4a roundup/backends/indexer_common.py
--- a/roundup/backends/indexer_common.py	Thu Nov 14 13:00:49 2019 +0100
+++ b/roundup/backends/indexer_common.py	Tue Nov 16 17:23:10 2021 +0100
@@ -19,10 +19,10 @@
         self.stopwords = set(STOPWORDS)
         for word in db.config[('main', 'indexer_stopwords')]:
             self.stopwords.add(word)
-        # Do not index anything longer than 25 characters since that'll be
+        # Do not index anything longer than 50 characters since that'll be
         # gibberish (encoded text or somesuch) or shorter than 2 characters
         self.minlength = 2
-        self.maxlength = 25
+        self.maxlength = 50
 
     def is_stopword(self, word):
         return word in self.stopwords
diff -r 45ad337e0c4a roundup/cgi/actions.py
--- a/roundup/cgi/actions.py	Thu Nov 14 13:00:49 2019 +0100
+++ b/roundup/cgi/actions.py	Tue Nov 16 17:23:10 2021 +0100
@@ -1319,7 +1319,7 @@
         # full-text search
         if request.search_text:
             matches = self.db.indexer.search(
-                re.findall(r'[\w%]{2,25}', request.search_text), klass)
+                re.findall(r'[\w%]{2,50}', request.search_text), klass)
         else:
             matches = None
 
diff -r 45ad337e0c4a roundup/cgi/templating.py
--- a/roundup/cgi/templating.py	Thu Nov 14 13:00:49 2019 +0100
+++ b/roundup/cgi/templating.py	Tue Nov 16 17:23:10 2021 +0100
@@ -2946,7 +2946,7 @@
         if self.search_text:
             matches = self.client.db.indexer.search(
                 [w.upper().encode("utf-8", "replace") for w in re.findall(
-                    r'(?u)[\w%]{2,25}',
+                    r'(?u)[\w%]{2,50}',
                     unicode(self.search_text, "utf-8", "replace")
                 )], klass)
         else:
History
Date User Action Args
2022-01-25 07:41:06ThomasAHsetrecipients: + ThomasAH
2022-01-25 07:41:06ThomasAHsetmessageid: <1643096466.37.0.516439572323.issue2551189@roundup.psfhosted.org>
2022-01-25 07:41:06ThomasAHlinkissue2551189 messages
2022-01-25 07:41:05ThomasAHcreate