Roundup Tracker - Issues

Issue 1344046

Title: Search for "All text" can't find some Unicode words
Type: behavior Severity: normal
Components: Database Versions:
Status: open Resolution:
Dependencies: Superseder:
Assigned To: richard Nosy List: richard
Priority: normal Keywords: patch

Created on 2005-10-31 16:33 by anonymous, last changed 2016-06-26 19:29 by rouilj.

msg2046 Author: [hidden] (anonymous) Date: 2005-10-31 16:33
The fulltext search implemented in
backends\ is not able to find words
having specific unicode characters in them. One such
character is the german 'u umlaut' ('ü'), which does
not survive the upper() statement in find().

E. g., if you search for 'Sprünge', wordlist first
contains 'SPR\xc3\x9cNGE', then 'SPR\xc3\x8cNGE'.

To fix this, i replaced line 82:

<         l = [word.upper() for word in wordlist if 26
> len(word) > 1]
>         l = [unicode(word, "utf-8",
"replace").upper().encode("utf-8", "replace")
>             for word in wordlist if 26 > len(word) > 1]
msg2047 Author: [hidden] (anonymous) Date: 2007-01-29 18:25
Logged In: NO 

Words with UTF-8 characters are wrongly detected in indexer_ backends.
UTF-8 characters splits words now.

Original in
for match in re.finditer(r'\b\w{2,25}\b', text.upper()):
  word =

for match in re.finditer(r'\b\w{2,25}\b', unicode(text, "utf-8","replace").upper(), re.UNICODE):
  word ="utf-8", "replace")
Date User Action Args
2016-06-26 19:29:22rouiljsetkeywords: + patch
type: behavior
2005-10-31 16:33:51anonymouscreate