Roundup Tracker - Issues

Message2047

Author anonymous
Recipients
Date 2007-01-29.18:25:54
Message-id
In-reply-to
Logged In: NO 

Words with UTF-8 characters are wrongly detected in indexer_ backends.
UTF-8 characters splits words now.

Original in indexer_xapian.py:
for match in re.finditer(r'\b\w{2,25}\b', text.upper()):
  word = match.group(0)

OK:
for match in re.finditer(r'\b\w{2,25}\b', unicode(text, "utf-8","replace").upper(), re.UNICODE):
  word = match.group(0).encode("utf-8", "replace")
History
Date User Action Args
2009-02-03 14:21:29adminlinkissue1344046 messages
2009-02-03 14:21:29admincreate