Issue 1238984
Created on 2005-07-15 15:31 by anonymous, last changed 2016-06-26 19:15 by rouilj.
msg3385 |
Author: [hidden] (anonymous) |
Date: 2005-07-15 15:31 |
|
Hello,
I've just started to use Roundup and it looks pretty
good. I can also tranlated some of text via roundup.po.
But I have to report that search in Japanese doesn't
work, I'm afraid.
There're some prediction in scripts like:
wordlist = re.findall(r'\b\w{2,25}\b', text.upper())
or
letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_"
segdicts = {} # Need batch
of empty dicts
for segment in letters:
segdicts[segment] = {}
for word, entry in self.words.items(): # Split into
segment dicts
initchar = word[0].upper()
segdicts[initchar][word] = entry
However, none of Chinese, Japanese nor Korean suite the
prediction because CJK has no obvious splitter like
latin language and of couse it does not start with
[0-9A-Z#_].
# Japanse is like this.
Readermustidentifywordboundaryfromthecontext.Wedohavesentenceseparators,though.
I'd fully appreciate it If you could separate splitter
or indexier code into one source file and allow end
users to customise it.
I'm not an expert of Zope but have heard that Zope has
ZCTextIndex mechanism and you can customise split
behaviour without any code change in Zope.
|
msg3386 |
Author: [hidden] (richard) |
Date: 2005-07-18 01:29 |
|
Logged In: YES
user_id=6405
Re-filing this as a feature request.
|
msg5633 |
Author: [hidden] (rouilj) |
Date: 2016-06-26 19:15 |
|
I think issue2550788 may solve this.
|
|
Date |
User |
Action |
Args |
2016-06-26 19:15:42 | rouilj | set | superseder: Does not support non-ascii chars for All text search (with Xapian) messages:
+ msg5633 components:
+ Database, Web interface, - None nosy:
+ rouilj |
2005-07-15 15:31:28 | anonymous | create | |
|