Message3385
Hello,
I've just started to use Roundup and it looks pretty
good. I can also tranlated some of text via roundup.po.
But I have to report that search in Japanese doesn't
work, I'm afraid.
There're some prediction in scripts like:
wordlist = re.findall(r'\b\w{2,25}\b', text.upper())
or
letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_"
segdicts = {} # Need batch
of empty dicts
for segment in letters:
segdicts[segment] = {}
for word, entry in self.words.items(): # Split into
segment dicts
initchar = word[0].upper()
segdicts[initchar][word] = entry
However, none of Chinese, Japanese nor Korean suite the
prediction because CJK has no obvious splitter like
latin language and of couse it does not start with
[0-9A-Z#_].
# Japanse is like this.
Readermustidentifywordboundaryfromthecontext.Wedohavesentenceseparators,though.
I'd fully appreciate it If you could separate splitter
or indexier code into one source file and allow end
users to customise it.
I'm not an expert of Zope but have heard that Zope has
ZCTextIndex mechanism and you can customise split
behaviour without any code change in Zope.
|
|
Date |
User |
Action |
Args |
2009-02-03 14:23:59 | admin | link | issue1238984 messages |
2009-02-03 14:23:59 | admin | create | |
|