Roundup Tracker - Issues

Message3385

Author anonymous
Recipients
Date 2005-07-15.15:31:28
Message-id
In-reply-to
Hello,

I've just started to use Roundup and it looks pretty
good. I can also tranlated some of text via roundup.po.

But I have to report that search in Japanese doesn't
work, I'm afraid.

There're some prediction in scripts like:

  wordlist = re.findall(r'\b\w{2,25}\b', text.upper())

or

  letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_"
  segdicts = {}                           # Need batch
of empty dicts
  for segment in letters:
    segdicts[segment] = {}
  for word, entry in self.words.items():  # Split into
segment dicts
    initchar = word[0].upper()
    segdicts[initchar][word] = entry

However, none of Chinese, Japanese nor Korean suite the
prediction because CJK has no obvious splitter like
latin language and of couse it does not start with
[0-9A-Z#_].

# Japanse is like this.
Readermustidentifywordboundaryfromthecontext.Wedohavesentenceseparators,though.


I'd fully appreciate it If you could separate splitter
or indexier code into one source file and allow end
users to customise it.

I'm not an expert of Zope but have heard that Zope has
ZCTextIndex mechanism and you can customise split
behaviour without any code change in Zope.
History
Date User Action Args
2009-02-03 14:23:59adminlinkissue1238984 messages
2009-02-03 14:23:59admincreate