Roundup Tracker - Issues

Issue 1238984

classification
Full text search in CJK does not work
Type: rfe Severity: normal
Components: Database, Web interface Versions:
process
Status: open
: Does not support non-ascii chars for All text search (with Xapian)
View: 2550788
: : richard, rouilj
Priority: normal :

Created on 2005-07-15 15:31 by anonymous, last changed 2016-06-26 19:15 by rouilj.

Messages
msg3385 Author: [hidden] (anonymous) Date: 2005-07-15 15:31
Hello,

I've just started to use Roundup and it looks pretty
good. I can also tranlated some of text via roundup.po.

But I have to report that search in Japanese doesn't
work, I'm afraid.

There're some prediction in scripts like:

  wordlist = re.findall(r'\b\w{2,25}\b', text.upper())

or

  letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_"
  segdicts = {}                           # Need batch
of empty dicts
  for segment in letters:
    segdicts[segment] = {}
  for word, entry in self.words.items():  # Split into
segment dicts
    initchar = word[0].upper()
    segdicts[initchar][word] = entry

However, none of Chinese, Japanese nor Korean suite the
prediction because CJK has no obvious splitter like
latin language and of couse it does not start with
[0-9A-Z#_].

# Japanse is like this.
Readermustidentifywordboundaryfromthecontext.Wedohavesentenceseparators,though.


I'd fully appreciate it If you could separate splitter
or indexier code into one source file and allow end
users to customise it.

I'm not an expert of Zope but have heard that Zope has
ZCTextIndex mechanism and you can customise split
behaviour without any code change in Zope.
msg3386 Author: [hidden] (richard) Date: 2005-07-18 01:29
Logged In: YES 
user_id=6405

Re-filing this as a feature request. 
msg5633 Author: [hidden] (rouilj) Date: 2016-06-26 19:15
I think issue2550788 may solve this.
History
Date User Action Args
2016-06-26 19:15:42rouiljsetsuperseder: Does not support non-ascii chars for All text search (with Xapian)
messages: + msg5633
components: + Database, Web interface, - None
nosy: + rouilj
2005-07-15 15:31:28anonymouscreate