Issue 1238984: Full text search in CJK does not work - Roundup tracker

classification

Title:	Full text search in CJK does not work
Type:	rfe	Severity:	normal
Components:	Database, Web interface	Versions:

process

Status:	open	Resolution:
Dependencies		Superseder:	Does not support non-ascii chars for All text search (with Xapian) View: 2550788
Assigned To:		Nosy List:	richard, rouilj
Priority:	normal	Keywords:

Created on 2005-07-15 15:31 by anonymous, last changed 2016-06-26 19:15 by rouilj.

Messages
msg3385	Author: [hidden] (anonymous)	Date: 2005-07-15 15:31
Hello, I've just started to use Roundup and it looks pretty good. I can also tranlated some of text via roundup.po. But I have to report that search in Japanese doesn't work, I'm afraid. There're some prediction in scripts like: wordlist = re.findall(r'\b\w{2,25}\b', text.upper()) or letters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#_" segdicts = {} # Need batch of empty dicts for segment in letters: segdicts[segment] = {} for word, entry in self.words.items(): # Split into segment dicts initchar = word[0].upper() segdicts[initchar][word] = entry However, none of Chinese, Japanese nor Korean suite the prediction because CJK has no obvious splitter like latin language and of couse it does not start with [0-9A-Z#_]. # Japanse is like this. Readermustidentifywordboundaryfromthecontext.Wedohavesentenceseparators,though. I'd fully appreciate it If you could separate splitter or indexier code into one source file and allow end users to customise it. I'm not an expert of Zope but have heard that Zope has ZCTextIndex mechanism and you can customise split behaviour without any code change in Zope.
msg3386	Author: [hidden] (richard)	Date: 2005-07-18 01:29
Logged In: YES user_id=6405 Re-filing this as a feature request.
msg5633	Author: [hidden] (rouilj)	Date: 2016-06-26 19:15
I think issue2550788 may solve this.

History
Date	User	Action	Args
2016-06-26 19:15:42	rouilj	set	superseder: Does not support non-ascii chars for All text search (with Xapian) messages: + msg5633 components: + Database, Web interface, - None nosy: + rouilj
2005-07-15 15:31:28	anonymous	create