2016-02-24
Hi, guys, I'm not sure what the exact "non-ascii" issue you have. For
me, the need is to just handle Chinese chars(also the English words is
still needed). I use the mmseg 1.3.0 : Python Package Index - to parse all the text, and
added the terms to the xapian's database. (Note that not only mmseg, but
other fxsjy/jieba: 结巴中文分词 - should
works OK, because they just cut a long Chinese sentence to several
Chinese words)

Now, I can search the Chinese words correctly.

If you are interested to use mmseg, I can upload the patch against
roundup's source code.
