Message5464
Hi, guys, I'm not sure what the exact "non-ascii" issue you have. For
me, the need is to just handle Chinese chars(also the English words is
still needed). I use the mmseg 1.3.0 : Python Package Index -
https://pypi.python.org/pypi/mmseg/1.3.0 to parse all the text, and
added the terms to the xapian's database. (Note that not only mmseg, but
other fxsjy/jieba: 结巴中文分词 - https://github.com/fxsjy/jieba should
works OK, because they just cut a long Chinese sentence to several
Chinese words)
Now, I can search the Chinese words correctly.
If you are interested to use mmseg, I can upload the patch against
roundup's source code. |
|
Date |
User |
Action |
Args |
2016-02-24 06:36:55 | ollydbg | set | messageid: <1456295815.83.0.289995605814.issue2550788@psf.upfronthosting.co.za> |
2016-02-24 06:36:55 | ollydbg | set | recipients:
+ ollydbg, ber, pefu, ThomasAH, jerome, yanqian |
2016-02-24 06:36:55 | ollydbg | link | issue2550788 messages |
2016-02-24 06:36:54 | ollydbg | create | |
|