Roundup Tracker - Issues

Message5140

Author smcgraw
Recipients rouilj, smcgraw
Date 2014-09-13.01:56:37
Message-id <1410573399.54.0.0690584272296.issue2550851@psf.upfronthosting.co.za>
In-reply-to
John Rouillard wrote:
> Are you in a position to try out roundup with asian
> languages without using the CJKCodecs?

Well, we can try it right now. :-)

  このメッセージは日本語で書かれました。

You should be able to paste that into Google Translate or
similar and ask for Japanese -> English and get something 
sensible back.  (I'm assuming Roundup's tracker didn't
install the cjkcodecs.  If that's wrong I summitted a similar
issue on a newly installed 1.5.0 tracker with no ill effects
there either.)

Additionally, if you download the cjkcodecs package from
Sourceforge (http://sourceforge.net/projects/cjkpython.berlios/)
you'll see its author is Hye-Shik Chang <perky@FreeBSD.org>
and last update was 2004.

The Python revision where the cjkcodecs were added to 
cpython is:
  http://hg.python.org/cpython/rev/69dadc2ca14d
also in 2004 and by Hye-Shik Chang <hyeshik@gmail.com>
(That is, the cjkcodecs added to Python *are* the ones
from the cjkcodecs package.)

If you compare the codecs in the cjkcodecs package with 
those listed in the Python codecs module docs (which I 
did copy-pasting the python table, removing all rows
that weren't cjk codings, and deleting all but the first
column; the cjkcodecs package codec list from the names 
of *.py files in the package) one gets (first column are 
codecs in cjkcodecs package but not in python codecs,
second column are codecs in both):

		big5
		big5hkscs
		cp932
		cp949
		cp950
		euc_jis_2004
		euc_jisx0213
		euc_jp
		euc_kr
euc_tw
		gb18030
		gb2312
		gbk
		hz
iso2022_cn
		iso2022_jp
		iso2022_jp_1
		iso2022_jp_2
		iso2022_jp_2004
		iso2022_jp_3
		iso2022_jp_ext
		iso2022_kr
		johab
		shift_jis
		shift_jis_2004
		shift_jisx0213

I do not know why euc-tw and iso2022-cn were left out of 
Python.  However, Wikipedia 
  (http://en.wikipedia.org/wiki/Extended_Unix_Code)
says,
 "It [euc-tw] is a rarely used encoding for traditional 
  Chinese characters as used on Taiwan. Big5 is much
  more common."

As for iso2022-cn, other projects had problems with it and 
decided to do without it, e.g.:
  https://bugzilla.mozilla.org/show_bug.cgi?id=470523

Given the many other, more popular encodings for Chinese, 
the lack of those two would not seem to present a serious 
barrier to communication.

So I see no reason why Roundup should continue to recommend 
the cjkcodecs package.
History
Date User Action Args
2014-09-13 01:56:39smcgrawsetmessageid: <1410573399.54.0.0690584272296.issue2550851@psf.upfronthosting.co.za>
2014-09-13 01:56:39smcgrawsetrecipients: + smcgraw, rouilj
2014-09-13 01:56:39smcgrawlinkissue2550851 messages
2014-09-13 01:56:38smcgrawcreate