Roundup Tracker - Issues

Issue 732567

classification
unicode problem
Type: Severity: normal
Components: Mail interface Versions:
process
Status: closed fixed
:
: richard : anthonybaxter, kedder, richard
Priority: normal :

Created on 2003-05-05 08:37 by anonymous, last changed 2003-05-06 21:54 by kedder.

Messages
msg796 Author: [hidden] (anonymous) Date: 2003-05-05 08:37
it looks like all mail is sent out with headers
incidating the
text is utf-8, even if the incoming mail was in
latin-1. the
outgoing mail contains the same bytes as the incoming
mail did,
just the encoding indicated in the headers is changed.

imho incoming and outgoing mail should be in latin-1 if
possible since most environments (on unix at least) can't
handle unicode mail.
msg797 Author: [hidden] (kedder) Date: 2003-05-05 08:50
Logged In: YES 
user_id=218539

well, actually unix perfectly CAN handle unicode (like most
environments i know). And even if some software knows
nothing about utf-8 it should display the message correctly,
if it was written using latin characters only.

What kind of problems do you actually had with utf-8
encoding? What mail software do you use?
msg798 Author: [hidden] (anthonybaxter) Date: 2003-05-05 09:22
Logged In: YES 
user_id=29957

You really shouldn't be re-encoding the header unless it's
necessary. Doing
so gratuitously means more work and possible mailer
compatibility issues (the
person sending the message is guaranteed to be able to read
headers in the
format that they sent them, but not necessarily in encoded
format).
msg799 Author: [hidden] (anonymous) Date: 2003-05-05 09:45
Logged In: NO 

utf-8 is compatible with ascii, but not with latin-1 --
non-ascii latin-1 characters interpreted as utf-8 result in
garbage.

yes, many unix platforms have some support for utf-8, but it
often does not work sufficiently well and many sites use
non-unicode capable programs. i don't think it is realistic
to require roundup users
to switch to unicode.

mail programs that exhibit these problems include at least
mutt and mh.

the problems shows up in the web interface also. when mail
with latin-1 accented characters is sent to roundup, the web
interface does not show those correctly.
msg800 Author: [hidden] (kedder) Date: 2003-05-05 09:52
Logged In: YES 
user_id=218539

currently headers are only re-encoded only if header
contains symbols other than 

   -a-zA-Z0-9!*+/[]., 

So latin-only users shouldn't really worry about
re-encoding. All others are most probably use mailers with
support of rfc2822 header encoding (there is no other valid
way to put native characters to headers except rfc2822
encoding). 

I can't see any problems with that.. Anyway could you
provide any usecase where current bahavior can couse problems?
msg801 Author: [hidden] (anonymous) Date: 2003-05-05 10:08
Logged In: NO 

to clarify: i was talking about the message body, not
headers. haven't tried putting weird characters in headers
yet...
msg802 Author: [hidden] (anonymous) Date: 2003-05-05 10:52
Logged In: NO 

i stared a bit more at the weird characters appearing in the
web interface when i send in mails with latin-1 accents. it
seems to display correctly if i select "utf-8" from my
browsers's encoding menu. so it would seem the incoming mail
is indeed converted to utf-8 from latin-1.
msg803 Author: [hidden] (kedder) Date: 2003-05-05 11:04
Logged In: YES 
user_id=218539

oh... Are you using cvs trunk version with tracker
previously used with roundup-0.5.x? 

Did you followed doc/upgrading.txt instructions (there is a
section "0.6.0 Multilingual character set support")? And If
you did, which browser are you using?
msg804 Author: [hidden] (anthonybaxter) Date: 2003-05-06 05:13
Logged In: YES 
user_id=29957

Why do the perfectly valid characters " and ' trigger UTF-8
encoding? Looking
through the messages I have here, these consistenly trigger
the header mangling,
and there's no reason for them to do so.
msg805 Author: [hidden] (richard) Date: 2003-05-06 05:23
Logged In: YES 
user_id=6405

Are you sure they're not M$ "enhanced" characters?
msg806 Author: [hidden] (anthonybaxter) Date: 2003-05-06 05:28
Logged In: YES 
user_id=29957

Positive. For instance, here's a raw header:
Subject: [dev378]
=?utf-8?q?Create_new_=22monitored_events=22_table,
        _poll_in_cs_control_panel?=

Note that ascii 0x22 is "

kedder, in a comment below, noted that anything outside
-a-zA-Z0-9!*+/[].,
is encoded. This should be extended to cover single and
double quotes.
msg807 Author: [hidden] (richard) Date: 2003-05-06 05:34
Logged In: YES 
user_id=6405

I'm _far_ from knowledgable on this (hey, I only just learnt last week what UTF-
8 actually is, and still have a _long_ way to go in figuring out its applications ;)

We should probably extend that charset to everything <ASCII 128, shouldn't 
we?
msg808 Author: [hidden] (kedder) Date: 2003-05-06 07:37
Logged In: YES 
user_id=218539

yep, there is definately a field for improvment... But i've
copied this  regexp from rfc2822 (IIRC) module, which comes
with python2.2, but not with python2.1. So i thought these
chars would be enough (or may be they are defined in that
rfc (?)). 

Anyway extending character set is not too difficult... Will
this resolve the issue?
msg809 Author: [hidden] (anonymous) Date: 2003-05-06 07:40
Logged In: NO 

re web interface: yes, this is an installation upgraded from
0.5 to cvs.
there were no changes to the templates though and i assumed
they would be overwritten. so the web interface problem
might be because of that. but the mail side bug is still
unexplained.

in roundup-admin i get utf-8 encoded data with "get content
msg13".
msg810 Author: [hidden] (anthonybaxter) Date: 2003-05-06 07:45
Logged In: YES 
user_id=29957

Adding those two characters will shut me up, anyway :)

msg811 Author: [hidden] (anonymous) Date: 2003-05-06 07:46
Logged In: NO 

sigh!

now that i look more carefully it seems the outgoing mails
with headers indicating charset=utf-8 are indeed correctly
utf-8 encoded and not just bytes copies from the incoming
latin-1 mail.

so now the issue becomes what to do about environments that
can't
handle utf-8 -- i guess there should be a configuration
option saying
which character set to use. if it's not possible to
represent the data
in that character set, then it would be acceptable to fall
back to utf-8. how does that sound?

  -- erno
msg812 Author: [hidden] (anthonybaxter) Date: 2003-05-06 07:54
Logged In: YES 
user_id=29957

No, look - it's not a matter of "what charset to use". It's
simply a matter of
_leaving_ _it_ _alone_ unless you actually need to do
something about it.

_Don't_ gratuitously use RFC2231 encoding. There is no
point. Use the format
that the user sent the message in - that's the only
guarantee you have about the
user's MUA capabilities. 

_IF_ the message is entered with a subject that is outside
the ASCII range, 
_then_ use encoding. Otherwise, Leave It Alone.
msg813 Author: [hidden] (kedder) Date: 2003-05-06 07:58
Logged In: YES 
user_id=218539

nobody: starting from upcoming 0.6 roundup stores all data
internally in utf-8.  So now roundup-admin outputs data in
utf-8. 

However it is possible to convert  roundup-admin output to
locale set encoding, ignoring conversion errors (they may
occur, if character is not defined in current locale - so
locally undefined chars will be shown as "?"). Is that what
you want?
msg814 Author: [hidden] (kedder) Date: 2003-05-06 08:03
Logged In: YES 
user_id=218539

richard: that's exactly what i'm talking about.
msg815 Author: [hidden] (anonymous) Date: 2003-05-06 08:18
Logged In: NO 

I agree with principle with Anthony Braxter, but doing it
that way seems like more work. the charset-in-config-file
solution would be something I could hack in right away. I'll
just keep it as a local hack then.

  -- erno
msg816 Author: [hidden] (kedder) Date: 2003-05-06 21:54
Logged In: YES 
user_id=218539

I've tweaked the header encoding code a bit, it now should
pass ascii only strings as-is. 
History
Date User Action Args
2003-05-05 08:37:38anonymouscreate