Issue 1381559
Created on 2005-12-15 12:39 by ber, last changed 2006-01-30 16:58 by ber.
msg2070 |
Author: [hidden] (ber) |
Date: 2005-12-15 12:39 |
|
When you have a type="text/plain" file attachment,
the nosy mailer does not do encodings
and always uses 7bit.
This is wrong for texts with umlauts.
My patch has a workaround and outlines a better
solution. It should apply to CVS
from yesterday.
|
msg2071 |
Author: [hidden] (a1s) |
Date: 2005-12-25 16:31 |
|
Logged In: YES
user_id=8719
i am sorry, i do not think that charset guessing as in
attached patch is the right thing. if charset must be
specified, it should be explicitely set in file mime type
(you can use database detectors to apply site defaults).
commited in the HEAD branch (will appear in 0.9) is a fix
that checks if text/plain attachment can be 7bit-encoded,
and uses quoted printable encoding if it cannot.
|
msg2072 |
Author: [hidden] (ber) |
Date: 2005-12-27 10:49 |
|
Logged In: YES
user_id=113859
Hi Alexander,
thanks for answering and helping to get this bug fixed.
I completely agree with yout that charset guessing is
the wrong solution, but actually it _is_ a solution.
Unless the Charset is saved somewhere, I do not see a better
solution, though.
Guesses about the encoding will not fix the bug that you send
emails out with the wrong charset = (text encoding).
Thus this bug probably should still be open.
If you suggest to save the charset as part of the filetype
string:
This could potentially be a solution, but it might break the
current assumption
what is in this string. The charset usually is not part of
the MIME-Type,
but of the email Content-Type.
This is whay I propose to save the charset in an extra field,
if the filetype is "text/plain". This would be a larger change
to roundup as all input and output channels would need to be
checked
to save the charset or guess it instead.
Also I do not fully understand what you mean by: applying
site defaults. A good site will get utf-8 and latin-1
encoded text-attachments and must record that encoding and
possibly recode it at a few occasions.
Bernhard
|
msg2073 |
Author: [hidden] (a1s) |
Date: 2005-12-27 11:15 |
|
Logged In: YES
user_id=8719
i do not understand what do you mean by differentiating
"MIME-Type" and "Content-Type". i am not aware of any
"MIME-Type" other than content type.
charset is perfectly acceptable parameter of MIME content
type, so the type property of files in classic schema is the
right place to store the name of the character set.
by "site defaults" i mean that latin1 may be mostly correct
guess at your site, but is absolutely incorrect for my site
where text attachments most probably will be encoded with
cp866 or cp1251.
i consider this bug fixed and closed, but won't close it
again if you insist on having it open.
|
msg2074 |
Author: [hidden] (ber) |
Date: 2005-12-30 07:40 |
|
Logged In: YES
user_id=113859
Hi Alexander,
first thanks for not insisting in closing the bug.
Let me try to explain why I believe that the change you
have described does not solve the question.
Here is a case where things go wrong:
Let us assume your site default is cp1251, so you want
to write in cyrillic.
You get one (a) text/plain attachment which is cyrillic
with charset utf-8 and another one (b) which is cyrillic
with charset cp1251.
Now they both get send out with nosy, both probably cannot
be "mail-encoded" with 7-bit and with your fix will most
likely get encoded as quoted-printable.
But if you do not save the charset you probably will send
out a) with a charset of cd1251 which will break the text.
This is why I think the bug is not fully solved.
I have tried to change the subject to better reflect this.
What do you think?
So we must save the charset to have a real fix.
My patch was a dirty workaround which fixed the problem
in an expensive way, but at one place.
If you replace "latin-1" with the EMAIL charset default,
it would be an even better kludge.
But of course a better solution should be implemented.
|
msg2075 |
Author: [hidden] (a1s) |
Date: 2005-12-30 07:56 |
|
Logged In: YES
user_id=8719
if charset is not specified in file.type then mail
attachment will not have character set name too. MIME type
of the mail attachment is exactly what's saved in the type
field.
contemporary email clients should be able to cope with text
attachments without character set designator.
anyway, you can add charset to the type if you want to.
please see detectors section in "customizing roundup" document.
|
msg2076 |
Author: [hidden] (ber) |
Date: 2005-12-30 08:00 |
|
Logged In: YES
user_id=113859
Hi Alexander,
and now to the question of where to save the charset,
which seems unavoidable to save somewhere. ;-)
Reading RFC2046 the MIME maintype is "text" and
the subtype is "plain". "charset" would be a critical, but
optional parameter. So where do we save this within roundup?
Two ideas:
a) added as string to the filetype
b) creating a new parameter to the file class
b.1) calling this new parameter "charset"
b.2) making this new parameter more generic
Idea a) has the potential to break code that relies on the
assumption that all filetypes are of the form MAIN/SUB.
In addition it would always need more parsing to seperate
the main- and subtypes from the parameters, if they are
needed seperately.
Idea b.1) would need a change in the schema and be specific
to "text" maintypes. "text" will be an important case, so it
might be fine.
Idea b.2) would be generic for all parameters that are there
to come for any attachment type, so it probably should be
implemented ideally similar to a python dictionary.
Implementation and usage in the code would be more
complicated as in b.1.
In principle I do not care which solution is implemented,
as long as one is done, though. I have a tendency for the
b.1 or b.2 solutions, as I cannot judge if a) will break
anything and I do not like parsing the string each time I
want a parameter.
Thanks again for considering this
and I hope you will have a happy rollover!
Bernhard
|
msg2077 |
Author: [hidden] (a1s) |
Date: 2005-12-30 08:10 |
|
Logged In: YES
user_id=8719
assumption that no filetype has parameters is plain wrong.
parameters were in content-type since rfc1049 - more than 15
years!
|
msg2078 |
Author: [hidden] (ber) |
Date: 2006-01-02 10:42 |
|
Logged In: YES
user_id=113859
I did not write that "no filetype has parameters"
and I know how to operate Roundup's detectors.
Thanks for the lecture.
My first post was about that roundup need to record
and then set the charset parameter for text/plain file
attachment on all occasions. This does not seem to be done
yet. Sending out an 8bit attachment without that parameter
calls for trouble. I did not look up if latin-1 or utf-8
should be assumed in this case, but anyway, it is quite
likely to be wrong.
You have not answered my question wether your patch will
fix the scenario for the bug that I have described, btw.
Can I include that you agree that the charset parameter
should be saved and that this is not the case currently?
Then we only disagree if this should be done by default
or not. I say: Yes, it is a serious bug as roundup is
unusable in environments where e.g. latin-1 and utf-8 based
texts are used. Umlauts break frequently and users
rightfully think that this is the software.
My second post was about where to save the parameter.
Just because in emails this is saved in a body part header
file as string in a parameter, roundup does not need to do
this. From your answer I conclude that you like method a)
best and do not care about the style of code when the
string is parsed. Also you want places within Roundup (not
within RFC anything) to break if they have made the
assumption that "type/subtype" is what they get as string?
|
msg2079 |
Author: [hidden] (a1s) |
Date: 2006-01-02 10:58 |
|
Logged In: YES
user_id=8719
roundup does not need to record charset on all occasions.
if you need that, you can do that with database detectors.
sending mail attachments without charset parameter in
content-type is not a bug. 8-bit characters with 7-bit
transfer encoding was a bug. it is fixed now.
yes, i think that the best place to store charset is mime
type property. but if you want to store character set name
separately in your tracker, you are free to do that.
there should be no places in roundup breaking if file type
contains parameters. if there are such places, they must be
fixed.
|
msg2080 |
Author: [hidden] (ber) |
Date: 2006-01-02 11:03 |
|
Logged In: YES
user_id=113859
So you do not think that the behavious that I have outlined
is a bug? To me it clearly shows that roundup will have to
record the charset to be able to display those attachments
in a browser or per email correctly.
It is not a hard bug not having a charset in an email, but
it leads to a bug for the users, because the umlauts will
be broken. I have this on a live system.
I also have the problem with web browsers, btw,
because roundup does not know the charset, it cannot give
it to the browser, who will then display the texts wrong.
How do you envision to fix the bad behaviour without saving
the charset? I created a case that will occur often in
non-us environments and will lead to broken behaviour (for
the users).
|
msg2081 |
Author: [hidden] (a1s) |
Date: 2006-01-02 11:23 |
|
Logged In: YES
user_id=8719
if file type contains charset name it will be set in
content-type headings both in emails and in http displays.
no umlauts will be broken.
if there is no charset name recorded, user agent (email
program or web browser) lets the user to select correct
charset. no umlauts are broken.
but there is no globally correct way for roundup to guess
the character set if it is not specified explicitely. (and
with incorrect guesses umlauts will be broken for sure.) if
you think there is sitewide correct way to do that for your
site, please use database detectors.
|
msg2082 |
Author: [hidden] (ber) |
Date: 2006-01-02 13:33 |
|
Logged In: YES
user_id=113859
If you are saying that the current version does record
the charset parameters when input comes from email or
http, then indeed the bug would be fixed to a large extend.
(My testing was done with a 0.7.x version where
charset is not recorded.)
In addition I would say that a text/plain attachment
without charset is incomplete within Roundup as Roundup by
definition should talk to several systems. So adding a
guess by default (no matter how it is done technically)
should be wise. To do the guess best, it would need the
full information of the input channcel (web, http or
email). Is this available from the detectors?
|
msg2083 |
Author: [hidden] (ber) |
Date: 2006-01-30 16:58 |
|
Logged In: YES
user_id=113859
Hi Richard,
does closing the bug with "works for me" means,
that you have retested this with roundup 1.0 and you are
sure that the bug is gone?
I definately saw these problems in an environment
where people use webbrowsers and email clients with
different locales (iso-8859-15 and utf8) and do attachments
with umlauts.
But if you say it is gone with 1.0, this would be cool!
|
|
Date |
User |
Action |
Args |
2005-12-15 12:39:38 | ber | create | |
|