Created on 2018-10-10 01:07 by ezio.melotti, last changed 2019-06-05 22:33 by ezio.melotti.
|msg6276||Author: [hidden] (ezio.melotti)||Date: 2018-10-10 01:07|
RoundupMessage._decode_header (formerly known as Message._decode_header_to_utf8) is incorrect when the encoding is missing: def _decode_header(self, hdr): parts =  for part, encoding in decode_header(hdr): if encoding: part = part.decode(encoding) parts.append(part) return ''.join([u2s(p) for p in parts]) If the encoding is specified, the parts will be decoded to a list of unicode strings, if it isn't, parts will be a list of byte strings. In the latter case, u2s() will fail to encode the byte strings on Python 2 if they contain non-ascii characters, and it will always fail on Python 3 since byte strings don't have an .encode() method. I fixed this downstream by attempting the decoding using utf-8 first and falling back on iso-8859-1 if that fails: * https://hg.python.org/tracker/roundup/rev/d7454b42b914 * http://psf.upfronthosting.co.za/roundup/meta/issue668 The code on 1.5 is slightly different, but the logic is the same.
|msg6295||Author: [hidden] (joseph_myers)||Date: 2018-10-28 19:40|
I think this fix is appropriate to apply (a testcase would be nice to have in the testsuite, but I don't know how hard that is to write).
|msg6503||Author: [hidden] (rouilj)||Date: 2019-06-02 14:26|
Ezio, do you have a test case for this patch? -- rouilj
|msg6504||Author: [hidden] (rouilj)||Date: 2019-06-02 21:16|
Additional info from offline discussion with Ezio: Crash is: Traceback (most recent call last): File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", line 1519, in handle_Message return self.handle_message(message) File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", line 1590, in handle_message return self._handle_message(message) File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", line 1601, in _handle_message self.parsed_message = self.parsed_message_class(self, message) File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", line 555, in __init__ self.subject = message.getheader('subject', '') File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", line 285, in getheader return self._decode_header_to_utf8(hdr) File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", line 273, in _decode_header_to_utf8 return ''.join([s.encode('utf-8') for s in l]) UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 33: ordinal not in range(128) s is a non-ascii byte string: in py2 when you try to encode a byte string it first tries to automatically decode it using the default encoding (ascii) in order to get unicode, and then tries to encode the unicode string using the specified encoding -- but this fails for non- ascii byte strings Failure may be provoked by setting subject to: 'ßðèé'.encode('utf-8') Patch alternate URL: https://bitbucket.org/python/roundup/commits/d7454b42b914a69e6d1e1de99f e79fa6c8d6d982 at line 273 all the parts in l are encoded, in order to be encoded they all must be unicode however at line 266, that part is decoded only if the encoding is specified (i.e. if the condition of the if is true), otherwise is left as it is in order to be decoded at line 266, that part must be bytes, so if the condition is false, the part is appended to l as bytes, and the encoding at line 273 fails. also note that on python2 this might work as long as the part is ascii- only due to the implicit conversion between str and unicode, but on python 3, line 273 will always fail if the encoding at line 265 is not specified
|msg6528||Author: [hidden] (ezio.melotti)||Date: 2019-06-05 22:33|
Fixed in 9cc3257d0f. decode_header sometimes returns bytes, sometimes unicode, so the patch checks if the type is byte and tries to decode with either the given encoding, utf-8, or iso-8859-1 if utf-8 fails. The patch works on both Python 2 and 3.
|2019-06-05 22:33:49||ezio.melotti||set||status: new -> closed|
messages: + msg6528
|2019-06-02 21:16:23||rouilj||set||messages: + msg6504|
messages: + msg6503
|2019-06-02 02:58:38||rouilj||set||keywords: + patch, Blocker|
messages: + msg6295