Roundup Tracker - Issues

Issue 2551008

classification
Wrong header encoding handling in mailgw.py
Type: behavior Severity: normal
Components: Mail interface Versions: devel, 1.6, 1.5
process
Status: closed fixed
:
: ezio.melotti : ezio.melotti, joseph_myers, rouilj
Priority: normal : Blocker, Effort-Low, patch

Created on 2018-10-10 01:07 by ezio.melotti, last changed 2019-06-05 22:33 by ezio.melotti.

Messages
msg6276 Author: [hidden] (ezio.melotti) Date: 2018-10-10 01:07
RoundupMessage._decode_header (formerly known as
Message._decode_header_to_utf8) is incorrect when the encoding is missing:

    def _decode_header(self, hdr):
        parts = []
        for part, encoding in decode_header(hdr):
            if encoding:
                part = part.decode(encoding)
            parts.append(part)
        return ''.join([u2s(p) for p in parts])

If the encoding is specified, the parts will be decoded to a list of
unicode strings, if it isn't, parts will be a list of byte strings.  In
the latter case, u2s() will fail to encode the byte strings on Python 2
if they contain non-ascii characters, and it will always fail on Python
3 since byte strings don't have an .encode() method.

I fixed this downstream by attempting the decoding using utf-8 first and
falling back on iso-8859-1 if that fails:
* https://hg.python.org/tracker/roundup/rev/d7454b42b914
* http://psf.upfronthosting.co.za/roundup/meta/issue668

The code on 1.5 is slightly different, but the logic is the same.
msg6295 Author: [hidden] (joseph_myers) Date: 2018-10-28 19:40
I think this fix is appropriate to apply (a testcase would be nice to have 
in the testsuite, but I don't know how hard that is to write).
msg6503 Author: [hidden] (rouilj) Date: 2019-06-02 14:26
Ezio, do you have a test case for this patch?

-- rouilj
msg6504 Author: [hidden] (rouilj) Date: 2019-06-02 21:16
Additional info from offline discussion with Ezio:

Crash is:

Traceback (most recent call last):
  File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", 
line 1519, in handle_Message
    return self.handle_message(message)
  File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", 
line 1590, in handle_message
    return self._handle_message(message)
  File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", 
line 1601, in _handle_message
    self.parsed_message = self.parsed_message_class(self, message)
  File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", 
line 555, in __init__
    self.subject = message.getheader('subject', '')
  File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", 
line 285, in getheader
    return self._decode_header_to_utf8(hdr)
  File "/home/roundup/lib/python2.6/site-packages/roundup/mailgw.py", 
line 273, in _decode_header_to_utf8
    return ''.join([s.encode('utf-8') for s in l])
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 
33: ordinal not in range(128)

s is a non-ascii byte string: in py2 when you try to encode a byte 
string it first tries to automatically decode it using the default 
encoding (ascii) in order to get unicode, and then tries to encode the 
unicode string using the specified encoding -- but this fails for non-
ascii byte strings

Failure may be provoked by setting subject to:

  'ßðèé'.encode('utf-8')

Patch alternate URL:

  
https://bitbucket.org/python/roundup/commits/d7454b42b914a69e6d1e1de99f
e79fa6c8d6d982


at line 273 all the parts in l are encoded, in order to be encoded
they all must be unicode however at line 266, that part is decoded
only if the encoding is specified (i.e. if the condition of the if
is true), otherwise is left as it is in order to be decoded at line
266, that part must be bytes, so if the condition is false, the part
is appended to l as bytes, and the encoding at line 273 fails.

also note that on python2 this might work as long as the part is ascii-
only due to the implicit conversion between str and unicode, but on 
python 3, line 273 will always fail if the encoding at line 265 is not 
specified
msg6528 Author: [hidden] (ezio.melotti) Date: 2019-06-05 22:33
Fixed in 9cc3257d0f.
decode_header sometimes returns bytes, sometimes unicode, so the patch
checks if the type is byte and tries to decode with either the given
encoding, utf-8, or iso-8859-1 if utf-8 fails.
The patch works on both Python 2 and 3.
History
Date User Action Args
2019-06-05 22:33:49ezio.melottisetstatus: new -> closed
assignee: ezio.melotti
resolution: fixed
messages: + msg6528
2019-06-02 21:16:23rouiljsetmessages: + msg6504
2019-06-02 14:26:46rouiljsetnosy: + rouilj
messages: + msg6503
2019-06-02 02:58:38rouiljsetkeywords: + patch, Blocker
2018-10-28 19:40:44joseph_myerssetnosy: + joseph_myers
messages: + msg6295
2018-10-10 01:07:54ezio.melotticreate