Roundup Tracker - Issues

Issue 2551092

classification
TypeError: can only concatenate str (not "bytes") to str
Type: crash Severity: major
Components: Mail interface Versions: 2.0.0
process
Status: fixed fixed
:
: rouilj : ced, rouilj
Priority: high : patch

Created on 2020-10-08 14:51 by ced, last changed 2020-10-27 00:59 by rouilj.

Files
File name Uploaded Description Edit Remove
anypy_email.patch ced, 2020-10-08 14:51
Messages
msg6967 Author: [hidden] (ced) Date: 2020-10-08 14:51
I got this traceback on email with subject encoding Q:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/roundup/mailgw.py", line 1511, in handle_Message
    return self.handle_message(message)
  File "/usr/lib/python3.7/site-packages/roundup/mailgw.py", line 1583, in handle_message
    return self._handle_message(message)
  File "/usr/lib/python3.7/site-packages/roundup/mailgw.py", line 1594, in _handle_message
    self.parsed_message = self.parsed_message_class(self, message)
  File "/usr/lib/python3.7/site-packages/roundup/mailgw.py", line 509, in __init__
    self.subject = message.get_header('subject', '')
  File "/usr/lib/python3.7/site-packages/roundup/mailgw.py", line 252, in get_header
    return self._decode_header(value.replace('\n', ''))
  File "/usr/lib/python3.7/site-packages/roundup/mailgw.py", line 219, in _decode_header
    for part, encoding in decode_header(hdr):
  File "/usr/lib/python3.7/site-packages/roundup/anypy/email_.py", line 124, in decode_header
    last_word += word
TypeError: can only concatenate str (not "bytes") to str


I found that the roundup.anypy.email_.decode_header which is a copy of the stdlib email.header.decode_header is missing a line which convert word into bytes if it is a str (on Python3).
Here is attached a patch which applies the change only for Python3.
Also I think the fix of issue2551008 may be removed as now decode_header will always returns bytes.
It will even better to remove roundup.anypy.email_ completly once Python2 is no more supported (issue2550879).
msg6968 Author: [hidden] (rouilj) Date: 2020-10-10 04:36
Hi Cédric:

Do you have a test for this? It looks like a testcase added to:

  class HeaderRoundupMessageTests(TestCase):

of test/test_mailgw_roundupmessage.py is the right place to put it.

I spent about 1/2 an hour trying to get a test put together from
your description but I didn't get a quoted printable subject
encoding to fail.

Re rolling back the patch for issue2551008, it doesn't look like
we have a valid test/tests for that case so I am uncomfortable
with unrolling that patch. IIUC that patch and this patch would be 
compatible so nothing bad would happen if it is not rolled back.

-- rouilj
msg6975 Author: [hidden] (rouilj) Date: 2020-10-22 16:01
Ping Cédric: do you have an example subject line (in python format)
I can use in testing?
msg6977 Author: [hidden] (ced) Date: 2020-10-22 16:37
I do not have anymore the email that was breaking it.
But from memory it was a subject with some non-ascii char  like "é".
msg6978 Author: [hidden] (rouilj) Date: 2020-10-22 18:24
Hi =?utf-8?q?C=C3=A9dric_Krier?=: (hope the encoding works correcly.)

In message <1603384621.38.0.248728051864.issue2551092@roundup.psfhosted.org>,
=?utf-8?q?C=C3=A9dric_Krier?= writes:
>I do not have anymore the email that was breaking it.
>But from memory it was a subject with some non-ascii char  like "é".

Do you remember if it was a literal utf-8 character or an encoded
entity in the subject. For example in the tests I have:

  Subject: [issue] Test (=?utf-8?b?w4TDlsOc?=) umlauts

and that passes fine. isinstance(word,str) is False as word (under
python3) is a byte b'' not normal string.
msg6980 Author: [hidden] (ced) Date: 2020-10-22 19:23
On 2020-10-22 18:24, John Rouillard wrote:
> Hi =?utf-8?q?C=C3=A9dric_Krier?=: (hope the encoding works correcly.)
> 
> In message <1603384621.38.0.248728051864.issue2551092@roundup.psfhosted.org>,
> =?utf-8?q?C=C3=A9dric_Krier?= writes:
> >I do not have anymore the email that was breaking it.
> >But from memory it was a subject with some non-ascii char  like "é".
> 
> Do you remember if it was a literal utf-8 character or an encoded
> entity in the subject.

I do not remember but I know it was sent via mutt (like this email).
msg6981 Author: [hidden] (ced) Date: 2020-10-22 19:26
On 2020-10-22 18:24, John Rouillard wrote:
> In message <1603384621.38.0.248728051864.issue2551092@roundup.psfhosted.org>,
> =?utf-8?q?C=C3=A9dric_Krier?= writes:
> >I do not have anymore the email that was breaking it.
> >But from memory it was a subject with some non-ascii char  like "é".
> 
> Do you remember if it was a literal utf-8 character or an encoded
> entity in the subject. For example in the tests I have:
> 
>   Subject: [issue] Test (=?utf-8?b?w4TDlsOc?=) umlauts

It looked more like:

test_encodi?==?utf-8?B?bmcgw6k=?=
msg7001 Author: [hidden] (rouilj) Date: 2020-10-27 00:46
Applied suggested patch in: rev 6278:f21ec1414591

I wasn't able to come up with a test case that exercised it.
History
Date User Action Args
2020-10-27 00:59:48rouiljsetresolution: duplicate -> fixed
2020-10-27 00:46:12rouiljsetstatus: open -> fixed
resolution: duplicate
messages: + msg7001
2020-10-22 19:26:02cedsetmessages: + msg6981
title: TypeError: can only concatenate str (not "bytes") to str test encoding é -> TypeError: can only concatenate str (not "bytes") to str
2020-10-22 19:23:31cedsetmessages: + msg6980
title: TypeError: can only concatenate str (not "bytes") to str -> TypeError: can only concatenate str (not "bytes") to str test encoding é
2020-10-22 18:24:26rouiljsetmessages: + msg6978
2020-10-22 16:37:01cedsetmessages: + msg6977
2020-10-22 16:01:33rouiljsetmessages: + msg6975
2020-10-10 04:36:17rouiljsetpriority: high
assignee: rouilj
status: new -> open
messages: + msg6968
nosy: + rouilj
2020-10-08 14:51:36cedcreate