Issue 2551262: Make mail gateway issue matching more fault tolerant - Roundup tracker

classification

Title:	Make mail gateway issue matching more fault tolerant
Type:	behavior	Severity:	major
Components:	Mail interface	Versions:	2.2.0

process

Status:	fixed	Resolution:	fixed
Dependencies		Superseder:
Assigned To:	rouilj	Nosy List:	Heiko, rouilj, schlatterbeck
Priority:		Keywords:	Blocker

Created on 2023-02-22 08:13 by Heiko, last changed 2023-04-04 15:58 by rouilj.

Messages
msg7724	Author: [hidden] (Heiko)	Date: 2023-02-22 08:13
Roundup's mail gateway allows to set add email content to an issue by specifying the issue designator in square brackets at the beginning of the email subject line. However, the syntax must exactly follow the scheme [issue<num>]. It must be placed at the beginning of the subject line, no blank spaces are allowed, only square brackets will work, and I think it is also case sensitive. However, it frequently happens that users do not follow this syntax, mostly unintentionally. In these cases Roundup will not find the matching issue and create a new one instead. In our tracker that has hundreds of users, this leads to a large number of useless issues that must be sorted out at great expense. My request is to make the mail gateway based issue matching more fault tolerant. It should not matter where in the subject line the issue designator is placed. Blank spaces inside the brackets and brackets other than square should be allowed. It should also be case insensitive.
msg7725	Author: [hidden] (rouilj)	Date: 2023-02-22 17:20
Hello Heiko: I'm sorry you are getting misplaced issues that you have to sort out. Let's see if some of Roundup's current features will help. Some of the changes you request could be a new feature. What version of Roundup are you using? I think my suggestions have been there since the 1.x series but.. I see you included Ralf, Ralf couple of questions for you too. Sorry for the length of the email, but you have opened a more complex part of Roundup. In message <1677053611.22.0.478622424686.issue2551262@roundup.psfhosted.org>, Heiko Stegmann writes: > [...] frequently happens that users do not follow this > syntax [...] In these cases Roundup will > not find the matching issue and create a new one instead Roundup has two additional mechanisms to try to assign emails to issues if prefix parsing fails. 1. It looks for an in-reply-to header in the email and tries to match that up against an existing message with the same message id. Then it puts the new message on issue with the matching message. 2. If message id matching fails, it tries to match on the subject. If the new message's subject is changed sufficiently from the title of an existing issue, this will fail as well. I would expect #1 to catch most of what you describe. Heiko, can you find one of your misrouted messages and dump it using: roundup-admin -i <tracker home> get inreplyto msg<num> Does it have a value (not None) for inreplyto? If so, does that value match an existing message? If you have a newer Roundup, and it has the filter command you can use: roundup-admin -i <tracker home> filter msg messageid="<1675055043.0895576.IZTXE7H4VMTIDGQA.issue226@localhost>" to find the existing message. Ralf, let's assume Heiko's experiment indicates that inreplyto matching should work. Does this sound like a false positive match in prefix parsing? IIUC that would prevent inreplyto and subject matching. It could also be a bug in inreplyto parsing, bu I don't know of any do you? Parsing the subject line is tricky. We don't want false positives where a match happens, but shouldn't be. To guard against this we use: 1. known location for the prefix 2. delimiters around the prefix 3. strict format inside the delimiters We also don't want false negatives missing a valid match. Relaxing #3 could be useful in preventing false negatives. >Roundup's mail gateway allows to set add email content to >an issue by specifying the issue designator in square >brackets at the beginning of the email subject >line. However, the syntax must exactly follow the scheme >[issue<num>]. It is more complex that that, you can have: Subject: [user3] [realname=Fred] to modify the realname of user3. Also you can create a new item in the device class by emailing: Subject: [device] new device [location=32; category+=computing] which creates a device and sets some of its properties. You are correct it must be: [classname<number can be optional>] >It must be placed at the beginning of the subject line, You can strip prefixes from the subject line using refwd_re in config.ini. It is meant to remove the prefixes added by mailers when replying or forwarding message. If your users have a mail client wih a different prefix, you can add it there. >no blank spaces are allowed, only square brackets will work, The brackets can be replaced with another pair of characters using 'subject_suffix_delimiters' config.ini which works for both the class/item prefix and the suffix modifiers. But only one pair of delimiters/characters is allowed. So you can allow (issue23) or <issue32> but not both. >and I think it is also case sensitive. That's correct since class names are case sensitive. >However, it frequently happens that users do not follow >this syntax Some user education can go a long way there. However given the variety of mail clients in use, education can be an impossible task. What is the setting for 'subject_prefix_parsing' in your tracker's config.ini? There are three settings: strict - which returns emails with unparsible prefixes to the user describing the problem and how to fix it. loose - which if it can't parse a prefix will pass the message through to create a new object in the default_class as specified in config.ini. none - which will not try to find a prefix and fall back to inreplyto or subject matching. If those fail, it will create a new object. If you aren't using strict, it might help. >My request is to make the mail gateway based issue matching >more fault tolerant. It should not matter where in the >subject line the issue designator is placed. Are you actually seeing subject lines where the user takes the time to edit the subject line on a reply to move the prefix into or after the subject like: computer failure [issue24] computer [issue24] failure or change the delimiters: computer failure <issue24> {issue24} computer failure rather than leaving it alone and generating something like Re: [issue24] computer failure >Blank spaces inside the brackets This could be a useful option. Are people keeping the designator together? So you see: [ issue24 ] or do they split the designator like: [ issue 24 ] These look simpler to implement. Hopefully you aren't seeing [ iss ue 24 ] 8-). >and brackets other than square should be allowed. I assume here you want to support both '[]' and '<>' (for example) as demlimiters? If you just want to change the delimiters, you can do that already. >It should also be case insensitive. You can name classes 'issue' or 'Issue' or even 'ISSUE'. So it must be case sensitive by default. If it wasn't you couldn't tell the difference between the three issue types. Are you actually seeing a prefix of [issue24] being replied to with a subject like: [ISSUE24] or [Issue24] If you are seeing that, is the rest of the subject line modified as well? e.g. [issue24] computer broken is replied to as: [ISSUE24] COMPUTER BROKEN If so you would also probably want to set 'subject_updates_title = no' in config.ini so that people using mail readers that sort/thread by subject don't get broken. However adding a config.ini option to make class name matching for the prefix case insensitive could be possible.
msg7727	Author: [hidden] (schlatterbeck)	Date: 2023-02-23 09:55
On Wed, Feb 22, 2023 at 05:20:45PM +0000, John Rouillard wrote: > > Roundup has two additional mechanisms to try to assign > emails to issues if prefix parsing fails. > > 1. It looks for an in-reply-to header in the email and > tries to match that up against an existing message with > the same message id. Then it puts the new message on > issue with the matching message. Are you sure this mechanism works in all cases? I have lots of cases where a direct reply does not work, I've never considered that matching happens by in-reply-to header. > 2. If message id matching fails, it tries to match on the subject. > If the new message's subject is changed sufficiently from the > title of an existing issue, this will fail as well. Yes, the particular use-case Heiko is referring to is something like [issue 4711] i.e. an space after issue is the most common thing people get wrong. Now if someone is editing the subject by hand, chances are that the whole mail is not composed as an answer to an existing incoming mail... So the idea would be to make matching more tolerant like in a message body. This allows different case and/or a space between class name and node-id. (I know because this bites me regularly: I'm ham-radio operator and I'm using roundup for tracking my successful two-way calls, the term for these is 'QSO'. The table in asterisk is named 'qso'. I put messages another ham sends me with a confirmation as a message into asterisk (from electronic services like eqsl.cc or lotw.arrl.org) These often read something like 'tnx fr qso 73 <callsign>' 73 meaning something like 'best regards'. I can't count the number messages that link to qso73 :-) So if we change this it probably should be configurable. And when at it we might want to make the message-body matching configurable, too for the use-case above. > Ralf, let's assume Heiko's experiment indicates that > inreplyto matching should work. Does this sound like a false > positive match in prefix parsing? IIUC that would prevent > inreplyto and subject matching. It could also be a bug in > inreplyto parsing, bu I don't know of any do you? I don't think it matches anything, see use-case above. But it may well be that when it started matching (message begins with '[' after stripping reply/fwd prefixes) that no other alternatives are tried. See above for my experience on subject matching. > Parsing the subject line is tricky. We don't want false > positives where a match happens, but shouldn't be. To guard > against this we use: > > 1. known location for the prefix Yes, I would only consider a prefix, i.e. directly after a 'Re:' or similar prefix in the subject (this is already a configurable regex). > 2. delimiters around the prefix Yes I would not depart from []. We do have the occasional [issue4711} or so but I would not try to parse these. > 3. strict format inside the delimiters See above, for a first change I'd only allow a space between classname and node-id. This is the most common case as far as my observations go. > It is more complex that that, you can have: > > Subject: [user3] [realname=Fred] Yes, but the second thing must currently be last in the subject. > What is the setting for 'subject_prefix_parsing' in your > tracker's config.ini? > There are three settings: > > strict - which returns emails with unparsible prefixes to > the user describing the problem and how to fix it. > > loose - which if it can't parse a prefix will pass the > message through to create a new object in the > default_class as specified in config.ini. > > none - which will not try to find a prefix and fall back > to inreplyto or subject matching. If those fail, it > will create a new object. > > If you aren't using strict, it might help. Yes, maybe that would indeed solve that particular issue. > >My request is to make the mail gateway based issue matching > >more fault tolerant. It should not matter where in the > >subject line the issue designator is placed. I'm very reluctant to allow the designator anywhere in the message. This simply could create too much ambiguity, especially together with the second bracketet expression where you can set properties. > >Blank spaces inside the brackets > > This could be a useful option. Are people keeping the > designator together? So you see: > > [ issue24 ] > > or do they split the designator like: > > [ issue 24 ] I think this is a good use-case and most of the cases I've seen match that pattern. > These look simpler to implement. Hopefully you aren't seeing > > [ iss ue 24 ] I wouldn't allow these. > >It should also be case insensitive. I also would not allow case-insensitive matches. But note that the parsing inside the message body seems to allow capitalization. So I'm seeing something like Issue 23 highlighted as a link to issue23. An all-uppercase ISSUE 23 is not linked, though. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7728	Author: [hidden] (rouilj)	Date: 2023-02-23 18:44
Hi Ralf: In message <20230223095547.p4hypcsuawoqwcl3@runtux.com>, Ralf Schlatterbeck writes: >Ralf Schlatterbeck added the comment: > >On Wed, Feb 22, 2023 at 05:20:45PM +0000, John Rouillard wrote: >> >> Roundup has two additional mechanisms to try to assign >> emails to issues if prefix parsing fails. >> >> 1. It looks for an in-reply-to header in the email and >> tries to match that up against an existing message with >> the same message id. Then it puts the new message on >> issue with the matching message. > >Are you sure this mechanism works in all cases? I have lots of cases >where a direct reply does not work, I've never considered that matching >happens by in-reply-to header. Well I don't know about all cases. But the test testReplytoMultiMatch in test_mailgw.py works. And the code in Codecov shows that the branches are being executed. Here is a funny one though, there is a test testIssueidLast which does find the [issue1] at the end of the subject line. (note, if I modify that test to remove the "[issue1]" at the end it does properly match the new message by message-id/in-reply-to to issue 1.) However if I keep the "prefix" at the end, it gets matched by extracting it via this (unanchored) regexp: \\[(?P<classname>(file\|issue\|keyword\|msg\|priority\|query\|status\|user))(?P<nodeid>\\d+)?\\]' even though it correctly records a missing prefix. The comment with this is that mailing list software could add an identifier as well. This would mess up prefix detection, but designator detection would still happen. So it would appear that '[classNN]' is matched anywhere in the subject line case insensitively via: m = re.search(class_re, tmpsubject, re.IGNORECASE) 8-/. Only the first one found is used. This is with strict parsing too. In this case, there is a side effect of erasing the new subject as it moves past, so the issue preserves the original title. >> 2. If message id matching fails, it tries to match on the subject. >> If the new message's subject is changed sufficiently from the >> title of an existing issue, this will fail as well. > >Yes, the particular use-case Heiko is referring to is something like > >[issue 4711] >i.e. an space after issue is the most common thing people get wrong. >Now if someone is editing the subject by hand, chances are that the >whole mail is not composed as an answer to an existing incoming mail... True. >So the idea would be to make matching more tolerant like in a message >body. This allows different case and/or a space between class name and >node-id. >So if we change this it probably should be configurable. Agreed. Should it be added to the definition of loose? Should there be a separate setting (or settings in an already long file)? Should the values of: subject_prefix_parsing be extended to <required> [optional]: <strict\|loose\|none> [internal_whitespace] [delim_whitespace] [prefix_only] (Better naming is needed.) * internal_whitespace is "[issue 24]" * delim_whitespace (if we want to support it) is "[ issue24 ]". * using both allows: "[ issue 24 ]". Implementing prefix_only might be tricky as noted above with junk before it. >And when at it we might want to make the message-body matching >configurable, too for the use-case above. This is for a different ticket. BTW it's your fault for not naming the class QSOcontacts rather than qso 8-). 73! >> Ralf, let's assume Heiko's experiment indicates that >> inreplyto matching should work. [...] > >I don't think it matches anything, see use-case above. Agreed. The space between the class and id would prevent any match. >But it may well be that when it started matching (message begins with >'[' after stripping reply/fwd prefixes) that no other alternatives are >tried. See above for my experience on subject matching. The in-reply-to should be applied, but as you noted there may not be an in-reply-to on the inbound email. Also if Roundup doesn't receive a copy of every email in an external discussion, it will break the chain. Hence my asking Heiko to fire up roundup-admin for a look. >> Parsing the subject line is tricky. We don't want false >> positives where a match happens, but shouldn't be. To guard >> against this we use: >> >> 1. known location for the prefix >Yes, I would only consider a prefix, i.e. directly after a 'Re:' or >similar prefix in the subject (this is already a configurable regex). I wonder if we should anchor the identifier prefix and handle any other mailing list junk as part of the Re: removal stuff. >> 2. delimiters around the prefix >Yes I would not depart from []. We do have the occasional [issue4711} >or so but I would not try to parse these. ok. >> 3. strict format inside the delimiters >See above, for a first change I'd only allow a space between classname >and node-id. This is the most common case as far as my observations go. with some format variations (spacing) allowed by configuration. >> It is more complex that that, you can have: >> >> Subject: [user3] [realname=Fred] > >Yes, but the second thing must currently be last in the subject. It would appear not. Also if you do: Subject: [realname=Fred] [user3] I think the suffix parsing will not happen. When suffix parsing happens, the subject line consists what is after '[user3]'. >> What is the setting for 'subject_prefix_parsing' in your >> tracker's config.ini? >> [...] >> If you aren't using strict, it might help. > >Yes, maybe that would indeed solve that particular issue. From the code it looks like it should if the designator is found as a prefix. >> >My request is to make the mail gateway based issue matching >> >more fault tolerant. It should not matter where in the >> >subject line the issue designator is placed. > >I'm very reluctant to allow the designator anywhere in the message. This >simply could create too much ambiguity, especially together with the >second bracketed expression where you can set properties. Well we have that already so... Heiko consider this implemented 8-). >> >Blank spaces inside the brackets >> [...] >> or do they split the designator like: >> >> [ issue 24 ] > >I think this is a good use-case and most of the cases I've seen match >that pattern. Ok. >> [ iss ue 24 ] > >I wouldn't allow these. Ok. >> >It should also be case insensitive. > >I also would not allow case-insensitive matches. Apparently it is already. Although nothing I found in the code indicates that classes are supposed to be case sensitive. >But note that the parsing inside the message body seems to allow >capitalization. So I'm seeing something like Issue 23 highlighted as a >link to issue23. An all-uppercase ISSUE 23 is not linked, though. I wonder if a class named QSO would be matched by qso 73?
msg7744	Author: [hidden] (rouilj)	Date: 2023-03-22 00:02
changeset: 7234:86862ed039fa makes the mail gateway treats a prefix of "[ issue 24 ]" the same as "[issue24]". Zero or more spaces are allowed after the prefix starter '[' and before the prefix end ']'. Also zero or more spaces are allowed between the classname and the id number (the id number is optional). This is not configurable. Heiko will this help your use case?
msg7747	Author: [hidden] (rouilj)	Date: 2023-03-31 02:11
Hi Heiko, You said: > In our tracker that has hundreds of users, this leads to a large number of > useless issues that must be sorted out at great expense. In addition to the change I made to make designator parsing more forgiving, it sounds like your tracker could use a better way of handling misplaced updates/issues. Reducing the work to merge them into the main issue. A method to do so is discussed at https://wiki.roundup-tracker.org/MergeIssues. In theory, you would find the misplaced issue, enter the number of the (target) issue that was supposed to be updated and click merge. This would retire the misplaced issue and move the messages into the correct issue. You could even have it automatically send an email to the person who created the misplaced issue. This email could tell them how to properly reply to the issue. The wiki implementation is a bit dated, but there are updates at the end of it describing how to make it work with 1.x Roundup that looks like it should work for 2.x as well. I would be happy to work with you to update the wiki page and get a working recipe that could be implemented. If it works well enough we can merge it into the distributed templates. If you are interested, open a new ticket and we can work on it.
msg7751	Author: [hidden] (Heiko)	Date: 2023-04-04 02:02
> Heiko will this help your use case? Yes, definitely. That will already avoid the majority of the 'useless issues'!
msg7752	Author: [hidden] (Heiko)	Date: 2023-04-04 02:15
> Reducing the work to merge them into the main issue. We already have that merge functionality in the tracker, based on the example you mentioned. > You could even have it automatically send an email to the person who created the misplaced issue. This email could tell them how to properly reply to the issue. Indeed, the underlying problem is that users are not aware of misplaced issues. Most will notice only at some later point in time that their email has not been filed into the intended issue. Or they will not notice at all that something went wrong. An automatic notification mail about the misplaced issue would be helpful. Actually, I am not doing the technical support for that tracker anymore - Ralf is doing that. Ralf, can you support that? I guess such automatic warning mails would considerably reduce the work load of the tracker maintainers.

History
Date	User	Action	Args
2023-04-04 15:58:35	rouilj	set	status: open -> fixed resolution: remind -> fixed
2023-04-04 02:15:02	Heiko	set	messages: + msg7752
2023-04-04 02:02:05	Heiko	set	messages: + msg7751
2023-03-31 02:11:17	rouilj	set	messages: + msg7747
2023-03-22 00:02:51	rouilj	set	status: new -> open assignee: rouilj resolution: remind messages: + msg7744
2023-03-20 15:45:46	rouilj	set	keywords: + Blocker
2023-02-23 18:44:59	rouilj	set	messages: + msg7728
2023-02-23 09:55:51	schlatterbeck	set	messages: + msg7727
2023-02-22 17:20:45	rouilj	set	nosy: + rouilj messages: + msg7725
2023-02-22 08:13:31	Heiko	create