Roundup Tracker - Issues

Issue 2551262

classification
Make mail gateway issue matching more fault tolerant
Type: behavior Severity: major
Components: Mail interface Versions: 2.2.0
process
Status: open remind
:
: rouilj : Heiko, rouilj, schlatterbeck
Priority: : Blocker

Created on 2023-02-22 08:13 by Heiko, last changed 2023-03-31 02:11 by rouilj.

Messages
msg7724 Author: [hidden] (Heiko) Date: 2023-02-22 08:13
Roundup's mail gateway allows to set add email content to an issue by specifying the issue designator in square 
brackets at the beginning of the email subject line. However, the syntax must exactly follow the scheme [issue<num>]. 
It must be placed at the beginning of the subject line, no blank spaces are allowed, only square brackets will work, 
and I think it is also case sensitive.

However, it frequently happens that users do not follow this syntax, mostly unintentionally. In these cases Roundup 
will not find the matching issue and create a new one instead. In our tracker that has hundreds of users, this leads 
to a large number of useless issues that must be sorted out at great expense.

My request is to make the mail gateway based issue matching more fault tolerant. It should not matter where in the 
subject line the issue designator is placed. Blank spaces inside the brackets and brackets other than square should be 
allowed. It should also be case insensitive.
msg7725 Author: [hidden] (rouilj) Date: 2023-02-22 17:20
Hello Heiko:

I'm sorry you are getting misplaced issues that you have to
sort out.  Let's see if some of Roundup's current features
will help. Some of the changes you request could be a new
feature.

What version of Roundup are you using? I think my
suggestions have been there since the 1.x series but..

I see you included Ralf, Ralf couple of questions for you
too.

Sorry for the length of the email, but you have opened a
more complex part of Roundup.

In message <1677053611.22.0.478622424686.issue2551262@roundup.psfhosted.org>,
Heiko Stegmann writes:

> [...] frequently happens that users do not follow this
> syntax [...] In these cases Roundup will
> not find the matching issue and create a new one instead

Roundup has two additional mechanisms to try to assign
emails to issues if prefix parsing fails.

 1. It looks for an in-reply-to header in the email and
    tries to match that up against an existing message with
    the same message id. Then it puts the new message on
    issue with the matching message.
 2. If message id matching fails, it tries to match on the subject.
    If the new message's subject is changed sufficiently from the
    title of an existing issue, this will fail as well.

I would expect #1 to catch most of what you describe.

Heiko, can you find one of your misrouted messages and dump
it using:

   roundup-admin -i <tracker home> get inreplyto msg<num>

Does it have a value (not None) for inreplyto? If so, does
that value match an existing message? If you have a newer
Roundup, and it has the filter command you can use:

  roundup-admin -i <tracker home> filter msg messageid="<1675055043.0895576.IZTXE7H4VMTIDGQA.issue226@localhost>"

to find the existing message.

Ralf, let's assume Heiko's experiment indicates that
inreplyto matching should work. Does this sound like a false
positive match in prefix parsing?  IIUC that would prevent
inreplyto and subject matching. It could also be a bug in
inreplyto parsing, bu I don't know of any do you?

Parsing the subject line is tricky. We don't want false
positives where a match happens, but shouldn't be. To guard
against this we use:

  1. known location for the prefix
  2. delimiters around the prefix
  3. strict format inside the delimiters

We also don't want false negatives missing a valid match.

Relaxing #3 could be useful in preventing false negatives.

>Roundup's mail gateway allows to set add email content to
>an issue by specifying the issue designator in square
>brackets at the beginning of the email subject
>line. However, the syntax must exactly follow the scheme
>[issue<num>].

It is more complex that that, you can have:

  Subject: [user3] [realname=Fred]

to modify the realname of user3.

Also you can create a new item in the device class by
emailing:

  Subject: [device] new device [location=32; category+=computing]

which creates a device and sets some of its properties.

You are correct it must be:

 [classname<number can be optional>]

>It must be placed at the beginning of the subject line,

You can strip prefixes from the subject line using refwd_re
in config.ini. It is meant to remove the prefixes added by
mailers when replying or forwarding message. If your users
have a mail client wih a different prefix, you can add it
there.

>no blank spaces are allowed, only square brackets will work,

The brackets can be replaced with another pair of characters
using 'subject_suffix_delimiters' config.ini which works for
both the class/item prefix and the suffix modifiers. *But*
only one pair of delimiters/characters is allowed. So you
can allow

  (issue23)

or

 <issue32>

but not both.

>and I think it is also case sensitive.

That's correct since class names are case sensitive.

>However, it frequently happens that users do not follow
>this syntax

Some user education can go a long way there. However given
the variety of mail clients in use, education can be an
impossible task.

What is the setting for 'subject_prefix_parsing' in your
tracker's config.ini?

There are three settings:

  strict - which returns emails with unparsible prefixes to
           the user describing the problem and how to fix it.

  loose - which if it can't parse a prefix will pass the
          message through to create a new object in the
          default_class as specified in config.ini.

  none - which will not try to find a prefix and fall back
         to inreplyto or subject matching. If those fail, it
         will create a new object.

If you aren't using strict, it might help.

>My request is to make the mail gateway based issue matching
>more fault tolerant. It should not matter where in the
>subject line the issue designator is placed.

Are you actually seeing subject lines where the user takes
the time to edit the subject line on a reply to move the
prefix into or after the subject like:

  computer failure [issue24]

  computer [issue24] failure

or change the delimiters:

  computer failure <issue24>

  {issue24} computer failure

rather than leaving it alone and generating something like

  Re: [issue24] computer failure

>Blank spaces inside the brackets

This could be a useful option. Are people keeping the
designator together? So you see:

  [ issue24   ] 

or do they split the designator like:

  [  issue 24 ]

These look simpler to implement. Hopefully you aren't seeing

 [ iss ue 24 ]

8-).

>and brackets other than square should be allowed.

I assume here you want to support both '[]' and '<>' (for
example) as demlimiters?  If you just want to change the
delimiters, you can do that already.

>It should also be case insensitive.

You can name classes 'issue' or 'Issue' or even 'ISSUE'. So
it must be case sensitive by default. If it wasn't you
couldn't tell the difference between the three issue types.

Are you actually seeing a prefix of [issue24] being replied
to with a subject like:

  [ISSUE24]

or 

  [Issue24]

If you are seeing that, is the rest of the subject line
modified as well?  e.g.

  [issue24] computer broken

is replied to as:

  [ISSUE24] COMPUTER BROKEN

If so you would also probably want to set
'subject_updates_title = no' in config.ini so that people
using mail readers that sort/thread by subject don't get
broken.

However adding a config.ini option to make class name
matching for the prefix case insensitive could be possible.
msg7727 Author: [hidden] (schlatterbeck) Date: 2023-02-23 09:55
On Wed, Feb 22, 2023 at 05:20:45PM +0000, John Rouillard wrote:
> 
> Roundup has two additional mechanisms to try to assign
> emails to issues if prefix parsing fails.
> 
>  1. It looks for an in-reply-to header in the email and
>     tries to match that up against an existing message with
>     the same message id. Then it puts the new message on
>     issue with the matching message.

Are you sure this mechanism works in all cases? I have lots of cases
where a direct reply does not work, I've never considered that matching
happens by in-reply-to header.

>  2. If message id matching fails, it tries to match on the subject.
>     If the new message's subject is changed sufficiently from the
>     title of an existing issue, this will fail as well.

Yes, the particular use-case Heiko is referring to is something like

[issue 4711]
i.e. an space after issue is the most common thing people get wrong.
Now if someone is editing the subject by hand, chances are that the
whole mail is not composed as an answer to an existing incoming mail...

So the idea would be to make matching more tolerant like in a message
body. This allows different case and/or a space between class name and
node-id. (I know because this bites me regularly: I'm ham-radio operator
and I'm using roundup for tracking my successful two-way calls, the term
for these is 'QSO'. The table in asterisk is named 'qso'. I put messages
another ham sends me with a confirmation as a message into asterisk
(from electronic services like eqsl.cc or lotw.arrl.org) These often
read something like 'tnx fr qso 73 <callsign>' 73 meaning something like
'best regards'. I can't count the number messages that link to qso73 :-)

So if we change this it probably should be configurable. And when at it
we might want to make the message-body matching configurable, too for
the use-case above.

> Ralf, let's assume Heiko's experiment indicates that
> inreplyto matching should work. Does this sound like a false
> positive match in prefix parsing?  IIUC that would prevent
> inreplyto and subject matching. It could also be a bug in
> inreplyto parsing, bu I don't know of any do you?

I don't think it matches anything, see use-case above.
But it may well be that when it started matching (message begins with
'[' after stripping reply/fwd prefixes) that no other alternatives are
tried. See above for my experience on subject matching.

> Parsing the subject line is tricky. We don't want false
> positives where a match happens, but shouldn't be. To guard
> against this we use:
> 
>   1. known location for the prefix
Yes, I would only consider a *prefix*, i.e. directly after a 'Re:' or
similar prefix in the subject (this is already a configurable regex).

>   2. delimiters around the prefix
Yes I would not depart from []. We *do* have the occasional [issue4711}
or so but I would not try to parse these.

>   3. strict format inside the delimiters
See above, for a first change I'd only allow a space between classname
and node-id. This is the most common case as far as my observations go.

> It is more complex that that, you can have:
> 
>   Subject: [user3] [realname=Fred]

Yes, but the second thing must currently be last in the subject.

> What is the setting for 'subject_prefix_parsing' in your
> tracker's config.ini?

> There are three settings:
> 
>   strict - which returns emails with unparsible prefixes to
>            the user describing the problem and how to fix it.
> 
>   loose - which if it can't parse a prefix will pass the
>           message through to create a new object in the
>           default_class as specified in config.ini.
> 
>   none - which will not try to find a prefix and fall back
>          to inreplyto or subject matching. If those fail, it
>          will create a new object.
> 
> If you aren't using strict, it might help.

Yes, maybe that would indeed solve that particular issue.

> >My request is to make the mail gateway based issue matching
> >more fault tolerant. It should not matter where in the
> >subject line the issue designator is placed.

I'm very reluctant to allow the designator anywhere in the message. This
simply could create too much ambiguity, especially together with the
second bracketet expression where you can set properties.

> >Blank spaces inside the brackets
> 
> This could be a useful option. Are people keeping the
> designator together? So you see:
> 
>   [ issue24   ] 
> 
> or do they split the designator like:
> 
>   [  issue 24 ]

I think this is a good use-case and most of the cases I've seen match
that pattern.

> These look simpler to implement. Hopefully you aren't seeing
> 
>  [ iss ue 24 ]

I wouldn't allow these.

> >It should also be case insensitive.

I also would not allow case-insensitive matches.
But note that the parsing inside the message body seems to allow
capitalization. So I'm seeing something like Issue 23 highlighted as a
link to issue23. An all-uppercase ISSUE 23 is *not* linked, though.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg7728 Author: [hidden] (rouilj) Date: 2023-02-23 18:44
Hi Ralf:

In message <20230223095547.p4hypcsuawoqwcl3@runtux.com>,
Ralf Schlatterbeck writes:
>Ralf Schlatterbeck added the comment:
>
>On Wed, Feb 22, 2023 at 05:20:45PM +0000, John Rouillard wrote:
>> 
>> Roundup has two additional mechanisms to try to assign
>> emails to issues if prefix parsing fails.
>> 
>>  1. It looks for an in-reply-to header in the email and
>>     tries to match that up against an existing message with
>>     the same message id. Then it puts the new message on
>>     issue with the matching message.
>
>Are you sure this mechanism works in all cases? I have lots of cases
>where a direct reply does not work, I've never considered that matching
>happens by in-reply-to header.

Well I don't know about all cases. But the test

  testReplytoMultiMatch

in test_mailgw.py works. And the code in Codecov shows that the
branches are being executed.

Here is a funny one though, there is a test testIssueidLast which does
find the [issue1] at the end of the subject line.

(note, if I modify that test to remove the "[issue1]" at the end it does
properly match the new message by message-id/in-reply-to to issue
1.)

However if I keep the "prefix" at the end, it gets matched by
extracting it via this (unanchored) regexp:

  \\[(?P<classname>(file|issue|keyword|msg|priority|query|status|user))(?P<nodeid>\\d+)?\\]'

even though it correctly records a missing prefix. The comment with
this is that mailing list software could add an identifier as
well. This would mess up prefix detection, but designator detection would
still happen.

So it would appear that '[classNN]' is matched anywhere in the subject
line case insensitively via:

  m = re.search(class_re, tmpsubject, re.IGNORECASE)

8-/. Only the first one found is used. This is with strict parsing
too. In this case, there is a side effect of erasing the new subject
as it moves past, so the issue preserves the original title.

>>  2. If message id matching fails, it tries to match on the subject.
>>     If the new message's subject is changed sufficiently from the
>>     title of an existing issue, this will fail as well.
>
>Yes, the particular use-case Heiko is referring to is something like
>
>[issue 4711]
>i.e. an space after issue is the most common thing people get wrong.
>Now if someone is editing the subject by hand, chances are that the
>whole mail is not composed as an answer to an existing incoming mail...

True.

>So the idea would be to make matching more tolerant like in a message
>body. This allows different case and/or a space between class name and
>node-id.
>So if we change this it probably should be configurable.

Agreed. Should it be added to the definition of loose? Should there be
a separate setting (or settings in an already long file)? Should the
values of:

   subject_prefix_parsing

be extended to <required> [optional]:

   <strict|loose|none> [internal_whitespace] [delim_whitespace]
     [prefix_only]

(Better naming is needed.)

  * internal_whitespace is "[issue 24]"
  * delim_whitespace (if we want to support it) is "[ issue24 ]".
  * using both allows: "[ issue 24 ]".

Implementing prefix_only might be tricky as noted above with junk
before it.

>And when at it we might want to make the message-body matching
>configurable, too for the use-case above.

This is for a different ticket. BTW it's your fault for not naming the
class QSOcontacts rather than qso 8-). 73!

>> Ralf, let's assume Heiko's experiment indicates that
>> inreplyto matching should work. [...]
>
>I don't think it matches anything, see use-case above.

Agreed. The space between the class and id would prevent any match.

>But it may well be that when it started matching (message begins with
>'[' after stripping reply/fwd prefixes) that no other alternatives are
>tried. See above for my experience on subject matching.

The in-reply-to should be applied, but as you noted there may not be
an in-reply-to on the inbound email. Also if Roundup doesn't receive a
copy of every email in an external discussion, it will break the
chain.

Hence my asking Heiko to fire up roundup-admin for a look.

>> Parsing the subject line is tricky. We don't want false
>> positives where a match happens, but shouldn't be. To guard
>> against this we use:
>> 
>>   1. known location for the prefix
>Yes, I would only consider a *prefix*, i.e. directly after a 'Re:' or
>similar prefix in the subject (this is already a configurable regex).

I wonder if we should anchor the identifier prefix and handle any
other mailing list junk as part of the Re: removal stuff.

>>   2. delimiters around the prefix
>Yes I would not depart from []. We *do* have the occasional [issue4711}
>or so but I would not try to parse these.

ok.

>>   3. strict format inside the delimiters
>See above, for a first change I'd only allow a space between classname
>and node-id. This is the most common case as far as my observations go.

with some format variations (spacing) allowed by configuration.

>> It is more complex that that, you can have:
>> 
>>   Subject: [user3] [realname=Fred]
>
>Yes, but the second thing must currently be last in the subject.

It would appear not. Also if you do:

   Subject: [realname=Fred] [user3]

I think the suffix parsing will not happen. When suffix parsing
happens, the subject line consists what is after '[user3]'.

>> What is the setting for 'subject_prefix_parsing' in your
>> tracker's config.ini?
>> [...]
>> If you aren't using strict, it might help.
>
>Yes, maybe that would indeed solve that particular issue.

From the code it looks like it should if the designator is found as a
prefix.

>> >My request is to make the mail gateway based issue matching
>> >more fault tolerant. It should not matter where in the
>> >subject line the issue designator is placed.
>
>I'm very reluctant to allow the designator anywhere in the message. This
>simply could create too much ambiguity, especially together with the
>second bracketed expression where you can set properties.

Well we have that already so... Heiko consider this implemented 8-).

>> >Blank spaces inside the brackets
>> [...]
>> or do they split the designator like:
>> 
>>   [  issue 24 ]
>
>I think this is a good use-case and most of the cases I've seen match
>that pattern.

Ok.

>>  [ iss ue 24 ]
>
>I wouldn't allow these.

Ok.

>> >It should also be case insensitive.
>
>I also would not allow case-insensitive matches.

Apparently it is already. Although nothing I found in the code
indicates that classes are supposed to be case sensitive.

>But note that the parsing inside the message body seems to allow
>capitalization. So I'm seeing something like Issue 23 highlighted as a
>link to issue23. An all-uppercase ISSUE 23 is *not* linked, though.

I wonder if a class named QSO would be matched by qso 73?
msg7744 Author: [hidden] (rouilj) Date: 2023-03-22 00:02
changeset:   7234:86862ed039fa

makes the mail gateway treats a prefix of "[  issue 24 ]" the same as "[issue24]".
Zero or more spaces are allowed after the prefix starter '[' and before the
prefix end ']'. Also zero or more spaces are allowed between the classname and the
id number (the id number is optional).

This is not configurable.

Heiko will this help your use case?
msg7747 Author: [hidden] (rouilj) Date: 2023-03-31 02:11
Hi Heiko,

You said:

> In our tracker that has hundreds of users, this leads to a large number of
> useless issues that must be sorted out at great expense.

In addition to the change I made to make designator parsing more forgiving, it sounds
like your tracker could use a better way of handling misplaced updates/issues.
Reducing the work to merge them into the main issue.

A method to do so is discussed at https://wiki.roundup-tracker.org/MergeIssues.
In theory, you would find the misplaced issue, enter the number of the (target) issue
that was supposed to be updated and click merge.

This would retire the misplaced issue and move the messages into the correct issue.
You could even have it automatically send an email to the person who created the
misplaced issue. This email could tell them how to properly reply to the issue.

The wiki implementation is a bit dated, but there are updates at the end of it
describing how to make it work with 1.x Roundup that looks like it should work for
2.x as well.

I would be happy to work with you to update the wiki page and get a working recipe
that could be implemented. If it works well enough we can merge it into the distributed
templates. If you are interested, open a new ticket and we can work on it.
History
Date User Action Args
2023-03-31 02:11:17rouiljsetmessages: + msg7747
2023-03-22 00:02:51rouiljsetstatus: new -> open
assignee: rouilj
resolution: remind
messages: + msg7744
2023-03-20 15:45:46rouiljsetkeywords: + Blocker
2023-02-23 18:44:59rouiljsetmessages: + msg7728
2023-02-23 09:55:51schlatterbecksetmessages: + msg7727
2023-02-22 17:20:45rouiljsetnosy: + rouilj
messages: + msg7725
2023-02-22 08:13:31Heikocreate