Message 7727 - Roundup tracker

Author	schlatterbeck
Recipients	Heiko, rouilj, schlatterbeck
Date	2023-02-23.09:55:51
Message-id	<20230223095547.p4hypcsuawoqwcl3@runtux.com>
In-reply-to	<20230222172039.8DFAA6A0010@pe15.cs.umb.edu>
On Wed, Feb 22, 2023 at 05:20:45PM +0000, John Rouillard wrote:
> 
> Roundup has two additional mechanisms to try to assign
> emails to issues if prefix parsing fails.
> 
>  1. It looks for an in-reply-to header in the email and
>     tries to match that up against an existing message with
>     the same message id. Then it puts the new message on
>     issue with the matching message.

Are you sure this mechanism works in all cases? I have lots of cases
where a direct reply does not work, I've never considered that matching
happens by in-reply-to header.

>  2. If message id matching fails, it tries to match on the subject.
>     If the new message's subject is changed sufficiently from the
>     title of an existing issue, this will fail as well.

Yes, the particular use-case Heiko is referring to is something like

[issue 4711]
i.e. an space after issue is the most common thing people get wrong.
Now if someone is editing the subject by hand, chances are that the
whole mail is not composed as an answer to an existing incoming mail...

So the idea would be to make matching more tolerant like in a message
body. This allows different case and/or a space between class name and
node-id. (I know because this bites me regularly: I'm ham-radio operator
and I'm using roundup for tracking my successful two-way calls, the term
for these is 'QSO'. The table in asterisk is named 'qso'. I put messages
another ham sends me with a confirmation as a message into asterisk
(from electronic services like eqsl.cc or lotw.arrl.org) These often
read something like 'tnx fr qso 73 <callsign>' 73 meaning something like
'best regards'. I can't count the number messages that link to qso73 :-)

So if we change this it probably should be configurable. And when at it
we might want to make the message-body matching configurable, too for
the use-case above.

> Ralf, let's assume Heiko's experiment indicates that
> inreplyto matching should work. Does this sound like a false
> positive match in prefix parsing?  IIUC that would prevent
> inreplyto and subject matching. It could also be a bug in
> inreplyto parsing, bu I don't know of any do you?

I don't think it matches anything, see use-case above.
But it may well be that when it started matching (message begins with
'[' after stripping reply/fwd prefixes) that no other alternatives are
tried. See above for my experience on subject matching.

> Parsing the subject line is tricky. We don't want false
> positives where a match happens, but shouldn't be. To guard
> against this we use:
> 
>   1. known location for the prefix
Yes, I would only consider a *prefix*, i.e. directly after a 'Re:' or
similar prefix in the subject (this is already a configurable regex).

>   2. delimiters around the prefix
Yes I would not depart from []. We *do* have the occasional [issue4711}
or so but I would not try to parse these.

>   3. strict format inside the delimiters
See above, for a first change I'd only allow a space between classname
and node-id. This is the most common case as far as my observations go.

> It is more complex that that, you can have:
> 
>   Subject: [user3] [realname=Fred]

Yes, but the second thing must currently be last in the subject.

> What is the setting for 'subject_prefix_parsing' in your
> tracker's config.ini?

> There are three settings:
> 
>   strict - which returns emails with unparsible prefixes to
>            the user describing the problem and how to fix it.
> 
>   loose - which if it can't parse a prefix will pass the
>           message through to create a new object in the
>           default_class as specified in config.ini.
> 
>   none - which will not try to find a prefix and fall back
>          to inreplyto or subject matching. If those fail, it
>          will create a new object.
> 
> If you aren't using strict, it might help.

Yes, maybe that would indeed solve that particular issue.

> >My request is to make the mail gateway based issue matching
> >more fault tolerant. It should not matter where in the
> >subject line the issue designator is placed.

I'm very reluctant to allow the designator anywhere in the message. This
simply could create too much ambiguity, especially together with the
second bracketet expression where you can set properties.

> >Blank spaces inside the brackets
> 
> This could be a useful option. Are people keeping the
> designator together? So you see:
> 
>   [ issue24   ] 
> 
> or do they split the designator like:
> 
>   [  issue 24 ]

I think this is a good use-case and most of the cases I've seen match
that pattern.

> These look simpler to implement. Hopefully you aren't seeing
> 
>  [ iss ue 24 ]

I wouldn't allow these.

> >It should also be case insensitive.

I also would not allow case-insensitive matches.
But note that the parsing inside the message body seems to allow
capitalization. So I'm seeing something like Issue 23 highlighted as a
link to issue23. An all-uppercase ISSUE 23 is *not* linked, though.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
History
Date	User	Action	Args
2023-02-23 09:55:51	schlatterbeck	set	recipients: + schlatterbeck, rouilj, Heiko
2023-02-23 09:55:51	schlatterbeck	link	issue2551262 messages
2023-02-23 09:55:51	schlatterbeck	create
Roundup Tracker - Issues

Message7727