Roundup Tracker - Issues

Issue 2551119

classification
Title: Bug in Query-Generator for filter (rdbms backend)
Type: behavior Severity: normal
Components: Database, Test Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: schlatterbeck Nosy List: ber, rouilj, schlatterbeck
Priority: Keywords: Blocker, patch

Created on 2021-03-18 12:13 by schlatterbeck, last changed 2021-06-10 01:57 by rouilj.

Files
File name Uploaded Description Edit Remove
empty_multilink_expr.diff ber, 2021-03-18 17:12
roundup-keyword-expr-20210414-1.png ber, 2021-04-14 13:35
Messages
msg7124 Author: [hidden] (schlatterbeck) Date: 2021-03-18 12:13
There is a feature in the query generator that can combine ID search with boolean expressions AND, OR, NOT, see commit f1fe6fd0aa61. This change came without a regression test and breaks for certain cases. Without a test of the *intended* feature I dare not touch that code. As far as I understand this uses *negative* IDs to implement a prefix or suffix expression parser.

My use-case that breaks:

We have two tables, time_project and time_wp with the following data structures in roundup (excerpt):

time_project

  wps = Multilink('time_wp')
  status = Link('project_status')
  name = String()

time_wp

  bookers = Multilink(user)

Now a user enters a query with a work-package name that doesn't exist in the web-interface. The resulting call to filter has the following arguments:

sort = [('+', 'name')]
filterspec = {'status': ['2', '1'], 'wps.bookers': ['2428'], 'wps': []}

This produces a traceback:
<class 'psycopg2.errors.UndefinedTable'>: missing FROM-clause entry for table "_time_wp3" LINE 1: ...t from _time_wp where _time_wp.__retired__=0) and _time_wp3.... ^

...

  return render(ob, econtext.vars)
  File "/usr/local/lib/python2.7/dist-packages/roundup/cgi/PageTemplates/Expressions.py", line 99, in render
    ob = ob()
  File "/usr/local/lib/python2.7/dist-packages/roundup/cgi/templating.py", line 3257, in batch
    l = [id for id in klass.filter(matches, filterspec, sort, group)
  File "/usr/local/lib/python2.7/dist-packages/roundup/backends/rdbms_common.py", line 2858, in filter
    sq = self._filter_sql (search_matches, filterspec, sort, group,
  File "/usr/local/lib/python2.7/dist-packages/roundup/backends/rdbms_common.py", line 233, in sql
    cursor.execute(sql, args)
UndefinedTable: missing FROM-clause entry for table "_time_wp3"
LINE 1: ...t from _time_wp where _time_wp.__retired__=0) and _time_wp3....
                                                             ^

It looks like the mentioned buggy code is triggered by the empty wps multilink in the search.
Note how it tries to materialize the whole time_wp table (which contains 43665 entries in our case!) in a subquery.
And it doesn't seem to correctly plug into the query generator: the from clause is missing an entry which is later referenced.

I'd appreciate if someone using this extended query feature can turn on SQL statement logging (either in roundup or in the backend database) and provide me with pairs of filter calls and the generated SQL. Then I'd write an regression test for the intended behaviour.

@Bernhard Reiter: Do you still use roundup with that feature? Examples of usage would be welcome!
msg7125 Author: [hidden] (ber) Date: 2021-03-18 14:59
Hi Ralf,
you seem to use the postgresl backen.

AFAIK we still use this feature (though I don't know how many searches are done with it). We haven't looked at the code for a long while, the idea is to be able to compose complex searches.

What is the fastest way to get a reproducable problem setup?
msg7126 Author: [hidden] (ber) Date: 2021-03-18 17:12
Ralf,
here is an into-the-blue-patch by one of our devs,
more like a guess, but maybe it helps you anyway.
(Sorry for not being more solid here, this certainly
could have been documented better.)

Best,
Bernhard
msg7127 Author: [hidden] (schlatterbeck) Date: 2021-03-18 19:30
Hi Bernhard, thanks for the quick reaction.

On Thu, Mar 18, 2021 at 02:59:46PM +0000, Bernhard Reiter wrote:
> 
> Hi Ralf,
> you seem to use the postgresl backen.

Yes. But I guess the problem is also reproduceable with mysql, the
problem is in the query generator not backend specific.

> AFAIK we still use this feature (though I don't know how many searches
> are done with it). We haven't looked at the code for a long while, the
> idea is to be able to compose complex searches.

Cool.

> What is the fastest way to get a reproducable problem setup?

It would be nice if you could come up with examples of all the usual
queries involving AND, OR, NOT and send me
- The parameters to the filter function for a specific query
- Optionally the generated sql (I can infer this myself).

So this probable involves specifying some common queries via the web
interface and looking what the filter method in backends/rdbms_common.py
is called with. Unfortunately there is no logging for this, so you have
to either instrument filter or set a breakpoint there in the python
debugger.
Accompanying the examples with the expression specified in the web
interface would be nice, so I won't have to reverse-engineer that :-)

The whole idea of all this is to come up with a test that covers
functionality people (your customers among them :-) rely on before
proceeding.

On Thu, Mar 18, 2021 at 05:12:36PM +0000, Bernhard Reiter wrote:
> 
> Ralf,
> here is an into-the-blue-patch by one of our devs,
> more like a guess, but maybe it helps you anyway.
> (Sorry for not being more solid here, this certainly
> could have been documented better.)

Hmm, such a patch without a proper test is just what I want to avoid at
this time :-) Once the behaviour is nailed down it should be easy to
come up with a proper patch. Note that I have other systems where this
misbehaves. I ideally would find out what goes wrong there and come up
with additional tests (which would fail at this time) and proceed from
there. I already have some analyses on what I *think* is wrong. But
first I have to find out what it does when it's right :-)

Thanks
Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg7128 Author: [hidden] (ber) Date: 2021-03-19 08:35
Hi Ralf,
just tried to be helpful as fast as I could yesterday. :)

The original idea was to generally be able to have queries, where
e.g. one status or two statuses, or priorities 
are excluded while others are allowed. One of the
standard templates has the javascript code to form the queries.
So you should be able to find test cases as well.

The implementation idea had to find a few to express this via the url,
and thus the notation was invented.

If I have more time, I'll look it up here.

Best,
Bernhard
msg7129 Author: [hidden] (rouilj) Date: 2021-03-22 00:56
Including link to issue2550648 to see all tickets related to this.
There are 3 or 4 opened tickets.
msg7195 Author: [hidden] (ber) Date: 2021-04-14 13:35
== Use Case (at Intevation)

We want to search for a combination of keywords, for example
all issues that have both the keywords "roundup" and "admin"

=== How it is used

The query build from our template allows this, here is an url shortened to the
significant parts, which will only list issues that have both keyword 1 and keyword 4.

URL/issue?&keyword=1,4,-3&@action=search

For this the expression editor is opened, the expression is constructed
and "apply"ed to it is used as value for the keyword to search.
If "OR" was selected, the URL would be

URL/issue?&keyword=1,4,-4&@action=search

So we have a postfix notation.


roundup-keyword-expr-20210414-1.png shows an editor with "(a and b) or c"
after pressing apply it will be entered in the current search page.

=== Template
It is in the default classic roundup.

testing with
changeset:   6377:a7e7314fb7d9
tag:         tip
user:        John Rouillard <rouilj@ieee.org>
date:        Sat Apr 10 22:10:07 2021 -0400


python demo.py nuke
* login as demo, demo
* create keywords a, b, c
* create issues with a combination of the keywords.
* use issue search page and expression editor to form expressions.

A template set used by some of our still running elder trackers has the similar code
https://hg.intevation.de/roundup/fast-decomposed/file/tip/html
or hg clone https://hg.intevation.de/roundup/fast-decomposed fast-decomposed-hg


Ralf, is that helping you? Thanks for your interest in this feature. :)
(Sorry for having taken a while to respond here.)
msg7196 Author: [hidden] (schlatterbeck) Date: 2021-04-14 14:20
On Wed, Apr 14, 2021 at 01:35:01PM +0000, Bernhard Reiter wrote:

[...]
> Ralf, is that helping you? Thanks for your interest in this feature. :)
> (Sorry for having taken a while to respond here.)

Yes! Thanks a lot, this is helping me to write tests for the indended
semantics. I'm currently busy with another project (also roundup) so it
will take me one or two weeks to come back to this.

Thanks again for taking the time!

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg7212 Author: [hidden] (rouilj) Date: 2021-05-06 19:10
Hi Ralf,

I am starting to plan the next roundup release. I would like to get a 
beta out end of may (no alpha this time). That way I can get the 20
year anniversary edition on in July. I'll probably release it on the 
13th again a few days later then the first release: 0.1.0 2001-07-11.

This is one of the few active bugs, do you think you can get it
sorted (with or without docs) by end May or July?

-- rouilj
msg7217 Author: [hidden] (schlatterbeck) Date: 2021-05-07 12:24
On Thu, May 06, 2021 at 07:10:58PM +0000, John Rouillard wrote:
> 
> I am starting to plan the next roundup release. I would like to get a 
> beta out end of may (no alpha this time). That way I can get the 20
> year anniversary edition on in July. I'll probably release it on the 
> 13th again a few days later then the first release: 0.1.0 2001-07-11.

Cool!

> This is one of the few active bugs, do you think you can get it
> sorted (with or without docs) by end May or July?

I'll try my best. Unfortunately the multilink expression feature is in
worse shape than I anticipated: The very first test (the one suggested
by Bernhard) fails for mysql.
When a '-1' id (meaning "empty multilink") is involved, at least the
tests I came up with fail for *all* backends.

I'll commit what I have later today (when I'm reasonably sure I didn't
break other tests, I've introduced a new class "keyword" in the tests)

Once I have this in place the idea is
- to get mysql working
- to get the feature working with '-1' in a query
- to allow '-1' mixed with other ids as an implicit "or" (my memory says
  that this used to work before the multilink-expression feature was
  introduced but I may be wrong and it may well be that this never
  worked for multilinks)
- To ensure that if only '-1' (and no other boolean operator with lower
  negative id) is part of the query we do not materialize the whole
  multilink table in a subquery (that would be fine for 'keyword' but I
  have some tables where this would materialize *large* datasets)
- Finally try to reproduce the bug for which the issue was originally
  created by writing a test for it
- and fix it :-)

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg7218 Author: [hidden] (schlatterbeck) Date: 2021-05-07 13:31
The generated sql for mysql reads:

'select distinct(_issue.id) from issue_keywords,_issue   where (1=0) and _issue.__retired__=0 order by _issue.id'

not very likely this will ever return the correct set of items :-)
msg7219 Author: [hidden] (rouilj) Date: 2021-05-07 15:53
Hi Ralf, it looks like anydbm and sqlite are working now
and you got mysql running. Really just a single line change?

I am surprised this bug didn't cause other
issues. If I entered -1 for unset in a search form would
that line be hit? If so would it work both before and after
your change?

I am pushing to CI now and will see how many times that line
is covered. I wonder if we have a test for -1 (unset) somewhere
in the test suite.

Nice find.
msg7220 Author: [hidden] (schlatterbeck) Date: 2021-05-07 19:18
On Fri, May 07, 2021 at 03:53:57PM +0000, John Rouillard wrote:
> Hi Ralf, it looks like anydbm and sqlite are working now
> and you got mysql running. Really just a single line change?
> 
> I am surprised this bug didn't cause other
> issues. If I entered -1 for unset in a search form would
> that line be hit? If so would it work both before and after
> your change?

No this is the code with the expression parser for multilinks,
e.g. the list '1', '2', '-3' would specify '1' AND '2' ('-3' is the
"opcode" for AND). I think this has never worked for mysql. The constants
in the expression representation are integers and IDs in roundup always
were strings. So this never matched.
A single '-1' would have worked. And now (latest commit some minutes
ago) a list like '-1', '1', '2' would also work, this is implicit OR
so it would mean a multilink containing '1' or '2' or being empty.
I think this case has *never* worked. It *did* work for Link properties
but not for multilinks.

> I am pushing to CI now and will see how many times that line
> is covered. I wonder if we have a test for -1 (unset) somewhere
> in the test suite.

Yes, be sure to include my latest commit.

Note that I'm still planning further changes. The original bug (the
reason I created this issue in the first place) is still not found.
See my todo-list in msg7217, I've finished point 3 in that list :-)
And I think point 4 is impossible, we always need a subquery for
multilinks. And I'd like to make the expression parser work with:
- '-1' being part of an expression, e.g., '1', '2', '-3', '-1', '-4'
  meaning "1 and 2 or empty"
- reverse multilinks, I'm not sure this would work with anydbm, I'm
  fairly sure it should work with the sql backends. But this needs a
  test.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg7222 Author: [hidden] (schlatterbeck) Date: 2021-05-10 14:36
OK, I've finally found + fixed the problem.

During the search I discovered more cases where the query generator did not produce correct results. And the search method for anydbm (Proptree in hyperdb) was originally built on the assumption that an empty list [] would not produce any results when searching for links and multilinks. This has been changed without considering the semantics and also needed fixing (originally searching for empty Link/Multilink was done with the value '-1'). Due to similar reasons the expression parser interacted with the search method of Proptree when in a subtree results need to be combined.

These two cases involve fairly rarely occurring use-cases like searching for two different attributes (say name and order) of the same transitive property, so this is probably the reason it was not found earlier.

Another area littered with bugs was the interaction with newer features like reverse multilinks: this also was never tested with the multilink expression feature.

And finally I'm allowing '-1' in expressions. E.g. we can now express "not empty" with ['-1', '-2'] ('-1' standing for empty and '-2' being the postfix operator for negation) searching e.g. for all issues that have any keyword (no matter which keyword).

Note that the multilink expression feature probably has never worked all those years with mysql. It was comparing strings (IDs are strings in the database) with integers and did never find anything.

I'm quite happy that the technical dept in that part of the code was finally cleaned up.

All existing tests pass and I've created a bunch of new tests that also pass for me. Lets see if the CI test also passes those tests.
msg7223 Author: [hidden] (rouilj) Date: 2021-05-10 17:48
In message <1620657365.75.0.788253625663.issue2551119@roundup.psfhosted.org>,
Ralf Schlatterbeck writes:
>Ralf Schlatterbeck added the comment:
>
>OK, I've finally found + fixed the problem.
> [...]
>I'm quite happy that the technical dept in that part of the code was finally cleaned up.
>
>All existing tests pass and I've created a bunch of new tests that
>also pass for me. Lets see if the CI test also passes those tests.

  https://www.travis-ci.com/github/roundup-tracker/roundup/builds/225467127

looks good from here. All is green. Python 3.9 runs reports:

  1581 passed, 2 skipped, 1 warning

 skipped: TrackerConfig::testInvalidIndexerLanguage_w_empty_no_xapian
          HTMLClassTestCase::test_string_stext

 warning: gpgme is still using imp

Can you do a first draft on writing up the expression parser? I am not
sure where in the docs it should go.

I think the user guide is a good place, but there are a few issues
with entering the expression (javascript editor dissapears after use
so there is no way to edit the expression, I think there was also an
issues with saving a search with an expression). So maybe putting it
there is not ideal.

Except for documentation is there anything else that has to be done?

Thanks for your efforts. I have been lost in the templating code.

Have a great day.
msg7226 Author: [hidden] (schlatterbeck) Date: 2021-05-11 08:57
On Mon, May 10, 2021 at 05:48:17PM +0000, John Rouillard wrote:
> Can you do a first draft on writing up the expression parser? I am not
> sure where in the docs it should go.

I can provide some documentation on the filter method -- should probably
go into the relevant chapter of doc/customizing.txt.
I have no experience with the javascript code for editing expressions.
I've never used it myself and I'm not using it in any of my trackers.
So I can't contribute anything meaningful to the user-guide part.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
msg7227 Author: [hidden] (schlatterbeck) Date: 2021-05-11 09:35
I've pushed some documentation in both customizing.txt and in hyperdb.filter (almost the same wording). I notice that we have almost no documentation on the hyperdb-level API (There is some documentation of the HTML wrapper but not on the underlying hyperdb API)
History
Date User Action Args
2021-06-10 01:57:05rouiljsetstatus: pending -> open
2021-06-10 01:54:19rouiljsetstatus: open -> pending
2021-05-11 09:35:22schlatterbecksetmessages: + msg7227
2021-05-11 08:57:52schlatterbecksetmessages: + msg7226
2021-05-10 17:48:17rouiljsetmessages: + msg7223
2021-05-10 14:36:05schlatterbecksetmessages: + msg7222
2021-05-07 19:18:19schlatterbecksetmessages: + msg7220
2021-05-07 15:53:57rouiljsetmessages: + msg7219
2021-05-07 13:31:18schlatterbecksetmessages: + msg7218
2021-05-07 12:24:59schlatterbecksetmessages: + msg7217
2021-05-06 19:16:27rouiljsetkeywords: + Blocker
2021-05-06 19:10:58rouiljsetstatus: new -> open
type: behavior
messages: + msg7212
2021-04-14 14:20:50schlatterbecksetmessages: + msg7196
2021-04-14 13:35:01bersetfiles: + roundup-keyword-expr-20210414-1.png
messages: + msg7195
2021-03-22 00:56:19rouiljsetnosy: + rouilj
messages: + msg7129
2021-03-19 08:35:49bersetmessages: + msg7128
2021-03-18 19:30:33schlatterbecksetmessages: + msg7127
2021-03-18 17:12:36bersetfiles: + empty_multilink_expr.diff
keywords: + patch
messages: + msg7126
2021-03-18 14:59:46bersetmessages: + msg7125
2021-03-18 12:13:36schlatterbeckcreate