Issue 2551119: Bug in Query-Generator for filter (rdbms backend)

classification

Title:	Bug in Query-Generator for filter (rdbms backend)
Type:	behavior	Severity:	normal
Components:	Database, Test	Versions:

process

Status:	fixed	Resolution:	fixed
Dependencies		Superseder:
Assigned To:	schlatterbeck	Nosy List:	ber, rouilj, schlatterbeck
Priority:	normal	Keywords:	patch

Created on 2021-03-18 12:13 by schlatterbeck, last changed 2021-06-14 01:55 by rouilj.

Files
File name	Uploaded	Description	Edit	Remove
empty_multilink_expr.diff	ber, 2021-03-18 17:12
roundup-keyword-expr-20210414-1.png	ber, 2021-04-14 13:35

Messages
msg7124	Author: [hidden] (schlatterbeck)	Date: 2021-03-18 12:13
There is a feature in the query generator that can combine ID search with boolean expressions AND, OR, NOT, see commit f1fe6fd0aa61. This change came without a regression test and breaks for certain cases. Without a test of the intended feature I dare not touch that code. As far as I understand this uses negative IDs to implement a prefix or suffix expression parser. My use-case that breaks: We have two tables, time_project and time_wp with the following data structures in roundup (excerpt): time_project wps = Multilink('time_wp') status = Link('project_status') name = String() time_wp bookers = Multilink(user) Now a user enters a query with a work-package name that doesn't exist in the web-interface. The resulting call to filter has the following arguments: sort = [('+', 'name')] filterspec = {'status': ['2', '1'], 'wps.bookers': ['2428'], 'wps': []} This produces a traceback: <class 'psycopg2.errors.UndefinedTable'>: missing FROM-clause entry for table "_time_wp3" LINE 1: ...t from _time_wp where _time_wp.__retired__=0) and _time_wp3.... ^ ... return render(ob, econtext.vars) File "/usr/local/lib/python2.7/dist-packages/roundup/cgi/PageTemplates/Expressions.py", line 99, in render ob = ob() File "/usr/local/lib/python2.7/dist-packages/roundup/cgi/templating.py", line 3257, in batch l = [id for id in klass.filter(matches, filterspec, sort, group) File "/usr/local/lib/python2.7/dist-packages/roundup/backends/rdbms_common.py", line 2858, in filter sq = self._filter_sql (search_matches, filterspec, sort, group, File "/usr/local/lib/python2.7/dist-packages/roundup/backends/rdbms_common.py", line 233, in sql cursor.execute(sql, args) UndefinedTable: missing FROM-clause entry for table "_time_wp3" LINE 1: ...t from _time_wp where _time_wp.__retired__=0) and _time_wp3.... ^ It looks like the mentioned buggy code is triggered by the empty wps multilink in the search. Note how it tries to materialize the whole time_wp table (which contains 43665 entries in our case!) in a subquery. And it doesn't seem to correctly plug into the query generator: the from clause is missing an entry which is later referenced. I'd appreciate if someone using this extended query feature can turn on SQL statement logging (either in roundup or in the backend database) and provide me with pairs of filter calls and the generated SQL. Then I'd write an regression test for the intended behaviour. @Bernhard Reiter: Do you still use roundup with that feature? Examples of usage would be welcome!
msg7125	Author: [hidden] (ber)	Date: 2021-03-18 14:59
Hi Ralf, you seem to use the postgresl backen. AFAIK we still use this feature (though I don't know how many searches are done with it). We haven't looked at the code for a long while, the idea is to be able to compose complex searches. What is the fastest way to get a reproducable problem setup?
msg7126	Author: [hidden] (ber)	Date: 2021-03-18 17:12
Ralf, here is an into-the-blue-patch by one of our devs, more like a guess, but maybe it helps you anyway. (Sorry for not being more solid here, this certainly could have been documented better.) Best, Bernhard
msg7127	Author: [hidden] (schlatterbeck)	Date: 2021-03-18 19:30
Hi Bernhard, thanks for the quick reaction. On Thu, Mar 18, 2021 at 02:59:46PM +0000, Bernhard Reiter wrote: > > Hi Ralf, > you seem to use the postgresl backen. Yes. But I guess the problem is also reproduceable with mysql, the problem is in the query generator not backend specific. > AFAIK we still use this feature (though I don't know how many searches > are done with it). We haven't looked at the code for a long while, the > idea is to be able to compose complex searches. Cool. > What is the fastest way to get a reproducable problem setup? It would be nice if you could come up with examples of all the usual queries involving AND, OR, NOT and send me - The parameters to the filter function for a specific query - Optionally the generated sql (I can infer this myself). So this probable involves specifying some common queries via the web interface and looking what the filter method in backends/rdbms_common.py is called with. Unfortunately there is no logging for this, so you have to either instrument filter or set a breakpoint there in the python debugger. Accompanying the examples with the expression specified in the web interface would be nice, so I won't have to reverse-engineer that :-) The whole idea of all this is to come up with a test that covers functionality people (your customers among them :-) rely on before proceeding. On Thu, Mar 18, 2021 at 05:12:36PM +0000, Bernhard Reiter wrote: > > Ralf, > here is an into-the-blue-patch by one of our devs, > more like a guess, but maybe it helps you anyway. > (Sorry for not being more solid here, this certainly > could have been documented better.) Hmm, such a patch without a proper test is just what I want to avoid at this time :-) Once the behaviour is nailed down it should be easy to come up with a proper patch. Note that I have other systems where this misbehaves. I ideally would find out what goes wrong there and come up with additional tests (which would fail at this time) and proceed from there. I already have some analyses on what I think is wrong. But first I have to find out what it does when it's right :-) Thanks Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7128	Author: [hidden] (ber)	Date: 2021-03-19 08:35
Hi Ralf, just tried to be helpful as fast as I could yesterday. :) The original idea was to generally be able to have queries, where e.g. one status or two statuses, or priorities are excluded while others are allowed. One of the standard templates has the javascript code to form the queries. So you should be able to find test cases as well. The implementation idea had to find a few to express this via the url, and thus the notation was invented. If I have more time, I'll look it up here. Best, Bernhard
msg7129	Author: [hidden] (rouilj)	Date: 2021-03-22 00:56
Including link to issue2550648 to see all tickets related to this. There are 3 or 4 opened tickets.
msg7195	Author: [hidden] (ber)	Date: 2021-04-14 13:35
== Use Case (at Intevation) We want to search for a combination of keywords, for example all issues that have both the keywords "roundup" and "admin" === How it is used The query build from our template allows this, here is an url shortened to the significant parts, which will only list issues that have both keyword 1 and keyword 4. URL/issue?&keyword=1,4,-3&@action=search For this the expression editor is opened, the expression is constructed and "apply"ed to it is used as value for the keyword to search. If "OR" was selected, the URL would be URL/issue?&keyword=1,4,-4&@action=search So we have a postfix notation. roundup-keyword-expr-20210414-1.png shows an editor with "(a and b) or c" after pressing apply it will be entered in the current search page. === Template It is in the default classic roundup. testing with changeset: 6377:a7e7314fb7d9 tag: tip user: John Rouillard <rouilj@ieee.org> date: Sat Apr 10 22:10:07 2021 -0400 python demo.py nuke * login as demo, demo * create keywords a, b, c * create issues with a combination of the keywords. * use issue search page and expression editor to form expressions. A template set used by some of our still running elder trackers has the similar code https://hg.intevation.de/roundup/fast-decomposed/file/tip/html or hg clone https://hg.intevation.de/roundup/fast-decomposed fast-decomposed-hg Ralf, is that helping you? Thanks for your interest in this feature. :) (Sorry for having taken a while to respond here.)
msg7196	Author: [hidden] (schlatterbeck)	Date: 2021-04-14 14:20
On Wed, Apr 14, 2021 at 01:35:01PM +0000, Bernhard Reiter wrote: [...] > Ralf, is that helping you? Thanks for your interest in this feature. :) > (Sorry for having taken a while to respond here.) Yes! Thanks a lot, this is helping me to write tests for the indended semantics. I'm currently busy with another project (also roundup) so it will take me one or two weeks to come back to this. Thanks again for taking the time! Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7212	Author: [hidden] (rouilj)	Date: 2021-05-06 19:10
Hi Ralf, I am starting to plan the next roundup release. I would like to get a beta out end of may (no alpha this time). That way I can get the 20 year anniversary edition on in July. I'll probably release it on the 13th again a few days later then the first release: 0.1.0 2001-07-11. This is one of the few active bugs, do you think you can get it sorted (with or without docs) by end May or July? -- rouilj
msg7217	Author: [hidden] (schlatterbeck)	Date: 2021-05-07 12:24
On Thu, May 06, 2021 at 07:10:58PM +0000, John Rouillard wrote: > > I am starting to plan the next roundup release. I would like to get a > beta out end of may (no alpha this time). That way I can get the 20 > year anniversary edition on in July. I'll probably release it on the > 13th again a few days later then the first release: 0.1.0 2001-07-11. Cool! > This is one of the few active bugs, do you think you can get it > sorted (with or without docs) by end May or July? I'll try my best. Unfortunately the multilink expression feature is in worse shape than I anticipated: The very first test (the one suggested by Bernhard) fails for mysql. When a '-1' id (meaning "empty multilink") is involved, at least the tests I came up with fail for all backends. I'll commit what I have later today (when I'm reasonably sure I didn't break other tests, I've introduced a new class "keyword" in the tests) Once I have this in place the idea is - to get mysql working - to get the feature working with '-1' in a query - to allow '-1' mixed with other ids as an implicit "or" (my memory says that this used to work before the multilink-expression feature was introduced but I may be wrong and it may well be that this never worked for multilinks) - To ensure that if only '-1' (and no other boolean operator with lower negative id) is part of the query we do not materialize the whole multilink table in a subquery (that would be fine for 'keyword' but I have some tables where this would materialize large datasets) - Finally try to reproduce the bug for which the issue was originally created by writing a test for it - and fix it :-) Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7218	Author: [hidden] (schlatterbeck)	Date: 2021-05-07 13:31
The generated sql for mysql reads: 'select distinct(_issue.id) from issue_keywords,_issue where (1=0) and _issue.__retired__=0 order by _issue.id' not very likely this will ever return the correct set of items :-)
msg7219	Author: [hidden] (rouilj)	Date: 2021-05-07 15:53
Hi Ralf, it looks like anydbm and sqlite are working now and you got mysql running. Really just a single line change? I am surprised this bug didn't cause other issues. If I entered -1 for unset in a search form would that line be hit? If so would it work both before and after your change? I am pushing to CI now and will see how many times that line is covered. I wonder if we have a test for -1 (unset) somewhere in the test suite. Nice find.
msg7220	Author: [hidden] (schlatterbeck)	Date: 2021-05-07 19:18
On Fri, May 07, 2021 at 03:53:57PM +0000, John Rouillard wrote: > Hi Ralf, it looks like anydbm and sqlite are working now > and you got mysql running. Really just a single line change? > > I am surprised this bug didn't cause other > issues. If I entered -1 for unset in a search form would > that line be hit? If so would it work both before and after > your change? No this is the code with the expression parser for multilinks, e.g. the list '1', '2', '-3' would specify '1' AND '2' ('-3' is the "opcode" for AND). I think this has never worked for mysql. The constants in the expression representation are integers and IDs in roundup always were strings. So this never matched. A single '-1' would have worked. And now (latest commit some minutes ago) a list like '-1', '1', '2' would also work, this is implicit OR so it would mean a multilink containing '1' or '2' or being empty. I think this case has never worked. It did work for Link properties but not for multilinks. > I am pushing to CI now and will see how many times that line > is covered. I wonder if we have a test for -1 (unset) somewhere > in the test suite. Yes, be sure to include my latest commit. Note that I'm still planning further changes. The original bug (the reason I created this issue in the first place) is still not found. See my todo-list in msg7217, I've finished point 3 in that list :-) And I think point 4 is impossible, we always need a subquery for multilinks. And I'd like to make the expression parser work with: - '-1' being part of an expression, e.g., '1', '2', '-3', '-1', '-4' meaning "1 and 2 or empty" - reverse multilinks, I'm not sure this would work with anydbm, I'm fairly sure it should work with the sql backends. But this needs a test. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7222	Author: [hidden] (schlatterbeck)	Date: 2021-05-10 14:36
OK, I've finally found + fixed the problem. During the search I discovered more cases where the query generator did not produce correct results. And the search method for anydbm (Proptree in hyperdb) was originally built on the assumption that an empty list [] would not produce any results when searching for links and multilinks. This has been changed without considering the semantics and also needed fixing (originally searching for empty Link/Multilink was done with the value '-1'). Due to similar reasons the expression parser interacted with the search method of Proptree when in a subtree results need to be combined. These two cases involve fairly rarely occurring use-cases like searching for two different attributes (say name and order) of the same transitive property, so this is probably the reason it was not found earlier. Another area littered with bugs was the interaction with newer features like reverse multilinks: this also was never tested with the multilink expression feature. And finally I'm allowing '-1' in expressions. E.g. we can now express "not empty" with ['-1', '-2'] ('-1' standing for empty and '-2' being the postfix operator for negation) searching e.g. for all issues that have any keyword (no matter which keyword). Note that the multilink expression feature probably has never worked all those years with mysql. It was comparing strings (IDs are strings in the database) with integers and did never find anything. I'm quite happy that the technical dept in that part of the code was finally cleaned up. All existing tests pass and I've created a bunch of new tests that also pass for me. Lets see if the CI test also passes those tests.
msg7223	Author: [hidden] (rouilj)	Date: 2021-05-10 17:48
In message <1620657365.75.0.788253625663.issue2551119@roundup.psfhosted.org>, Ralf Schlatterbeck writes: >Ralf Schlatterbeck added the comment: > >OK, I've finally found + fixed the problem. > [...] >I'm quite happy that the technical dept in that part of the code was finally cleaned up. > >All existing tests pass and I've created a bunch of new tests that >also pass for me. Lets see if the CI test also passes those tests. https://www.travis-ci.com/github/roundup-tracker/roundup/builds/225467127 looks good from here. All is green. Python 3.9 runs reports: 1581 passed, 2 skipped, 1 warning skipped: TrackerConfig::testInvalidIndexerLanguage_w_empty_no_xapian HTMLClassTestCase::test_string_stext warning: gpgme is still using imp Can you do a first draft on writing up the expression parser? I am not sure where in the docs it should go. I think the user guide is a good place, but there are a few issues with entering the expression (javascript editor dissapears after use so there is no way to edit the expression, I think there was also an issues with saving a search with an expression). So maybe putting it there is not ideal. Except for documentation is there anything else that has to be done? Thanks for your efforts. I have been lost in the templating code. Have a great day.
msg7226	Author: [hidden] (schlatterbeck)	Date: 2021-05-11 08:57
On Mon, May 10, 2021 at 05:48:17PM +0000, John Rouillard wrote: > Can you do a first draft on writing up the expression parser? I am not > sure where in the docs it should go. I can provide some documentation on the filter method -- should probably go into the relevant chapter of doc/customizing.txt. I have no experience with the javascript code for editing expressions. I've never used it myself and I'm not using it in any of my trackers. So I can't contribute anything meaningful to the user-guide part. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com
msg7227	Author: [hidden] (schlatterbeck)	Date: 2021-05-11 09:35
I've pushed some documentation in both customizing.txt and in hyperdb.filter (almost the same wording). I notice that we have almost no documentation on the hyperdb-level API (There is some documentation of the HTML wrapper but not on the underlying hyperdb API)
msg7285	Author: [hidden] (rouilj)	Date: 2021-06-14 01:55
Closing this as fixed. Doc done and Ralf I assume you agree it is complete. Reopen if you disagree. Another blocker for 2.1.0 gone.

History
Date	User	Action	Args
2021-06-14 01:55:52	rouilj	set	keywords: - Blocker priority: normal messages: + msg7285 status: open -> fixed resolution: fixed
2021-06-10 01:57:05	rouilj	set	status: pending -> open
2021-06-10 01:54:19	rouilj	set	status: open -> pending
2021-05-11 09:35:22	schlatterbeck	set	messages: + msg7227
2021-05-11 08:57:52	schlatterbeck	set	messages: + msg7226
2021-05-10 17:48:17	rouilj	set	messages: + msg7223
2021-05-10 14:36:05	schlatterbeck	set	messages: + msg7222
2021-05-07 19:18:19	schlatterbeck	set	messages: + msg7220
2021-05-07 15:53:57	rouilj	set	messages: + msg7219
2021-05-07 13:31:18	schlatterbeck	set	messages: + msg7218
2021-05-07 12:24:59	schlatterbeck	set	messages: + msg7217
2021-05-06 19:16:27	rouilj	set	keywords: + Blocker
2021-05-06 19:10:58	rouilj	set	status: new -> open type: behavior messages: + msg7212
2021-04-14 14:20:50	schlatterbeck	set	messages: + msg7196
2021-04-14 13:35:01	ber	set	files: + roundup-keyword-expr-20210414-1.png messages: + msg7195
2021-03-22 00:56:19	rouilj	set	nosy: + rouilj messages: + msg7129
2021-03-19 08:35:49	ber	set	messages: + msg7128
2021-03-18 19:30:33	schlatterbeck	set	messages: + msg7127
2021-03-18 17:12:36	ber	set	files: + empty_multilink_expr.diff keywords: + patch messages: + msg7126
2021-03-18 14:59:46	ber	set	messages: + msg7125
2021-03-18 12:13:36	schlatterbeck	create

Roundup Tracker - Issues

Issue 2551119