Roundup Tracker - Issues

Issue 2550834

classification
postgres and mysql get error with Date attribute in groupby sort
Type: crash Severity: major
Components: Infrastructure Versions: 1.4
process
Status: closed fixed
:
: : ber, r.david.murray, schlatterbeck, tekberg
Priority: :

Created on 2014-03-10 16:17 by tekberg, last changed 2014-04-04 07:17 by ber.

Messages
msg5008 Author: [hidden] (tekberg) Date: 2014-03-10 16:17
I'm beginning to think that nobody uses postgres as back end since I have 
already found 3 (2550805, 2550829, and this) things that work differently 
between the sqlite3 and postgres. This one is the handling of Date() 
attributes in a class like issue. The 2 Date attributes in the issue class 
(creation, activity) always have non-None values. However if one adds a 
Date() (call it target_date) attribute most of the time it works fine. If 
one does a search where the 'group by' uses target_date that also works  
fine if all of the target_dates have Date values. However if one of the 
target_dates is None (NULL in the DB), one gets an error. The following is 
the last line of the stack trace and the error.

 File "/usr/local/lib/python2.7/site-packages/roundup/hyperdb.py", line 
552, in _sort
    sortattr = sorted (sortattr, key = lambda x:x[i:idx], reverse = rev)
TypeError: can't compare datetime.datetime to NoneType

If I look at postgres, target_date is defined as 'timestamp without time 
zone'. The target_date is defined by sqlite3 as a VARCHAR(30). If I use 
sorted with None and strings it works:

>>> a = ['a','b',None,'c']
>>> sorted(a)
[None, 'a', 'b', 'c']

In an earlier run I dumped sortattr to a file and found that 
datetime.datetime was the data type being used. Using sorted with None and 
datetime.datetime objects one gets an error.

>>> import datetime
>>> b = [datetime.datetime(2014,1,1), datetime.datetime(2013,1,1), None, 
datetime.datetime(2012,1,1)]
>>> sorted(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare datetime.datetime to NoneType

This is the same error that roundup gets.

I don't have a good fix for this yet.
msg5009 Author: [hidden] (r.david.murray) Date: 2014-03-10 16:26
We use postgres as the backend on bugs.python.org, and I use it for my
other roundup based trackers.  So it's not true that nobody uses it :)

On the other hand, we *started* with postres, so that could be why we
haven't run into the troubles you have.
msg5010 Author: [hidden] (schlatterbeck) Date: 2014-03-10 17:06
I'm also using roundup only with postgreSQL backends in several
installations. We're using our own custom date attributes.
I guess I've never tried to *sort* by date attributes though.

Interesting is the error you get. Roundup should sort this in SQL not in
python. Can you try to come up with a piece of code in roundup API that
reproduces this? Would be very valuable as a regression test item.

Ralf
msg5011 Author: [hidden] (tekberg) Date: 2014-03-10 19:23
I was rather surprised that the sorting is being done with python code. 
It would certainly have avoided this problem.

I'm not sure how to create a test case for this with python code. It 
involves creating a tracker, adding a Date attribute, populating it with 
>=2 issues, with NULL and non-NULL date values, then running the query 
that does a groupby on that attribute.

Are there other test cases similar to this?
msg5012 Author: [hidden] (r.david.murray) Date: 2014-03-10 20:20
This reminds me of an issue we had on the python tracker with the server
getting slammed, that turned out to be google spidering certain
searches, and roundup pulling everything into Python in order to execute
the search/sort...it involved multilink properties, I think.  Whatever
it was, we disabled that feature somehow to solve it.
msg5013 Author: [hidden] (ber) Date: 2014-03-11 08:09
(We are using PostgreSQL as well for a number of production roundup 
installations.)
msg5061 Author: [hidden] (schlatterbeck) Date: 2014-04-02 15:52
I've added a simple regression test for sorting dates, one of which is a
NULL value. Works. So this is *not* in the roundup API,

Please look for testDateSort in test/db_test_base.py

This contains code you can use to check if this behaves differently in
your installation with your particular date attribute and your version
of PostgreSQL. 

My current guess is that your template is doing some weird things and
tries to sort date values in the template. Can you give us the template
that reproduces the problem?

@David Murray: You mention Multilinks: Yes these *are* sorted in python
because it isn't possible to do this in SQL. You should not allow
sorting on multilink values via the web-interface if you can avoid it.
But I don't think (at least from the description I've read) that this is
the problem in this case.
msg5062 Author: [hidden] (schlatterbeck) Date: 2014-04-02 16:02
Oops, I see the code in the test is too optimized to cut&paste for a test.
To test against your database with a date value containing multilinks
you want to do something along the lines of
 
import sys
import os
from roundup import date, instance

dir     = os.getcwd ()
tracker = instance.open (dir)
db      = tracker.open ('admin')

print db.issue.filter (None, {}, ('+','id'), ('+', 'deadline'))

This groups by the date attribute "deadline" which you want to change if
your date attribute is named differently.
msg5063 Author: [hidden] (schlatterbeck) Date: 2014-04-02 16:05
On Wed, Apr 02, 2014 at 04:02:41PM +0000, Ralf Schlatterbeck wrote:
> To test against your database with a date value containing multilinks

Umm, forget the 'containing multilinks' part. Time to quit for today :-)

Ralf
msg5064 Author: [hidden] (tekberg) Date: 2014-04-02 17:05
This a fragment of page.html that had the error:

<b i18n:translate="">Monthly Maintenance</b><br>

    <a href="#"
       tal:attributes="href python:request.indexargs_url('issue', {
      '@sort': 'assignedto_multi,-activity',
      '@group': 'target_date',
      '@filter': 'keyword,status',
      '@columns': 'id,assignedto_multi,activity,title,status,priority,source',
      '@search_text': '',
      'keyword': '5',
      'status':'-1,1,2,3,4,9',
      '@pagesize':'300',
      '@startwith':'0',
      '@dispname': i18n.gettext('Monthly Maintenance - current'),
     })"
       i18n:translate="">Current maintenance</a>
---------------
This is what I changed it to which works fine:

<b i18n:translate="">Monthly Maintenance</b><br>

    <a href="#"
       tal:attributes="href python:request.indexargs_url('issue', {
      '@sort': '-activity',
      '@group': 'target_date',
      '@filter': 'keyword,status',
      '@columns': 'id,assignedto_multi,activity,title,status,priority,source',
      '@search_text': '',
      'keyword': '5',
      'status':'-1,1,2,3,4,9',
      '@pagesize':'300',
      '@startwith':'0',
      '@dispname': i18n.gettext('Monthly Maintenance - current'),
     })"
       i18n:translate="">Current maintenance</a>

The only difference is @sort now only has one key. Previously it had assignedto, 
which was converted to assignedto_multi which a MultiLink version of assignedto. 
I'm not sure why having only one sort key works, and two fails.

If you need the page.html file instead of these snippets let me know. That file 
contains usernames that I'll have to redact. You probably need the schema.py file 
too.
msg5065 Author: [hidden] (schlatterbeck) Date: 2014-04-03 14:16
On Wed, Apr 02, 2014 at 05:05:36PM +0000, Tom Ekberg wrote:
> 
> Tom Ekberg added the comment:
>       '@sort': 'assignedto_multi,-activity',
>       '@group': 'target_date',
> ---------------
> This is what I changed it to which works fine:
> 
>       '@sort': '-activity',
>       '@group': 'target_date',
> 
> The only difference is @sort now only has one key. Previously it had
> assignedto, which was converted to assignedto_multi which a MultiLink
> version of assignedto.  I'm not sure why having only one sort key
> works, and two fails.

That clarifies it for me: In the non-working version you're sorting by a
Multilink (assignedto_multi) while for the second version no Multilink
is involved. As soon as multilinks are involved, sorting is done (at
least partially)* in python not in SQL.

The Reason: Multilink Sorting first sorts the individual multilinks for
each item by their order property (which is the labelprop if no
orderprop is explicitly defined or an 'order' attribute exists). And
*then* sorts the individual items by this list. AFAIK there is no way to
implement this in SQL although I'm willing to learn :-)

So in short: The python sorting for date properties should be fixed to
allow NULL values.

On the other hand: You really don't want to sort by Multilinks as the
sorting isn't done in SQL.

*) We sort *first* in SQL by all the properties right of the Multilink
and then we sort again in python on the multilink and all the properties
left of it. For the example (with multilink) above we would get the
following sort order:

'target_date', 'assignedto_multi', '-activity'

So we would sort (descending) by activity in SQL and then by target_date
and assignedto_multi in python (in one go). The sort order in Python is
stable** so we don't need to include the sorting by activity in python.

**) This means for items in the sort-list that compare equal we don't
change the order that they were in before sorting.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
allmenda.com member                     email: rsc@allmenda.com
msg5066 Author: [hidden] (r.david.murray) Date: 2014-04-03 14:40
Note that in Python3 there are a lot more datatypes where sorts
including None will fail, so long term the issue is about more than just
datetimes.
msg5067 Author: [hidden] (schlatterbeck) Date: 2014-04-04 06:17
On Thu, Apr 03, 2014 at 02:40:25PM +0000, R David Murray wrote:
> 
> Note that in Python3 there are a lot more datatypes where sorts
> including None will fail, so long term the issue is about more than just
> datetimes.

I know. There currently is a mechanism for mangling to-be-sorted items
before sorting. It just wasn't called when sorting in python for rdbms
backends. I've fixed this now.
I don't think this already works for python3 but should be easy to fix
by fixing the "sort_repr" methods of the various hyperdb types (e.g.
String, Multilink, Date, etc.)

Fixed in ra116de39e38c

Concerning the original bug-report, just for the records: This also
failed for mysql not just PostgreSQL.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
allmenda.com member                     email: rsc@allmenda.com
History
Date User Action Args
2014-04-04 07:17:54bersettitle: postgres gets error with Date attribute in groupby sort -> postgres and mysql get error with Date attribute in groupby sort
2014-04-04 06:17:19schlatterbecksetstatus: new -> closed
resolution: fixed
messages: + msg5067
2014-04-03 14:40:25r.david.murraysetmessages: + msg5066
2014-04-03 14:16:50schlatterbecksetmessages: + msg5065
2014-04-02 17:05:36tekbergsetmessages: + msg5064
2014-04-02 16:05:10schlatterbecksetmessages: + msg5063
2014-04-02 16:02:41schlatterbecksetmessages: + msg5062
2014-04-02 15:52:13schlatterbecksetmessages: + msg5061
2014-03-11 08:09:54bersetnosy: + ber
messages: + msg5013
2014-03-10 20:20:30r.david.murraysetmessages: + msg5012
2014-03-10 19:23:48tekbergsetmessages: + msg5011
2014-03-10 17:06:19schlatterbecksetnosy: + schlatterbeck
messages: + msg5010
2014-03-10 16:26:27r.david.murraysetnosy: + r.david.murray
messages: + msg5009
2014-03-10 16:17:30tekbergcreate