Roundup Tracker - Issues

Message5069

Author schlatterbeck
Recipients ThomasAH, ber, rouilj, schlatterbeck
Date 2014-04-04.23:13:24
Message-id <20140404231316.GA19617@runtux.com>
In-reply-to <1396432461.46.0.925798996112.issue2550806@psf.upfronthosting.co.za>
On Wed, Apr 02, 2014 at 09:54:21AM +0000, Bernhard Reiter wrote:
> 
> Hi Ralf,
> as I saw a number of improvements you did to roundup in the last days,
> what about this issue? I still fear that it is a significant problem losing
> data in some race conditions.

Thanks for the reminder.

I've now checked -- and reproduced -- the behaviour. Thanks John for
providing a detailed instruction and appropriate auditors to test this.

I did run two concurrent command-line updates (I didn't care about the
txSource, the nonce was fine for me).

First command:
roundup-admin -i . set issue1 priority=urgent
tx_SourceCheckAudit(1) pre db.tx_Source: cli, nonce: 6aa1acdbfe6b49078484
tx_SourceCheckAudit(1) post db.tx_Source: cli, nonce 6aa1acdbfe6b49078484
tx_SourceCheckReact(1) pre db.tx_Source: cli, recorded: arrived via: cli 6aa1acdbfe6b49078484
tx_SourceCheckReact(1) post db.tx_Source: cli, recorded: arrived via: cli 6aa1acdbfe6b49078484

Second command:
roundup-admin -i . set issue1 status=unread
tx_SourceCheckAudit(1) pre db.tx_Source: cli, nonce: 23e9adb6fc638761eb4f
tx_SourceCheckAudit(1) post db.tx_Source: cli, nonce 23e9adb6fc638761eb4f
tx_SourceCheckReact(1) pre db.tx_Source: cli, recorded: arrived via: cli 23e9adb6fc638761eb4f
tx_SourceCheckReact(1) post db.tx_Source: cli, recorded: arrived via: cli 23e9adb6fc638761eb4f

Displayed history (cut&paste from web-interface):
2014-04-04 21:12:07     admin   set     tx_Source: arrived via: cli 84ef0e4322500b890fb0 -> arrived via: cli 23e9adb6fc638761eb4f
2014-04-04 21:11:56     admin   set     priority: bug -> urgent
tx_Source: arrived via: cli 84ef0e4322500b890fb0 -> arrived via: cli 84ef0e4322500b890fb0
2014-04-04 21:08:34     admin   set     status: deferred -> unread
tx_Source: arrived via: web c05d48aea9f81a9ae254 -> arrived via: cli 84ef0e4322500b890fb0
2013-01-04 00:02:21     admin   set     priority: urgent -> bug
tx_Source: arrived via: web d597645edc0bc58d7943 -> arrived via: web c05d48aea9f81a9ae254
2013-01-04 00:01:35     admin   create

We clearly see that the first two lines in the history display
corresponding to the latest two changes (at 21:11:56 and 21:12:07) start
from the same nonce value that was set in the "set" at 21:08:34

Now it's nice to know what is actually stored in the journal, I've
reformatted the journal entries to make this a little clearer:

>>> print db.getjournal('issue', '1')
[
('1', <Date 2013-01-04.00:01:35.750>, '1', 'create', {}),
('1', <Date 2013-01-04.00:02:21.859>, '1', 'set',
     {'priority': '2', 'tx_Source': 'arrived via: web d597645edc0bc58d7943'}),
('1', <Date 2014-04-04.21:08:34.814>, '1', 'set',
     {'status': '2', 'tx_Source': 'arrived via: web c05d48aea9f81a9ae254'}),
('1', <Date 2014-04-04.21:11:56.923>, '1', 'set',
     {'priority': '3', 'tx_Source': 'arrived via: cli 84ef0e4322500b890fb0'}),
('1', <Date 2014-04-04.21:12:07.040>, '1', 'set',
     {'tx_Source': 'arrived via: cli 84ef0e4322500b890fb0'})
]

The current value of tx_Source is:
>>> db.issue.get('1','tx_Source')
'arrived via: cli 23e9adb6fc638761eb4f'

We see that the journal entry only stores the *old* value *before* the
database transaction. The interpolation in the web-interface simply
walks the list and remembers the old value from the previous journal
entry.

In this case my second (nonce 23e9adb6fc638761eb4f) transaction "won"
and the tx_Source update from my first transaction (nonce
6aa1acdbfe6b49078484) wasn't seen by any transaction and might have
never been written.

This is normal serialisation behaviour: The database if free to not
perform updates that are subsequently overwritten by a later
transaction.

So both my transactions seem to have started with the same old tx_Source
value with the nonce 84ef0e4322500b890fb0 and both put this old value
into their journal entry (thats the old tx_Source when both of my two
concurrent transactions were started).

The history printing mechanism creates from the list of nonces in the
journal (see output of getjournal call above):
d597645edc0bc58d7943
c05d48aea9f81a9ae254
84ef0e4322500b890fb0
84ef0e4322500b890fb0

the following (wrong) update list:
d597645edc0bc58d7943->c05d48aea9f81a9ae254
c05d48aea9f81a9ae254->84ef0e4322500b890fb0
84ef0e4322500b890fb0->84ef0e4322500b890fb0
84ef0e4322500b890fb0->23e9adb6fc638761eb4f

Which looks like the value 84ef0e4322500b890fb0 was there twice
but results from two concurrent updates both seeing the same old value.

So there is nothing wrong with roundups serialisation.

Currently roundup is using the lowest isolation level supported by
PostgreSQL, "Read committed".
See also:
http://www.postgresql.org/docs/9.1/static/transaction-iso.html

Setting a higher isolation level, at least "repeatable read" would work
for this case, at least if the auditor would read the tx_Source property
before updating it (but this is done by the detector framework anyway,
so no need to modify the detectors). In that case one of the
transactions would be aborted with a serialization error.

Currently the isolation level isn't settable in roundup, maybe we should
make this configurable?

Only Postgres seems to support the concept of isolation levels although
defined in the SQL standards...

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
allmenda.com member                     email: rsc@allmenda.com
History
Date User Action Args
2014-04-04 23:13:25schlatterbecksetrecipients: + schlatterbeck, ber, rouilj, ThomasAH
2014-04-04 23:13:25schlatterbecklinkissue2550806 messages
2014-04-04 23:13:24schlatterbeckcreate