Issue 2550806: Parallel change requests to properties from auditors may lead to data loss

classification

Title:	Parallel change requests to properties from auditors may lead to data loss
Type:		Severity:	urgent
Components:	Database	Versions:	1.4

process

Status:	closed	Resolution:	fixed
Dependencies		Superseder:
Assigned To:	schlatterbeck	Nosy List:	ThomasAH, ber, rouilj, schlatterbeck
Priority:	high	Keywords:

Created on 2013-04-23 18:34 by ber, last changed 2014-04-14 16:11 by schlatterbeck.

Messages
msg4846	Author: [hidden] (ber)	Date: 2013-04-23 18:34
This issue is split out from issue2550731 (Add mechanism for the detectors to be able to tell the source of the data changes) where it was detected. The symptoms are: When using a database that allow parallel access. Using an auditor that sets an attribute to different values (and delays some time) to show the effect with a higher chance. If then two requests come in quickly, the attribute value of the first request is lost from the history record. Thus it is lost at all, which is data loss of possible user input and this not happen. Expected behaviour: The requests are serialised somehow and all attribute values make it into the history. First observations: Ralf expected the database transaction mechanism to work here and prevent the trouble. Issue2550731 seems to prove that this does not work, at least not with Postgresql. Next steps: Ralf said, he wanted to peek into this (msg4739) as he has some experience with database transactions. (So I took the liberty to assign it to Ralf.) It probably makes sense to: a) Simply the test case to be reproduced without the source detection with current roundup mainline. And to be triggered with as little code as possible. Best would be a test function that we can use for automatic testing as well. b) Try the problem with other databases that have parallel access like MariaDB, to rule out a PostgreSQL specific problem.
msg5058	Author: [hidden] (ber)	Date: 2014-04-02 09:54
Hi Ralf, as I saw a number of improvements you did to roundup in the last days, what about this issue? I still fear that it is a significant problem losing data in some race conditions.
msg5068	Author: [hidden] (schlatterbeck)	Date: 2014-04-04 07:27
The "old-nabble" links to roundup-devel threads in issue2550731 no longer work, here are the correct ones: Wierd bug in roundup 1.4.20 with postgres back end http://sourceforge.net/p/roundup/mailman/message/29359236/ Need testing help for issue2550731 allow detectors to reject insecure changes http://sourceforge.net/p/roundup/mailman/message/29141150/
msg5069	Author: [hidden] (schlatterbeck)	Date: 2014-04-04 23:13
On Wed, Apr 02, 2014 at 09:54:21AM +0000, Bernhard Reiter wrote: > > Hi Ralf, > as I saw a number of improvements you did to roundup in the last days, > what about this issue? I still fear that it is a significant problem losing > data in some race conditions. Thanks for the reminder. I've now checked -- and reproduced -- the behaviour. Thanks John for providing a detailed instruction and appropriate auditors to test this. I did run two concurrent command-line updates (I didn't care about the txSource, the nonce was fine for me). First command: roundup-admin -i . set issue1 priority=urgent tx_SourceCheckAudit(1) pre db.tx_Source: cli, nonce: 6aa1acdbfe6b49078484 tx_SourceCheckAudit(1) post db.tx_Source: cli, nonce 6aa1acdbfe6b49078484 tx_SourceCheckReact(1) pre db.tx_Source: cli, recorded: arrived via: cli 6aa1acdbfe6b49078484 tx_SourceCheckReact(1) post db.tx_Source: cli, recorded: arrived via: cli 6aa1acdbfe6b49078484 Second command: roundup-admin -i . set issue1 status=unread tx_SourceCheckAudit(1) pre db.tx_Source: cli, nonce: 23e9adb6fc638761eb4f tx_SourceCheckAudit(1) post db.tx_Source: cli, nonce 23e9adb6fc638761eb4f tx_SourceCheckReact(1) pre db.tx_Source: cli, recorded: arrived via: cli 23e9adb6fc638761eb4f tx_SourceCheckReact(1) post db.tx_Source: cli, recorded: arrived via: cli 23e9adb6fc638761eb4f Displayed history (cut&paste from web-interface): 2014-04-04 21:12:07 admin set tx_Source: arrived via: cli 84ef0e4322500b890fb0 -> arrived via: cli 23e9adb6fc638761eb4f 2014-04-04 21:11:56 admin set priority: bug -> urgent tx_Source: arrived via: cli 84ef0e4322500b890fb0 -> arrived via: cli 84ef0e4322500b890fb0 2014-04-04 21:08:34 admin set status: deferred -> unread tx_Source: arrived via: web c05d48aea9f81a9ae254 -> arrived via: cli 84ef0e4322500b890fb0 2013-01-04 00:02:21 admin set priority: urgent -> bug tx_Source: arrived via: web d597645edc0bc58d7943 -> arrived via: web c05d48aea9f81a9ae254 2013-01-04 00:01:35 admin create We clearly see that the first two lines in the history display corresponding to the latest two changes (at 21:11:56 and 21:12:07) start from the same nonce value that was set in the "set" at 21:08:34 Now it's nice to know what is actually stored in the journal, I've reformatted the journal entries to make this a little clearer: >>> print db.getjournal('issue', '1') [ ('1', <Date 2013-01-04.00:01:35.750>, '1', 'create', {}), ('1', <Date 2013-01-04.00:02:21.859>, '1', 'set', {'priority': '2', 'tx_Source': 'arrived via: web d597645edc0bc58d7943'}), ('1', <Date 2014-04-04.21:08:34.814>, '1', 'set', {'status': '2', 'tx_Source': 'arrived via: web c05d48aea9f81a9ae254'}), ('1', <Date 2014-04-04.21:11:56.923>, '1', 'set', {'priority': '3', 'tx_Source': 'arrived via: cli 84ef0e4322500b890fb0'}), ('1', <Date 2014-04-04.21:12:07.040>, '1', 'set', {'tx_Source': 'arrived via: cli 84ef0e4322500b890fb0'}) ] The current value of tx_Source is: >>> db.issue.get('1','tx_Source') 'arrived via: cli 23e9adb6fc638761eb4f' We see that the journal entry only stores the old value before the database transaction. The interpolation in the web-interface simply walks the list and remembers the old value from the previous journal entry. In this case my second (nonce 23e9adb6fc638761eb4f) transaction "won" and the tx_Source update from my first transaction (nonce 6aa1acdbfe6b49078484) wasn't seen by any transaction and might have never been written. This is normal serialisation behaviour: The database if free to not perform updates that are subsequently overwritten by a later transaction. So both my transactions seem to have started with the same old tx_Source value with the nonce 84ef0e4322500b890fb0 and both put this old value into their journal entry (thats the old tx_Source when both of my two concurrent transactions were started). The history printing mechanism creates from the list of nonces in the journal (see output of getjournal call above): d597645edc0bc58d7943 c05d48aea9f81a9ae254 84ef0e4322500b890fb0 84ef0e4322500b890fb0 the following (wrong) update list: d597645edc0bc58d7943->c05d48aea9f81a9ae254 c05d48aea9f81a9ae254->84ef0e4322500b890fb0 84ef0e4322500b890fb0->84ef0e4322500b890fb0 84ef0e4322500b890fb0->23e9adb6fc638761eb4f Which looks like the value 84ef0e4322500b890fb0 was there twice but results from two concurrent updates both seeing the same old value. So there is nothing wrong with roundups serialisation. Currently roundup is using the lowest isolation level supported by PostgreSQL, "Read committed". See also: http://www.postgresql.org/docs/9.1/static/transaction-iso.html Setting a higher isolation level, at least "repeatable read" would work for this case, at least if the auditor would read the tx_Source property before updating it (but this is done by the detector framework anyway, so no need to modify the detectors). In that case one of the transactions would be aborted with a serialization error. Currently the isolation level isn't settable in roundup, maybe we should make this configurable? Only Postgres seems to support the concept of isolation levels although defined in the SQL standards... Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: http://www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com allmenda.com member email: rsc@allmenda.com
msg5070	Author: [hidden] (rouilj)	Date: 2014-04-04 23:53
In message <20140404231316.GA19617@runtux.com> <20140404231316.GA19617@runtux.c om>, Ralf Schlatterbeck writes: >This is normal serialisation behaviour: The database if free to not >perform updates that are subsequently overwritten by a later >transaction. > >So both my transactions seem to have started with the same old tx_Source >value with the nonce 84ef0e4322500b890fb0 and both put this old value >into their journal entry (thats the old tx_Source when both of my two >concurrent transactions were started). > [...] >So there is nothing wrong with roundups serialisation. > >Currently roundup is using the lowest isolation level supported by >PostgreSQL, "Read committed". >See also: >http://www.postgresql.org/docs/9.1/static/transaction-iso.html > >Setting a higher isolation level, at least "repeatable read" would work >for this case, at least if the auditor would read the tx_Source property >before updating it (but this is done by the detector framework anyway, >so no need to modify the detectors). In that case one of the >transactions would be aborted with a serialization error. > >Currently the isolation level isn't settable in roundup, maybe we should >make this configurable? > >Only Postgres seems to support the concept of isolation levels although >defined in the SQL standards... Well from http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html mysql with innodb has: SET [GLOBAL \| SESSION] TRANSACTION ISOLATION LEVEL { REPEATABLE READ \| READ COMMITTED \| READ UNCOMMITTED \| SERIALIZABLE } This statement sets the transaction isolation level, used for operations on InnoDB tables. Scope of the Isolation Level You can set the isolation level globally, for the current session, or for the next transaction: With the GLOBAL keyword, the statement sets the default transaction level globally for all subsequent sessions. Existing sessions are unaffected. With the SESSION keyword, the statement sets the default transaction level for all subsequent transactions performed within the current session. Without any SESSION or GLOBAL keyword, the statement sets the isolation level for the next (not started) transaction performed within the current session. Also https://www.sqlite.org/sharedcache.html seems to indicate that sqlite with shared cache can support read_uncommitted mode.
msg5071	Author: [hidden] (schlatterbeck)	Date: 2014-04-06 11:14
Implement new config option 'isolation_level' in rdbms section. This is only relevant for postgres and mysql (sqlite does locking and implements a very high isolation level by default :-) Thanks John for the hint that mysql also implements this. Add a test for Postgres that reproduces the observed history behaviour with isolation level 'read committed' and throws a serialization error with level 'repeatable read'. The isolation level 'read committed' is the new default (which was the case for postgres before but mysql by default seems to have repeatable read by default)
msg5072	Author: [hidden] (ber)	Date: 2014-04-07 09:18
Ralf, thanks for looking into it, the analysis and the fix. I still have to fully read and understand it! Just to recap: my two initial ideas were a) it possibly be reproduced without a new attribte, just with title or something. b) the use case is to be able to construct all historic values, even if they are there very briefly. So a change to title from "a" to "b" to "c" would all be in the log and history. Now with the defect "b" could have been lost. I hope this is not the case with the fix anymore.
msg5073	Author: [hidden] (ber)	Date: 2014-04-07 09:21
Shouldn't REPEATABLE READ be the default then?
msg5074	Author: [hidden] (schlatterbeck)	Date: 2014-04-07 14:11
On Mon, Apr 07, 2014 at 09:18:22AM +0000, Bernhard Reiter wrote: > > Bernhard Reiter added the comment: > > Ralf, thanks for looking into it, the analysis and the fix. > I still have to fully read and understand it! > > Just to recap: my two initial ideas were > a) it possibly be reproduced without a new attribte, just with title or > something. This is in the regression test, see the tests testConcurrentReadCommitted testConcurrentRepeatableRead in the class postgresqlJournalTest in test/test_postgresql.py The first reproduces the problem (with default transaction isolation 'read committed') the second shows that setting isolation to 'repeatable read' makes it fail with a serialization error. > b) the use case is to be able to construct all historic values, even if they > are there very briefly. They may not be there at all when using 'read committed'. Both transactions start at the same time and see the same old value. This is later reflected in the history, we have the same old value twice. > So a change to title from "a" to "b" to "c" would all be in the log and > history. Now with the defect "b" could have been lost. I hope this is not > the case with the fix anymore. If the changes to title occur at the same time you have two changes, one from "a" to "b", one from "a" to "c". The two transactions are not done one-after-the-other. So the behaviour is still the same with the fix if you leave the default isolation level at 'read committed'. Thats how SQL works. If you change the isolation level to something higher ('repeatable read'), you will also not get two changes "a"->"b" "b"->"c" but instead one of the two simultaneous transactions is aborted with a serialization error. So you end up with either "a"->"b" or "a"->"c" depending which transaction wins the race. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: http://www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com allmenda.com member email: rsc@allmenda.com
msg5075	Author: [hidden] (schlatterbeck)	Date: 2014-04-07 14:19
On Mon, Apr 07, 2014 at 09:21:29AM +0000, Bernhard Reiter wrote: > > Shouldn't REPEATABLE READ be the default then? This depends on what your goal is. For me it's worse that users get a serialization error (in our current web interface the users sees "An error has occurred") and possibly lose their submitted data than the occasional wrong history entry. But this depends on your application and what the history is used for. 'read commited' could also have other inconsistencies (e.g., if an auditor computes values over a table and stores the result in the database). But I think we don't have any applications with that behaviour in the default tracker schemas shipped. Note that setting this higher for mysql is probably less problematic because mysql seems to use locking and the two transactions would both go through, one after the other. I've not tested this, though. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: http://www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com allmenda.com member email: rsc@allmenda.com
msg5076	Author: [hidden] (ber)	Date: 2014-04-09 11:20
Ralf, thanks for the explanations! Still I am not sure what the expected behaviour from users/admins of roundup is. Some thoughts: There is some logic in the current behaviour in that a change often depends on the values it was aware of. So if two changes (or better change requests) are coming in, they have no chance to be aware of each other. So how to resolve this conflict? There certainly are use cases where I would want both change requests to be serialized and thus resulting in a full history. These are the cases where the "old" values do not have to be known to do the change. For the other behaviour I'll find it naturally that one is rejected as it could not have known the "old" values. Given that emails could be written well apart, but still come in at the same time, I also believe that the behaviour should be the same no matter when they arrive. If a change-request comes in, we currently have no way to know on which old values the decision for the change was made. We could try to introduce a way and it would allow to decline a change if it was not based on the latest value. Right now I cannot say if this is a real use case or not. What I also don't like is that the behaviour is different for the backends. In my view a roundup-based application should basically behave the same on all backends. In anyway, we certainly have to describe the behaviour in the documentation. Bernhard
msg5079	Author: [hidden] (rouilj)	Date: 2014-04-14 15:49
In msg 5075 Ralf Schlatterbeck said: >On Mon, Apr 07, 2014 at 09:21:29AM +0000, Bernhard Reiter wrote: > > > > Shouldn't REPEATABLE READ be the default then? > > This depends on what your goal is. For me it's worse that users > get a serialization error (in our current web interface the > users sees "An error has occurred") and possibly lose their > submitted data than the occasional wrong history entry. But > this depends on your application and what the history is used > for. In msg5076 Bernhard Reiter said: > If a change-request comes in, we currently have no way to know > on which old values the decision for the change was made. We > could try to introduce a way and it would allow to decline a > change if it was not based on the latest value. Right now I > cannot say if this is a real use case or not. Don't we already have conflict handling in the web interface? > What I also don't like is that the behaviour is different for > the backends. In my view a roundup-based application should > basically behave the same on all backends. > > In anyway, we certainly have to describe the behaviour in the > documentation. Would this be reasonable if we use serializable or repeatable read: If the interface is interactive (web) based, throw the same error that is raised if you try to commit data that was changed underneath you. (0.8.0 release of - better edit conflict handling (sf bug 1118790)). That way the user/application can decide. If the interface is non-interactive (e.g. email) based such that there is no good way to have the user reconscile the change, retry the commit to see if it can be done (and maybe send a notice on the ticket that this was a conflicted change). This should result in: 1) no loss of data (worst case, the user has to verify and resubmit, or can choose to discard the update) 2) allows a way for a human to determine the suitability of the change when possible. If we want the system to be consistant across all database backends, isn't serializable the target we should aim for? -- rouilj
msg5080	Author: [hidden] (schlatterbeck)	Date: 2014-04-14 16:11
On Mon, Apr 14, 2014 at 03:50:00PM +0000, John Rouillard wrote: > Don't we already have conflict handling in the web interface? > > > What I also don't like is that the behaviour is different for > > the backends. In my view a roundup-based application should > > basically behave the same on all backends. Yes but the conflict handling currently only compares the previous values and refuses to update if any have changed. It currently does not handle serialization errors. > Would this be reasonable if we use serializable or repeatable > read: > > If the interface is interactive (web) based, throw the same > error that is raised if you try to commit data that was changed > underneath you. (0.8.0 release of - better edit conflict > handling (sf bug 1118790)). That way the user/application can > decide. Yes, good idea. (Patches welcome I currently don't have much time to look into this although this is definitely high prio) > If the interface is non-interactive (e.g. email) based such > that there is no good way to have the user reconscile the > change, retry the commit to see if it can be done (and maybe > send a notice on the ticket that this was a conflicted change). > > This should result in: > > 1) no loss of data (worst case, the user has to verify and > resubmit, or can choose to discard the update) > 2) allows a way for a human to determine the suitability of > the change when possible. This is not quite true, the change can overwrite a previous change via web interface. But: This is the case anyway if the change in the web-interface happens a little earlier so that both transactions don't conflict. We currently don't have a precondition on changes via email that looks for modifications... > If we want the system to be consistant across all database > backends, isn't serializable the target we should aim for? This is a question of performance and -- on a busy tracker -- failed transactions. afaik at least mysql does a lot of locking in case you want serializable so I don't think this should be the default. Note that there is another bug concerned with serialization: The session cookies are updated in the same transaction as the normal updates. This often fails in a busy tracker because browsers often update session cookies simulateneously when restarting with several open pages to a roundup tracker (!) I've found issue2550671 but I think I remember there being an even older issues with the same problem. Ralf -- Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: http://www.runtux.com Reichergasse 131, A-3411 Weidling email: office@runtux.com allmenda.com member email: rsc@allmenda.com

History
Date	User	Action	Args
2014-04-14 16:11:25	schlatterbeck	set	messages: + msg5080
2014-04-14 15:50:00	rouilj	set	messages: + msg5079
2014-04-09 11:20:20	ber	set	messages: + msg5076
2014-04-07 14:19:02	schlatterbeck	set	messages: + msg5075
2014-04-07 14:11:54	schlatterbeck	set	messages: + msg5074
2014-04-07 09:21:29	ber	set	messages: + msg5073
2014-04-07 09:18:22	ber	set	messages: + msg5072
2014-04-06 11:14:37	schlatterbeck	set	status: new -> closed resolution: fixed messages: + msg5071
2014-04-04 23:53:18	rouilj	set	messages: + msg5070
2014-04-04 23:13:25	schlatterbeck	set	messages: + msg5069
2014-04-04 07:27:32	schlatterbeck	set	messages: + msg5068
2014-04-02 09:54:21	ber	set	messages: + msg5058
2013-04-24 06:34:38	ThomasAH	set	nosy: + ThomasAH
2013-04-23 18:34:30	ber	create

Roundup Tracker - Issues

Issue 2550806