Roundup Tracker - Issues

Issue 2550806

classification
Title: Parallel change requests to properties from auditors may lead to data loss
Type: Severity: urgent
Components: Database Versions: 1.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: schlatterbeck Nosy List: ThomasAH, ber, rouilj, schlatterbeck
Priority: high Keywords:

Created on 2013-04-23 18:34 by ber, last changed 2014-04-14 16:11 by schlatterbeck.

Messages
msg4846 Author: [hidden] (ber) Date: 2013-04-23 18:34
This issue is split out from issue2550731 (Add mechanism for the detectors to be 
able to tell the source of the data changes) where it was detected.

The symptoms are:
When using a database that allow parallel access. Using an auditor that sets an 
attribute to different values (and delays some time) to show the effect with a 
higher  chance. If then two requests come in quickly, the attribute value of the first 
request is lost from the history record. Thus it is lost at all, which is data loss of 
possible user input and this not happen.

Expected behaviour:
The requests are serialised somehow and all attribute values make it into the 
history.

First observations:
Ralf expected the database transaction mechanism to work here and prevent the 
trouble. Issue2550731 seems to prove that this does not work, at least not with 
Postgresql. 

Next steps:
Ralf said, he wanted to peek into this (msg4739) as he has some experience with 
database transactions. (So I took the liberty to assign it to Ralf.)

It probably makes sense to:
a) Simply the test case to be reproduced without the source detection with 
current roundup mainline. And to be triggered with as little code as possible. Best 
would be a test function that we can use for automatic testing as well.
b) Try the problem with other databases that have parallel access like MariaDB, 
to rule out a PostgreSQL specific problem.
msg5058 Author: [hidden] (ber) Date: 2014-04-02 09:54
Hi Ralf,
as I saw a number of improvements you did to roundup in the last days,
what about this issue? I still fear that it is a significant problem losing
data in some race conditions.
msg5068 Author: [hidden] (schlatterbeck) Date: 2014-04-04 07:27
The "old-nabble" links to roundup-devel threads in issue2550731 no
longer work, here are the correct ones:

Wierd bug in roundup 1.4.20 with postgres back end
http://sourceforge.net/p/roundup/mailman/message/29359236/

Need testing help for issue2550731 allow detectors to reject insecure
changes
http://sourceforge.net/p/roundup/mailman/message/29141150/
msg5069 Author: [hidden] (schlatterbeck) Date: 2014-04-04 23:13
On Wed, Apr 02, 2014 at 09:54:21AM +0000, Bernhard Reiter wrote:
> 
> Hi Ralf,
> as I saw a number of improvements you did to roundup in the last days,
> what about this issue? I still fear that it is a significant problem losing
> data in some race conditions.

Thanks for the reminder.

I've now checked -- and reproduced -- the behaviour. Thanks John for
providing a detailed instruction and appropriate auditors to test this.

I did run two concurrent command-line updates (I didn't care about the
txSource, the nonce was fine for me).

First command:
roundup-admin -i . set issue1 priority=urgent
tx_SourceCheckAudit(1) pre db.tx_Source: cli, nonce: 6aa1acdbfe6b49078484
tx_SourceCheckAudit(1) post db.tx_Source: cli, nonce 6aa1acdbfe6b49078484
tx_SourceCheckReact(1) pre db.tx_Source: cli, recorded: arrived via: cli 6aa1acdbfe6b49078484
tx_SourceCheckReact(1) post db.tx_Source: cli, recorded: arrived via: cli 6aa1acdbfe6b49078484

Second command:
roundup-admin -i . set issue1 status=unread
tx_SourceCheckAudit(1) pre db.tx_Source: cli, nonce: 23e9adb6fc638761eb4f
tx_SourceCheckAudit(1) post db.tx_Source: cli, nonce 23e9adb6fc638761eb4f
tx_SourceCheckReact(1) pre db.tx_Source: cli, recorded: arrived via: cli 23e9adb6fc638761eb4f
tx_SourceCheckReact(1) post db.tx_Source: cli, recorded: arrived via: cli 23e9adb6fc638761eb4f

Displayed history (cut&paste from web-interface):
2014-04-04 21:12:07     admin   set     tx_Source: arrived via: cli 84ef0e4322500b890fb0 -> arrived via: cli 23e9adb6fc638761eb4f
2014-04-04 21:11:56     admin   set     priority: bug -> urgent
tx_Source: arrived via: cli 84ef0e4322500b890fb0 -> arrived via: cli 84ef0e4322500b890fb0
2014-04-04 21:08:34     admin   set     status: deferred -> unread
tx_Source: arrived via: web c05d48aea9f81a9ae254 -> arrived via: cli 84ef0e4322500b890fb0
2013-01-04 00:02:21     admin   set     priority: urgent -> bug
tx_Source: arrived via: web d597645edc0bc58d7943 -> arrived via: web c05d48aea9f81a9ae254
2013-01-04 00:01:35     admin   create

We clearly see that the first two lines in the history display
corresponding to the latest two changes (at 21:11:56 and 21:12:07) start
from the same nonce value that was set in the "set" at 21:08:34

Now it's nice to know what is actually stored in the journal, I've
reformatted the journal entries to make this a little clearer:

>>> print db.getjournal('issue', '1')
[
('1', <Date 2013-01-04.00:01:35.750>, '1', 'create', {}),
('1', <Date 2013-01-04.00:02:21.859>, '1', 'set',
     {'priority': '2', 'tx_Source': 'arrived via: web d597645edc0bc58d7943'}),
('1', <Date 2014-04-04.21:08:34.814>, '1', 'set',
     {'status': '2', 'tx_Source': 'arrived via: web c05d48aea9f81a9ae254'}),
('1', <Date 2014-04-04.21:11:56.923>, '1', 'set',
     {'priority': '3', 'tx_Source': 'arrived via: cli 84ef0e4322500b890fb0'}),
('1', <Date 2014-04-04.21:12:07.040>, '1', 'set',
     {'tx_Source': 'arrived via: cli 84ef0e4322500b890fb0'})
]

The current value of tx_Source is:
>>> db.issue.get('1','tx_Source')
'arrived via: cli 23e9adb6fc638761eb4f'

We see that the journal entry only stores the *old* value *before* the
database transaction. The interpolation in the web-interface simply
walks the list and remembers the old value from the previous journal
entry.

In this case my second (nonce 23e9adb6fc638761eb4f) transaction "won"
and the tx_Source update from my first transaction (nonce
6aa1acdbfe6b49078484) wasn't seen by any transaction and might have
never been written.

This is normal serialisation behaviour: The database if free to not
perform updates that are subsequently overwritten by a later
transaction.

So both my transactions seem to have started with the same old tx_Source
value with the nonce 84ef0e4322500b890fb0 and both put this old value
into their journal entry (thats the old tx_Source when both of my two
concurrent transactions were started).

The history printing mechanism creates from the list of nonces in the
journal (see output of getjournal call above):
d597645edc0bc58d7943
c05d48aea9f81a9ae254
84ef0e4322500b890fb0
84ef0e4322500b890fb0

the following (wrong) update list:
d597645edc0bc58d7943->c05d48aea9f81a9ae254
c05d48aea9f81a9ae254->84ef0e4322500b890fb0
84ef0e4322500b890fb0->84ef0e4322500b890fb0
84ef0e4322500b890fb0->23e9adb6fc638761eb4f

Which looks like the value 84ef0e4322500b890fb0 was there twice
but results from two concurrent updates both seeing the same old value.

So there is nothing wrong with roundups serialisation.

Currently roundup is using the lowest isolation level supported by
PostgreSQL, "Read committed".
See also:
http://www.postgresql.org/docs/9.1/static/transaction-iso.html

Setting a higher isolation level, at least "repeatable read" would work
for this case, at least if the auditor would read the tx_Source property
before updating it (but this is done by the detector framework anyway,
so no need to modify the detectors). In that case one of the
transactions would be aborted with a serialization error.

Currently the isolation level isn't settable in roundup, maybe we should
make this configurable?

Only Postgres seems to support the concept of isolation levels although
defined in the SQL standards...

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
allmenda.com member                     email: rsc@allmenda.com
msg5070 Author: [hidden] (rouilj) Date: 2014-04-04 23:53
In message <20140404231316.GA19617@runtux.com> <20140404231316.GA19617@runtux.c
om>,
Ralf Schlatterbeck writes:
>This is normal serialisation behaviour: The database if free to not
>perform updates that are subsequently overwritten by a later
>transaction.
>
>So both my transactions seem to have started with the same old tx_Source
>value with the nonce 84ef0e4322500b890fb0 and both put this old value
>into their journal entry (thats the old tx_Source when both of my two
>concurrent transactions were started).
> [...]
>So there is nothing wrong with roundups serialisation.
>
>Currently roundup is using the lowest isolation level supported by
>PostgreSQL, "Read committed".
>See also:
>http://www.postgresql.org/docs/9.1/static/transaction-iso.html
>
>Setting a higher isolation level, at least "repeatable read" would work
>for this case, at least if the auditor would read the tx_Source property
>before updating it (but this is done by the detector framework anyway,
>so no need to modify the detectors). In that case one of the
>transactions would be aborted with a serialization error.
>
>Currently the isolation level isn't settable in roundup, maybe we should
>make this configurable?
>
>Only Postgres seems to support the concept of isolation levels although
>defined in the SQL standards...

Well from 

  http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html

mysql with innodb has:

  SET [GLOBAL | SESSION] TRANSACTION ISOLATION LEVEL
    {
	 REPEATABLE READ
       | READ COMMITTED
       | READ UNCOMMITTED
       | SERIALIZABLE
     }

  This statement sets the transaction isolation level, used for
  operations on InnoDB tables.

  Scope of the Isolation Level

  You can set the isolation level globally, for the current session, or
  for the next transaction:

      With the GLOBAL keyword, the statement sets the default
      transaction level globally for all subsequent sessions. Existing
      sessions are unaffected.

      With the SESSION keyword, the statement sets the default
      transaction level for all subsequent transactions performed within
      the current session.

      Without any SESSION or GLOBAL keyword, the statement sets the
      isolation level for the next (not started) transaction performed
      within the current session.

Also https://www.sqlite.org/sharedcache.html seems to indicate that
sqlite with shared cache can support read_uncommitted mode.
msg5071 Author: [hidden] (schlatterbeck) Date: 2014-04-06 11:14
Implement new config option 'isolation_level' in rdbms section. This is
only relevant for postgres and mysql (sqlite does locking and implements
a very high isolation level by default :-)

Thanks John for the hint that mysql also implements this.

Add a test for Postgres that reproduces the observed history behaviour
with isolation level 'read committed' and throws a serialization error
with level 'repeatable read'.

The isolation level 'read committed' is the new default (which was the
case for postgres before but mysql by default seems to have repeatable
read by default)
msg5072 Author: [hidden] (ber) Date: 2014-04-07 09:18
Ralf, thanks for looking into it, the analysis and the fix.
I still have to fully read and understand it!

Just to recap: my two initial ideas were
a) it possibly be reproduced without a new attribte, just with title or 
something.
b) the use case is to be able to construct all historic values, even if they 
are there very briefly. 

So a change to title from "a" to "b" to "c" would all be in the log and 
history. Now with the defect "b" could have been lost. I hope this is not 
the case with the fix anymore.
msg5073 Author: [hidden] (ber) Date: 2014-04-07 09:21
Shouldn't REPEATABLE READ be the default then?
msg5074 Author: [hidden] (schlatterbeck) Date: 2014-04-07 14:11
On Mon, Apr 07, 2014 at 09:18:22AM +0000, Bernhard Reiter wrote:
> 
> Bernhard Reiter added the comment:
> 
> Ralf, thanks for looking into it, the analysis and the fix.
> I still have to fully read and understand it!
> 
> Just to recap: my two initial ideas were
> a) it possibly be reproduced without a new attribte, just with title or 
> something.
This is in the regression test, see the tests
testConcurrentReadCommitted
testConcurrentRepeatableRead
in the class postgresqlJournalTest in test/test_postgresql.py
The first reproduces the problem (with default transaction isolation
'read committed') the second shows that setting isolation to
'repeatable read' makes it fail with a serialization error.

> b) the use case is to be able to construct all historic values, even if they 
> are there very briefly. 
They may not be there at all when using 'read committed'. Both
transactions start at the same time and see the same old value.
This is later reflected in the history, we have the same old value
twice.

> So a change to title from "a" to "b" to "c" would all be in the log and 
> history. Now with the defect "b" could have been lost. I hope this is not 
> the case with the fix anymore.
If the changes to title occur at the same time you have two changes, one
from "a" to "b", one from "a" to "c". The two transactions are *not*
done one-after-the-other.
So the behaviour is still the same with the fix if you leave the default
isolation level at 'read committed'. Thats how SQL works.

If you change the isolation level to something higher ('repeatable
read'), you will also not get two changes "a"->"b" "b"->"c" but instead
one of the two simultaneous transactions is aborted with a serialization
error. So you end up with either "a"->"b" or "a"->"c" depending which
transaction wins the race.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
allmenda.com member                     email: rsc@allmenda.com
msg5075 Author: [hidden] (schlatterbeck) Date: 2014-04-07 14:19
On Mon, Apr 07, 2014 at 09:21:29AM +0000, Bernhard Reiter wrote:
> 
> Shouldn't REPEATABLE READ be the default then?

This depends on what your goal is. For me it's worse that users get a
serialization error (in our current web interface the users sees "An
error has occurred") and possibly lose their submitted data than the
occasional wrong history entry. But this depends on your application and
what the history is used for.

'read commited' could also have other inconsistencies (e.g., if an
auditor computes values over a table and stores the result in the
database). But I think we don't have any applications with that
behaviour in the default tracker schemas shipped.

Note that setting this higher for mysql is probably less problematic
because mysql seems to use locking and the two transactions would both
go through, one after the other. I've not tested this, though.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
allmenda.com member                     email: rsc@allmenda.com
msg5076 Author: [hidden] (ber) Date: 2014-04-09 11:20
Ralf, 
thanks for the explanations!

Still I am not sure what the expected behaviour from users/admins of 
roundup is. Some thoughts:

There is some logic in the current behaviour in that a change
often depends on the values it was aware of.
So if two changes (or better change requests) are coming in, 
they have no chance to be aware of each other. 
So how to resolve this conflict?

There certainly are use cases where I would want both change requests 
to be serialized and thus resulting in a full history. These are the cases 
where the "old" values do not have to be known to do the change.
For the other behaviour I'll find it naturally that one is rejected as it could 
not have known the "old" values. Given that emails could be written well 
apart, but still come in at the same time, I also believe that the behaviour 
should be the same no matter when they arrive.

If a change-request comes in, we currently have no way to know
on which old values the decision for the change was made.  We could try 
to introduce a way and it would allow to decline a change if it was
not based on the latest value. Right now I cannot say if this is a real
use case or not. 

What I also don't like is that the behaviour is different for the backends.
In my view a roundup-based application should basically behave the same 
on all backends.

In anyway, we certainly have to describe 
the behaviour in the documentation.

Bernhard
msg5079 Author: [hidden] (rouilj) Date: 2014-04-14 15:49
In msg 5075 Ralf Schlatterbeck said:
>On Mon, Apr 07, 2014 at 09:21:29AM +0000, Bernhard Reiter wrote:
> > 
> > Shouldn't REPEATABLE READ be the default then?
>
> This depends on what your goal is. For me it's worse that users
> get a serialization error (in our current web interface the
> users sees "An error has occurred") and possibly lose their
> submitted data than the occasional wrong history entry. But
> this depends on your application and what the history is used
> for.

In msg5076 Bernhard Reiter said:
> If a change-request comes in, we currently have no way to know
> on which old values the decision for the change was made.  We
> could try to introduce a way and it would allow to decline a
> change if it was not based on the latest value. Right now I
> cannot say if this is a real use case or not.

Don't we already have conflict handling in the web interface?
 
> What I also don't like is that the behaviour is different for
> the backends.  In my view a roundup-based application should
> basically behave the same on all backends.
> 
> In anyway, we certainly have to describe the behaviour in the
> documentation.

Would this be reasonable if we use serializable or repeatable
read:

  If the interface is interactive (web) based, throw the same
  error that is raised if you try to commit data that was changed
  underneath you. (0.8.0 release of - better edit conflict
  handling (sf bug 1118790)). That way the user/application can
  decide.

  If the interface is non-interactive (e.g. email) based such
  that there is no good way to have the user reconscile the
  change, retry the commit to see if it can be done (and maybe
  send a notice on the ticket that this was a conflicted change).

This should result in:

  1) no loss of data (worst case, the user has to verify and
     resubmit, or can choose to discard the update)
  2) allows a way for a human to determine the suitability of
     the change when possible.

If we want the system to be consistant across all database
backends, isn't serializable the target we should aim for?

  -- rouilj
msg5080 Author: [hidden] (schlatterbeck) Date: 2014-04-14 16:11
On Mon, Apr 14, 2014 at 03:50:00PM +0000, John Rouillard wrote:
> Don't we already have conflict handling in the web interface?
>  
> > What I also don't like is that the behaviour is different for
> > the backends.  In my view a roundup-based application should
> > basically behave the same on all backends.

Yes but the conflict handling currently only compares the previous
values and refuses to update if any have changed. It currently does
*not* handle serialization errors.

> Would this be reasonable if we use serializable or repeatable
> read:
> 
>   If the interface is interactive (web) based, throw the same
>   error that is raised if you try to commit data that was changed
>   underneath you. (0.8.0 release of - better edit conflict
>   handling (sf bug 1118790)). That way the user/application can
>   decide.
Yes, good idea.
(Patches welcome I currently don't have much time to look into this
although this is definitely high prio)

>   If the interface is non-interactive (e.g. email) based such
>   that there is no good way to have the user reconscile the
>   change, retry the commit to see if it can be done (and maybe
>   send a notice on the ticket that this was a conflicted change).
> 
> This should result in:
> 
>   1) no loss of data (worst case, the user has to verify and
>      resubmit, or can choose to discard the update)
>   2) allows a way for a human to determine the suitability of
>      the change when possible.

This is not quite true, the change can overwrite a previous change via
web interface. But: This is the case anyway if the change in the
web-interface happens a *little* earlier so that both transactions don't
conflict. We currently don't have a precondition on changes via email
that looks for modifications...

> If we want the system to be consistant across all database
> backends, isn't serializable the target we should aim for?

This is a question of performance and -- on a busy tracker -- failed
transactions. afaik at least mysql does a lot of locking in case you
want serializable so I don't think this should be the default.

Note that there is another bug concerned with serialization: The session
cookies are updated in the same transaction as the normal updates. This
often fails in a busy tracker because browsers often update session
cookies simulateneously when restarting with several open pages to a
roundup tracker (!)
I've found issue2550671 but I think I remember there being an even older
issues with the same problem.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
allmenda.com member                     email: rsc@allmenda.com
History
Date User Action Args
2014-04-14 16:11:25schlatterbecksetmessages: + msg5080
2014-04-14 15:50:00rouiljsetmessages: + msg5079
2014-04-09 11:20:20bersetmessages: + msg5076
2014-04-07 14:19:02schlatterbecksetmessages: + msg5075
2014-04-07 14:11:54schlatterbecksetmessages: + msg5074
2014-04-07 09:21:29bersetmessages: + msg5073
2014-04-07 09:18:22bersetmessages: + msg5072
2014-04-06 11:14:37schlatterbecksetstatus: new -> closed
resolution: fixed
messages: + msg5071
2014-04-04 23:53:18rouiljsetmessages: + msg5070
2014-04-04 23:13:25schlatterbecksetmessages: + msg5069
2014-04-04 07:27:32schlatterbecksetmessages: + msg5068
2014-04-02 09:54:21bersetmessages: + msg5058
2013-04-24 06:34:38ThomasAHsetnosy: + ThomasAH
2013-04-23 18:34:30bercreate