Roundup Tracker - Issues

Issue 2551142

classification
Import of retired node with username after active node fails with unique constraint failure.
Type: behavior Severity: major
Components: Database Versions:
process
Status: fixed fixed
:
: rouilj : rouilj, schlatterbeck
Priority: high :

Created on 2021-06-07 13:36 by rouilj, last changed 2022-01-29 16:36 by rouilj.

Messages
msg7261 Author: [hidden] (rouilj) Date: 2021-06-07 13:36
User upgrading and changing database is importing an export file.
Trying to import an export they get:

   IntegrityError: UNIQUE constraint failed: _user.__retired__, 
_user._username

They have multiple retired users with the same username.

As a workaround, sorting the input user.csv file by (username, retired)
so that all retired=true values are first for a given username works.

The import has two steps in rdbms based systems (this issue
doesn't happen in anydbm). The key is username.

  1. create a new node that sets the unique composite index (key,
     __retired__) where __retired__ has the default value of 0.
  2. retire it by updating the unique composite index (key
     __retired__) setting __retired__ to the id.

If I have an export file ordered like:

  (2021, 6, 5, 22, 17, 28.615, 0, 0, 
0):'1':'dupe@example.com':...'22':...=
:'duplicate':True

  (2021, 6, 5, 22, 20, 5.85, 0, 0, 
0):'1':'dupl@example.com':...:'24':...:=
'duplicate':False

it will import correctly. As the unique index will see:

   duplicate, 0  (id 22)
   duplicate, 22 (id 22)
   duplicate, 0  (id 24)

However if the active entry is imported first:

  (2021, 6, 5, 22, 20, 5.85, 0, 0, 
0):'1':'dupl@example.com':...:'24':...:=
'duplicate':False

  (2021, 6, 5, 22, 17, 28.615, 0, 0, 
0):'1':'dupe@example.com':...'22':...=
:'duplicate':True

the unique index sees:

  duplicate,0   (id 24)
  duplicate,0   (id 22)  # conflict

and we get the error. But how do we fix thing for the future? I
think reusing a username is an edge case (and confusing), but we
should handle this better.

I can change the export to sort by (id, retired). Sorting by id
on the assumption that the active entry is the newest entry seems
a dangerous assumption, so sort on the tuple. That should fix it
for the future.

But this doesn't allow importing an unsorted/missorted export.
To read these:

1. Import could read an entire csv and sort properly (taking
possibly a large amount of memory). Not a great idea IMO.

2. Handle a retry when the exception is triggered. On exception,
   changing the non-retired index entry from
   (key1, 0) to (key1, -1). Then retry the failing insert.

When the retry succeeds, update the index for key1 back to 0. If
-1 doesn't work for some reason use 10000 or some other sentinel
number (that we hope is not a valid value for a retired user).

Or we could leave the -1 (sentry) value until all entries are
fully imported and do one update of the index changing -1 to
0. That is probably performs better.

3. The code could be rewritten to set the __retired__ property on
initial node creation, but that looks to need pretty invasive
changes.


Full thread at: https://sourceforge.net/p/roundup/mailman/roundup-
devel/thread/20210606192127.CF6986A0020%40pe15.cs.umb.edu/#msg37297018

Initial report/triage in irc.
msg7262 Author: [hidden] (rouilj) Date: 2021-06-07 14:04
Initial fix in: changeset:   6431:ada1edcc9132

Export is sorted to prevent the issue.

Import uses method #2 with value -1 to fixup/move aside the active
record. The -1 fixup is reset after the fixup succeeds or fails rather 
than batch fixup at the end. I think the code is more readable this way.

Also the fixup is logged at info level when it runs and when it 
succeeds.
If it fails an exception is propagated.
msg7263 Author: [hidden] (rouilj) Date: 2021-06-07 15:11
Well CI isn't being nice to me.

memorydb failed. It behaves like anydbm, so I have a fix deployed.
changeset:   6432:97a45bfa62a8

mysql and postgres both failed. For mysql it looks like it's not
seeing a unique constraint error. I have no logged messages.

For postgres, it looks like entering the duplicate row messes
up the db handle, so my attempt to perform a query against the db
to find the active entry throws a:

    psycopg2.errors.InFailedSqlTransaction: current transaction is 
aborted, commands ignored until end of transaction block

on the cursor. Ralf can you take a look at this one? Is it related
to your work in using a cursor rather than a non-cursor?

See https://www.travis-ci.com/github/roundup-
tracker/roundup/jobs/511673688 for the error tracebacks for error
output for all cases. Postgres is at the end.
msg7264 Author: [hidden] (rouilj) Date: 2021-06-08 01:41
Testing with mysql now works. Turns out the unique index
(__retired__, keyvalue) wasn't being created.

I added the call to add_class_key_required_unique_constraint to 
create_class_table_indexes defined in in back_mysql.py.
This now matches the function of create_class_table_indexes
in roundup/backends/rdbms_common.py.

Not sure of the effect of this change on existing trackers.
Still need to investigate.
msg7265 Author: [hidden] (rouilj) Date: 2021-06-08 02:48
Steffan, the fix for issue1876683 (that you suggested) seems to have
been rolled back or missed for mysql.

The upshot is that a unique index on the __retired__ and key value is 
missing for the mysql backend.

Do you have any idea what would happen if I were to make the change 
specified in this ticket to back_mysql? Would it crash people's
databases? Do you remember how you recovered from the duplicate
entries with the same key in your database?

Thanks.
msg7266 Author: [hidden] (stefan) Date: 2021-06-08 02:55
Sorry, you are referring to a conversation (and a proposal) from > 12 years ago. That's a bit too much to remember enough details to be able to give advice on this.
msg7277 Author: [hidden] (rouilj) Date: 2021-06-10 22:02
Postgres is now fixed.  When an exception is raised, the database
handle is invalid and select/update etc. raise an exception.
Performing a fixup requires the ability to select from the
database. Now the database is committed after each row and on
exception, the database is rolled back.

For other rdbms backends, the methods that commit/rollback are
no-ops.

Mysql is fixed so that the index that makes sure a class's key is
unique is added.

The the roundup database version for rdbms backends is now
version 6.  Admins need to do a 'roundup-admin migrate' as part
of this deployment.  Only mysql gets a schema change to
add the index that makes sure that the class's key is unique.

memorydb skips the test similar to anydbm since it never had the issue.

changeset:   6433:c1d3fbcdbfbd - version upgrade 5->6,
             constraint added to mysql, postgresql fixes.
changeset:   6434:269f39e28d5c - changes to testing


see upgrading.txt doc and checkin notes for more info.
msg7278 Author: [hidden] (rouilj) Date: 2021-06-10 22:03
Postgres is now fixed.  When an exception is raised, the database
handle is invalid and select/update etc. raise an exception.
Performing a fixup requires the ability to select from the
database. Now the database is committed after each row and on
exception, the database is rolled back.

For other rdbms backends, the methods that commit/rollback are
no-ops.

Mysql is fixed so that the index that makes sure a class's key is
unique is added.

The the roundup database version for rdbms backends is now
version 6.  Admins need to do a 'roundup-admin migrate' as part
of this deployment.  Only mysql gets a schema change to
add the index that makes sure that the class's key is unique.

memorydb skips the test similar to anydbm since it never had the issue.

changeset:   6433:c1d3fbcdbfbd - version upgrade 5->6,
             constraint added to mysql, postgresql fixes.
changeset:   6434:269f39e28d5c - changes to testing


see upgrading.txt doc and checkin notes for more info.
msg7446 Author: [hidden] (rouilj) Date: 2022-01-29 16:36
In changeset:   6610:db3f0ba75b4a, I reworked the postgresql code a bit.
Rather than using commit() and rollback() it uses savepoints. This should be
faster as it doesn't close/write and reopen a full transaction.
History
Date User Action Args
2022-01-29 16:36:39rouiljsetmessages: + msg7446
2021-06-10 22:03:48rouiljsetkeywords: - Blocker
messages: + msg7278
2021-06-10 22:02:19rouiljsetstatus: open -> fixed
nosy: - stefan
resolution: fixed
messages: + msg7277
components: + Database
2021-06-10 01:35:13rouiljsetkeywords: + Blocker
2021-06-08 02:55:48stefansetmessages: + msg7266
2021-06-08 02:48:28rouiljsetnosy: + stefan
messages: + msg7265
2021-06-08 01:41:48rouiljsetmessages: + msg7264
2021-06-07 15:11:05rouiljsetmessages: + msg7263
2021-06-07 14:04:37rouiljsetstatus: new -> open
nosy: + schlatterbeck
messages: + msg7262
2021-06-07 13:36:18rouiljcreate