Issue 2551142
Created on 2021-06-07 13:36 by rouilj, last changed 2022-01-29 16:36 by rouilj.
Messages | |||
---|---|---|---|
msg7261 | Author: [hidden] (rouilj) | Date: 2021-06-07 13:36 | |
User upgrading and changing database is importing an export file. Trying to import an export they get: IntegrityError: UNIQUE constraint failed: _user.__retired__, _user._username They have multiple retired users with the same username. As a workaround, sorting the input user.csv file by (username, retired) so that all retired=true values are first for a given username works. The import has two steps in rdbms based systems (this issue doesn't happen in anydbm). The key is username. 1. create a new node that sets the unique composite index (key, __retired__) where __retired__ has the default value of 0. 2. retire it by updating the unique composite index (key __retired__) setting __retired__ to the id. If I have an export file ordered like: (2021, 6, 5, 22, 17, 28.615, 0, 0, 0):'1':'dupe@example.com':...'22':...= :'duplicate':True (2021, 6, 5, 22, 20, 5.85, 0, 0, 0):'1':'dupl@example.com':...:'24':...:= 'duplicate':False it will import correctly. As the unique index will see: duplicate, 0 (id 22) duplicate, 22 (id 22) duplicate, 0 (id 24) However if the active entry is imported first: (2021, 6, 5, 22, 20, 5.85, 0, 0, 0):'1':'dupl@example.com':...:'24':...:= 'duplicate':False (2021, 6, 5, 22, 17, 28.615, 0, 0, 0):'1':'dupe@example.com':...'22':...= :'duplicate':True the unique index sees: duplicate,0 (id 24) duplicate,0 (id 22) # conflict and we get the error. But how do we fix thing for the future? I think reusing a username is an edge case (and confusing), but we should handle this better. I can change the export to sort by (id, retired). Sorting by id on the assumption that the active entry is the newest entry seems a dangerous assumption, so sort on the tuple. That should fix it for the future. But this doesn't allow importing an unsorted/missorted export. To read these: 1. Import could read an entire csv and sort properly (taking possibly a large amount of memory). Not a great idea IMO. 2. Handle a retry when the exception is triggered. On exception, changing the non-retired index entry from (key1, 0) to (key1, -1). Then retry the failing insert. When the retry succeeds, update the index for key1 back to 0. If -1 doesn't work for some reason use 10000 or some other sentinel number (that we hope is not a valid value for a retired user). Or we could leave the -1 (sentry) value until all entries are fully imported and do one update of the index changing -1 to 0. That is probably performs better. 3. The code could be rewritten to set the __retired__ property on initial node creation, but that looks to need pretty invasive changes. Full thread at: https://sourceforge.net/p/roundup/mailman/roundup- devel/thread/20210606192127.CF6986A0020%40pe15.cs.umb.edu/#msg37297018 Initial report/triage in irc. |
|||
msg7262 | Author: [hidden] (rouilj) | Date: 2021-06-07 14:04 | |
Initial fix in: changeset: 6431:ada1edcc9132 Export is sorted to prevent the issue. Import uses method #2 with value -1 to fixup/move aside the active record. The -1 fixup is reset after the fixup succeeds or fails rather than batch fixup at the end. I think the code is more readable this way. Also the fixup is logged at info level when it runs and when it succeeds. If it fails an exception is propagated. |
|||
msg7263 | Author: [hidden] (rouilj) | Date: 2021-06-07 15:11 | |
Well CI isn't being nice to me. memorydb failed. It behaves like anydbm, so I have a fix deployed. changeset: 6432:97a45bfa62a8 mysql and postgres both failed. For mysql it looks like it's not seeing a unique constraint error. I have no logged messages. For postgres, it looks like entering the duplicate row messes up the db handle, so my attempt to perform a query against the db to find the active entry throws a: psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block on the cursor. Ralf can you take a look at this one? Is it related to your work in using a cursor rather than a non-cursor? See https://www.travis-ci.com/github/roundup- tracker/roundup/jobs/511673688 for the error tracebacks for error output for all cases. Postgres is at the end. |
|||
msg7264 | Author: [hidden] (rouilj) | Date: 2021-06-08 01:41 | |
Testing with mysql now works. Turns out the unique index (__retired__, keyvalue) wasn't being created. I added the call to add_class_key_required_unique_constraint to create_class_table_indexes defined in in back_mysql.py. This now matches the function of create_class_table_indexes in roundup/backends/rdbms_common.py. Not sure of the effect of this change on existing trackers. Still need to investigate. |
|||
msg7265 | Author: [hidden] (rouilj) | Date: 2021-06-08 02:48 | |
Steffan, the fix for issue1876683 (that you suggested) seems to have been rolled back or missed for mysql. The upshot is that a unique index on the __retired__ and key value is missing for the mysql backend. Do you have any idea what would happen if I were to make the change specified in this ticket to back_mysql? Would it crash people's databases? Do you remember how you recovered from the duplicate entries with the same key in your database? Thanks. |
|||
msg7266 | Author: [hidden] (stefan) | Date: 2021-06-08 02:55 | |
Sorry, you are referring to a conversation (and a proposal) from > 12 years ago. That's a bit too much to remember enough details to be able to give advice on this. |
|||
msg7277 | Author: [hidden] (rouilj) | Date: 2021-06-10 22:02 | |
Postgres is now fixed. When an exception is raised, the database handle is invalid and select/update etc. raise an exception. Performing a fixup requires the ability to select from the database. Now the database is committed after each row and on exception, the database is rolled back. For other rdbms backends, the methods that commit/rollback are no-ops. Mysql is fixed so that the index that makes sure a class's key is unique is added. The the roundup database version for rdbms backends is now version 6. Admins need to do a 'roundup-admin migrate' as part of this deployment. Only mysql gets a schema change to add the index that makes sure that the class's key is unique. memorydb skips the test similar to anydbm since it never had the issue. changeset: 6433:c1d3fbcdbfbd - version upgrade 5->6, constraint added to mysql, postgresql fixes. changeset: 6434:269f39e28d5c - changes to testing see upgrading.txt doc and checkin notes for more info. |
|||
msg7278 | Author: [hidden] (rouilj) | Date: 2021-06-10 22:03 | |
Postgres is now fixed. When an exception is raised, the database handle is invalid and select/update etc. raise an exception. Performing a fixup requires the ability to select from the database. Now the database is committed after each row and on exception, the database is rolled back. For other rdbms backends, the methods that commit/rollback are no-ops. Mysql is fixed so that the index that makes sure a class's key is unique is added. The the roundup database version for rdbms backends is now version 6. Admins need to do a 'roundup-admin migrate' as part of this deployment. Only mysql gets a schema change to add the index that makes sure that the class's key is unique. memorydb skips the test similar to anydbm since it never had the issue. changeset: 6433:c1d3fbcdbfbd - version upgrade 5->6, constraint added to mysql, postgresql fixes. changeset: 6434:269f39e28d5c - changes to testing see upgrading.txt doc and checkin notes for more info. |
|||
msg7446 | Author: [hidden] (rouilj) | Date: 2022-01-29 16:36 | |
In changeset: 6610:db3f0ba75b4a, I reworked the postgresql code a bit. Rather than using commit() and rollback() it uses savepoints. This should be faster as it doesn't close/write and reopen a full transaction. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-01-29 16:36:39 | rouilj | set | messages: + msg7446 |
2021-06-10 22:03:48 | rouilj | set | keywords:
- Blocker messages: + msg7278 |
2021-06-10 22:02:19 | rouilj | set | status: open -> fixed nosy: - stefan resolution: fixed messages: + msg7277 components: + Database |
2021-06-10 01:35:13 | rouilj | set | keywords: + Blocker |
2021-06-08 02:55:48 | stefan | set | messages: + msg7266 |
2021-06-08 02:48:28 | rouilj | set | nosy:
+ stefan messages: + msg7265 |
2021-06-08 01:41:48 | rouilj | set | messages: + msg7264 |
2021-06-07 15:11:05 | rouilj | set | messages: + msg7263 |
2021-06-07 14:04:37 | rouilj | set | status: new -> open nosy: + schlatterbeck messages: + msg7262 |
2021-06-07 13:36:18 | rouilj | create |