Message7261
User upgrading and changing database is importing an export file.
Trying to import an export they get:
IntegrityError: UNIQUE constraint failed: _user.__retired__,
_user._username
They have multiple retired users with the same username.
As a workaround, sorting the input user.csv file by (username, retired)
so that all retired=true values are first for a given username works.
The import has two steps in rdbms based systems (this issue
doesn't happen in anydbm). The key is username.
1. create a new node that sets the unique composite index (key,
__retired__) where __retired__ has the default value of 0.
2. retire it by updating the unique composite index (key
__retired__) setting __retired__ to the id.
If I have an export file ordered like:
(2021, 6, 5, 22, 17, 28.615, 0, 0,
0):'1':'dupe@example.com':...'22':...=
:'duplicate':True
(2021, 6, 5, 22, 20, 5.85, 0, 0,
0):'1':'dupl@example.com':...:'24':...:=
'duplicate':False
it will import correctly. As the unique index will see:
duplicate, 0 (id 22)
duplicate, 22 (id 22)
duplicate, 0 (id 24)
However if the active entry is imported first:
(2021, 6, 5, 22, 20, 5.85, 0, 0,
0):'1':'dupl@example.com':...:'24':...:=
'duplicate':False
(2021, 6, 5, 22, 17, 28.615, 0, 0,
0):'1':'dupe@example.com':...'22':...=
:'duplicate':True
the unique index sees:
duplicate,0 (id 24)
duplicate,0 (id 22) # conflict
and we get the error. But how do we fix thing for the future? I
think reusing a username is an edge case (and confusing), but we
should handle this better.
I can change the export to sort by (id, retired). Sorting by id
on the assumption that the active entry is the newest entry seems
a dangerous assumption, so sort on the tuple. That should fix it
for the future.
But this doesn't allow importing an unsorted/missorted export.
To read these:
1. Import could read an entire csv and sort properly (taking
possibly a large amount of memory). Not a great idea IMO.
2. Handle a retry when the exception is triggered. On exception,
changing the non-retired index entry from
(key1, 0) to (key1, -1). Then retry the failing insert.
When the retry succeeds, update the index for key1 back to 0. If
-1 doesn't work for some reason use 10000 or some other sentinel
number (that we hope is not a valid value for a retired user).
Or we could leave the -1 (sentry) value until all entries are
fully imported and do one update of the index changing -1 to
0. That is probably performs better.
3. The code could be rewritten to set the __retired__ property on
initial node creation, but that looks to need pretty invasive
changes.
Full thread at: https://sourceforge.net/p/roundup/mailman/roundup-
devel/thread/20210606192127.CF6986A0020%40pe15.cs.umb.edu/#msg37297018
Initial report/triage in irc. |
|
Date |
User |
Action |
Args |
2021-06-07 13:36:18 | rouilj | set | recipients:
+ rouilj |
2021-06-07 13:36:18 | rouilj | set | messageid: <1623072978.21.0.923795997756.issue2551142@roundup.psfhosted.org> |
2021-06-07 13:36:18 | rouilj | link | issue2551142 messages |
2021-06-07 13:36:17 | rouilj | create | |
|