Roundup Tracker - Issues

Issue 2551282

classification
MySQL utf8mb4 issues
Type: crash Severity: normal
Components: Database Versions: devel
process
Status: fixed fixed
:
: rouilj : ngaba, rouilj
Priority: : Blocker, patch

Created on 2023-06-30 01:23 by rouilj, last changed 2024-07-13 22:40 by rouilj.

Messages
msg7792 Author: [hidden] (rouilj) Date: 2023-06-30 01:23
Gabor Nagy reported:

 OperationalError: (1253, "COLLATION 'utf8mb3_bin' is not valid for CHARACTER SET 'utf8mb4'")

when using mysql. This appears to be a version issue with mysql and is a hint
that changes made in 2.2.0 and 2.0.0 to support other UTF8 charsets in mysql
is incomplete.

See email chain: https://sourceforge.net/p/roundup/mailman/roundup-users/thread/20230616213123.150b935f%40Dell/

TL;DR is to add a modification of this patch to use configurable
values:

diff --git a/back_mysql.py.bak b/back_mysql.py
index b0ec1c6..7b128a4 100644
--- a/back_mysql.py.bak
+++ b/back_mysql.py
@@ -92,9 +92,9 @@ def db_create(config):
     kwargs = connection_dict(config)
     conn = MySQLdb.connect(**kwargs)
     cursor = conn.cursor()
-    command = "CREATE DATABASE %s"%config.RDBMS_NAME
+    command = "CREATE DATABASE %s COLLATE utf8mb4_unicode_ci"%config.RDBMS_NAME
     if sys.version_info[0] > 2:
-        command += ' CHARACTER SET utf8'
+        command += ' CHARACTER SET utf8mb4'
     logging.info(command)
     cursor.execute(command)
     conn.commit()
@@ -625,7 +625,7 @@ class Database(rdbms_common.Database):
                 raise
 
 class MysqlClass:
-    case_sensitive_equal = 'COLLATE utf8_bin ='
+    case_sensitive_equal = 'COLLATE utf8mb4_bin ='
 
     # TODO: AFAIK its version dependent for MySQL
     supports_subselects = False

This will be deferred to 2.4.0 because 2.3.0 is already in beta and changing the
DB level at this point would delay 2.3.0. I will have more time to look at this in
August and hopefully more people will be able to test it before the 2.4.0 release.

Check the email thread for other issues with that patch and the possible requirement
to have a charset binary collation setting as it may not be possible to derive a suitable
collation from the charset.
msg7793 Author: [hidden] (ngaba) Date: 2023-06-30 09:13
The initial patch is just a worksforme proof (with my MariaDB, etc.).

I think it is not a good idea to hardwire these "utf8mb4" settings into back_mysql.py, because all users upgrading from an older Roundup version (with utf8mb3 MySQL database) will face with the cryptic exception 

OperationalError: (1253, "COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8mb3'")

if they want to use exact_match_spec; moreover, their mysql database will contain both utf8mb3 and utf8mb4 tables after a while (utf8mb4 tables will be created when user adds new classes to config.ini).

I think these utf8mb4 collations etc. should be read from config.ini.
The simplest way is to add config.ini options like
mysql_bin_collation = utf8mb4_bin
etc., or compute implicitly these from the value of mysql_charset. (The problem is that now the default mysql_charset is utf8mb4, so even this latter implementation is not backward compatible. So I would choose the new config.ini settings, that is more failsafe and user configurable.)

In addition, in Roundup 2.4.0 documentation, mysql users should be advised to convert their utf8mb3 tables to utf8mb4 and modify config.ini accordingly. (In my opinion, mysql should provide a clean documentation and a script or mysql command for doing the utfbm3->utf8mb4 conversion...)
msg7794 Author: [hidden] (ngaba) Date: 2023-06-30 09:27
Edit:
"utf8mb4 tables will be created when user adds new classes to config.ini"
I meant schema.py
msg7795 Author: [hidden] (rouilj) Date: 2023-07-01 05:50
Hi Gabor:

In message <1688116426.59.0.762927186513.issue2551282@roundup.psfhosted.org>,
Nagy Gabor writes:
>I think these utf8mb4 collations etc. should be read from config.ini.
>The simplest way is to add config.ini options like
>mysql_bin_collation = utf8mb4_bin
>etc., or compute implicitly these from the value of mysql_charset.

Agreed. I mentioned checking the email thread for specific collation
settings as well as the existing charset setting.

>In addition, in Roundup 2.4.0 documentation, mysql users should be
>advised to convert their utf8mb3 tables to utf8mb4 and modify
>config.ini accordingly. (In my opinion, mysql should provide a clean
>documentation and a script or mysql command for doing the
>utfbm3->utf8mb4 conversion...)

Agreed. This could be:

 1 an automatic migration similar to other schema upgrades run using
   the roundup-admin migrate command

 2 a separate roundup-admin comand to run SQL without having to
   install/resort to using the mysql(1) command.

I am leaning toward the second option because this isn't a schema
migration as much as an underlying change in the encoding of the same
schema. But I haven't given it a lot of thought.
msg7979 Author: [hidden] (rouilj) Date: 2024-04-07 02:49
Hi Gabor:

Can you please look at the latest commit: changeset:   7860:8b31893f5930

https://sourceforge.net/p/roundup/code/ci/8b31893f5930cbea37031d49bd08a631d1b6b654/

The commit message:
====
Fix issues with utf8 support in Roundup. By default using:

 utf8mb4 charset
 utf8mb4_unicode_ci collation (case insensitive)
 utf8mb4_0900_ci collation (case sensitive)

which are settable from config.ini. Sadly I couldn't come up with a way
to mange these from one parameter. Doing a compatibility lookup table
would have increased the maintenance burden and have me chasing MySQL
changes. So I opted for the easy path and have the admins (with more
MySQL experience) make the choices.

Conversion directions added to upgrading.txt. I don't have any good
testing for this. I was able to generate utf8/utf8mb3 tables and load
a little data and convert. However this is a poor substitute for a
conversion on a working tracker 8-(.
====

Any thoughts would be welcome.
msg8098 Author: [hidden] (rouilj) Date: 2024-07-13 22:40
2.4.0 is released with change directions for utf8mb4. Hopefully they work.
History
Date User Action Args
2024-07-13 22:40:50rouiljsetstatus: pending -> fixed
resolution: remind -> fixed
messages: + msg8098
2024-04-07 02:50:00rouiljsetstatus: new -> pending
assignee: rouilj
messages: + msg7979
2024-03-02 07:59:44rouiljsetkeywords: + Blocker
2023-07-01 05:50:21rouiljsetmessages: + msg7795
2023-06-30 09:27:16ngabasetmessages: + msg7794
2023-06-30 09:13:46ngabasetmessages: + msg7793
2023-06-30 01:23:24rouiljsetresolution: remind
2023-06-30 01:23:15rouiljcreate