Roundup Tracker - Issues

Issue 2551282

classification
MySQL utf8mb4 issues
Type: crash Severity: normal
Components: Database Versions: devel
process
Status: new remind
:
: : ngaba, rouilj
Priority: : patch

Created on 2023-06-30 01:23 by rouilj, last changed 2023-07-01 05:50 by rouilj.

Messages
msg7792 Author: [hidden] (rouilj) Date: 2023-06-30 01:23
Gabor Nagy reported:

 OperationalError: (1253, "COLLATION 'utf8mb3_bin' is not valid for CHARACTER SET 'utf8mb4'")

when using mysql. This appears to be a version issue with mysql and is a hint
that changes made in 2.2.0 and 2.0.0 to support other UTF8 charsets in mysql
is incomplete.

See email chain: https://sourceforge.net/p/roundup/mailman/roundup-users/thread/20230616213123.150b935f%40Dell/

TL;DR is to add a modification of this patch to use configurable
values:

diff --git a/back_mysql.py.bak b/back_mysql.py
index b0ec1c6..7b128a4 100644
--- a/back_mysql.py.bak
+++ b/back_mysql.py
@@ -92,9 +92,9 @@ def db_create(config):
     kwargs = connection_dict(config)
     conn = MySQLdb.connect(**kwargs)
     cursor = conn.cursor()
-    command = "CREATE DATABASE %s"%config.RDBMS_NAME
+    command = "CREATE DATABASE %s COLLATE utf8mb4_unicode_ci"%config.RDBMS_NAME
     if sys.version_info[0] > 2:
-        command += ' CHARACTER SET utf8'
+        command += ' CHARACTER SET utf8mb4'
     logging.info(command)
     cursor.execute(command)
     conn.commit()
@@ -625,7 +625,7 @@ class Database(rdbms_common.Database):
                 raise
 
 class MysqlClass:
-    case_sensitive_equal = 'COLLATE utf8_bin ='
+    case_sensitive_equal = 'COLLATE utf8mb4_bin ='
 
     # TODO: AFAIK its version dependent for MySQL
     supports_subselects = False

This will be deferred to 2.4.0 because 2.3.0 is already in beta and changing the
DB level at this point would delay 2.3.0. I will have more time to look at this in
August and hopefully more people will be able to test it before the 2.4.0 release.

Check the email thread for other issues with that patch and the possible requirement
to have a charset binary collation setting as it may not be possible to derive a suitable
collation from the charset.
msg7793 Author: [hidden] (ngaba) Date: 2023-06-30 09:13
The initial patch is just a worksforme proof (with my MariaDB, etc.).

I think it is not a good idea to hardwire these "utf8mb4" settings into back_mysql.py, because all users upgrading from an older Roundup version (with utf8mb3 MySQL database) will face with the cryptic exception 

OperationalError: (1253, "COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8mb3'")

if they want to use exact_match_spec; moreover, their mysql database will contain both utf8mb3 and utf8mb4 tables after a while (utf8mb4 tables will be created when user adds new classes to config.ini).

I think these utf8mb4 collations etc. should be read from config.ini.
The simplest way is to add config.ini options like
mysql_bin_collation = utf8mb4_bin
etc., or compute implicitly these from the value of mysql_charset. (The problem is that now the default mysql_charset is utf8mb4, so even this latter implementation is not backward compatible. So I would choose the new config.ini settings, that is more failsafe and user configurable.)

In addition, in Roundup 2.4.0 documentation, mysql users should be advised to convert their utf8mb3 tables to utf8mb4 and modify config.ini accordingly. (In my opinion, mysql should provide a clean documentation and a script or mysql command for doing the utfbm3->utf8mb4 conversion...)
msg7794 Author: [hidden] (ngaba) Date: 2023-06-30 09:27
Edit:
"utf8mb4 tables will be created when user adds new classes to config.ini"
I meant schema.py
msg7795 Author: [hidden] (rouilj) Date: 2023-07-01 05:50
Hi Gabor:

In message <1688116426.59.0.762927186513.issue2551282@roundup.psfhosted.org>,
Nagy Gabor writes:
>I think these utf8mb4 collations etc. should be read from config.ini.
>The simplest way is to add config.ini options like
>mysql_bin_collation = utf8mb4_bin
>etc., or compute implicitly these from the value of mysql_charset.

Agreed. I mentioned checking the email thread for specific collation
settings as well as the existing charset setting.

>In addition, in Roundup 2.4.0 documentation, mysql users should be
>advised to convert their utf8mb3 tables to utf8mb4 and modify
>config.ini accordingly. (In my opinion, mysql should provide a clean
>documentation and a script or mysql command for doing the
>utfbm3->utf8mb4 conversion...)

Agreed. This could be:

 1 an automatic migration similar to other schema upgrades run using
   the roundup-admin migrate command

 2 a separate roundup-admin comand to run SQL without having to
   install/resort to using the mysql(1) command.

I am leaning toward the second option because this isn't a schema
migration as much as an underlying change in the encoding of the same
schema. But I haven't given it a lot of thought.
History
Date User Action Args
2023-07-01 05:50:21rouiljsetmessages: + msg7795
2023-06-30 09:27:16ngabasetmessages: + msg7794
2023-06-30 09:13:46ngabasetmessages: + msg7793
2023-06-30 01:23:24rouiljsetresolution: remind
2023-06-30 01:23:15rouiljcreate