Roundup Tracker - Issues

Issue 2551115

classification
Use utf8mb4 as a default for MySQL instead of utf8
Type: rfe Severity: normal
Components: Documentation, Database Versions: devel
process
Status: pending
:
: rouilj : rouilj, schlatterbeck, whunger
Priority: :

Created on 2021-03-10 14:51 by whunger, last changed 2021-03-23 07:21 by schlatterbeck.

Messages
msg7090 Author: [hidden] (whunger) Date: 2021-03-10 14:51
I just stumbled over the upgrading instructions to 2.x for MySQL users (https://www.roundup-tracker.org/docs/upgrading.html#python-2-mysql-users-must-read), to which i would like to propose an improvement.

I would not advise to set the MySQL database encoding to utf, but to utf8mb4 instead. Utf8 in MySQL is limited to use only up to 3 bytes per character, and thus cannot store all characters that you would expect maybe. Especially Emojis will not work, which is a pity, as these have a high probability of arriving through the Email interface :-)

https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql
msg7094 Author: [hidden] (rouilj) Date: 2021-03-10 15:36
Hi Werner:

In message <1615387874.91.0.944369845492.issue2551115@roundup.psfhosted.org>,
Werner Hunger writes:
>I just stumbled over the upgrading instructions to 2.x for MySQL
>users
>(https://www.roundup-tracker.org/docs/upgrading.html#python-2-mysql-users-must-read),

Have you successfully upgraded? Those docs were written with minimal if
any testing. Nobody on the core team runs mysql. Any feedback on those
docs would be appreciated.

>to which i would like to propose an improvement.
>
>I would not advise to set the MySQL database encoding to utf, but to
>utf8mb4 instead. Utf8 in MySQL is limited to use only up to 3 bytes
>per character, and thus cannot store all characters that you would
>expect maybe. Especially Emojis will not work, which is a pity, as
>these have a high probability of arriving through the Email interface

This makes sense. Nobody on the primary team runs mysql so we weren't
up on the nuances. I have updated the docs and changed the config
variable to use utf8mb4. The diffs are below:

diff -r 6a6b4651be1f doc/upgrading.txt
--- a/doc/upgrading.txt Thu Mar 04 12:55:21 2021 +0100
+++ b/doc/upgrading.txt Wed Mar 10 10:30:03 2021 -0500
@@ -77,10 +77,10 @@
 python3. See `Python 3 support`_.

 In an upgraded ``config.ini`` (see next section) the ``[rdbms]``
-section has a key ``mysql_charset`` set by default to ``utf-8``.
-
-It should be possible to change utf-8 to any mysql charset. So if you
-know what charset is enabled (e.g. via a setting in ~roundup/.my.cnf,
+section has a key ``mysql_charset`` set by default to ``utf8mb4``.
+
+It should be possible to change ``utf8mb4`` to any mysql charset. So
+if you know what charset is enabled (e.g. via a setting in ~roundup/.my.cnf,
 or the default charset for the database) you can set it in
 ``config.ini`` and not need to covert the database. However the
 underlying issues with misconverted data and bad searches will still
diff -r 6a6b4651be1f roundup/configuration.py
--- a/roundup/configuration.py  Thu Mar 04 12:55:21 2021 +0100
+++ b/roundup/configuration.py  Wed Mar 10 10:30:03 2021 -0500
@@ -1048,7 +1048,7 @@
         (NullableOption, 'read_default_group', 'roundup',
             "Name of the group to use in the MySQL defaults file (.my.cnf).\n"
             "Only used in MySQL connections."),
-        (Option, 'mysql_charset', 'utf8',
+        (Option, 'mysql_charset', 'utf8mb4',
             "Charset to use for mysql connection,\n"
             "use 'default' for the mysql default, no charset option\n"
             "is used when creating the connection in that case.\n"

committed in rev bd84f43e1d13.

Does this fix your request?
msg7121 Author: [hidden] (rouilj) Date: 2021-03-18 01:15
Hi Werner, does my change address your issue?

-- rouilj
msg7130 Author: [hidden] (whunger) Date: 2021-03-22 19:38
Sorry for answering so late.

I think your changes are correct as such, but i'm not sure about
back_mysql in total, and i did not do any testing yet. db_create() for
example still contains "CHARACTER SET utf8" for "CREATE DATABASE", and
i'm not sure whether a proper collation definition should also be part
of database creation/configuration.

The problem is, our version of Roundup is a heavily patched 1.4.16 for
which the MySQL-dependent code differs substantially from the upstream
code, so i can't compare their behaviour without some effort. Our
version of the backend has been changed to work around the n+1 problem
for tree-like and other strongly linked data structures that we have
in our tracker, and that is now up with data from 2004 on (about 1.5
Mio messages). The original code (in combination with our page
templates) generates way too much database queries for one web request
- MySQL is not able to handle this load with decent performance.
Therefor we added support for bulk node prefetching with node id lists
anticipated from preceeding query results, among other things.

The reason i was looking into the current docs was to start estimating
whether it would be feasible to migrate and merge our backend changes
to Roundup 2.x - and thus gain Python 3 compatibility. If you'd like
to, i can keep you posted once we decide to upgrade and the merge has
been done.

Best regards - Werner.
--
PGP-Schl├╝ssel mit "Betreff: Send public key" abrufbar.
msg7131 Author: [hidden] (rouilj) Date: 2021-03-22 20:30
Hi Werner:

In message <20210322193839.Horde.X35YQqE4kZiE_wnnKiMRRjO@www.rxd.de>,
Werner Hunger writes:
>I think your changes are correct as such, but i'm not sure about
>back_mysql in total, and i did not do any testing yet. db_create() for
>example still contains "CHARACTER SET utf8" for "CREATE DATABASE", and
>i'm not sure whether a proper collation definition should also be part
>of database creation/configuration.

Understood. Would you recommend using the character set from the
config (mysql_charset) in the CREATE DATABASE?

>The problem is, our version of Roundup is a heavily patched 1.4.16 for
>which the MySQL-dependent code differs substantially from the upstream
>code, so i can't compare their behaviour without some effort. Our
>version of the backend has been changed to work around the n+1 problem
>for tree-like and other strongly linked data structures that we have
>in our tracker, and that is now up with data from 2004 on (about 1.5
>Mio messages). The original code (in combination with our page
>templates) generates way too much database queries for one web request
>- MySQL is not able to handle this load with decent performance.
>Therefor we added support for bulk node prefetching with node id lists
>anticipated from preceeding query results, among other things.

Hmm that sounds interesting. Might some of the ideas in:
https://issues.roundup-tracker.org/issue2550514 help reduce the need
for your changes?

>The reason i was looking into the current docs was to start estimating
>whether it would be feasible to migrate and merge our backend changes
>to Roundup 2.x - and thus gain Python 3 compatibility. If you'd like
>to, i can keep you posted once we decide to upgrade and the merge has
>been done.

I would very much like that. Also please consider joining and keeping
roundup-devel at lists.sourceforge.net in the loop.
msg7133 Author: [hidden] (schlatterbeck) Date: 2021-03-23 07:21
On Mon, Mar 22, 2021 at 07:38:41PM +0000, Werner Hunger wrote:
> Mio messages). The original code (in combination with our page
> templates) generates way too much database queries for one web request
> - MySQL is not able to handle this load with decent performance.
> Therefor we added support for bulk node prefetching with node id lists
> anticipated from preceeding query results, among other things.

There is a filter_iter instead of filter in back_anydbm (that also works
for mysql, at least it's tested :-)

For the typical loop

for id in db.someclass.filter(...):
    n = db.someclass.getnode(id)

This will not produce a database query for each getnode call when using
filter_iter instead of filter because filter_iter fetches the whole row
(not just the id) and pre-populates the cache in roundup. So the getnode
call already fetches from the cache.

This is not (yet?) used in the templates but it may speed up things for
some workloads.

> The reason i was looking into the current docs was to start estimating
> whether it would be feasible to migrate and merge our backend changes
> to Roundup 2.x - and thus gain Python 3 compatibility. If you'd like
> to, i can keep you posted once we decide to upgrade and the merge has
> been done.

Yes, I'd also like to know what you decide.

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
History
Date User Action Args
2021-03-23 07:21:56schlatterbecksetmessages: + msg7133
2021-03-22 20:30:44rouiljsetmessages: + msg7131
2021-03-22 19:38:41whungersetmessages: + msg7130
2021-03-18 01:15:11rouiljsetmessages: + msg7121
2021-03-12 22:42:22rouiljsetstatus: open -> pending
2021-03-11 07:56:19schlatterbecksetnosy: + schlatterbeck
2021-03-10 15:39:53rouiljsetstatus: new -> open
assignee: rouilj
2021-03-10 15:36:59rouiljsetnosy: + rouilj
messages: + msg7094
2021-03-10 14:51:14whungercreate