Roundup Tracker - Issues

Message7084

Author schlatterbeck
Recipients rouilj, schlatterbeck
Date 2021-03-03.11:12:26
Message-id <1614769946.78.0.635031689931.issue2551114@roundup.psfhosted.org>
In-reply-to
I'm experiencing memory problems with a PostgreSQL backend, especially 
when using filter_iter instead of filter.
This is now traced to psycopg by default using client-side cursors: 

https://www.psycopg.org/docs/usage.html#server-side-cursors

These consume the whole query-result at the client resulting in huge 
roundup client processes (in our case WSGI but this can be reproduced 
with the built-in standalone roundup server).

We typically get a memory consumption of several GB with only 15 WSGI processes. In single cases with filter_iter (where a *large* result-set was obtained) we got a single WSGI process of several GB or even an out-of-memory condition. The roundup server in that case already has 32GB of memory.

For some time we have turned off that WSGI processes are restarted every 
30 requests. Since that change our WSGI processes have grown *a lot*. 
This is because these do not return memory once consumed to the 
operating system. So once a single large query is executed the WSGI 
process stays big.

The fix would be simple: psycopg implements server-side cursors as named cursors, so every call to connection.cursor() would need a name argument. Since filter calls are not executed in parallel (even a threading implementation in a web-server which is not recommended for python anyway uses separate connections for different threads) this change can be implemented with a fixed name per filter or filter_iter call.

What do you think: Since this only affects postgres (mysql seems to have non-buffered cursors by default where the result is never fully materialized at the client) this should be straightforward to implement.

Should we make this configurable (for postgres backend only)? If yes, should the server-side cursor be the default?

My opinion is that it should be configurable and that the server-side (new implementation) should be the default.
History
Date User Action Args
2021-03-03 11:12:26schlatterbecksetrecipients: + schlatterbeck, rouilj
2021-03-03 11:12:26schlatterbecksetmessageid: <1614769946.78.0.635031689931.issue2551114@roundup.psfhosted.org>
2021-03-03 11:12:26schlatterbecklinkissue2551114 messages
2021-03-03 11:12:26schlatterbeckcreate