Roundup Tracker - Issues

Message7134

Author schlatterbeck
Recipients rawler, richard, rouilj, schlatterbeck, stefan
Date 2021-03-23.10:34:13
Message-id <20210323103411.z4l6lop3tqihfcp3@runtux.com>
In-reply-to <1616445437.42.0.697053476829.issue2550514@roundup.psfhosted.org>
On Mon, Mar 22, 2021 at 08:37:17PM +0000, John Rouillard wrote:
> 
> Ping was anything done with these patches? If not is anybody able to 
> update them to the current 2.0 code base and python3ize them with
> tests?

This is implemented with filter_iter a long time ago.

But it isn't used in the html framework. The idea of filter_iter is to
do the same as filter but as an iterator. So in each iteration it will
return a single ID (instead a full list of IDs). But: In addition it
internally fetches all attributes of a node, not just the ID and
populates the node-cache. In this way we will have a single query that
does the same as the proposed getnodes() in the issue: Fetch the whole
row. But we're doing it in an iterator which can conserve a lot of
memory when the query is large (especially since we're not doing a
fetchall of the resulting query data).

So instead of

for id in db.someclass.filter(...):
    n = db.someclass.getnode(id)
    # do domething with node n

(which will perform an additional sql query for each getnode)

using 
for id in db.someclass.filter_iter(...):
    n = db.someclass.getnode(id)
    # do domething with node n

will perform only a single query at the start. The getnode call will hit
the cache. In addition it will not slurp in the whole query into the
memory like the proposed getnodes would (I had proposed to make getnodes
an iterator in msg3956 in that issue).

Note that the original analysis that this will perform 2*n+1 sql queries
is wrong, because the first access to a node will put the node in the
cache (which still leaves n + 1 sql queries :-)

So the only thing that remains to be done is to implement this in
LinkHTMLProperty.menu and probably in more methods of the HTML wrapper
classes. Note that most use-cases of filter fit a pattern where filter
can be replaced with filter_iter.

The only exception is sorting by Multilinks: This cannot be done in SQL
alone and is therefore not implemented in filter_iter. This is something
that should be deprecated anyway.

And: If the code in the loop uses Multilinks, these *are* fetched in
additional SQL queries (Multilinks are fetched lazily).

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
History
Date User Action Args
2021-03-23 10:34:14schlatterbecksetrecipients: + schlatterbeck, richard, stefan, rouilj, rawler
2021-03-23 10:34:14schlatterbecklinkissue2550514 messages
2021-03-23 10:34:13schlatterbeckcreate