Roundup Tracker - Issues

Message5794

Author rouilj
Recipients rouilj, telsch
Date 2016-07-09.18:26:32
Message-id <1468088793.17.0.568721565788.issue2550743@psf.upfronthosting.co.za>
In-reply-to
Sorry this has taken so long to address.

Do you have large attached files that could be indexed?

In looking through the code (specifically
backends/indexer_rdbms.py::add_text, I can see getting
this error if you are indexing attachment contents.

It looks like it tries to insert all words in a document at a
single time using:

        # for each word, add an entry in the db
        sql = 'insert into __words (_word, _textid) values (%s, %s)'%(a, a)
        words = [(word, id) for word in words]
        self.db.cursor.executemany(sql, words)

While that could be split up into single execute commands or
smaller batches of X number of words, performance would suffer
as the python mysql driver generates a large insert with values
command to the server if I understand:

 https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-executemany.html

correctly.

We may accept a patch to make the number of inserts
configurable using a setting in the [rdbms] section of the
tracker config file. So if you develop one, please feel free
to open an issue and attach the patch. The default should
be an unlimited number of inserts. 

If you can't increase the packet size, maybe some other search
engine (in 1.6.0 we support whoosh as well as xapian) would work.

I have added this problem to the documentation in mysql.txt under the
header: Other Configuration so others can benefit from your knowledge.

Thanks for the report and sorry this took so long.

-- rouilj
History
Date User Action Args
2016-07-09 18:26:33rouiljsetmessageid: <1468088793.17.0.568721565788.issue2550743@psf.upfronthosting.co.za>
2016-07-09 18:26:33rouiljsetrecipients: + rouilj, telsch
2016-07-09 18:26:33rouiljlinkissue2550743 messages
2016-07-09 18:26:32rouiljcreate