Roundup Tracker - Issues

Issue 2551190

classification
Allow roundup-admin reindex to work in batches
Type: rfe Severity: normal
Components: Database, Command-line interface Versions: 2.3.0
process
Status: fixed fixed
:
: rouilj : rouilj
Priority: :

Created on 2022-01-28 21:59 by rouilj, last changed 2023-05-24 17:24 by rouilj.

Messages
msg7445 Author: [hidden] (rouilj) Date: 2022-01-28 21:59
Reindexing the database can take a while. It can require taking
the tracker offline.

Provide some way to incrementally reindex classes/items in the database. Consider
allowing the roundup-admin reindex command to take a class name and  range of numbers.

roundup-admin> reindex issues 1-10000
......
roundup-admin> reindex msgs 1-10000
....
roundup-admin> reindex files 1-10000

will reindex the fields in the first 10000 issues, the content in the first 10000 messages
and the same for files. These can be run overnight or during a long weekend. By limiting
the size, the reindexing can be scheduled in windows rather than starting it and
letting it run as long as it has to run.

Concerns:

Should there be a way to eliminate classes from reindex?

  "reindex" reindexes everything, but if I need to do incremental reindexing,
  I would never use reindex (as it would index the classes I want to incremental index
  as well). So I either need to scour the schema and identify every class in the full
  text index and increment every class using "reindex class" or "reindex class 1-<index
  of last element in class>". Can reindex accept a list of classes to skip indexing:

     reindex -files -msgs -issues

  to reindex all classes except these three. Then these three will be incrementally indexed.

Should it be possible to index using a different indexer from the one in config.ini?

   One use case for reindexing is to change the indexer. It would be good to keep
   the current indexing active while building/loading the new indexer.

   Setting indexing backend (and options like stoplist, indexer_language) could be
   another use for an options/pragma setting in roundup-admin see issue2551103.
msg7452 Author: [hidden] (rouilj) Date: 2022-02-08 04:03
Also reindex in general may be optimized better in some backends.

https://whoosh.readthedocs.io/en/latest/batch.html

https://xapian.org/docs/bindings/python/xapian.html#xapian.WritableDatabase.commit

for tuning commits. probably should be tuned for a reindex.
msg7757 Author: [hidden] (rouilj) Date: 2023-04-13 01:15
changeset:   7252:9c067ed4568b
added pragma command to roundup-admin with basic settings.
Fixing this issue using pragma (formerly options) is TBD.
msg7770 Author: [hidden] (rouilj) Date: 2023-05-24 17:24
Implemented in:

changeset:   7395:312d52305583

with a change in:

changeset:   7393:9e612a39547a

to make reporting of missing items in indexing range get reported when using anydbm,
History
Date User Action Args
2023-05-24 17:24:42rouiljsetstatus: new -> fixed
assignee: rouilj
resolution: remind -> fixed
messages: + msg7770
versions: + 2.3.0
2023-04-13 01:15:03rouiljsetmessages: + msg7757
2022-09-12 22:32:08rouiljsetresolution: remind
2022-07-09 22:25:00rouiljsettitle: Allow reindex to work in batches -> Allow roundup-admin reindex to work in batches
2022-02-08 04:03:35rouiljsetmessages: + msg7452
2022-01-28 21:59:35rouiljcreate