Message7821
A GET operation on the rest API endpoints /rest/data/<class>/<id> and
/rest/data/<class>/<id>/<property> return an ETag header for the object
referenced by the <id>.
However /rest/data/<class>, /rest/summary and /rest/data do not have any cache headers
at all. This makes them unfriendly to caching servers as well as local browser caches.
For the /rest/data endpoint, only a schema change will invalidate cached
data.
Supplementary classes: status, user, keyword are rarely changed
compared to the primary classes of issue, msg, file, query. So supplementary
classes would benefit from a longer cache time.
It makes sense that /rest/summary may not have a cache time as it reports the status of
latest issues. But even here a 5 minute window or something based on the time since the
last change of any issue would be reasonable to prevent hits on the database.
Because PUT/POST is not allowed to these endpoints, a long cache time will
not result in a lost update problem. However for primary classes it could
result in an incomplete picture of the available data.
Would adding caching directives for these endpoints be useful for reducing database load?
I don't know how often they are queried but I would expect an index page for issues would be
queried often.
Will caching these cause more issues with cache invalidation? If so, should we use a
must-revalidate or a no-cache/no-store directive on these endpoints?
Assuming the data can be cached, how to specify/determine a maxage time per
collection/special endpoint? Because roundup isn't always a long
lived process, we need to store dynamic cache info somewhere.
Would (ab)using the session database to store cache time data like:
API-CACHE-/rest/msg = [302400, 604800, 604800, 604800, 604800, 604800, 1691008630]
[ current max-age,
interval (in sec) for 5th last change,
4th last change,
...,
last change,
timestamp in sec of last change ]
where the middle N (N<=5) numbers is an interval in seconds where there was no change in the
underlying class. In this case, there was one message added exactly one week apart.
The last number is the timestamp in seconds (UTC timezone) of the last change.
The current max-age is calculated from this list using (for example) (1/2 is some
random magic number):
* 1/2 the smallest value
* 1/2 the median value
Because of the effect of large outliers on average values, I don't think 1/2 the average
is a good metric. I included the max-age when the update is done so it's not calculated while
the client is waiting. I assume this will be read more than it is written.
For /rest/data, looking at the schema file might work but if the schema is built
from imported files, this will fail to capture changes to the imported files that
would change the schema. Also checking the file date on every request will be expensive.
For those using the rest interface, do you have suggestions on how important this is?
How does your client code cache this info? |
|
Date |
User |
Action |
Args |
2023-08-02 21:22:46 | rouilj | set | recipients:
+ rouilj |
2023-08-02 21:22:46 | rouilj | set | messageid: <1691011366.79.0.106289169259.issue2551288@roundup.psfhosted.org> |
2023-08-02 21:22:46 | rouilj | link | issue2551288 messages |
2023-08-02 21:22:46 | rouilj | create | |
|