Created on 2023-08-02 21:22 by rouilj, last changed 2023-08-02 21:22 by rouilj.
|Author: [hidden] (rouilj)
|Date: 2023-08-02 21:22
A GET operation on the rest API endpoints /rest/data/<class>/<id> and /rest/data/<class>/<id>/<property> return an ETag header for the object referenced by the <id>. However /rest/data/<class>, /rest/summary and /rest/data do not have any cache headers at all. This makes them unfriendly to caching servers as well as local browser caches. For the /rest/data endpoint, only a schema change will invalidate cached data. Supplementary classes: status, user, keyword are rarely changed compared to the primary classes of issue, msg, file, query. So supplementary classes would benefit from a longer cache time. It makes sense that /rest/summary may not have a cache time as it reports the status of latest issues. But even here a 5 minute window or something based on the time since the last change of any issue would be reasonable to prevent hits on the database. Because PUT/POST is not allowed to these endpoints, a long cache time will not result in a lost update problem. However for primary classes it could result in an incomplete picture of the available data. Would adding caching directives for these endpoints be useful for reducing database load? I don't know how often they are queried but I would expect an index page for issues would be queried often. Will caching these cause more issues with cache invalidation? If so, should we use a must-revalidate or a no-cache/no-store directive on these endpoints? Assuming the data can be cached, how to specify/determine a maxage time per collection/special endpoint? Because roundup isn't always a long lived process, we need to store dynamic cache info somewhere. Would (ab)using the session database to store cache time data like: API-CACHE-/rest/msg = [302400, 604800, 604800, 604800, 604800, 604800, 1691008630] [ current max-age, interval (in sec) for 5th last change, 4th last change, ..., last change, timestamp in sec of last change ] where the middle N (N<=5) numbers is an interval in seconds where there was no change in the underlying class. In this case, there was one message added exactly one week apart. The last number is the timestamp in seconds (UTC timezone) of the last change. The current max-age is calculated from this list using (for example) (1/2 is some random magic number): * 1/2 the smallest value * 1/2 the median value Because of the effect of large outliers on average values, I don't think 1/2 the average is a good metric. I included the max-age when the update is done so it's not calculated while the client is waiting. I assume this will be read more than it is written. For /rest/data, looking at the schema file might work but if the schema is built from imported files, this will fail to capture changes to the imported files that would change the schema. Also checking the file date on every request will be expensive. For those using the rest interface, do you have suggestions on how important this is? How does your client code cache this info?