Roundup Tracker - Issues

Issue 2551288

Missing cache headers for REST collection and special endpoints
Type: behavior Severity: normal
Components: API Versions:
Status: new
: : rouilj
Priority: : rest

Created on 2023-08-02 21:22 by rouilj, last changed 2023-08-02 21:22 by rouilj.

msg7821 Author: [hidden] (rouilj) Date: 2023-08-02 21:22
A GET operation on the rest API endpoints /rest/data/<class>/<id> and 
/rest/data/<class>/<id>/<property> return an ETag header for the object
referenced by the <id>.

However /rest/data/<class>, /rest/summary and /rest/data do not have any cache headers
at all. This makes them unfriendly to caching servers as well as local browser caches.

For the /rest/data endpoint, only a schema change will invalidate cached

Supplementary classes: status, user, keyword are rarely changed
compared to the primary classes of issue, msg, file, query. So supplementary
classes would benefit from a longer cache time.

It makes sense that /rest/summary may not have a cache time as it reports the status of
latest issues. But even here a 5 minute window or something based on the time since the 
last change of any issue would be reasonable to prevent hits on the database.

Because PUT/POST is not allowed to these endpoints, a long cache time will
not result in a lost update problem. However for primary classes it could
result in an incomplete picture of the available data.

Would adding caching directives for these endpoints be useful for reducing database load?
I don't know how often they are queried but I would expect an index page for issues would be 
queried often.

Will caching these cause more issues with cache invalidation? If so, should we use a
must-revalidate or a no-cache/no-store directive on these endpoints?

Assuming the data can be cached, how to specify/determine a maxage time per
collection/special endpoint? Because roundup isn't always a long
lived process, we need to store dynamic cache info somewhere.

Would (ab)using the session database to store cache time data like:

  API-CACHE-/rest/msg = [302400, 604800, 604800, 604800, 604800, 604800, 1691008630]

    [ current max-age,
      interval (in sec) for 5th last change,
                            4th last change,
                            last change,
      timestamp in sec of last change ]

where the middle N (N<=5) numbers is an interval in seconds where there was no change in the 
underlying class. In this case, there was one message added exactly one week apart.
The last number is the timestamp in seconds (UTC timezone) of the last change.
The current max-age is calculated from this list using (for example) (1/2 is some
random magic number):

  * 1/2 the smallest value
  * 1/2 the median value

Because of the effect of large outliers on average values, I don't think 1/2 the average
is a good metric. I included the max-age when the update is done so it's not calculated while 
the client is waiting. I assume this will be read more than it is written.
For /rest/data, looking at the schema file might work but if the schema is built
from imported files, this will fail to capture changes to the imported files that
would change the schema. Also checking the file date on every request will be expensive.

For those using the rest interface, do you have suggestions on how important this is?
How does your client code cache this info?
Date User Action Args
2023-08-02 21:22:46rouiljcreate