Issue 2551036: Support rate limiting in REST interface - Roundup tracker

classification

Title:	Support rate limiting in REST interface
Type:	resource usage	Severity:	normal
Components:	Web interface, API	Versions:	devel

process

Status:	fixed	Resolution:	fixed
Dependencies		Superseder:	Support rate limiting in XMLRPC interface. View: 2551361
Assigned To:	rouilj	Nosy List:	rouilj, schlatterbeck
Priority:	normal	Keywords:	rest

Created on 2019-03-24 01:54 by rouilj, last changed 2024-07-17 23:42 by rouilj.

Messages
msg6422	Author: [hidden] (rouilj)	Date: 2019-03-24 01:54
At some point we should add a mechanism to limit REST and xmlrpc api use. This has no impact on the API returned from the call. The additional headers will simply be ignored by the client. Suggest implementing the following headers: X-Rate-Limit-Limit: number of allowed requests in the current period (integer) X-Rate-Limit-Remaining: number of remaining requests in the current period (integer) X-Rate-Limit-Reset: number of seconds left in the current period (integer) See: https://developer.twitter.com/en/docs/basics/rate-limiting.html for examples. If more than X-Rate-Limit-Limit requests are done in the period, return HTTP 429 “Too Many Requests” and a json/xml error document like: { "error": { "msg": "Rate limit exceeded. More than X requests in Y seconds.", "status": 429 } } The data elements associated with this function are: period: say 10 minutes request limit: 100 requests current-request count: must be less than request limit period start: now() - this value > period, reset current-request count These should be associated with a user in roundup. Initial thought, period and request limit are values added to the user object/schema. The user should not be able to edit them. current-request count and period start could be associated with the session database (I think it has timeouts on session which would be ideal for this). Or it could be stored in fields in the user object.
msg6472	Author: [hidden] (rouilj)	Date: 2019-05-12 21:20
I have a POC implementation to rate limit the rest interface using the GCRA implementation used for issue2550949. It returns the standard headers: X-RateLimit-Remaining: 55 X-RateLimit-Reset: 3580.955768 X-RateLimit-Limit: 60 plus a non-standard header that seems to be useful: X-RateLimit-Limit-Period: 3600 to provide the period for the X-RateLimit-Limit. If the user has X-RateLimit-Remaining of 0 (i.e. they tripped the rate limiter and their request was discarded with a 429) it also returns a: Retry-After: 60 header which is the number of seconds to wait before an api call will be available to used. Not waiting that long will continue to result in a 429 "Too Many Requests" response. Also in the 429 case, it returns the payload: { "error": { "source": "ApiRateLimiter", "status": 429, "msg": "Api rate limits exceeded. Please wait: 60 seconds." } } There are two new config settings: api_calls_per_interval default 3600 api_interval_in_sec default 3600 (1 hour) The examples above were done with api_calls_per_interval set to 60. The code is written so that a tracker admin can replace the method that generates the rate limiter. Rather rather than using the config parameters, values could be stored in the user object in the database for a modified method to use as the source of the data. This allows a commercial user of roundup to sell different tiers of service differing in rate limits. The POC is not checked in. There are performance issues under high load when storing the data needed to implement the GCRA.
msg6485	Author: [hidden] (rouilj)	Date: 2019-05-25 04:05
I have done some work on this but it still has some caveats. I am going to commit this, disabled by default, to get it wider circulation and others to eval it. Changes: backends/sessions_dbm.py: The open method will try to open the session database 15 times with a .01 second delay between trials. If it runs out of trials, it gives up and returns an error to the client. This mitigates the exception: _gdbm.error: [Errno 11] Resource temporarily unavailable that was raised under high load. rest.py: added getRateLimit method to RestfulInstance which returns a RateLimit object or None. It uses two config parameters: WEB_API_CALLS_PER_INTERVAL and WEB_API_INTERVAL_IN_SEC. If either is set to 0 rate limiting is disabled. It was added as a method to allow a tracker admin to override the simple same rate limit for each user and create a replacement that can retreive rate limits on a per user basis from the user object. Added rate limiting code to the top of dispatch method. Data that must persist for the user using the rest interface is saved to the session dbm backed database. configuraton.py: Gets the new config vars and explanations. Default for api_calls_per_interval in the web section is 0, so this disables rate limiting. doc/rest.txt: describes the settings and how they affect rate limiting burst and recovery rates. Also discusses that there are issues with the current implementation. I have run 50 parallel curl operations against a roundup-server instance. In 5 trials I got a db error in 12 cases. So the dbm retry code cut that from 50% failure rate. However for the rate limit I was testing with, I should have gotten more 429 (rate limit exceeded) errors. It looks like I lost about 5-10% of the updates from the database. Hence it took more than the expected number of successful connections before the rate limits kicked in. I am not quite sure why this was happening. Locking failure on the session db could explain it, but I can't prove it. Logging did report that I was retreiving the same timestamp from the session databases for multiple connections, but that could be some other issue (failure to invalidate a cache) in the code as well. Even in this state, I think it will limit bad actors although not with the precision I had hoped.
msg6486	Author: [hidden] (rouilj)	Date: 2019-05-26 17:43
Committed as: rev5732:0e6ed3d72f92 Fixes to testing for the rest code are in rev5733:62bdcb874433 and rev5734:5cd9ac3daed7. These patches attempt to test the retry code that wraps the opening of an existing database. All of the tests for otk/session db create a new database during the test. The session/otk db class is never reloaded. So the code path that runs is always the one that creates and opens the database and never the one that opens an existing database. I haven't figured out how to cause the open call to fail to trigger the exception handling routines. I could use a pointer from somebody with better python fu than I have.
msg7631	Author: [hidden] (rouilj)	Date: 2022-08-04 22:08
Implemented using redis as session/otks store. msg7630 on issue2551140. summary: failure rates in the 0.6% range or less.
msg7869	Author: [hidden] (rouilj)	Date: 2023-12-11 00:29
note XMLRPC is not implemented yet. REST is complete.
msg8117	Author: [hidden] (rouilj)	Date: 2024-07-17 23:42
I'm changing the scope to REST only on this and opening a new one for XMLRPC (issue 2551361). I'm also closing this one as it's complete.

History
Date	User	Action	Args
2024-07-17 23:42:08	rouilj	set	status: open -> fixed superseder: Support rate limiting in XMLRPC interface. resolution: fixed messages: + msg8117 title: Support rate limiting in REST (and xmlrpc) interfaces -> Support rate limiting in REST interface
2023-12-11 00:29:55	rouilj	set	messages: + msg7869 components: + API
2022-08-04 22:08:39	rouilj	set	messages: + msg7631
2019-05-26 17:43:13	rouilj	set	messages: + msg6486
2019-05-25 04:05:18	rouilj	set	status: new -> open assignee: rouilj messages: + msg6485
2019-05-12 21:20:07	rouilj	set	messages: + msg6472
2019-03-24 01:54:30	rouilj	create