Roundup Tracker - Issues

Message6485

Author rouilj
Recipients rouilj, schlatterbeck
Date 2019-05-25.04:05:18
Message-id <1558757118.49.0.685847362722.issue2551036@roundup.psfhosted.org>
In-reply-to
I have done some work on this but it still has some caveats.

I am going to commit this, disabled by default, to get it wider
circulation and others to eval it.

Changes:

 backends/sessions_dbm.py: The open method will try to open the
    session database 15 times with a .01 second delay between trials.
    If it runs out of trials, it gives up and returns an error to the
    client. This mitigates the exception:
       _gdbm.error: [Errno 11] Resource temporarily unavailable
    that was raised under high load.

 rest.py: added getRateLimit method to RestfulInstance which returns a
    RateLimit object or None. It uses two config parameters:
    WEB_API_CALLS_PER_INTERVAL and WEB_API_INTERVAL_IN_SEC.  If either
    is set to 0 rate limiting is disabled.  It was added as a method
    to allow a tracker admin to override the simple same rate limit
    for each user and create a replacement that can retreive rate
    limits on a per user basis from the user object.

    Added rate limiting code to the top of dispatch method.
    Data that must persist for the user using the rest
    interface is saved to the session dbm backed database.

 configuraton.py: Gets the new config vars and explanations.
    Default for api_calls_per_interval in the web section is 0,
    so this disables rate limiting.

 doc/rest.txt: describes the settings and how they affect rate limiting
    burst and recovery rates. Also discusses that there are issues with
    the current implementation.

I have run 50 parallel curl operations against a roundup-server
instance. In 5 trials I got a db error in 12 cases.  So the dbm retry
code cut that from 50% failure rate.

However for the rate limit I was testing with, I should have gotten
more 429 (rate limit exceeded) errors. It looks like I lost about
5-10% of the updates from the database. Hence it took more than the
expected number of successful connections before the rate limits
kicked in. I am not quite sure why this was happening. Locking failure
on the session db could explain it, but I can't prove it.

Logging did report that I was retreiving the same timestamp from the
session databases for multiple connections, but that could be some
other issue (failure to invalidate a cache) in the code as well.

Even in this state, I think it will limit bad actors although not with
the precision I had hoped.
History
Date User Action Args
2019-05-25 04:05:18rouiljsetmessageid: <1558757118.49.0.685847362722.issue2551036@roundup.psfhosted.org>
2019-05-25 04:05:18rouiljsetrecipients: + rouilj, schlatterbeck
2019-05-25 04:05:18rouiljlinkissue2551036 messages
2019-05-25 04:05:18rouiljcreate