Roundup Tracker - Issues

Issue 2550750

classification
Title: Handling of socket timeouts
Type: behavior Severity: normal
Components: Web interface Versions: devel
process
Status: new Resolution:
Dependencies: Superseder:
Assigned To: rouilj Nosy List: ber, jerrykan, joseph_myers, rouilj
Priority: normal Keywords: patch

Created on 2012-03-18 21:30 by joseph_myers, last changed 2018-08-01 10:13 by joseph_myers.

Files
File name Uploaded Description Edit Remove
no-timeout-mail-patch joseph_myers, 2012-03-18 21:30 patch to socket error handling
Messages
msg4518 Author: [hidden] (joseph_myers) Date: 2012-03-18 21:30
Roundup tries to detect certain network errors and not report them to
the tracker admin (roundup/cgi/client.py, IGNORE_NET_ERRORS).  This does
not work reliably for timeouts, which produce the socket.timeout
exception rather than socket.error with ETIMEDOUT.

The attached patch deals with this by checking for socket.timeout before
socket.error.
msg4519 Author: [hidden] (ber) Date: 2012-03-19 08:28
Joseph,
thanks for the report and the patch.
Next step would be to get another person to confirm both.

Do you know an easy way to reproduce the issue for testing?
Bernhard
msg4520 Author: [hidden] (joseph_myers) Date: 2012-03-19 09:10
I don't have a way of reproducing timeouts on demand.  I generally 
observed the timeout exceptions being reported for large static images 
served through Roundup (and empirically observed that the exception emails 
did not appear for a month of having the patch installed, having been 
quite frequent before then).
msg5596 Author: [hidden] (rouilj) Date: 2016-06-12 00:36
I tried a few ways to get the timeout to trigger (suspending wget in the
middle of an attachment or page download etc.) with no luck.

Per:

    https://docs.python.org/2/library/socket.html

=====
 exception socket.timeout
    This exception is raised when a timeout occurs on a socket which has
had timeouts enabled via a prior call to settimeout(). The accompanying
value is a string whose value is currently always “timed out”.
[...]
Some notes on socket blocking and timeouts: A socket object can be in
one of three modes: blocking, non-blocking, or timeout. Sockets are
always created in blocking mode. In blocking mode, operations block
until complete or the system returns an error (such as connection timed
out). In non-blocking mode, 
====

I can't find a call to settimeout anywhere in the roundup code. The fact
that blocking mode can return a timeout seems well wrong.

However checking out the code in hg as of: 2012-08-27 also shows no
settimeout, so .... However settimeout was new in python 2.3. I don't
know if something changed in python that would make this patch obsolete.
msg5597 Author: [hidden] (joseph_myers) Date: 2016-06-12 00:47
The timeout is from the socket.setdefaulttimeout(60) call in 
roundup_server.py.
msg5599 Author: [hidden] (rouilj) Date: 2016-06-12 01:43
Thanks to Joseph's pointer I changed the default timeout to 0.05 and
ran demo.py. I also added print statements to all the exception code
near the patch.

When downloading a 25MB file (loaded prior to changing the timeout)

I now see:

  In handler socket.error
  timed out
  raising exception

indicating that the timeout exception is re-raised by the existing
code. The "timed out" is from printing the exception (print err).

I applied patch (with a mod to define err) and I get:

  In handler socket.timeout
  timed out

Note that I am not seeing any attempt to send email in either case
(the tracker is configured with MAIL_DEBUG=log.email.txt). So I am not
sure if something else is broken. But if raising the socket.error does
generate email, this patch does seem to stop it.

Bernhard is this a good enough investigation to let the patch proceed?

-- rouilj
msg5600 Author: [hidden] (ber) Date: 2016-06-13 08:23
Hi John,
good work on succeeding to get this reproduced at your side.

My knowledge on how this networking should work regarding
timeouts is limited. If you are saying that ignoring this timeout
is fine, please commit. My suggestion is to add a comment 
with a bit of explanation why it is the wanted behaviour.

Thanks,
Bernhard
msg6151 Author: [hidden] (joseph_myers) Date: 2018-07-31 23:29
There was some inconclusive discussion of this patch two years ago.  Is
it OK to push with an appropriate CHANGES.txt entry?
msg6152 Author: [hidden] (rouilj) Date: 2018-08-01 00:13
Hi Joseph:

In message <1533079791.45.0.56676864532.issue2550750@psf.upfronthosting.co.za>,
Joseph Myers writes:
>Joseph Myers added the comment:
>
>There was some inconclusive discussion of this patch two years ago.  Is
>it OK to push with an appropriate CHANGES.txt entry?

The problem as I see it is client had a failure to download
something. The client may never be able to download it because it
takes longer than the defaultimeout to send it. With your patch, this
failure is invisible to the admin.

My memory is hazy, but I think timeout was while sending the response
to a request. So it was large files, long tickets etc. especially on
slow links, low power devices (e.g. cell phones). 

Ideally when the error is raised the server should restart/continue
the send using a larger timeout value up to some maximum. But I am
not sure that is possible.

IIRC from my socket programming decades ago what is probably happening
under the hood is the socket is in non-blocking mode and EAGAIN or
EWOULDBLOCK is being returned. On each return, some timer is checked.
If the timer is exceeded it raises timeout, if not more data is sent.

I think we would need to operate at the EAGAIN or EWOULDBLOCK
level to continue the transfer. But my guess is the socket is dead
by the time we see the timeout.

I think setting a larger timeout (using settimeout()) before the reply
is sent to the client is what is needed. So we have different timeouts
for the accept, request, response and close parts of the client
communication.

If we can't do the ideal thing, would making the defaultimeout value
settable in config.ini be a better way to handle this. That way the
admin can try to avoid the error by increasing the timeout. This has
the downside of making a DOS easier. With this the admin will still
get a notice if the timeout is exceeded.

To go with the settable defaultimeout value, (now that random() works
8-)) only send email occasionally (say 10% of the time) but log every
failure to the log file.

One thing that would be useful, if it is not already in the code, is
to report the url that caused the timeout. I am not sure if it's
available though.

In any case that's why I have your patch shelved in my development
repo and never pushed it.

Does that clarify my concerns and possibly better ways to handle this
issue?
msg6153 Author: [hidden] (joseph_myers) Date: 2018-08-01 00:26
My underlying expectation here is that the admin should get email when
there is a bug in Roundup or the instance code, but not simply for
attempts to access the instance by a client on a bad connection, or by a
misbehaving client.  (There are lots of robots and other clients out
there doing things that a proper browser implementation of HTTP, on a
good connection, following valid links and making valid form submissions
only, would never do, and so there can be unhelpfully many such emails
when a robot does the same problematic thing on many pages.  I have a
few other Roundup backtraces to investigate that may also relate to
clients doing dubious things.)

I do happen to have an unsubmitted local patch to increase the default
timeout from a hardcoded 60 seconds to a hardcoded 300 seconds
(essentially an attempt to reduce the frequency of the errors in the
present issue from before I wrote the patch attached to this issue, I
think).
msg6154 Author: [hidden] (rouilj) Date: 2018-08-01 01:23
Hi Joseph:

In message <1533083205.99.0.56676864532.issue2550750@psf.upfronthosting.co.za>,
Joseph Myers writes:
>Joseph Myers added the comment:
>My underlying expectation here is that the admin should get email when
>there is a bug in Roundup or the instance code, but not simply for
>attempts to access the instance by a client on a bad connection, or by a
>misbehaving client.  (There are lots of robots and other clients out
>there doing things that a proper browser implementation of HTTP, on a
>good connection, following valid links and making valid form submissions
>only, would never do, and so there can be unhelpfully many such emails
>when a robot does the same problematic thing on many pages.  I have a
>few other Roundup backtraces to investigate that may also relate to
>clients doing dubious things.)

Ah ok, I understand your point. My expectation is obviously different.

How about amending your patch with a check to see if the option
"IgnoreSocketTimeout" is set. If it is set to true, block the
exception from being propigated (as your patch does). If the setting
is false, re-assert the exception letting the normal roundup error
handler take effect.

Also I would make sure that there is a warning level log message
emitted if the email is suppressed. That way no email is sent but a
trace is left to give the admin a clue when somebody complains.

>I do happen to have an unsubmitted local patch to increase the default
>timeout from a hardcoded 60 seconds to a hardcoded 300 seconds
>(essentially an attempt to reduce the frequency of the errors in the
>present issue from before I wrote the patch attached to this issue, I
>think).

I think a minute timeout should be good in the majority of cases. It
would be nice to make that configurable rather than just raising the
timeout as that makes it easier to run the server out of resources.
msg6156 Author: [hidden] (joseph_myers) Date: 2018-08-01 10:13
If you have such an option to control timeout handling, it raises more
questions.  The current code has:

    # list of network error codes that shouldn't be reported to tracker
admin
    # (error descriptions from FreeBSD intro(2))
    IGNORE_NET_ERRORS = (
        # A write on a pipe, socket or FIFO for which there is
        # no process to read the data.
        errno.EPIPE,
        # A connection was forcibly closed by a peer.
        # This normally results from a loss of the connection
        # on the remote socket due to a timeout or a reboot.
        errno.ECONNRESET,
        # Software caused connection abort.  A connection abort
        # was caused internal to your host machine.
        errno.ECONNABORTED,
        # A connect or send request failed because the connected party
        # did not properly respond after a period of time.
        errno.ETIMEDOUT,
    )

[...]

        try:
            call(*args, **kwargs)
        except socket.error as err:
            err_errno = getattr (err, 'errno', None)
            if err_errno is None:
                try:
                    err_errno = err[0]
                except TypeError:
                    pass
            if err_errno not in self.IGNORE_NET_ERRORS:
                raise
        except IOError:
            # Apache's mod_python will raise IOError -- without an
            # accompanying errno -- when a write to the client fails.
            # A common case is that the client has closed the
            # connection.  There's no way to be certain that this is
            # the situation that has occurred here, but that is the
            # most likely case.
            pass

So the existing expectation is that all those errors (including a
timeout if it somehow ends up as socket.error with ETIMEDOUT) are
ignored.  Does such an option also result in ETIMEDOUT being handled
differently?  Does it affect the other forms of socket errors or should
those have another option to affect them?  In the mod_python case, is it
possible to distinguish timeouts anyway or will they just go through the
generic IOError case?
History
Date User Action Args
2018-08-01 10:13:50joseph_myerssetmessages: + msg6156
2018-08-01 01:23:51rouiljsetmessages: + msg6154
2018-08-01 00:26:45joseph_myerssetmessages: + msg6153
2018-08-01 00:13:17rouiljsetmessages: + msg6152
2018-07-31 23:29:51joseph_myerssetmessages: + msg6151
2016-06-19 03:25:39rouiljsetassignee: rouilj
2016-06-13 08:23:48bersetmessages: + msg5600
2016-06-12 01:43:11rouiljsetmessages: + msg5599
2016-06-12 00:47:38joseph_myerssetmessages: + msg5597
2016-06-12 00:36:42rouiljsetnosy: + rouilj
messages: + msg5596
2012-08-28 03:48:08jerrykansetnosy: + jerrykan
2012-03-19 09:10:28joseph_myerssetmessages: + msg4520
2012-03-19 08:28:54bersetpriority: normal
nosy: + ber
messages: + msg4519
keywords: + patch
2012-03-18 21:30:57joseph_myerscreate