Created on 2012-03-18 21:30 by joseph_myers, last changed 2023-06-07 03:24 by rouilj.
|no-timeout-mail-patch||joseph_myers, 2012-03-18 21:30||patch to socket error handling|
|msg4518||Author: [hidden] (joseph_myers)||Date: 2012-03-18 21:30|
Roundup tries to detect certain network errors and not report them to the tracker admin (roundup/cgi/client.py, IGNORE_NET_ERRORS). This does not work reliably for timeouts, which produce the socket.timeout exception rather than socket.error with ETIMEDOUT. The attached patch deals with this by checking for socket.timeout before socket.error.
|msg4519||Author: [hidden] (ber)||Date: 2012-03-19 08:28|
Joseph, thanks for the report and the patch. Next step would be to get another person to confirm both. Do you know an easy way to reproduce the issue for testing? Bernhard
|msg4520||Author: [hidden] (joseph_myers)||Date: 2012-03-19 09:10|
I don't have a way of reproducing timeouts on demand. I generally observed the timeout exceptions being reported for large static images served through Roundup (and empirically observed that the exception emails did not appear for a month of having the patch installed, having been quite frequent before then).
|msg5596||Author: [hidden] (rouilj)||Date: 2016-06-12 00:36|
I tried a few ways to get the timeout to trigger (suspending wget in the middle of an attachment or page download etc.) with no luck. Per: https://docs.python.org/2/library/socket.html ===== exception socket.timeout This exception is raised when a timeout occurs on a socket which has had timeouts enabled via a prior call to settimeout(). The accompanying value is a string whose value is currently always “timed out”. [...] Some notes on socket blocking and timeouts: A socket object can be in one of three modes: blocking, non-blocking, or timeout. Sockets are always created in blocking mode. In blocking mode, operations block until complete or the system returns an error (such as connection timed out). In non-blocking mode, ==== I can't find a call to settimeout anywhere in the roundup code. The fact that blocking mode can return a timeout seems well wrong. However checking out the code in hg as of: 2012-08-27 also shows no settimeout, so .... However settimeout was new in python 2.3. I don't know if something changed in python that would make this patch obsolete.
|msg5597||Author: [hidden] (joseph_myers)||Date: 2016-06-12 00:47|
The timeout is from the socket.setdefaulttimeout(60) call in roundup_server.py.
|msg5599||Author: [hidden] (rouilj)||Date: 2016-06-12 01:43|
Thanks to Joseph's pointer I changed the default timeout to 0.05 and ran demo.py. I also added print statements to all the exception code near the patch. When downloading a 25MB file (loaded prior to changing the timeout) I now see: In handler socket.error timed out raising exception indicating that the timeout exception is re-raised by the existing code. The "timed out" is from printing the exception (print err). I applied patch (with a mod to define err) and I get: In handler socket.timeout timed out Note that I am not seeing any attempt to send email in either case (the tracker is configured with MAIL_DEBUG=log.email.txt). So I am not sure if something else is broken. But if raising the socket.error does generate email, this patch does seem to stop it. Bernhard is this a good enough investigation to let the patch proceed? -- rouilj
|msg5600||Author: [hidden] (ber)||Date: 2016-06-13 08:23|
Hi John, good work on succeeding to get this reproduced at your side. My knowledge on how this networking should work regarding timeouts is limited. If you are saying that ignoring this timeout is fine, please commit. My suggestion is to add a comment with a bit of explanation why it is the wanted behaviour. Thanks, Bernhard
|msg6151||Author: [hidden] (joseph_myers)||Date: 2018-07-31 23:29|
There was some inconclusive discussion of this patch two years ago. Is it OK to push with an appropriate CHANGES.txt entry?
|msg6152||Author: [hidden] (rouilj)||Date: 2018-08-01 00:13|
Hi Joseph: In message <firstname.lastname@example.org>, Joseph Myers writes: >Joseph Myers added the comment: > >There was some inconclusive discussion of this patch two years ago. Is >it OK to push with an appropriate CHANGES.txt entry? The problem as I see it is client had a failure to download something. The client may never be able to download it because it takes longer than the defaultimeout to send it. With your patch, this failure is invisible to the admin. My memory is hazy, but I think timeout was while sending the response to a request. So it was large files, long tickets etc. especially on slow links, low power devices (e.g. cell phones). Ideally when the error is raised the server should restart/continue the send using a larger timeout value up to some maximum. But I am not sure that is possible. IIRC from my socket programming decades ago what is probably happening under the hood is the socket is in non-blocking mode and EAGAIN or EWOULDBLOCK is being returned. On each return, some timer is checked. If the timer is exceeded it raises timeout, if not more data is sent. I think we would need to operate at the EAGAIN or EWOULDBLOCK level to continue the transfer. But my guess is the socket is dead by the time we see the timeout. I think setting a larger timeout (using settimeout()) before the reply is sent to the client is what is needed. So we have different timeouts for the accept, request, response and close parts of the client communication. If we can't do the ideal thing, would making the defaultimeout value settable in config.ini be a better way to handle this. That way the admin can try to avoid the error by increasing the timeout. This has the downside of making a DOS easier. With this the admin will still get a notice if the timeout is exceeded. To go with the settable defaultimeout value, (now that random() works 8-)) only send email occasionally (say 10% of the time) but log every failure to the log file. One thing that would be useful, if it is not already in the code, is to report the url that caused the timeout. I am not sure if it's available though. In any case that's why I have your patch shelved in my development repo and never pushed it. Does that clarify my concerns and possibly better ways to handle this issue?
|msg6153||Author: [hidden] (joseph_myers)||Date: 2018-08-01 00:26|
My underlying expectation here is that the admin should get email when there is a bug in Roundup or the instance code, but not simply for attempts to access the instance by a client on a bad connection, or by a misbehaving client. (There are lots of robots and other clients out there doing things that a proper browser implementation of HTTP, on a good connection, following valid links and making valid form submissions only, would never do, and so there can be unhelpfully many such emails when a robot does the same problematic thing on many pages. I have a few other Roundup backtraces to investigate that may also relate to clients doing dubious things.) I do happen to have an unsubmitted local patch to increase the default timeout from a hardcoded 60 seconds to a hardcoded 300 seconds (essentially an attempt to reduce the frequency of the errors in the present issue from before I wrote the patch attached to this issue, I think).
|msg6154||Author: [hidden] (rouilj)||Date: 2018-08-01 01:23|
Hi Joseph: In message <email@example.com>, Joseph Myers writes: >Joseph Myers added the comment: >My underlying expectation here is that the admin should get email when >there is a bug in Roundup or the instance code, but not simply for >attempts to access the instance by a client on a bad connection, or by a >misbehaving client. (There are lots of robots and other clients out >there doing things that a proper browser implementation of HTTP, on a >good connection, following valid links and making valid form submissions >only, would never do, and so there can be unhelpfully many such emails >when a robot does the same problematic thing on many pages. I have a >few other Roundup backtraces to investigate that may also relate to >clients doing dubious things.) Ah ok, I understand your point. My expectation is obviously different. How about amending your patch with a check to see if the option "IgnoreSocketTimeout" is set. If it is set to true, block the exception from being propigated (as your patch does). If the setting is false, re-assert the exception letting the normal roundup error handler take effect. Also I would make sure that there is a warning level log message emitted if the email is suppressed. That way no email is sent but a trace is left to give the admin a clue when somebody complains. >I do happen to have an unsubmitted local patch to increase the default >timeout from a hardcoded 60 seconds to a hardcoded 300 seconds >(essentially an attempt to reduce the frequency of the errors in the >present issue from before I wrote the patch attached to this issue, I >think). I think a minute timeout should be good in the majority of cases. It would be nice to make that configurable rather than just raising the timeout as that makes it easier to run the server out of resources.
|msg6156||Author: [hidden] (joseph_myers)||Date: 2018-08-01 10:13|
If you have such an option to control timeout handling, it raises more questions. The current code has: # list of network error codes that shouldn't be reported to tracker admin # (error descriptions from FreeBSD intro(2)) IGNORE_NET_ERRORS = ( # A write on a pipe, socket or FIFO for which there is # no process to read the data. errno.EPIPE, # A connection was forcibly closed by a peer. # This normally results from a loss of the connection # on the remote socket due to a timeout or a reboot. errno.ECONNRESET, # Software caused connection abort. A connection abort # was caused internal to your host machine. errno.ECONNABORTED, # A connect or send request failed because the connected party # did not properly respond after a period of time. errno.ETIMEDOUT, ) [...] try: call(*args, **kwargs) except socket.error as err: err_errno = getattr (err, 'errno', None) if err_errno is None: try: err_errno = err except TypeError: pass if err_errno not in self.IGNORE_NET_ERRORS: raise except IOError: # Apache's mod_python will raise IOError -- without an # accompanying errno -- when a write to the client fails. # A common case is that the client has closed the # connection. There's no way to be certain that this is # the situation that has occurred here, but that is the # most likely case. pass So the existing expectation is that all those errors (including a timeout if it somehow ends up as socket.error with ETIMEDOUT) are ignored. Does such an option also result in ETIMEDOUT being handled differently? Does it affect the other forms of socket errors or should those have another option to affect them? In the mod_python case, is it possible to distinguish timeouts anyway or will they just go through the generic IOError case?
|2023-06-07 03:24:04||rouilj||set||resolution: remind|
|2018-09-27 14:00:06||schlatterbeck||set||nosy: + schlatterbeck|
|2018-08-01 10:13:50||joseph_myers||set||messages: + msg6156|
|2018-08-01 01:23:51||rouilj||set||messages: + msg6154|
|2018-08-01 00:26:45||joseph_myers||set||messages: + msg6153|
|2018-08-01 00:13:17||rouilj||set||messages: + msg6152|
|2018-07-31 23:29:51||joseph_myers||set||messages: + msg6151|
|2016-06-19 03:25:39||rouilj||set||assignee: rouilj|
|2016-06-13 08:23:48||ber||set||messages: + msg5600|
|2016-06-12 01:43:11||rouilj||set||messages: + msg5599|
|2016-06-12 00:47:38||joseph_myers||set||messages: + msg5597|
messages: + msg5596
|2012-08-28 03:48:08||jerrykan||set||nosy: + jerrykan|
|2012-03-19 09:10:28||joseph_myers||set||messages: + msg4520|
|2012-03-19 08:28:54||ber||set||priority: normal|
nosy: + ber
messages: + msg4519
keywords: + patch