Roundup Tracker - Issues

Issue 2550651

classification
Title: HEAD requests are very slow. (fix to take 1/4 of original time in comments)
Type: resource usage Severity: major
Components: Web interface Versions: 1.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: rouilj Nosy List: ber, richard, rouilj, tonimueller
Priority: Keywords:

Created on 2010-05-10 10:22 by tonimueller, last changed 2017-10-02 23:17 by rouilj.

Messages
msg4055 Author: [hidden] (tonimueller) Date: 2010-05-10 10:22
Copied from debian:540626 :

Experimental evidence suggests that HEAD requests will result in
Roundup generating the entire page, discard it, then send the headers
to the client.  This is very wasteful, particularly on large pages.

Making requests to bugs.darcs.net *from* bugs.darcs.net (to avoid
network latency), we can see that HEAD and GET take about the same
amount of time:

    $ time nc bugs.darcs.net www <<<$'HEAD /status1 HTTP/1.1\nHost:
bugs.darcs.net\n\n' | wc -l
    7

    real    0m18.549s
    user    0m0.004s
    sys     0m0.004s
    $ time nc bugs.darcs.net www <<<$'GET /status1 HTTP/1.1\nHost:
bugs.darcs.net\n\n' | wc -l
    3117

    real    0m18.324s
    user    0m0.004s
    sys     0m0.004s

This issue has practical implications for me.  I maintain a script to
interact with roundup's mailgw, and I wanted to valid status IDs
before sending emails:

    if ! curl -fsIo/dev/null http://bugs.example.net/status$N
    then error "$N is not a valid status ID!"
    fi

Currently this request can take deciseconds, and so is far too slow to
use.




I've talked to both Eric Kow of darcs.net to get 1.4.13 installed there,
and to the user, who has confirmed that the problem is still present in
the latest version of roundup.
msg4056 Author: [hidden] (tonimueller) Date: 2010-05-10 10:24
Ouch, someone has apparently disabled the Debian-BTS linking. The
original bug can be viewed here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=540626
msg4094 Author: [hidden] (richard) Date: 2010-07-12 04:16
I'm not sure how this situation could be improved. It would be quite difficult in most cases to 
determine what HEAD should be testing for a "last modified" date for pages.
msg4098 Author: [hidden] (ber) Date: 2010-07-20 09:14
I'd say that the original use case should be solved differently.
Or what are other use cases for HEAD that we should support?
msg5509 Author: [hidden] (rouilj) Date: 2016-04-09 04:51
Is there a way in TAL to tell if a HEAD has been requested and abort the
generation of the tal page. Something like:

<tal:dontprocess tal:condition="python:
request.client.env['REQUEST_METHOD'] != 'HEAD'" >'

as the very first line of the page.html file
should prevent the page generation/tal processing right?

Will that have any effect on the expires time?
msg5823 Author: [hidden] (rouilj) Date: 2016-07-10 23:22
Probably too late to make any difference, but
another possibility occurred to me.

Rather than using:


  curl -fsIo/dev/null http://bugs.example.net/status$N

how about creating a template that is empty. Touch
tracker/html/issue.empty.html then use:

 curl -fsIo/dev/null 
'http://localhost:9017/demo/issue10400?@template=empty'; echo $?
22

curl -fsIo/dev/null 
'http://localhost:9017/demo/issue1?@template=empty'; echo $?
0

where issue1 exists and the other one doesn't.

This is probably as close as you can get at the moment. It does go into
the template code, but does very little work.

Alternatively you could use the xmlrpc interface.
msg6026 Author: [hidden] (rouilj) Date: 2017-10-02 23:17
Since roundup is no longer distributed as a debian package, this sort
of takes on less importance.

However I did two experiments:

Change page.html. I put:

  <tal:dontprocess
tal:condition="python:request.client.env['REQUEST_METHOD'] != 'HEAD'" >'

right before the <body> tag in page.html.  Then I put:

 </tal:dontprocess>

right after the </body> tag.

When curling I used the following framework:

   time sh -c 'for i in `seq 1 100`; 
        do curl -sk -o/dev/null -w"%{time_total}\n" 
            https://rouilj.dynamic-dns.net/demo/issue3 ; done' 

(wordwrapped manually). Running this takes total time of: 0m45.816s

Using the same as above but with --head added to the curl args,
it takes total time of: 0m26.533s

For anybody who can divide by 100, the following will
not be shocking.

In the GET case, curl reports average total time of 0.446s
with median of .447 and a range of 0.422, 0.456

The HEAD case returns average: 0.253s median: .253 and range
0.236, 0.259.

So we can see that there is a significant ~50% decrease in time.

The test env is running via a proxy web server (hiawatha 10.6) against
a daemonized roundup with a sqlite back end on an ASUS EB1036-B0534
Desktop with xubuntu 14.04.5. YMMV if you are using it as a cgi.

Given the original report that deciseconds was too slow this doesn't
help much as best case is still 2.5 deciseconds.

Creating an empty issue.empty.html file in the html subdir and running
the above loop with a url of:

  "https://rouilj.dynamic-dns.net/demo/issue3?@template=empty"

(note the quotes to protect the ?)

returns get case: avg: 0.15966 med: 0.160 range: 0.153, 0.171

returns head case: avg: 0.15968 med: 0.160 range: 0.154, 0.169

which are pretty much indistinguishable.

However still in 1.5 decisecond range. I think just the work of
accessing the db (to validate that the issue exists) and setting up an
html response is what we are seeing.  But this does get it down to 25%
of the original request time.

Note that using get on issue999 which generates a 404 error from the
server runs in 45ish seconds. Probably because it is rendering
_generic.404.html with the page border and all the rest. Running just
a head runs in 26 seconds which seems to support that supposition
since it looks similar to what was obtained with the original
modified page.html. So again slower than requested.

Putting the tal to detect a HEAD request at the top and bottom of the
_generic.404.page will probably cut that back to something similar to
(but higher than) the empty issue.empty.html template.

This is left as an exercise for the reader.

Also note there is no last-modified header generated for issues, only
attached files. So Richard's note that HEAD would not include a last
modified date is moot since I don't think any issue page has a last
modified date.

-- rouilj
History
Date User Action Args
2017-10-02 23:17:45rouiljsettitle: HEAD requests are very slow. (fix in comments) -> HEAD requests are very slow. (fix to take 1/4 of original time in comments)
2017-10-02 23:17:17rouiljsetstatus: new -> closed
assignee: rouilj
resolution: wont fix
messages: + msg6026
title: HEAD requests arer very slow. -> HEAD requests are very slow. (fix in comments)
2016-07-10 23:22:26rouiljsetmessages: + msg5823
2016-04-09 04:51:23rouiljsetnosy: + rouilj
messages: + msg5509
2010-07-20 09:14:34bersetnosy: + ber
messages: + msg4098
2010-07-20 09:13:18bersettitle: HEAD is ridiculously slow. -> HEAD requests arer very slow.
2010-07-12 04:16:57richardsetnosy: + richard
messages: + msg4094
2010-05-10 10:24:11tonimuellersetmessages: + msg4056
2010-05-10 10:22:45tonimuellercreate