Roundup Tracker - Issues

Message6353

Author ThomasAH
Recipients ThomasAH, ced, rouilj, schlatterbeck, tekberg
Date 2019-02-18.08:16:28
Message-id <20190218083840.565620063.thomas@intevation.de>
In-reply-to <20190217155013.90D034C028A@itserver6.localdomain>
* John P. Rouillard <rouilj@cs.umb.edu> [20190217 16:50]:
> In message <1550402207.92.0.371676542223.issue2551023@roundup.psfhosted.org>,
> =?utf-8?q?C=C3=A9dric_Krier?= writes:
> >I use the wsgi_handler instead of roundup-server. And I can not get
> >HTTP_X-REQUESTED-WITH set in environment because WSGI server convert all
> >'-' into '_'.
> 
> What server are you using? That translation of '-' to _' is not
> right/supported according to mod_wsgi or wsgi docs. It also seriously
> breaks the compatibility with CGI.

On the other hand I get variables like HTTP_USER_AGENT or
HTTP_X_FORWARDED_FOR in a non-python CGI environment, too.

From https://www.ietf.org/rfc/rfc3875:
(The Common Gateway Interface (CGI) Version 1.1)

| 4.1.18.  Protocol-Specific Meta-Variables
|
|    Meta-variables with names beginning with "HTTP_" contain values read
|    from the client request header fields, if the protocol used is HTTP.
|    The HTTP header field name is converted to upper case, has all
|    occurrences of "-" replaced with "_" and has "HTTP_" prepended to
|    give the meta-variable name.

> >So the X-Requested-With header becomes HTTP_X_REQUESTED_WITH. But
> >roundup-server use HTTP_X-REQUESTED-WITH key.
> 
> HTTP_X-REQUESTED-WITH is the correct form as the header is: X-REQUESTED-WITH
> headers?

And while not a PEP or RFC, this talk on Euro Python 2010 explicitly
states replacing hyphens with underscores:
https://gustavonarea.net/files/talks/europython2010/wsgi-cheatsheet.pdf

| HTTP_* variables: Those present in the HTTP request, in upper case
| and with hyphens replaced with underscores. For example, User-Agent
| becomes HTTP_USER_AGENT.

> >I think roundup-server behavior should be normalized with other WSGI
> >server and not use '-' in HTTP_*.
> >I could not find in PEP3333 that '-' should be converted into '_' for
> >HTTP_. There is only a reference to server-defined variables.
> >
> >https://www.python.org/dev/peps/pep-3333/#environ-variables
> 
> Wow that seems wsgi is seriously broken. It is not what is documented.
> From mod_wsgi:
> 
>   https://modwsgi.readthedocs.io/en/develop/release-notes/version-4.3.0.html
> 
> it says in part:
> 
>   bugs fixed:
> 
>     Under Apache 2.4, when creating the environ dictionary for passing
>     into access/authentication/authorisation handlers, the behvaiour of
>     Apache 2.4 as it pertained to the WSGI application, whereby it blocked
>     the passing of any HTTP headers with a name which did not contain just
>     alphanumerics or ‘-‘, was not being mirrored. This created the
>     possibility of HTTP header spoofing in certain circumstances. Such
>     headers are now being ignored.
> 
> and under features:
> 
>     In Apache 2.4, any headers with a name which does not include only
>     alphanumerics or ‘-‘ are blocked from being passed into a WSGI
>     application when the CGI like WSGI environ dictionary is created. This
>     is a mechanism to prevent header spoofing when there are multiple
>     headers where the only difference is the use of non alphanumerics in a
>     specific character position.
> 
>     This protection mechanism from Apache 2.4 is now being
>     restrospectively applied even when Apache 2.2 is being used and even
>     though Apache itself doesn’t do it. This may technically result in
>     headers that were previously being passed, no longer being passed. The
>     change is also technically against what the HTTP RFC says is allowed
>     for HTTP header names, but such blocking would occur in Apache 2.4
>     anyway due to changes in Apache. It is also understood that other web
>     servers such as nginx also perform the same type of blocking. Reliance
>     on HTTP headers which use characters other than alphanumerics and ‘-‘
>     is therefore dubious as many servers will now discard them when
>     needing to be passed into a system which requires the headers to be
>     passed as CGI like variables such as is the case for WSGI.
> 
> While it does not say that the - is preserved, not saying it's converted to _ would seem a massive oversight.

It is an oversight, because everyone (including those writing
PEP333/PEP3333) just assumes that it behaves like CGI.

> Also:
> 
>   https://wsgi.readthedocs.io/en/latest/definitions.html
> 
> says:
> 
>   HTTP_ Variables
> 
>     Variables corresponding to the client-supplied HTTP request
>     headers (i.e., variables whose names begin with HTTP_). The
>     presence or absence of these variables should correspond with the
>     presence or absence of the appropriate HTTP header in the request.
> 
> which is the same as in your cite of:
>    https://www.python.org/dev/peps/pep-3333/#environ-variables
> 
> The names for the http request headers are most definitely '-'
> separated. In cgi mode, the vars are precisely:
> 
>   HTTP_<uppercase version of http header field name>
> 
> and that is exactly how the vars will be presented to the application.

Uppercase of "-" is "_", because I have to press the shift key to
get it (on a US keyboard) :-)

And apart from all this, we're still talking about environment
variables here:
http://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap08.html

| Environment variable names used by the utilities in the Shell and
| Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase
| letters, digits, and the '_' (underscore) from the characters
| defined in Portable Character Set and do not begin with a digit.
| Other characters may be permitted by an implementation

This is just the context of "utilities in the Shell and Utilities
volume" and depending on the context you can use lowercase letters
or even everything except = and \0, but you can't even set an
environment variable with a "-" in bash without some dirty tricks.

So in short, most people just assume that all "-" are converted to
"_", even those writing the specs/PEPs, because it has always been
this way and it is the only way it consistently works. Now, with the
security updates for web servers in place, there is no longer an
ambiguity: An underscore in the environment variable is always a
dash in the header, because headers with underscores are not
allowed.

Regards,
Thomas
History
Date User Action Args
2019-02-18 08:16:29ThomasAHsetrecipients: + ThomasAH, schlatterbeck, ced
2019-02-18 08:16:29ThomasAHlinkissue2551023 messages
2019-02-18 08:16:28ThomasAHcreate