[squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

Thu Jun 25 19:55:53 UTC 2015

On 06/25/2015 08:13 AM, Amos Jeffries wrote:
> Which is why I want to go the route of HTTP/0.9 handling. Its
> clear when products encounter it and cause themselves problems.

Sigh. You are repeating essentially the same argument as before. Any
"let's create problems for something that appears to work without Squid"
approach often results in *us* wasting time on workarounds for those
problems.

You may be convinced that creating short-term problems for others will
make the world a better place long-term, but there is no way to prove or
disprove that assumption, and there is no objective way to even compare
the possible long-term gain with the likely short-term pain (especially
when it is somebody else who is hurting).

The history is full of examples where the "future happiness justifies
short-term suffering" approach either works great or fails miserably. I
doubt we can come up with a formula that predicts the outcome.

> Was the '|' character
> non-compliance bug reported against the product?

Unknown to me, but I suspect folks suffering from this do not report
bugs to companies they have no relationship with (and may not even know
what the "product" is!).

> Anyhow. I've thrown the URI you mentioned through my test copy of trunk
> and its not hanging looking for the end of mime headers like you said.
> Its waiting for the end of the URI:

> parseHttpRequest: Incomplete request, waiting for end of request line

Please forgive me for misinterpreting the Squid log line that lies about
an "Incomplete request" and "waiting for end of request line".

If you prefer the "waiting for the end of the URI" description of the
broken state (where the complete URI, the complete request line, and the
complete request are all available), I am not going to argue that it is
also inaccurate.

> Attached is a patch that replaces the hang behaviour with an
> invalid-request error page on standards compliant builds. But, when
> --enable-http-violations is used it accepts the invalid characters we
> know are in use and passes the request through as HTTP/0.9.

... which does not work because the request does not actually use the
HTTP/0.9 format. You end up with:

> Forwarding client request ... url=http://localhost:8080/path|with|unwise|charactersHTTP/1.1

> GET /path|with|unwise|charactersHTTP/1.1 HTTP/1.1
> Via: 0.9 protofroot (squid/4.0.0-BZR)

The trailing " HTTP/1.1" part of the request line is not a part of the
URI but your changes make it so.

Tokenizer cannot handle URIs with whitespaces directly, like your patch
attempts to do: Tokenizer alone cannot handle ambiguous grammars. To
handle such URIs well, you have two options IMO:

A. The old "do it by hand" way:

  0. Trim request line to remove trailing whitespace and CR*LF.
  1. Find the *last* whitespace on the trimmed request line.
  2. To get the request method and URI, apply the tokenizer to the
     request line prefix before that last whitespace.
  3. To get the protocol parts, apply the tokenizer to the
     request line suffix after that last whitespace.
  4. Make the code even messier to handle HTTP/0.9 cases if needed.

B. The new optimized and simplified "optimistic Tokenizer" way:

Here is a sketch:

    SBuf uri; // accumulates request URI characters
    if (!tok.prefix(uri, StrictUriChars))
        return false; // URI does not start with a valid character

    // in the order of correctness (and popularity!):
    const bool parsedSuffix =
        // canonical HTTP/1 format
        parseCanonicalReqLineTail(uri, tok) ||
        // HTTP/1 format but with bad characters in URI
        parseBadUriReqLineTail(uri, tok) ||
        // HTTP/1 format but with bad characters and spaces in URI
        parseSpacedReqLineTail(uri, tok) ||
        // HTTP/0 format (without the protocol part at all)
        parseHttpZeroReqLineTail(uri, tok);

    Assert(parsedSuffix); // parseHttpZeroReqLineTail() cannot fail
    ...

where, for example, parseCanonicalReqLineTail() expects whitespace
followed by protocol info while parseBadUriReqLineTail() expects a URI
suffix (using RelaxedUriChars which do not include whitepsace) followed
by a call to parseCanonicalReqLineTail().

If any of the helper functions returns false, it should not change its
uri and tok parameters, of course.

HTH,

Alex.