[squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

Thu Jun 25 14:13:21 UTC 2015

On 25/06/2015 4:05 p.m., Alex Rousskov wrote:
> On 06/24/2015 02:24 PM, Kinkie wrote:
>> My 2c: I vote for reality; possibly with a shaming announce message;
> 
> I would recommend against shaming any particular company unless somebody
> can personally vouch that that specific company is the one to blame. I
> cannot (without a lot more triage work which I have no time for).
> 
> The benefits of shaming in this case are tiny, but the danger of hurting
> an innocent person (or even company) is always there.
> 

Indeed. Which is why I want to go the route of HTTP/0.9 handling. Its
clear when products encounter it and cause themselves problems. But we
dont need to name and shame any particular source.

pPS. in my testing the URIs you presented as reason for the patch were
accepted by Squid and handled as HTTP/0.9.


> 
>> I wouldn't even recommend logging the violation: there is nothing the
>> average admin can do about it.
> 
> 
> An admin has reminded me privately that Squid already has a mechanism
> that can be used here if somebody wants to enforce strict parsing [and
> wants to see violations]:
> 
> 
>> NAME: relaxed_header_parser
>>     In the default "on" setting Squid accepts certain forms
>>     of non-compliant HTTP messages where it is unambiguous
>>     what the sending application intended
> 
>>     If set to "warn" then a warning will be emitted in cache.log
>>     each time such HTTP error is encountered.
> 
>>     If set to "off" then such HTTP errors will cause the request
>>     or response to be rejected.
> 
> The temporary fix can be enhanced to include all "unwise" URI characters
> if (and only if) relaxed_header_parser is "on", and also to warn about
> those characters when relaxed_header_parser is "warn".
> 

That would be better than always doing it in the strict parsing pathway.
And is a useful temporary workaround for people having this issue you
found since related parse should be on by default IIRC.


>> On Wed, Jun 24, 2015 at 10:12 PM, Alex Rousskov wrote:
>>> On 06/24/2015 05:26 AM, Amos Jeffries wrote:
>>>
>>>> On 24/06/2015 5:55 p.m., Alex Rousskov wrote:
>>>>>     This temporary trunk fix adds support for request URIs containing
>>>>> '|' characters. Such URIs are used by popular Amazon product (and
>>>>> probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js
>>>>>
>>>>> Without this fix, all requests for affected URIs timeout while Squid
>>>>> waits for the end of request headers it has already received(*).
>>>
>>>
>>>> This is not right. Squid should be identifying the message as
>>>> non-HTTP/1.x (which it isn't due to the URI syntax violation) and
>>>> treating it as such.
>>>
>>> I agree that Amazon violates URI syntax. On the other hand, the message
>>> can be interpreted as HTTP/1.x for all practical purposes AFAICT. If you
>>> want to implement a different fix, please do so. Meanwhile, folks
>>> suffering from this serious regression can try the temporary fix I posted.

SMTP messages can be interpreted as HTTP/1.1 for all intents and
purposes as well. That does not make it a good idea to do so.

I don't care who it is sending the broken messages (AIUI, its not the
company anyway, its the particular UA software you found). They need to
stop getting away with it and the best incentive is bad behaviour
leading to bad performance. Loosing the benefits of HTTP/1.1 does that
cleanly.


NP: We have a track record of several vendors who fixed their products
after this type of bug was reported to them.  Was the '|' character
non-compliance bug reported against the product? They are probably able
to roll out the proper fix faster than we can anyway.


Anyhow. I've thrown the URI you mentioned through my test copy of trunk
and its not hanging looking for the end of mime headers like you said.
Its waiting for the end of the URI:
"
RequestParser.cc(390) parse: request-line: method: GET
RequestParser.cc(391) parse: request-line: url:
RequestParser.cc(392) parse: request-line: proto: NONE/0.0
RequestParser.cc(393) parse: Parser: bytes processed=4
SBuf.cc(149) assign: assigning SBuf1845 from SBuf1850
parseHttpRequest: Incomplete request, waiting for end of request line
"

Attached is a patch that replaces the hang behaviour with an
invalid-request error page on standards compliant builds. But, when
--enable-http-violations is used it accepts the invalid characters we
know are in use and passes the request through as HTTP/0.9.


>>>
>>>>> The proper long-term fix is to allow any character in URI as long as we
>>>>> can reliably parse the request line (and, later, URI components). There
>>>>> is no point in hurting users by rejecting requests while slowly
>>>>> accumulating the list of benign characters used by web sites but
>>>>> prohibited by some RFC.
>>>
>>>> The *proper* long term fix is to obey the standards in regard to message
>>>> syntax so applications stop using these invalid (when un-encoded)
>>>> characters and claiming HTTP/1.1 support.
>>>
>>> We had "standards vs reality" and "policing traffic" discussions several
>>> times in the past, with no signs of convergence towards a single
>>> approach, so I am not going to revisit that discussion now. We continue
>>> to disagree [while Squid users continue to suffer].

Yes. And things like <http://bugs.squid-cache.org/show_bug.cgi?id=4222>
are what happens if we accept that particular "reality". (We get screwed
both ways.)

Amos

-------------- next part --------------
=== modified file 'src/http/one/RequestParser.cc'

--- src/http/one/RequestParser.cc	2015-04-10 11:02:44 +0000
+++ src/http/one/RequestParser.cc	2015-06-25 13:01:39 +0000
@@ -244,40 +244,41 @@
  * \retval  0  more data is needed to complete the parse
  */
 int
 Http::One::RequestParser::parseRequestFirstLine()
 {
     Http1::Tokenizer tok(buf_);
 
     debugs(74, 5, "parsing possible request: buf.length=" << buf_.length());
     debugs(74, DBG_DATA, buf_);
 
     // NP: would be static, except it need to change with reconfigure
     CharacterSet WspDelim = CharacterSet::SP; // strict parse only accepts SP
 
     if (Config.onoff.relaxed_header_parser) {
         // RFC 7230 section 3.5
         // tolerant parser MAY accept any of SP, HTAB, VT (%x0B), FF (%x0C), or bare CR
         // as whitespace between request-line fields
         WspDelim += CharacterSet::HTAB
                     + CharacterSet("VT,FF","\x0B\x0C")
                     + CharacterSet::CR;
+        debugs(74, 5, "using Parser relaxed WSP characters");
     }
 
     // only search for method if we have not yet found one
     if (method_ == Http::METHOD_NONE) {
         const int res = parseMethodField(tok, WspDelim);
         if (res < 1)
             return res;
         // else keep going...
     }
 
     // tolerant parser allows multiple whitespace characters between request-line fields
     if (Config.onoff.relaxed_header_parser) {
         const size_t garbage = tok.skipAll(WspDelim);
         if (garbage > 0) {
             firstLineGarbage_ += garbage;
             buf_ = tok.remaining(); // re-checkpoint after garbage
         }
     }
     if (tok.atEnd()) {
         debugs(74, 5, "Parser needs more data");
@@ -313,40 +314,72 @@
                 parseStatusCode = Http::scOkay;
                 buf_ = tok.remaining(); // incremental parse checkpoint
                 return 1;
 
             } else if (method_ == Http::METHOD_GET) {
                 // RFC 1945 - for GET the line terminator may follow URL instead of a delimiter
                 debugs(33, 5, "HTTP/0.9 syntax request-line detected");
                 msgProtocol_ = Http::ProtocolVersion(0,9);
                 static const SBuf cr("\r",1);
                 uri_ = line.trim(cr,false,true);
                 parseStatusCode = Http::scOkay;
                 buf_ = tok.remaining(); // incremental parse checkpoint
                 return 1;
             }
 
             debugs(33, 5, "invalid request-line. not HTTP");
             parseStatusCode = Http::scBadRequest;
             return -1;
         }
 
+        if (!tok.atEnd()) {
+#if USE_HTTP_VIOLATIONS
+            // invalid character somewhere in the line.
+            // As long as we can find the LF, accept the characters
+            // which we know are invalid in any URI, but actively used
+            LfDelim.add('\0'); // Java
+            LfDelim.add(' ');  // IIS
+            LfDelim.add('\"'); // Bing
+            LfDelim.add('\\'); // MSIE, Firefox
+            LfDelim.add('|');  // Amazon
+
+            // reset the tokenizer from anything the above did, then seek the LF character.
+            tok.reset(buf_);
+            if (tok.prefix(line, LfDelim) && tok.skip('\n')) {
+                msgProtocol_ = Http::ProtocolVersion(0,9);
+                static const SBuf cr("\r",1);
+                uri_ = line.trim(cr,false,true);
+                parseStatusCode = Http::scOkay;
+                buf_ = tok.remaining(); // incremental parse checkpoint
+                return 1;
+
+            } else if (tok.atEnd()) {
+                debugs(74, 5, "Parser needs more data");
+                return 0;
+            }
+            // else, drop back to invalid request-line handling
+#endif
+            const SBuf t = tok.remaining();
+            debugs(33, 5, "invalid request-line characters." << Raw("data", t.rawContent(), t.length()));
+            parseStatusCode = Http::scBadRequest;
+            return -1;
+        }
         debugs(74, 5, "Parser needs more data");
         return 0;
     }
     // else strict non-whitespace tolerant parse
 
     // only search for request-target (URL) if we have not yet found one
     if (uri_.isEmpty()) {
         const int res = parseUriField(tok);
         if (res < 1 || msgProtocol_.protocol == AnyP::PROTO_HTTP)
             return res;
         // else keep going...
     }
 
     if (tok.atEnd()) {
         debugs(74, 5, "Parser needs more data");
         return 0;
     }
 
     // HTTP/1 version suffix (protocol magic) followed by CR*LF
     if (msgProtocol_.protocol == AnyP::PROTO_NONE) {