[squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

Amos Jeffries squid3 at treenet.co.nz
Fri Jun 26 12:44:49 UTC 2015


On 26/06/2015 7:55 a.m., Alex Rousskov wrote:
> On 06/25/2015 08:13 AM, Amos Jeffries wrote:
>> Which is why I want to go the route of HTTP/0.9 handling. Its
>> clear when products encounter it and cause themselves problems.
> 
> Sigh. You are repeating essentially the same argument as before. Any
> "let's create problems for something that appears to work without Squid"
> approach often results in *us* wasting time on workarounds for those
> problems.
> 
> You may be convinced that creating short-term problems for others will
> make the world a better place long-term, but there is no way to prove or
> disprove that assumption, and there is no objective way to even compare
> the possible long-term gain with the likely short-term pain (especially
> when it is somebody else who is hurting).
> 
> The history is full of examples where the "future happiness justifies
> short-term suffering" approach either works great or fails miserably. I
> doubt we can come up with a formula that predicts the outcome.
> 
> 
>> Was the '|' character
>> non-compliance bug reported against the product?
> 
> Unknown to me, but I suspect folks suffering from this do not report
> bugs to companies they have no relationship with (and may not even know
> what the "product" is!).
> 
> 
>> Anyhow. I've thrown the URI you mentioned through my test copy of trunk
>> and its not hanging looking for the end of mime headers like you said.
>> Its waiting for the end of the URI:
> 
>> parseHttpRequest: Incomplete request, waiting for end of request line
> 
> Please forgive me for misinterpreting the Squid log line that lies about
> an "Incomplete request" and "waiting for end of request line".

No lie there. It hasn't got near starting to search for URL yet. When
relaxed parser is enabled (ie by default) it has to find the end of the
request-line first and work backwards. It found characters that are
invalid anywhere in the HTTP first-line / request-line before it found
the LF. It just happens that they are in the URI portion of the
request-line.
 It would report the same thing on receiving "GET / HTTP/1.0|\r\n"

> 
> If you prefer the "waiting for the end of the URI" description of the
> broken state (where the complete URI, the complete request line, and the
> complete request are all available), I am not going to argue that it is
> also inaccurate.
> 
> 
>> Attached is a patch that replaces the hang behaviour with an
>> invalid-request error page on standards compliant builds. But, when
>> --enable-http-violations is used it accepts the invalid characters we
>> know are in use and passes the request through as HTTP/0.9.
> 
> 
> ... which does not work because the request does not actually use the
> HTTP/0.9 format. You end up with:
> 
> 
>> Forwarding client request ... url=http://localhost:8080/path|with|unwise|charactersHTTP/1.1
> 
>> GET /path|with|unwise|charactersHTTP/1.1 HTTP/1.1
>> Via: 0.9 protofroot (squid/4.0.0-BZR)
> 
> 
> The trailing " HTTP/1.1" part of the request line is not a part of the
> URI but your changes make it so.

Sorry. Attached is a patch which fixes that and resolves accel/intercept
issues with the mime block as well.



> 
> Tokenizer cannot handle URIs with whitespaces directly, like your patch
> attempts to do: Tokenizer alone cannot handle ambiguous grammars. To
> handle such URIs well, you have two options IMO:
> 
> A. The old "do it by hand" way:
> 
>   0. Trim request line to remove trailing whitespace and CR*LF.
>   1. Find the *last* whitespace on the trimmed request line.
>   2. To get the request method and URI, apply the tokenizer to the
>      request line prefix before that last whitespace.
>   3. To get the protocol parts, apply the tokenizer to the
>      request line suffix after that last whitespace.
>   4. Make the code even messier to handle HTTP/0.9 cases if needed.
> 
> 
> B. The new optimized and simplified "optimistic Tokenizer" way:
> 
> Here is a sketch:
> 
>     SBuf uri; // accumulates request URI characters
>     if (!tok.prefix(uri, StrictUriChars))
>         return false; // URI does not start with a valid character
> 
>     // in the order of correctness (and popularity!):
>     const bool parsedSuffix =
>         // canonical HTTP/1 format
>         parseCanonicalReqLineTail(uri, tok) ||
>         // HTTP/1 format but with bad characters in URI
>         parseBadUriReqLineTail(uri, tok) ||
>         // HTTP/1 format but with bad characters and spaces in URI
>         parseSpacedReqLineTail(uri, tok) ||
>         // HTTP/0 format (without the protocol part at all)
>         parseHttpZeroReqLineTail(uri, tok);
> 
>     Assert(parsedSuffix); // parseHttpZeroReqLineTail() cannot fail
>     ...

C. locate LF character and parse in reverse. HTTP-Version label, WSP
then remainder is URI.


(B) is Interesting alternative to what we have now (C). Although you are
missing the relaxed vs strict characters which the RFC mandates (for
both URI and whitspace separately permutating). That complicates things
a bit.

If I can find some time I'll give this (B) style a bit of testing and
see how it matches up against the current code. Could prove faster on
very long URIs.

Amos
-------------- next part --------------
=== modified file 'src/http/one/Parser.cc'
--- src/http/one/Parser.cc	2015-04-10 11:02:44 +0000
+++ src/http/one/Parser.cc	2015-06-26 12:15:52 +0000
@@ -26,41 +26,42 @@
 }
 
 bool
 Http::One::Parser::skipLineTerminator(Http1::Tokenizer &tok) const
 {
     static const SBuf crlf("\r\n");
     if (tok.skip(crlf))
         return true;
 
     if (Config.onoff.relaxed_header_parser && tok.skipOne(CharacterSet::LF))
         return true;
 
     return false;
 }
 
 bool
 Http::One::Parser::grabMimeBlock(const char *which, const size_t limit)
 {
     // MIME headers block exist in (only) HTTP/1.x and ICY
     const bool expectMime = (msgProtocol_.protocol == AnyP::PROTO_HTTP && msgProtocol_.major == 1) ||
-                            msgProtocol_.protocol == AnyP::PROTO_ICY;
+                            msgProtocol_.protocol == AnyP::PROTO_ICY ||
+                            hackExpectsMime_;
 
     if (expectMime) {
         /* NOTE: HTTP/0.9 messages do not have a mime header block.
          *       So the rest of the code will need to deal with '0'-byte headers
          *       (ie, none, so don't try parsing em)
          */
         // XXX: c_str() reallocates. performance regression.
         if (SBuf::size_type mimeHeaderBytes = headersEnd(buf_.c_str(), buf_.length())) {
 
             // Squid could handle these headers, but admin does not want to
             if (firstLineSize() + mimeHeaderBytes >= limit) {
                 debugs(33, 5, "Too large " << which);
                 parseStatusCode = Http::scHeaderTooLarge;
                 buf_.consume(mimeHeaderBytes);
                 parsingStage_ = HTTP_PARSE_DONE;
                 return false;
             }
 
             mimeHeaderBlock_ = buf_.consume(mimeHeaderBytes);
             debugs(74, 5, "mime header (0-" << mimeHeaderBytes << ") {" << mimeHeaderBlock_ << "}");

=== modified file 'src/http/one/Parser.h'
--- src/http/one/Parser.h	2015-06-09 01:59:58 +0000
+++ src/http/one/Parser.h	2015-06-26 12:02:35 +0000
@@ -24,41 +24,41 @@
     HTTP_PARSE_CHUNK_SZ,  ///< HTTP/1.1 chunked encoding chunk-size
     HTTP_PARSE_CHUNK_EXT, ///< HTTP/1.1 chunked encoding chunk-ext
     HTTP_PARSE_CHUNK,     ///< HTTP/1.1 chunked encoding chunk-data
     HTTP_PARSE_MIME,      ///< HTTP/1 mime-header block
     HTTP_PARSE_DONE       ///< parsed a message header, or reached a terminal syntax error
 };
 
 /** HTTP/1.x protocol parser
  *
  * Works on a raw character I/O buffer and tokenizes the content into
  * the major CRLF delimited segments of an HTTP/1 procotol message:
  *
  * \item first-line (request-line / simple-request / status-line)
  * \item mime-header 0*( header-name ':' SP field-value CRLF)
  */
 class Parser : public RefCountable
 {
 public:
     typedef SBuf::size_type size_type;
 
-    Parser() : parseStatusCode(Http::scNone), parsingStage_(HTTP_PARSE_NONE) {}
+    Parser() : parseStatusCode(Http::scNone), parsingStage_(HTTP_PARSE_NONE), hackExpectsMime_(false) {}
     virtual ~Parser() {}
 
     /// Set this parser back to a default state.
     /// Will DROP any reference to a buffer (does not free).
     virtual void clear() = 0;
 
     /// attempt to parse a message from the buffer
     /// \retval true if a full message was found and parsed
     /// \retval false if incomplete, invalid or no message was found
     virtual bool parse(const SBuf &aBuf) = 0;
 
     /** Whether the parser is waiting on more data to complete parsing a message.
      * Use to distinguish between incomplete data and error results
      * when parse() returns false.
      */
     bool needsMoreData() const {return parsingStage_!=HTTP_PARSE_DONE;}
 
     /// size in bytes of the first line including CRLF terminator
     virtual size_type firstLineSize() const = 0;
 
@@ -114,27 +114,30 @@
      *                mimeHeaderBlock_ has been updated and buf_ consumed.
      *
      * \retval false  An error occured, or no mime terminator found within limit.
      */
     bool grabMimeBlock(const char *which, const size_t limit);
 
     /// RFC 7230 section 2.6 - 7 magic octets
     static const SBuf Http1magic;
 
     /// bytes remaining to be parsed
     SBuf buf_;
 
     /// what stage the parser is currently up to
     ParseState parsingStage_;
 
     /// what protocol label has been found in the first line (if any)
     AnyP::ProtocolVersion msgProtocol_;
 
     /// buffer holding the mime headers (if any)
     SBuf mimeHeaderBlock_;
+
+    /// Whether the invalid HTTP as HTTP/0.9 hack expects a mime header block
+    bool hackExpectsMime_;
 };
 
 } // namespace One
 } // namespace Http
 
 #endif /*  _SQUID_SRC_HTTP_ONE_PARSER_H */
 

=== modified file 'src/http/one/RequestParser.cc'
--- src/http/one/RequestParser.cc	2015-04-10 11:02:44 +0000
+++ src/http/one/RequestParser.cc	2015-06-26 12:02:28 +0000
@@ -244,109 +244,166 @@
  * \retval  0  more data is needed to complete the parse
  */
 int
 Http::One::RequestParser::parseRequestFirstLine()
 {
     Http1::Tokenizer tok(buf_);
 
     debugs(74, 5, "parsing possible request: buf.length=" << buf_.length());
     debugs(74, DBG_DATA, buf_);
 
     // NP: would be static, except it need to change with reconfigure
     CharacterSet WspDelim = CharacterSet::SP; // strict parse only accepts SP
 
     if (Config.onoff.relaxed_header_parser) {
         // RFC 7230 section 3.5
         // tolerant parser MAY accept any of SP, HTAB, VT (%x0B), FF (%x0C), or bare CR
         // as whitespace between request-line fields
         WspDelim += CharacterSet::HTAB
                     + CharacterSet("VT,FF","\x0B\x0C")
                     + CharacterSet::CR;
+        debugs(74, 5, "using Parser relaxed WSP characters");
     }
 
     // only search for method if we have not yet found one
     if (method_ == Http::METHOD_NONE) {
         const int res = parseMethodField(tok, WspDelim);
         if (res < 1)
             return res;
         // else keep going...
     }
 
     // tolerant parser allows multiple whitespace characters between request-line fields
     if (Config.onoff.relaxed_header_parser) {
         const size_t garbage = tok.skipAll(WspDelim);
         if (garbage > 0) {
             firstLineGarbage_ += garbage;
             buf_ = tok.remaining(); // re-checkpoint after garbage
         }
     }
     if (tok.atEnd()) {
         debugs(74, 5, "Parser needs more data");
         return 0;
     }
 
     // from here on, we have two possible parse paths: whitespace tolerant, and strict
     if (Config.onoff.relaxed_header_parser) {
         // whitespace tolerant
 
+        int warnOnError = (Config.onoff.relaxed_header_parser <= 0 ? DBG_IMPORTANT : 2);
+
         // NOTES:
         // * this would be static, except WspDelim changes with reconfigure
         // * HTTP-version charset is included by uriValidCharacters()
         // * terminal CR is included by WspDelim here in relaxed parsing
         CharacterSet LfDelim = uriValidCharacters() + WspDelim;
 
         // seek the LF character, then tokenize the line in reverse
         SBuf line;
         if (tok.prefix(line, LfDelim) && tok.skip('\n')) {
             Http1::Tokenizer rTok(line);
             SBuf nil;
             (void)rTok.suffix(nil,CharacterSet::CR); // optional CR in terminator
             SBuf digit;
             if (rTok.suffix(digit,CharacterSet::DIGIT) && rTok.skipSuffix(Http1magic) && rTok.suffix(nil,WspDelim)) {
                 uri_ = rTok.remaining();
                 msgProtocol_ = Http::ProtocolVersion(1, (*digit.rawContent() - '0'));
                 if (uri_.isEmpty()) {
-                    debugs(33, 5, "invalid request-line. missing URL");
+                    debugs(33, warnOnError, "invalid request-line. missing URL");
                     parseStatusCode = Http::scBadRequest;
                     return -1;
                 }
 
                 parseStatusCode = Http::scOkay;
                 buf_ = tok.remaining(); // incremental parse checkpoint
                 return 1;
 
             } else if (method_ == Http::METHOD_GET) {
                 // RFC 1945 - for GET the line terminator may follow URL instead of a delimiter
                 debugs(33, 5, "HTTP/0.9 syntax request-line detected");
                 msgProtocol_ = Http::ProtocolVersion(0,9);
                 static const SBuf cr("\r",1);
                 uri_ = line.trim(cr,false,true);
                 parseStatusCode = Http::scOkay;
                 buf_ = tok.remaining(); // incremental parse checkpoint
                 return 1;
             }
 
-            debugs(33, 5, "invalid request-line. not HTTP");
+            debugs(33, warnOnError, "invalid request-line. not HTTP");
             parseStatusCode = Http::scBadRequest;
             return -1;
         }
 
+        if (!tok.atEnd()) {
+
+#if USE_HTTP_VIOLATIONS
+            // invalid character somewhere in the line.
+            // As long as we can find the LF, accept the characters
+            // which we know are invalid in any URI, but actively used
+            LfDelim.add('\0'); // Java
+            LfDelim.add(' ');  // IIS
+            LfDelim.add('\"'); // Bing
+            LfDelim.add('\\'); // MSIE, Firefox
+            LfDelim.add('|');  // Amazon
+
+            // reset the tokenizer from anything the above did, then seek the LF character.
+            tok.reset(buf_);
+
+            if (tok.prefix(line, LfDelim) && tok.skip('\n')) {
+
+                Http1::Tokenizer rTok(line);
+
+                // strip terminating CR (if any)
+                SBuf nil;
+                (void)rTok.suffix(nil,CharacterSet::CR); // optional CR in terminator
+                line = rTok.remaining();
+
+                // strip terminating 'WSP HTTP-version' (if any)
+                if (rTok.suffix(nil,CharacterSet::DIGIT) && rTok.skipSuffix(Http1magic) && rTok.suffix(nil,WspDelim)) {
+                    hackExpectsMime_ = true; // client thinks its speaking HTTP, probably sent a mime block.
+                    uri_ = rTok.remaining();
+                } else
+                    uri_ = line; // no HTTP/1.x label found. Use the whole line.
+
+                if (uri_.isEmpty()) {
+                    debugs(33, warnOnError, "invalid request-line. missing URL");
+                    parseStatusCode = Http::scBadRequest;
+                    return -1;
+                }
+
+                debugs(33, warnOnError, "invalid request-line. treating as HTTP/0.9" << (hackExpectsMime_?" (with mime)":""));
+                msgProtocol_ = Http::ProtocolVersion(0,9);
+                parseStatusCode = Http::scOkay;
+                buf_ = tok.remaining(); // incremental parse checkpoint
+                return 1;
+
+            } else if (tok.atEnd()) {
+                debugs(74, 5, "Parser needs more data");
+                return 0;
+            }
+            // else, drop back to invalid request-line handling
+#endif
+            const SBuf t = tok.remaining();
+            debugs(33, warnOnError, "invalid request-line characters." << Raw("data", t.rawContent(), t.length()));
+            parseStatusCode = Http::scBadRequest;
+            return -1;
+        }
         debugs(74, 5, "Parser needs more data");
         return 0;
     }
     // else strict non-whitespace tolerant parse
 
     // only search for request-target (URL) if we have not yet found one
     if (uri_.isEmpty()) {
         const int res = parseUriField(tok);
         if (res < 1 || msgProtocol_.protocol == AnyP::PROTO_HTTP)
             return res;
         // else keep going...
     }
 
     if (tok.atEnd()) {
         debugs(74, 5, "Parser needs more data");
         return 0;
     }
 
     // HTTP/1 version suffix (protocol magic) followed by CR*LF
     if (msgProtocol_.protocol == AnyP::PROTO_NONE) {



More information about the squid-dev mailing list