[squid-dev] [PATCH] mime unfolding

Amos Jeffries squid3 at treenet.co.nz
Thu May 19 13:29:35 UTC 2016


On 19/05/2016 1:59 p.m., Alex Rousskov wrote:
> On 05/14/2016 06:42 AM, Amos Jeffries wrote:
> 
> 
>> One of the parsers you will find attached (MIME_UNFOLD_SLOW) is that
>> loop expanded into functions so each if statement gets its own named
>> function and so non-genius people can avoid reading the description of
>> unfoldMime.
> 
> I am afraid you misunderstand the problem I am trying to solve. IMO, the
> parsing code, as it was written before, was impossible for a human with
> average abilities to fully comprehend. The number of if statements is
> not the problem on its own. Moving an if statement into a function does
> not solve the problem. There is just too much persistent state and state
> change/interaction complexity. We have seen many examples of similar
> code leading to bugs because nobody could fully comprehend the complex
> interactions happening in loops like that.
> 
> With very few exceptions that neither of us, sadly, qualify for, humans
> are incapable of writing low-level non-trivial parsing code correctly. I
> have seen so many bugs in such code (in Squid and elsewhere) that I am
> quite convinced that this is a nearly "universal truth".
> 
> Until we have better parsing tools at our disposal, we should use
> tokenizers and similar safety helpers when writing parsing code, and we
> should keep it simple. This will not fully protect us from bugs (nothing
> will!), but it will reduce the number of bugs and will protect us from
> most CVEs.
> 
> During this particular project, as you know, it took several private
> review iterations to replace unsafe (and immediately buggy!) low-level
> code with tokenizers. Reducing that loop complexity is the final step. I
> was hoping that providing a sketch of how to do that would allow for a
> quick and painless finale, but I was obviously wrong.
> 
> I tried again below, and I hope you will like the final result more. I
> am leaving intermediate steps in case you would want to stop sooner than
> I did. They are short.
> 
> The simplest implementation I can think of would look like this:
> 
>     while (!tk.atEnd()) {
>         if (skipObsFolds(tk))
>             result += ' '; // replace obs-folds with SP
>         else
>             result += tk.chars(1); // advance one character
>     }
> 
> but we do not want to append one character at a time, so we optimize to
> append more characters (while watching for the obs-fold start). That
> "more characters" optimization is where the complexity and bugs come in.
> 
> I would start with appending all characters until CR or LF because
> obs-fold has to start with CR or LF. This algorithm is not going to
> catch all longest sequences, but it will be reasonably efficient while
> remaining simple:
> 
>     while (!tk.atEnd()) {
>         if (skipObsFolds(tk))
>             result += ' '; // replaced all obs-folds with one SP
>         else
>             result += tokenUntilCrLf(tk); // advance one or more chars
>     }
> 
>     /// always extracts the first character;
>     /// extracts more characters if they are not CR or LF
>     SBuf
>     tokenUntilCrLf(Tokenizer &tk) {
>         const SBuf buf = tk.remaining();
>         const auto skipped = tk.skipOne(ANY) + tk.skipAll(notCRLF);
>         return buf.substr(0, skipped);
>     }
> 
> 
> Furthermore, we can observe that skipping the leading CR?LF? is always
> safe _after_ our skipObsFolds() check. Thus, we can replace that
> "skipped = ..." line above with the one that may skip even more:
> 
>   // this will skip at least one character (CR, LF, and/or notCRLF)
>   const auto skipped =
>       tk.skipOne(CR) + tk.skipOne(LF) + tk.skipAll(notCRLF);
> 
> 
> Now you may notice that skipObsFolds() has to _start_ with a similar
> CR?LF skipping sequence, so we can extract that common code from both
> helper functions to arrive at something like this:
> 
>     while (!tk.atEnd()) {
>         const SBuf all = tk.remaining();
>         const auto crLen = tk.skipOne(CR); // may not be there
>         const auto lfLen = tk.skipOne(LF); // may not be there
>         if (lfLen && tk.skipAll(WSP)) // obs-fold!
>             result += ' '; // replace one obs-fold with one SP
>         else
>             result += all.substr(0, crLen+lfLen + tk.skipAll(notCRLF));
>     }
> 
> The above is a little too complex for my taste but is probably close to
> optimal performance, given the tools that we have to use. "A little too
> complex" because it is not _obvious_ that the loop is always making
> progress. The somewhat hidden fact here is that if crLen+lfLen is zero
> (i.e., no progress yet), then tk.skipAll(notCRLF) has to succeed
> (progress!) since the next character has to be CR, LF, or none of those.
> 
> The usual "this is a sketch" disclaimer applies.

Agreed. You have followed here almost the exact same construction steps
I used to reach the loop I started with. However as you point out it
looks complex, so I took some extra steps to simplify its appearance. Or
so I thought.


4) In most cases the initial blob of characters would be non-CRLF. It is
not immediately obvious how the loop is actually handling them.

We can make that much clearer by shifting those characters into result
at the start of the loop. It is just as fast, but easier to see whats
going on.

The most complicated bit is now only dealing with the 'end of line'
things; bare-CR, CRLF or obs-fold.


 while (!tk.atEnd()) {
   if (tk.prefix(line, nonCRLF))
        result += line;

   const SBuf all = tk.remaining();

   const auto crLen = tk.skipOne(CR); // may not be there
   const auto lfLen = tk.skipOne(LF); // may not be there
   if (lfLen && tk.skipAll(WSP)) // obs-fold!
     result += ' '; // replace one obs-fold with one SP
   else // bare-CR or CRLF
     result += all.substr(0, crLen+lfLen);
 }


 *** If we both agree on the construction steps so far to here. Then
this above would be a good version apply. Patch attached that does above.


Since I'm explaining how our loops are the same, I will continue with
the extra (over?) optimization steps I took earlier. Just for the record:


5) Now that we have nonCRLF characters not bing included in 'all', we
can use lfLen to determine whether this is a bare-CR sequence

 while (!tk.atEnd()) {
   if (tk.prefix(line, nonCRLF))
        result += line;

   const SBuf all = tk.remaining();

   const auto crLen = tk.skipOne(CR); // may not be there
   const auto lfLen = tk.skipOne(LF); // may not be there
   if (lfLen) // its a CRLF or obs-fold
   {
     if (tk.skipAll(WSP)) // obs-fold!
       result += ' '; // replace one obs-fold with one SP
     else // CRLF !
       result += all.substr(0, crLen+lfLen);

   } else if (crLen) { // bare-CR !
       result += all.substr(0, crLen);
   }
 }


6) Now we can use the available CRLF SBuf instead of all.substr() on the
nested if statement.


 while (!tk.atEnd()) {
   if (tk.prefix(line, nonCRLF))
        result += line;

   const SBuf all = tk.remaining();

   const auto crLen = tk.skipOne(CR); // may not be there
   if (tk.skipOne(LF)) {
     if (tk.skipAll(WSP)) // obs-fold!
       result += ' '; // replace one obs-fold with one SP
     else // CRLF
       result += Http1::CrLF();

   } else if (crLen) {
       result += all.substr(0, crLen);
   }
 }


And that (6) is now almost verbatim the initial patch proposal.

Amos

-------------- next part --------------
=== modified file 'src/http/one/Parser.cc'
--- src/http/one/Parser.cc	2016-04-12 18:12:15 +0000
+++ src/http/one/Parser.cc	2016-05-19 11:11:47 +0000
@@ -1,143 +1,248 @@
 /*
  * Copyright (C) 1996-2016 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 #include "squid.h"
 #include "Debug.h"
 #include "http/one/Parser.h"
 #include "http/one/Tokenizer.h"
 #include "mime_header.h"
 #include "SquidConfig.h"
 
 /// RFC 7230 section 2.6 - 7 magic octets
 const SBuf Http::One::Parser::Http1magic("HTTP/1.");
 
+const SBuf &Http::One::CrLf()
+{
+    static const SBuf crlf("\r\n");
+    return crlf;
+}
+
 void
 Http::One::Parser::clear()
 {
     parsingStage_ = HTTP_PARSE_NONE;
     buf_ = NULL;
     msgProtocol_ = AnyP::ProtocolVersion();
     mimeHeaderBlock_.clear();
 }
 
+/// characters HTTP permits tolerant parsers to accept as delimiters
+static const CharacterSet &
+RelaxedDelimiterCharacters()
+{
+    // RFC 7230 section 3.5
+    // tolerant parser MAY accept any of SP, HTAB, VT (%x0B), FF (%x0C),
+    // or bare CR as whitespace between request-line fields
+    static const CharacterSet RelaxedDels =
+        (CharacterSet::SP +
+         CharacterSet::HTAB +
+         CharacterSet("VT,FF","\x0B\x0C") +
+         CharacterSet::CR).rename("relaxed-WSP");
+
+    return RelaxedDels;
+}
+
+/// characters used to separate HTTP fields
+const CharacterSet &
+Http::One::Parser::DelimiterCharacters()
+{
+    return Config.onoff.relaxed_header_parser ?
+           RelaxedDelimiterCharacters() : CharacterSet::SP;
+}
+
 bool
 Http::One::Parser::skipLineTerminator(Http1::Tokenizer &tok) const
 {
-    static const SBuf crlf("\r\n");
-    if (tok.skip(crlf))
+    if (tok.skip(Http1::CrLf()))
         return true;
 
     if (Config.onoff.relaxed_header_parser && tok.skipOne(CharacterSet::LF))
         return true;
 
     return false;
 }
 
+/// all characters except the LF line terminator
+static const CharacterSet &
+LineCharacters()
+{
+    static const CharacterSet line = CharacterSet::LF.complement("non-LF");
+    return line;
+}
+
+/**
+ * Replace obs-fold with a single SP,
+ *
+ * RFC 7230 section 3.2.4
+ * "A server that receives an obs-fold in a request message that is not
+ *  within a message/http container MUST ... replace
+ *  each received obs-fold with one or more SP octets prior to
+ *  interpreting the field value or forwarding the message downstream."
+ *
+ * "A proxy or gateway that receives an obs-fold in a response message
+ *  that is not within a message/http container MUST ... replace each
+ *  received obs-fold with one or more SP octets prior to interpreting
+ *  the field value or forwarding the message downstream."
+ */
+void
+Http::One::Parser::unfoldMime()
+{
+    Http1::Tokenizer tok(mimeHeaderBlock_);
+    mimeHeaderBlock_.clear();
+
+    static const CharacterSet nonCRLF = (CharacterSet::CR + CharacterSet::LF).complement().rename("non-CRLF");
+
+    while (!tok.atEnd()) {
+        SBuf line;
+        // preserve all characters not part of a mime line terminator
+        if (tok.prefix(line, nonCRLF))
+            mimeHeaderBlock_.append(line);
+
+        const SBuf savePoint(tok.remaining());
+
+        const auto crLen = tok.skipAll(CharacterSet::CR); // may not be there
+        const auto lfLen = tok.skipOne(CharacterSet::LF); // may not be there
+
+        if (lfLen && tok.skipAll(CharacterSet::WSP)) // obs-fold!
+            mimeHeaderBlock_.append(' '); // replace one obs-fold with one SP
+        else // bare-CR or CRLF
+            mimeHeaderBlock_.append(savePoint.substr(0, crLen+lfLen));
+    }
+}
+
 bool
 Http::One::Parser::grabMimeBlock(const char *which, const size_t limit)
 {
     // MIME headers block exist in (only) HTTP/1.x and ICY
     const bool expectMime = (msgProtocol_.protocol == AnyP::PROTO_HTTP && msgProtocol_.major == 1) ||
                             msgProtocol_.protocol == AnyP::PROTO_ICY ||
                             hackExpectsMime_;
 
     if (expectMime) {
         /* NOTE: HTTP/0.9 messages do not have a mime header block.
          *       So the rest of the code will need to deal with '0'-byte headers
          *       (ie, none, so don't try parsing em)
          */
-        // XXX: c_str() reallocates. performance regression.
-        if (SBuf::size_type mimeHeaderBytes = headersEnd(buf_.c_str(), buf_.length())) {
+        bool containsObsFold = false;
+        if (SBuf::size_type mimeHeaderBytes = headersEnd(buf_, containsObsFold)) {
 
             // Squid could handle these headers, but admin does not want to
             if (firstLineSize() + mimeHeaderBytes >= limit) {
                 debugs(33, 5, "Too large " << which);
                 parseStatusCode = Http::scHeaderTooLarge;
                 buf_.consume(mimeHeaderBytes);
                 parsingStage_ = HTTP_PARSE_DONE;
                 return false;
             }
 
             mimeHeaderBlock_ = buf_.consume(mimeHeaderBytes);
+
+            /* RFC 7230 section 3:
+             * "A recipient that receives whitespace between the start-line and
+             * the first header field MUST ... consume each whitespace-preceded
+             * line without further processing of it."
+             *
+             * We need to always use the relaxed delimiters here to prevent
+             * line smuggling through strict parsers.
+             * Note that 'whitespace' in RFC 7230 includes CR. So that means
+             * sequences of CRLF will be pruned, but not sequences of bare-LF.
+             */
+            Http1::Tokenizer tok(mimeHeaderBlock_);
+            while (tok.skipOne(RelaxedDelimiterCharacters())) {
+                (void)tok.skipAll(LineCharacters()); // optional line content
+                // LF terminator is required.
+                // trust headersEnd() to ensure that we have at least one LF
+                (void)tok.skipOne(CharacterSet::LF);
+            }
+            // If mimeHeaderBlock_ had just whitespace line(s) followed by CRLF,
+            // then we skipped everything, including that terminating LF.
+            // Restore the terminating CRLF if needed.
+            if (tok.atEnd())
+                mimeHeaderBlock_ = Http1::CrLf();
+            else
+                mimeHeaderBlock_ = tok.remaining();
+            // now mimeHeaderBlock_ has 0+ fields followed by the LF terminator
+
+            if (containsObsFold)
+                unfoldMime();
+
             debugs(74, 5, "mime header (0-" << mimeHeaderBytes << ") {" << mimeHeaderBlock_ << "}");
 
         } else { // headersEnd() == 0
             if (buf_.length()+firstLineSize() >= limit) {
                 debugs(33, 5, "Too large " << which);
                 parseStatusCode = Http::scHeaderTooLarge;
                 parsingStage_ = HTTP_PARSE_DONE;
             } else
                 debugs(33, 5, "Incomplete " << which << ", waiting for end of headers");
             return false;
         }
 
     } else
         debugs(33, 3, "Missing HTTP/1.x identifier");
 
     // NP: we do not do any further stages here yet so go straight to DONE
     parsingStage_ = HTTP_PARSE_DONE;
 
     return true;
 }
 
 // arbitrary maximum-length for headers which can be found by Http1Parser::getHeaderField()
 #define GET_HDR_SZ  1024
 
 // BUG: returns only the first header line with given name,
 //      ignores multi-line headers and obs-fold headers
 char *
 Http::One::Parser::getHeaderField(const char *name)
 {
     if (!headerBlockSize() || !name)
         return NULL;
 
     LOCAL_ARRAY(char, header, GET_HDR_SZ);
     const int namelen = strlen(name);
 
     debugs(25, 5, "looking for " << name);
 
     // while we can find more LF in the SBuf
-    static CharacterSet iso8859Line = CharacterSet("non-LF",'\0','\n'-1) + CharacterSet(NULL, '\n'+1, (unsigned char)0xFF);
     Http1::Tokenizer tok(mimeHeaderBlock_);
     SBuf p;
-    static const SBuf crlf("\r\n");
 
-    while (tok.prefix(p, iso8859Line)) {
+    while (tok.prefix(p, LineCharacters())) {
         if (!tok.skipOne(CharacterSet::LF)) // move tokenizer past the LF
             break; // error. reached invalid octet or end of buffer insted of an LF ??
 
         // header lines must start with the name (case insensitive)
         if (p.substr(0, namelen).caseCmp(name, namelen))
             continue;
 
         // then a COLON
         if (p[namelen] != ':')
             continue;
 
         // drop any trailing *CR sequence
-        p.trim(crlf, false, true);
+        p.trim(Http1::CrLf(), false, true);
 
         debugs(25, 5, "checking " << p);
         p.consume(namelen + 1);
 
         // TODO: optimize SBuf::trim to take CharacterSet directly
         Http1::Tokenizer t(p);
         t.skipAll(CharacterSet::WSP);
         p = t.remaining();
 
         // prevent buffer overrun on char header[];
         p.chop(0, sizeof(header)-1);
 
         // return the header field-value
         SBufToCstring(header, p);
         debugs(25, 5, "returning " << header);
         return header;
     }
 
     return NULL;
 }

=== modified file 'src/http/one/Parser.h'
--- src/http/one/Parser.h	2016-04-12 15:07:13 +0000
+++ src/http/one/Parser.h	2016-05-06 13:45:26 +0000
@@ -94,55 +94,62 @@
 #if USE_HTTP_VIOLATIONS
     /// the right debugs() level for parsing HTTP violation messages
     int violationLevel() const;
 #endif
 
     /**
      * HTTP status code resulting from the parse process.
      * to be used on the invalid message handling.
      *
      * Http::scNone indicates incomplete parse,
      * Http::scOkay indicates no error,
      * other codes represent a parse error.
      */
     Http::StatusCode parseStatusCode;
 
 protected:
     /// detect and skip the CRLF or (if tolerant) LF line terminator
     /// consume from the tokenizer and return true only if found
     bool skipLineTerminator(Http1::Tokenizer &tok) const;
 
+    /// the characters which are to be considered valid whitespace
+    /// (WSP / BSP / OWS)
+    static const CharacterSet &DelimiterCharacters();
+
     /**
      * Scan to find the mime headers block for current message.
      *
      * \retval true   If mime block (or a blocks non-existence) has been
      *                identified accurately within limit characters.
      *                mimeHeaderBlock_ has been updated and buf_ consumed.
      *
      * \retval false  An error occured, or no mime terminator found within limit.
      */
     bool grabMimeBlock(const char *which, const size_t limit);
 
     /// RFC 7230 section 2.6 - 7 magic octets
     static const SBuf Http1magic;
 
     /// bytes remaining to be parsed
     SBuf buf_;
 
     /// what stage the parser is currently up to
     ParseState parsingStage_;
 
     /// what protocol label has been found in the first line (if any)
     AnyP::ProtocolVersion msgProtocol_;
 
     /// buffer holding the mime headers (if any)
     SBuf mimeHeaderBlock_;
 
     /// Whether the invalid HTTP as HTTP/0.9 hack expects a mime header block
     bool hackExpectsMime_;
+
+private:
+    void unfoldMime();
 };
 
 } // namespace One
 } // namespace Http
 
 #endif /*  _SQUID_SRC_HTTP_ONE_PARSER_H */
 

=== modified file 'src/http/one/RequestParser.cc'
--- src/http/one/RequestParser.cc	2016-01-01 00:12:18 +0000
+++ src/http/one/RequestParser.cc	2016-05-06 13:45:26 +0000
@@ -97,64 +97,40 @@
      *   A URI is composed from a limited set of characters consisting of
      *   digits, letters, and a few graphic symbols.
      * "
      */
     static const CharacterSet UriChars =
         CharacterSet("URI-Chars","") +
         // RFC 3986 section 2.2 - reserved characters
         CharacterSet("gen-delims", ":/?#[]@") +
         CharacterSet("sub-delims", "!$&'()*+,;=") +
         // RFC 3986 section 2.3 - unreserved characters
         CharacterSet::ALPHA +
         CharacterSet::DIGIT +
         CharacterSet("unreserved", "-._~") +
         // RFC 3986 section 2.1 - percent encoding "%" HEXDIG
         CharacterSet("pct-encoded", "%") +
         CharacterSet::HEXDIG;
 
     return UriChars;
 }
 
-/// characters HTTP permits tolerant parsers to accept as delimiters
-static const CharacterSet &
-RelaxedDelimiterCharacters()
-{
-    // RFC 7230 section 3.5
-    // tolerant parser MAY accept any of SP, HTAB, VT (%x0B), FF (%x0C),
-    // or bare CR as whitespace between request-line fields
-    static const CharacterSet RelaxedDels =
-        CharacterSet::SP +
-        CharacterSet::HTAB +
-        CharacterSet("VT,FF","\x0B\x0C") +
-        CharacterSet::CR;
-
-    return RelaxedDels;
-}
-
-/// characters used to separate HTTP fields
-const CharacterSet &
-Http::One::RequestParser::DelimiterCharacters()
-{
-    return Config.onoff.relaxed_header_parser ?
-           RelaxedDelimiterCharacters() : CharacterSet::SP;
-}
-
 /// characters which Squid will accept in the HTTP request-target (URI)
 const CharacterSet &
 Http::One::RequestParser::RequestTargetCharacters()
 {
     if (Config.onoff.relaxed_header_parser) {
 #if USE_HTTP_VIOLATIONS
         static const CharacterSet RelaxedExtended =
             UriValidCharacters() +
             // accept whitespace (extended), it will be dealt with later
             DelimiterCharacters() +
             // RFC 2396 unwise character set which must never be transmitted
             // in un-escaped form. But many web services do anyway.
             CharacterSet("RFC2396-unwise","\"\\|^<>`{}") +
             // UTF-8 because we want to be future-proof
             CharacterSet("UTF-8", 128, 255);
 
         return RelaxedExtended;
 #else
         static const CharacterSet RelaxedCompliant =
             UriValidCharacters() +

=== modified file 'src/http/one/RequestParser.h'
--- src/http/one/RequestParser.h	2016-01-01 00:12:18 +0000
+++ src/http/one/RequestParser.h	2016-05-06 13:45:26 +0000
@@ -39,35 +39,34 @@
     virtual bool parse(const SBuf &aBuf);
 
     /// the HTTP method if this is a request message
     const HttpRequestMethod & method() const {return method_;}
 
     /// the request-line URI if this is a request message, or an empty string.
     const SBuf &requestUri() const {return uri_;}
 
 private:
     void skipGarbageLines();
     int parseRequestFirstLine();
 
     /* all these return false and set parseStatusCode on parsing failures */
     bool parseMethodField(Http1::Tokenizer &);
     bool parseUriField(Http1::Tokenizer &);
     bool parseHttpVersionField(Http1::Tokenizer &);
     bool skipDelimiter(const size_t count);
     bool skipTrailingCrs(Http1::Tokenizer &tok);
 
     bool http0() const {return !msgProtocol_.major;}
-    static const CharacterSet &DelimiterCharacters();
     static const CharacterSet &RequestTargetCharacters();
 
     /// what request method has been found on the first line
     HttpRequestMethod method_;
 
     /// raw copy of the original client request-line URI field
     SBuf uri_;
 };
 
 } // namespace One
 } // namespace Http
 
 #endif /*  _SQUID_SRC_HTTP_ONE_REQUESTPARSER_H */
 

=== modified file 'src/http/one/forward.h'
--- src/http/one/forward.h	2016-01-01 00:12:18 +0000
+++ src/http/one/forward.h	2016-05-06 13:45:26 +0000
@@ -1,36 +1,40 @@
 /*
  * Copyright (C) 1996-2016 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 #ifndef SQUID_SRC_HTTP_ONE_FORWARD_H
 #define SQUID_SRC_HTTP_ONE_FORWARD_H
 
 #include "base/RefCount.h"
+#include "sbuf/forward.h"
 
 namespace Http {
 namespace One {
 
 class Tokenizer;
 
 class Parser;
 typedef RefCount<Http::One::Parser> ParserPointer;
 
 class TeChunkedParser;
 
 class RequestParser;
 typedef RefCount<Http::One::RequestParser> RequestParserPointer;
 
 class ResponseParser;
 typedef RefCount<Http::One::ResponseParser> ResponseParserPointer;
 
+/// CRLF textual representation
+const SBuf &CrLf();
+
 } // namespace One
 } // namespace Http
 
 namespace Http1 = Http::One;
 
 #endif /* SQUID_SRC_HTTP_ONE_FORWARD_H */
 

=== modified file 'src/mime_header.cc'
--- src/mime_header.cc	2016-01-01 00:12:18 +0000
+++ src/mime_header.cc	2016-05-08 10:00:42 +0000
@@ -1,58 +1,61 @@
 /*
  * Copyright (C) 1996-2016 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 /* DEBUG: section 25    MiME Header Parsing */
 
 #include "squid.h"
 #include "Debug.h"
 #include "profiler/Profiler.h"
 
 size_t
-headersEnd(const char *mime, size_t l)
+headersEnd(const char *mime, size_t l, bool &containsObsFold)
 {
     size_t e = 0;
     int state = 1;
 
     PROF_start(headersEnd);
 
     while (e < l && state < 3) {
         switch (state) {
 
         case 0:
 
             if ('\n' == mime[e])
                 state = 1;
 
             break;
 
         case 1:
             if ('\r' == mime[e])
                 state = 2;
             else if ('\n' == mime[e])
                 state = 3;
-            else
+            else if (' ' == mime[e] || '\t' == mime[e]) {
+                containsObsFold = true;
+                state = 0;
+            } else
                 state = 0;
 
             break;
 
         case 2:
             if ('\n' == mime[e])
                 state = 3;
             else
                 state = 0;
 
             break;
 
         default:
             break;
         }
 
         ++e;
     }
     PROF_stop(headersEnd);
 

=== modified file 'src/mime_header.h'
--- src/mime_header.h	2016-01-01 00:12:18 +0000
+++ src/mime_header.h	2016-05-10 12:20:08 +0000
@@ -1,17 +1,45 @@
 /*
  * Copyright (C) 1996-2016 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 /* DEBUG: section 25    MiME Header Parsing */
 
 #ifndef SQUID_MIME_HEADER_H_
 #define SQUID_MIME_HEADER_H_
 
-size_t headersEnd(const char *, size_t);
+/**
+ * Scan for the end of mime header block.
+ *
+ * Which is one of the following octet patterns:
+ * - CRLF CRLF, or
+ * - CRLF LF, or
+ * - LF CRLF, or
+ * - LF LF
+ *
+ * Also detects whether a obf-fold pattern exists within the mime block
+ * - CR*LF (SP / HTAB)
+ *
+ * \param containsObsFold will be set to true if obs-fold pattern is found. Otherwise not changed.
+ */
+size_t headersEnd(const char *, size_t, bool &containsObsFold);
+
+inline size_t
+headersEnd(const SBuf &buf, bool &containsObsFold)
+{
+    return headersEnd(buf.rawContent(), buf.length(), containsObsFold);
+}
+
+/// \deprecated caller needs to be fixed to handle obs-fold
+inline size_t
+headersEnd(const char *buf, size_t sz)
+{
+    bool ignored;
+    return headersEnd(buf, sz, ignored);
+}
 
 #endif /* SQUID_MIME_HEADER_H_ */
 

=== modified file 'src/tests/stub_mime.cc'
--- src/tests/stub_mime.cc	2016-01-01 00:12:18 +0000
+++ src/tests/stub_mime.cc	2016-05-06 13:45:26 +0000
@@ -1,15 +1,15 @@
 /*
  * Copyright (C) 1996-2016 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 #include "squid.h"
 
 #define STUB_API "mime.cc"
 #include "tests/STUB.h"
 
-size_t headersEnd(const char *mime, size_t l) STUB_RETVAL(0)
+size_t headersEnd(const char *, size_t, bool &) STUB_RETVAL(0)
 



More information about the squid-dev mailing list