[squid-dev] [PATCH] Simpler and more robust request line parser

Thu Jul 23 04:05:56 UTC 2015

Hello,

    The attached patch implements a simpler and more robust HTTP request
line parser. After getting a series of complaints and seeing admins
adding more and more exceptional URL characters to the existing trunk
parser, I took the plunge and rewrote the existing code using the
approach recently discussed on this list.

The primary changes are: Removed incremental parsing and revised parsing
sequence to accept virtually any URI (by default and also configurable
as before).

Known side effects:

* Drastically simpler code.
* Some unit test case adjustments.
* The new parser no longer treats some request lines ending with
  "HTTP/1.1" as HTTP/0.9 requests for URIs that end with "HTTP/1.1".
* The new parser no longer re-allocates character sets while parsing
  each request.

Also removed hard-coded 16-character method length limit because I did
not know why that limit was needed _and_ there was a TODO to make it
configurable. Please let me know if there was a good reason for that
limit, and I will restore it.

No changes to parsing HTTP header fields (a.k.a. the MIME block) were
intended.

This patch requires "suffix parsing and skipping" patch from my previous
email to the list. Like that patch, this one is based on trunk r14083. I
will update to the current trunk if the patches are approved. Please let
me know if you want to review the updated versions as well.

The patch changes are further detailed below.

* Removal of incremental request line parsing.

Squid parsed the request line incrementally. That optimization was
unnecessary:
  - most request lines are short enough to fit into one network I/O,
  - the long lines contain only a single long field (the URI), and
  - the user code must not use incomplete parsing results anyway.

Incremental parsing made code much more complex and possibly slower than
necessary.

The only place where incremental parsing of request lines potentially
makes sense is the URI field itself, and only if we want to accept URIs
exceeding request buffer capacity. Neither the old code, nor the
simplified one do that right now.

* Accept virtually any URI (when allowed).

1. USE_HTTP_VIOLATIONS now allow any URI characters except spaces.
2. relaxed_header_parser allows spaces in URIs.

The combination of #1 and #2 allows virtually any URI that Squid can
isolate by stripping method (prefix) and HTTP/version (suffix) off the
request line. This approach allows Squid to forward slightly malformed
(in numerous ways) URIs instead of misplacing on the Squid admin the
burden of explaining why something does not work going through Squid but
works fine when going directly or through another popular proxy (or
through an older version of Squid!).

URIs in what Squid considers an HTTP/0.9 request obey the same rules.
Whether the rules should differ for HTTP/0 is debatable, but the current
implementation is the simplest possible one, and the code makes it easy
to add complex rules.

* Code simplification.

RequestParser::parseRequestFirstLine() is now a simple sequence of
sequential if statements. There is no longer a path dedicated for the
[primarily academic?] strict parser. The decisions about parsing
individual fields and delimiters are mostly isolated to the
corresponding methods.

* Unit test cases adjustments.

Removal of incremental request line parsing means that we should not
check parsed fields when parsing fails or has not completed yet.

Some test cases made arguably weird decisions apparently to accommodate
the old parser. The expectations of those test cases are [more] natural now.

I did not attempt to fix rampant test code duplication, but I did add
optional (and disabled by default) debugging, to help pin-point failures
to test sub-cases that CPPUNIT cannot see.

Changing request methods to "none" in test sub-cases with invalid input
was not technically necessary because the new code ignores the method
when parsing fails, but it may help whoever would decide to reduce test
code duplication (by replacing hand-written expected outcomes for failed
test cases with a constant assignment or function call).

Thank you,

Alex.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: robust-reqline-parser-t2.patch
Type: text/x-diff
Size: 55681 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20150722/260da2a4/attachment-0001.patch>