[squid-dev] [PATCH] HTTP Parser upgrade
Amos Jeffries
squid3 at treenet.co.nz
Wed Oct 15 09:34:25 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 15/10/2014 10:19 a.m., Alex Rousskov wrote:
> On 10/11/2014 02:17 PM, Amos Jeffries wrote:
>> On 7/06/2014 4:55 a.m., Amos Jeffries wrote:
>>> Latest version of the Parser-NG project branch. Since last
>>> message I have dropped the copy-constructor related pieces.
>>
>>> Polygraph testing on the branch showed almost no performance
>>> impact (+0.04ms/req, -0.03% request rate). Nevertheless a half
>>> dozen points of added memory reallocate or data copy are
>>> marked for future removal as the Parser-NG and SBuf
>>> improvements continue.
>
> Just to double check: Is that really 0.04ms or 40ms? The former is
> beyond Polygraph precision (i.e. it is the same as "zero"). The
> latter is a significant overhead. I hope it is the former!
Yes the former (insignificant).
ParserNG (Oct 11th):
Mean response number: 1925.64
Mean response time: 103.36
trunk (Oct 11th):
Mean response number: 1926.03
Mean response time: 103.34
trunk (Oct )
Mean response number: 1926.19
Mean response time: 103.33
>
>> + * \deprecated use SBuf constructor instead */
>> -HttpRequestMethod::HttpRequestMethod(char const *begin, char
>> const *end) : theMethod(Http::METHOD_NONE)
>> +HttpRequestMethod::HttpRequestMethod(char const *begin) :
>> theMethod(Http::METHOD_NONE)
>
> Please rework this constructor into a HttpRequestMethodXXX()
> function call. Otherwise, it is very difficult to spot places where
> it is still being used, including in future patches. Since you are
> also changing the constructor profile, it should be possible to do
> that replacement directly in your patch, avoiding re-editing many
> files by hand.
>
> This is especially important if this old HttpRequestMethod(char*)
> has a somewhat different semantics/compliance than the new
> HttpRequestMethod(SBuf).
>
Done. This is not quite as simple as suggested, the renaming causes it
to become a method instead of constructor, so code needs adjusting to
allocate HttpRequestMethod objects separately before calling it.
>
>> - method = HttpRequestMethod(start, t); + SBuf m(start,
>> start-t); // XXX: SBuf ctor allocates and data-copies.
>> performance regression. + method = HttpRequestMethod(m);
>
>
> What is the purpose of introducing this SBuf performance
> regression?
>
To avoid repeated strcspn(X, w_space) over the input buffer.
> * If the deprecated HttpRequestMethod constructor works fine, use
> that and avoid this regression while waiting for callers to pass in
> an SBuf.
>
> * If the deprecated HttpRequestMethod constructor is not as good
> as the new HttpRequestMethod constructor for some reason (other
> than SBuf parameter itself), then should we re-implement the
> deprecated HttpRequestMethod constructor to create an SBuf and then
> call the new HttpRequestMethod constructor with that SBuf instead?
>
It works fine. Done.
>
>> + * or from a range of chars such as "GET" from "GETFOOBARBAZ" +
>> * + * Assumes the s parameter contains only the method string
>
> This description is not clear, especially because GETFOOBARBAZ
> does not "contain only the method string" GET. The
> HttpRequestMethod() code does not seem to match the documented
> "GETFOOBARBAZ" example because it uses caseCmp(s) without any
> prefix size limits.
>
s is expected to contain an SBuf whose contents is *only* the chars
forming the method (buf="GET", length=3) not the "FOOBARBAZ" chars
which follow in the I/O buffer.
This is assumed/needs to be true whether or not the chars which
preceed or follow are the expected whitespace delimiters or some
"FOOBARBAZ" fluff.
How else would you write that?
>
>> + // XXX: still check for missing method name?
>
> Please rephrase. It is not clear what you are trying to ask here.
> FWIW, the new code does handle the case of an empty method name as
> well as a method name not matching any known header.
>
That was it. Dropped.
>
>> const char *t = start + strcspn(start, w_space);
> ...
>> - start = t + strspn(t, w_space); + start = t + strspn(t,
>> w_space); // XXX: breaks if whitespace exists in URL
>
> Wrong XXX? The code appears to skip the space found at t
> declaration time. That is, after the declaration, "t" should point
> to either a space character or, if there was not one in a URL, to a
> terminating zero character. The XXX seems to be wrong in either
> case.
Er yes. mistakes strspn() for strcspn(). corrected.
>
>> + SBuf url = hp->requestUri(); // use full provided URI if we
>> abort + do { // use a loop so we can break out of it +
>> ::Parser::Tokenizer tok(url); + if (tok.remaining()[0] ==
>> '/') + break;
>
> Are we guaranteed to have a non-empty URL at this point? If not, it
> is unsafe to dereference its first character.
Right. Using skip('/') instead with current Tokenizer guarantees.
>
>> + // reject URI which are not well-formed even after the
>> processing above + if (url[0] != '/') {
>
> Similar concern here.
>
Thats an old bug in dead code, but adding isEmpty() check just for
completeness when someone gets around to makeing it non-dead.
>
>> - char ipbuf[MAX_IPSTRLEN]; + static char
>> ipbuf[MAX_IPSTRLEN];
>
> If you are changing this, please move the declaration down, where
> it is actually needed.
>
Done.
>
>> + static const CharacterSet authority =
>> CharacterSet("authority","-._~%:@[]!$&'()*+,;=") + +
>> CharacterSet::HEXDIG + CharacterSet::ALPHA +
>> CharacterSet::DIGIT;
>
> Syntax error (missing "+" before CharacterSet::HEXDIG)? If yes,
> does that imply that the above code has not been tested?
It is on the end of the preceeding line.
>
>> + const char *url = SBuf(hp->requestUri()).c_str();
>
> Semantically, this returns a pointer inside the now-gone object
> (the temporary anonymous SBuf). Do not do that even if it usually
> "works" with the current SBuf implementation.
Okay, using a pair of local-scope variables instead.
>
>> - const bool isFtp = !hp; + const bool isFtp = (hp ==
>> NULL);
>
> Why this change? The existing code was correct and arguably more
> "future proof" if we start using some kind of smart pointers for
> "hp".
>
Just consistency between X==NULL and X!=NULL checks.
hp is already a smart pointer in the branch and being non-C++11 it
does not provide operator true(), just operator !(). So we end up with
a mix of if(!X) and if(X != NULL).
Reverting since it was the other way to start with.
>
>> + const HttpRequestMethod &method = !isFtp ? hp->method() :
>> Http::METHOD_NONE; // XXX: or should this be GET ?
>
> The answer is in the caller you modified, I think:
>
>> - clientProcessRequest(this, NULL /*parser*/,
>> context.getRaw(), - request->method,
>> request->http_ver); + clientProcessRequest(this,
>> Http1::RequestParserPointer(), context.getRaw());
>
>
> So it should be something like http->request->method or
> equivalent. For FTP, we already have a parsed request at this
> point... However, please double check that
>
> a) you are actually using that new "method" variable -- it is
> difficult to say by looking from the patch alone, but I see many
> cases where the old "method" is getting replaced with
> "hp->method()". Perhaps all of the uses are gone now?
>
> b) the new "method" declaration is as close to the uses in (a) as
> possible.
>
Confirmed. It looks like the HTTP-only paths should all be using
hp->method(), the shared and FTP-only paths using request->method.
Removing method local.
>
>> Http::ProtocolVersion http_ver; - if (ClientSocketContext
>> *context = parseOneRequest(http_ver)) { + if
>> (ClientSocketContext *context = parseOneRequest()) {
>
> Do you still need the http_ver variable after removing the
> corresponding parameter?
>
No. Dropped.
>
> * clientProcessRequest() no longer consumes parsed input, right?
> Where does that consumption happen now?
>
Inside parseOneRequest().
* HttpServer syncs the I/O buffer to the Tokenizer in
parseHttpRequest() after each parse attempt.
* FtpServer uses byte counting by the Tokenizer to manage them
separately with consumeInput().
>
>> + * BUG: limits output to just 1KB when Squid accepts up to
>> 64KB line length. * - * \retval -1 an error occurred.
>> request_parse_status indicates HTTP status result. - *
>> \retval 1 successful parse. member fields contain the
>> request-line items - * \retval 0 more data is needed to
>> complete the parse + * \return A pointer to a field-value of
>> the first matching field-name, or NULL.
>
> In what sense does the Parser limit the output? It is documented
> to return a pointer to the field-value. Are we truncating that
> field-value?
>
Copying it into a static array of size 1KB.
...
// arbitrary maximum-length for headers which can be found by
Http1Parser::getHeaderField()
#define GET_HDR_SZ 1024
...
LOCAL_ARRAY(char, header, GET_HDR_SZ);
For this patching I have left that old limit in place. It appears to
be only used for a few things like Host: header value fetching in the
prepare*URL() functions, so not a problem for most users.
>
> * Http1::Parser class has virtual methods but is missing a virtual
> destructor.
>
Gah. C++11'ism. Default d'tor can be assumed virtual and the children
provide the virtuals.
>
>> + /// raw copy of the origina client reqeust-line URI field
>
> "origina" typo
>
Fixed.
>
>> + Parser() : parsingStage_(HTTP_PARSE_NONE) {}
>
>> + RequestParser() : Parser() {clear();} + virtual void
>> clear();
>
>
>> +Http::One::Parser::clear() { - clear(); // empty the state.
>> - state = HTTP_PARSE_NEW; - buf = aBuf; - bufsiz = len; -
>> debugs(74, 5, HERE << "Request buffer is " << buf); +
>> parsingStage_ = HTTP_PARSE_NONE; + buf_ = NULL; + msgProtocol_
>> = AnyP::ProtocolVersion(); + mimeHeaderBlock_.clear(); }
>
>
>> +void +Http::One::RequestParser::clear() +{ +
>> Http1::Parser::clear(); + + request_parse_status =
>> Http::scNone; + req.start = req.end = -1; + req.m_start =
>> req.m_end = -1; + req.u_start = req.u_end = -1; +
>> req.v_start = req.v_end = -1; + method_ = HttpRequestMethod();
>> +}
>
>
> The above is OK. If you want to polish it further to avoid calling
> virtual methods from constructors (which is bad style), make these
> changes:
>
> 1. Initialize all members in each constructor as needed, without
> calling clear().
>
> 2. Make Parser::clear() pure, without a body.
>
> 3. In RequestParser::clear(), just do:
>
> *this = RequestParser();
>
I'm never sure during this sequence if complex types like SBuf need an
explicit call to their constructor at step (1) to free old
smart-pointer values or not.
Trying it this way.
>
> For the proposed commit message, please add an brief explanation
> for "why we are doing this", in addition to the already provided
> "what we are doing" summary.
>
> And please make sure the commit message explicitly says that we
> are adding a few potential performance regressions with this
> change.
Is that a +1 considering these are mostly cosmetic?
Amos
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)
iQEcBAEBAgAGBQJUPj+hAAoJELJo5wb/XPRjG+8H/06I959LgxkJtMCq08npL/p6
+W8fsz7SWh5C6NwFzTCyonF6CCwQF7RBWxbXMaDq/RknMSyYGy7pwn+7JhNQz9TM
16I3HnP8cSheL0TDub8DlVHVIqiKW0gE12Z4F86j21YO/EqkGJuK7sZoXoePPu7w
2eMW4xg1NjsS0dbD1HJp1BRw3QUkZg30oenWhWRQbpWn3lTlEXPKOXZJxf/eILEN
q8yX9BYmTVM7Zt9ZNlU0lIfzhXTwZkaibZ4+MNpqfjyjLhVYZ77leY5Sn4xCVlr1
ERW+u4ap8HeB4oDMoP/GRchuF9GqTlSjx6hvxmWMtbxjaISUO0W7Twz5XVTfJLs=
=uro/
-----END PGP SIGNATURE-----
More information about the squid-dev
mailing list