[squid-dev] [PATCH] HTTP Response Parser upgrade
Alex Rousskov
rousskov at measurement-factory.com
Thu Jan 22 17:47:47 UTC 2015
On 01/20/2015 06:58 AM, Amos Jeffries wrote:
> Updated patch attached resolving the issues found by coadvisor in the
> original.
> +size_t
> +Client::needBufferSpace(const SBuf &readBuf, const size_t minSpace) const
> +{
> + size_t space = readBuf.spaceSize(); // available space w/o heroic measures
If the above line is the only readBuf usage in this method, pass
readBuf.spaceSize() as a parameter instead of the whole buffer.
> + if (space < minSpace) {
> + const size_t maxSpace = SBuf::maxSize; // absolute best
> + space = min(minSpace, maxSpace); // do not promise more than asked
> + }
Here and elsewhere in Client::needBufferSpace(), the returned space size
may be lower than minSpace. s/minSpace/wantedSpace/ or something like that.
> + * If the pipe is totally full, don't register the read handler.
> + * The BodyPipe will call our noteMoreBodySpaceAvailable() method
> + * when it has free space again.
The above comment does not match the code. The code does not deal with
read handler registration, only size calculations.
> + size_t adaptation_space =
> + virginBodyDestination->buf().potentialSpaceSize();
Make const if possible.
> + /// determine how much space the buffer needs to reserve
> + size_t needBufferSpace(const SBuf &readBuf, const size_t minSpace) const;
need+something implies a notification or state changing method, which is
not what this method is about. Based on the caller, I suggest to
s/needBufferSpace/calcReadSize/ or
s/needBufferSpace/readSize/
> void
> HttpStateData::readReply(const CommIoCbParams &io)
> {
...
> + assert(!flags.do_next_read); // XXX: should have been set false by mayReadVirginBody()
...
> + assert(Comm::IsConnOpen(serverConnection));
> + assert(io.conn->fd == serverConnection->fd);
These are all new asserts in an async job callback. Should we use
Must()s instead to avoid crashing the whole Squid process when dealing
with what seems to be a transaction-specific problem?
> + // how much we want to read
> + const int read_size = needBufferSpace(inBuf, (limitBuffer - inBuf.length()));
Unsigned->signed conversion. Can you use size_t for read_size?
> + if (read_size < 1) {
... and then convert this to !read_size or (read_size == 0) for clarity?
> + /// amount of message payload/body received so far.
> + int64_t payloadSeen;
> + /// positive when we read more than we wanted
> + int64_t payloadTruncated;
Have you considered making these unsigned?
> + /// parse scan to find the mime headers block for current message
> + bool findMimeBlock(const char *which, size_t limit);
Find is a usually a const method. This method does a lot more than just
finding MIME headers. To fix that and to supply information that the
method name does not already reveal, I would do:
/// returns whether the header was successfully parsed
bool parseMimeHeader(const char *headerKind, const size_t sizeLimit);
Note that the limit parameter is const.
> + int64_t mimeHeaderBytes = 0;
> + // XXX: c_str() reallocates. performance regression.
> + if ((mimeHeaderBytes = headersEnd(buf_.c_str(), buf_.length())) == 0) {
> + if (buf_.length()+firstLineSize() >= limit) {
> + debugs(33, 5, "Too large " << which);
> + parseStatusCode = Http::scHeaderTooLarge;
> + parsingStage_ = HTTP_PARSE_DONE;
> + } else
> + debugs(33, 5, "Incomplete " << which << ", waiting for end of headers");
> + return false;
> + }
> + mimeHeaderBlock_ = buf_.consume(mimeHeaderBytes);
> + debugs(74, 5, "mime header (0-" << mimeHeaderBytes << ") {" << mimeHeaderBlock_ << "}");
The above is better written as
// XXX: c_str() reallocates. performance regression.
if (const int64_t mimeHeaderBytes = ...) {
mimeHeaderBlock_ = buf_.consume(mimeHeaderBytes);
debugs(...);
} else {
...
return false;
}
but there is more that needs to be done in the above code:
> + // Squid could handle these headers, but admin does not want to
> + if (messageHeaderSize() >= limit) {
> + debugs(33, 5, "Too large " << which);
> + parseStatusCode = Http::scHeaderTooLarge;
> + return false;
> + }
This should be rewritten to go before we initialize mimeHeaderBlock_
IMO. Otherwise, the damage of keeping the admin-prohibited header is
already done.
> +const int
> +Http::One::ResponseParser::parseResponseStatusAndReason()
> +{
> + if (buf_.isEmpty())
> + return 0;
> +
Remove this check (and fix the tokenizer if it cannot handle empty buffers).
> + // NP: search space is >3 to get terminator character)
> + if(!tok.prefix(status, CharacterSet::DIGIT, 4))
> + return -1; // invalid status
> + // NOTE: multiple SP or non-SP bytes between version and status code are invalid.
> + if (tok.atEnd())
> + return 0; // need more to be sure we have it all
> + if(!tok.skip(' '))
> + return -1; // invalid status, a single SP terminator required
> +
> + debugs(74, 6, "found string status-code=" << status);
> +
> + // get the actual numeric value of the 0-3 digits we found
> + ::Parser::Tokenizer t2(status);
> + int64_t statusValue;
> + if (!t2.int64(statusValue))
> + return -1; // ouch. digits not forming a valid number?
> + debugs(74, 6, "found int64 status-code=" << statusValue);
> + if (statusValue < 0 || statusValue > 999)
> + return -1; // ouch. digits not within valid status code range.
This sequence does not make sense to me: Incomplete status should be
treated the same as early atEnd(), but the former is treated as an
invalid status. 4-digit status should be considered as invalid but it is
not. Numeric value of 0 digits?! Etc., etc. It feels like you are still
parsing using low-level C function calls, not a Tokenizer that we worked
so hard to build!
The correct sequence should be similar to these lines:
if (tok.prefix(status, CharacterSet::DIGIT, 3) && tok.skip(' '))
debugs(74, 6, "found string status-code=" << status);
else
if (tok.atEnd())
return 0; // good so far but need more to finish
else
return -1; // invalid status
// get the actual numeric value of the digits we found
::Parser::Tokenizer t2(status);
int64_t statusValue;
if (!t2.int64(statusValue))
return -1; // ouch. digits not forming a valid number?
if (status < 0 || status > 999)
return -1; // ouch. digits not within valid status code range.
However, if Tokenizer integer parsing interface were written correctly,
the above would have actually looked close to this:
uint64_t status;
if (tok.uint64(status, base10) && tok.skip(' '))
debugs(74, 6, "found status-code=" << status);
else
if (tok.atEnd())
return 0; // good so far but need more to finish
else
return -1; // invalid status
if (status > 999)
return -1; // status exceeds valid status code range
or even
uint64_t status;
if (tok.uint64(status, 3, base10) && tok.skip(' '))
debugs(74, 6, "found status-code=" << status);
else
if (tok.atEnd())
return 0; // good so far but need more to finish
else
return -1; // invalid status
The last form (with a size limit for uint64) allows the Tokenizer caller
to terminate parsing earlier, without waiting for the end of all-digit
input, but I am not sure that is a common need.
I urge you to fix Tokenizer integer parsing and rewrite the parsing code.
That is all I had time for today. I would have to come back to this
later. Sorry.
> * This has now had several hours of regular web browsing by myself
> (YouTube, Facebook, some news sites, and the Squid sites) with nothing
> noticably going wrong.
Any ICAP tests? The HTTP-ICAP interaction has been a source of many bugs
before and your patch changes the relevant code. If you have not tested
it, please do (or ask others to do it).
> * Polygraph shows a 7 RPS performance loss (0.36%), with trunk having
> +/-2 RPS (0.1%) normal fluctuation. I believe this is from the
> transitional regression to data-copy into MemBuf for chunked coding.
Did Polygraph actually use chunked responses in these tests?
Thank you,
Alex.
More information about the squid-dev
mailing list