[squid-dev] [PATCH] Simpler and more robust request line parser

Alex Rousskov rousskov at measurement-factory.com
Tue Nov 3 16:27:35 UTC 2015


On 11/03/2015 02:12 AM, Amos Jeffries wrote:
>> On 24/07/2015 11:13 a.m., Alex Rousskov wrote:
>>> "I am happy to restrict URIs to what older Squids accepted or more.
>>> Recent parser changes broke Squid. There is no need to go beyond what we
>>> were doing before those changes. I will gladly make a simple patch
>>> change to restrict URIs to either that old subset or to the current
>>> trunk subset, whichever is larger. Exactly what subset do you want me to
>>> use?"


>> Current trunk already accepts all ASCII printable characters, and a
>> bunch of whitespace / control codes.
>>
>> "What I would like" is:
>>
>> // RFC 3986 URI component character set(s)
>> // RFC 7230 request-line delimiter
>> // RFC 7230 HTTP-version character set
>> auto permit = uriValidCharacters() + CharacterSet::SP;
>>
>> if (relaxed_header_parser) {
>>
>>  // RFC7230 section 3.5 alternative control-code whitespace
>>  permit += CharacterSet::HTAB
>>    + CharacterSet("VT,FF","\x0B\x0C")
>>    + CharacterSet::CR;
>>
>> #if USE_HTTP_VIOLATIONS
>>   // RFC 2396 unwise character set
>>   // the "transport agents are known to sometimes modify or use as
>> delimiter" set
>>   // which must never be transmitted in un-escaped form
>>   permit.add('\0');
>>   permit += CharacterSet("RFC2396-unwise","\"\\|^<>`{}");
>> #endif
>> }
>>
>>
>> Which leaves some 27 CTL characters (0x01-0x1F) still forbidden. Things
>> like delete, backslash, bell, escape, end-of-transmission. I hope you
>> are not going to argue for re-enabling those inside this parser.
>>
>> Of course the bare UTF-8 extended unicode characters are omitted too.
>> Those is prohibited anywhere in HTTP, and a sure sign that its non-HTTP
>> being processed. So definitely exit this parser.
>>
>> IFF you have no objections to the above charsets then the only thing
>> this patch is waiting on in the technical sense is polygraph test results.


> Ping. Do we have agreement on those character sets ?

Maybe. I asked for the exact "relaxed parser" "HTTP violating" set (the
only important case in the real world). You gave me code constructing
various sets under various conditions. Until I start porting the
proposed patch to trunk, I would not know whether your code is
compatible with what I think we should do, and what sets it constructs.

You also expressed hope that we will not re-enable support for CTL
characters. I do not know whether the old parser allowed [some of] them,
which is one of the primary acceptance conditions for me (by default, we
want to accept everything old Squid accepted and possibly more; any new
restrictions should be discussed).

You also talked about Unicode which sounds completely irrelevant to me
because we are (or should be) dealing with octets and their 0-255
integer values, not any smart encoding on top of that.

In summary, I could not (and still cannot) respond with a quick
"agreed/disagreed". You did not give me enough information in an
digestible enough form to do that. You are certainly _not_ obligated to
provide the info I ask in a form I can quickly grok! I am just
explaining why I could not respond promptly.


Moreover, there is still some merging work required because you have
changed the parser between my review of your parser and my suggested
replacement patch. I could no longer simply adjust the sets and apply my
patch back then. That merging work is still required. If you have the
time to do that, please do it.


> I'm willing to skip the polygraph request and let the regular trunk
> testing handle that part now.

Noted.


Thank you,

Alex.



More information about the squid-dev mailing list