[squid-dev] [PATCH] Simpler and more robust request line parser
Amos Jeffries
squid3 at treenet.co.nz
Tue Nov 3 09:12:55 UTC 2015
On 5/08/2015 7:37 p.m., Amos Jeffries wrote:
> On 24/07/2015 11:13 a.m., Alex Rousskov wrote:
>> On 07/23/2015 08:16 AM, Amos Jeffries wrote:
>>> Please note: most of my comments in this long email are informational.
>>> Things which need to be taken into consideration when touching the parser.
>>
>> Noted. I will do my best to treat them as such.
>>
>>
>>> I request performance testing of the final version of this change vs
>>> lastest trunk at the time.
>>
>> Noted.
>>
>> The next patch will address comments not discussed in this email. Thank
>> you for those!
>>
>> The rest of this email is the usual waste of time discussing
>> non-existent bugs, arguing about mostly irrelevant issues, burning straw
>> men, and talking past each other. The only important bit of information
>> is probably the following paragraph that I will quote here for those who
>> do not want to waste their time reading everything:
>>
>> "I am happy to restrict URIs to what older Squids accepted or more.
>> Recent parser changes broke Squid. There is no need to go beyond what we
>> were doing before those changes. I will gladly make a simple patch
>> change to restrict URIs to either that old subset or to the current
>> trunk subset, whichever is larger. Exactly what subset do you want me to
>> use?"
>
>
> Current trunk already accepts all ASCII printable characters, and a
> bunch of whitespace / control codes.
>
> "What I would like" is:
>
> // RFC 3986 URI component character set(s)
> // RFC 7230 request-line delimiter
> // RFC 7230 HTTP-version character set
> auto permit = uriValidCharacters() + CharacterSet::SP;
>
> if (relaxed_header_parser) {
>
> // RFC7230 section 3.5 alternative control-code whitespace
> permit += CharacterSet::HTAB
> + CharacterSet("VT,FF","\x0B\x0C")
> + CharacterSet::CR;
>
> #if USE_HTTP_VIOLATIONS
> // RFC 2396 unwise character set
> // the "transport agents are known to sometimes modify or use as
> delimiter" set
> // which must never be transmitted in un-escaped form
> permit.add('\0');
> permit += CharacterSet("RFC2396-unwise","\"\\|^<>`{}");
> #endif
> }
>
>
> Which leaves some 27 CTL characters (0x01-0x1F) still forbidden. Things
> like delete, backslash, bell, escape, end-of-transmission. I hope you
> are not going to argue for re-enabling those inside this parser.
>
> Of course the bare UTF-8 extended unicode characters are omitted too.
> Those is prohibited anywhere in HTTP, and a sure sign that its non-HTTP
> being processed. So definitely exit this parser.
>
> IFF you have no objections to the above charsets then the only thing
> this patch is waiting on in the technical sense is polygraph test results.
>
Ping. Do we have agreement on those character sets ?
I'm willing to skip the polygraph request and let the regular trunk
testing handle that part now.
Amos
More information about the squid-dev
mailing list