[squid-dev] [PATCH] Support for regex with \-escaped characters

Thu Dec 4 14:33:38 UTC 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 4/12/2014 12:04 p.m., Alex Rousskov wrote:
> On 12/03/2014 07:46 AM, Amos Jeffries wrote:
>> On 21/11/2014 5:38 p.m., Alex Rousskov wrote:
>>> On 11/17/2014 05:56 PM, Amos Jeffries wrote:
> 
>>>> +    while (ConfigParser::RecognizeQuotedPair_ &&
>>>> *(nextToken-1) == '\\') {
>> 
>>> Are you sure that nextToken never points at the start of a
>>> buffer, especially when parsing parameters loaded from external
>>> files? If you are absolutely sure, this guarantee should be
>>> documented where ConfigParser::TokenParse() is defined.
>>> Otherwise, the code has to be revised (but still support regex
>>> parameter files that start with a backslash).
>> 
>> I dont believe that is possible. nextToken always points at the 
>> current buffer,
> 
> Yes, but I am worried that nextToken can point to the very
> beginning of the current buffer, rendering *(nextToken-1) invalid.
> 
> 
>> There is an un-conditional strcspn() call prior to the
>> while-loop which will move the nextToken pointer down into the
>> buffer at least one character in distance.
> 
> Why do you expect that strcspn() to always return a positive
> value?

Return value is a size_t (unsigned).

> 
>> If the buffer starts with a '\' character the nextToken should
>> be pointing at its escaped value character.
> 
> Where would nextToken be increased in that case, prior to reaching
> the loop? AFAICT, strcspn() would return zero when the buffer
> starts with a backslash and, hence, nextToken will remain pointing
> at the beginning of the buffer. It is possible that I have
> overlooked some other code that makes that impossible, but even if
> I did, the change looks very fragile at best.
> 
> 
>>> The second problem with this code is that it incorrectly
>>> handles "\\X" AFAICT: X should not be escaped, even though it
>>> is preceded by a backslash (because that backslash is itself
>>> escaped).
> 
>> This code does not un-escape anything. All it does is notice '\' 
>> escapes and ensures that both the '\' and character following
>> are included in the token
> 
> Unfortunately, the same code also includes characters that were
> _not_ protected/escaped/whatever by '\'. Details below.
> 
> 
>> The regex pattern parser is responsible for any un-escaping and
>> processing of the token itself.
> 
> Agreed. I am only talking about tokenizing, which is this code 
> responsibility.
> 
> 
>> while (... *(nextToken-1) == '\\') { ++nextToken; // skip the
>> quoted-pair (\-escaped) character
> 
> 
> I am talking about the "skip" code above. It is incorrectly
> skipping X in "\\X". For example, it will "skip" (i.e., include in
> token) a space character in
> 
> foo\\ bar
> 
> The above text has two space-separated tokens: "foo\\" and "bar",
> but the loop will treat that text as one token, because it sees the
> space character as "escaped". That is wrong AFAICT.
> 

Ah, I see. The !ConfigParser::RecognizeQuotedValues test was
preventing the correct delimiter set being used until it had already
reached the loop.

Fixed now.

Amos
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUgHC0AAoJELJo5wb/XPRjmtgIALHRlXeSgKo0+m1SfzCdO8VC
7N1iRtM1cj/9xbMaWixhA06ZVEQ9H7LbKWJi3cbVd+wxB34Y/1piEAqCezUqd3jG
LzA6SFvJmetmuwLnGB96tj7k23AEtrn0ae9MviyP1CEXIeq5ddfb45xEp90dsZR2
2j/qBqJluY3pP5bil3jIDuE6YrPfJ4y+4WgY+JV/wLX2fqqveTj3pQGk21zrU/yq
NbHA/7aNw9sxapU6s4zVASpgvXovf/TNS2E2QgIWBb0qRiEaJIMFBxC2ZQV0FEkt
61WsTe0qerG1oUIdfCXq2oPdW2coMVdlp1QR1h1dNbGFMB3yK50OxtQ/jonKYWI=
=X5Z7
-----END PGP SIGNATURE-----