[squid-dev] [PATCH] Support for regex with \-escaped characters
Alex Rousskov
rousskov at measurement-factory.com
Thu Dec 4 15:32:04 UTC 2014
On 12/04/2014 07:33 AM, Amos Jeffries wrote:
> On 4/12/2014 12:04 p.m., Alex Rousskov wrote:
>> On 12/03/2014 07:46 AM, Amos Jeffries wrote:
>>> On 21/11/2014 5:38 p.m., Alex Rousskov wrote:
>>>> On 11/17/2014 05:56 PM, Amos Jeffries wrote:
>>>>> + while (ConfigParser::RecognizeQuotedPair_ &&
>>>>> *(nextToken-1) == '\\') {
>>>> Are you sure that nextToken never points at the start of a
>>>> buffer, especially when parsing parameters loaded from external
>>>> files? If you are absolutely sure, this guarantee should be
>>>> documented where ConfigParser::TokenParse() is defined.
>>>> Otherwise, the code has to be revised (but still support regex
>>>> parameter files that start with a backslash).
>>> I dont believe that is possible. nextToken always points at the
>>> current buffer,
>> Yes, but I am worried that nextToken can point to the very
>> beginning of the current buffer, rendering *(nextToken-1) invalid.
>>> There is an un-conditional strcspn() call prior to the
>>> while-loop which will move the nextToken pointer down into the
>>> buffer at least one character in distance.
>> Why do you expect that strcspn() to always return a positive
>> value?
> Return value is a size_t (unsigned).
That fact does not answer my question. Why do you expect strcspn() to
always return a value greater than zero? A positive returned value is
required to satisfy your assertion that a "strcspn() call prior to the
while-loop will move the nextToken pointer down into the buffer at least
one character in distance".
>>> If the buffer starts with a '\' character the nextToken should
>>> be pointing at its escaped value character.
>
>> Where would nextToken be increased in that case, prior to reaching
>> the loop? AFAICT, strcspn() would return zero when the buffer
>> starts with a backslash and, hence, nextToken will remain pointing
>> at the beginning of the buffer. It is possible that I have
>> overlooked some other code that makes that impossible, but even if
>> I did, the change looks very fragile at best.
The above paragraph details the earlier concern with a specific example.
>>>> The second problem with this code is that it incorrectly
>>>> handles "\\X" AFAICT: X should not be escaped, even though it
>>>> is preceded by a backslash (because that backslash is itself
>>>> escaped).
>
>>> This code does not un-escape anything. All it does is notice '\'
>>> escapes and ensures that both the '\' and character following
>>> are included in the token
>
>> Unfortunately, the same code also includes characters that were
>> _not_ protected/escaped/whatever by '\'. Details below.
>
>
>>> The regex pattern parser is responsible for any un-escaping and
>>> processing of the token itself.
>
>> Agreed. I am only talking about tokenizing, which is this code
>> responsibility.
>
>
>>> while (... *(nextToken-1) == '\\') { ++nextToken; // skip the
>>> quoted-pair (\-escaped) character
>
>
>> I am talking about the "skip" code above. It is incorrectly
>> skipping X in "\\X". For example, it will "skip" (i.e., include in
>> token) a space character in
>
>> foo\\ bar
>
>> The above text has two space-separated tokens: "foo\\" and "bar",
>> but the loop will treat that text as one token, because it sees the
>> space character as "escaped". That is wrong AFAICT.
>
>
> Ah, I see. The !ConfigParser::RecognizeQuotedValues test was
> preventing the correct delimiter set being used until it had already
> reached the loop.
>
> Fixed now.
FWIW, I do not see the fix yet (trunk r13732).
Alex.
More information about the squid-dev
mailing list