[squid-users] Squid regex grammar

Yuri yvoinov at gmail.com
Fri Oct 27 15:43:09 UTC 2017



27.10.2017 21:33, Antony Stone пишет:
> On Friday 27 October 2017 at 17:26:18, Yuri wrote:
>
>> 27.10.2017 21:17, Antony Stone пишет:
>>> On Friday 27 October 2017 at 17:06:01, Yuri wrote:
>>>> 27.10.2017 20:55, Alex Rousskov пишет:
>>>>> When a regular expression is using extended features, the basic regular
>>>>> expression compiler often (or even always?!) does not fail because it
>>>>> views the extended features as ordinary plain characters. Thus, Squid
>>>>> cannot tell that something went wrong.
>>>>>
>>>>>> $ echo "foobar" | grep --basic-regexp    'foo|bar'
>>>>>> $ echo "foobar" | grep --extended-regexp 'foo|bar'
>>>>>> foobar
>>>>> As you can see, the basic compiler is silent about the "|" character
>>>>> that it does not support. Here is a similar example where a malformed
>>>>>
>>>>> extended regular expression is silently accepted by the basic compiler:
>>>>>> $ echo "foobar" | grep --basic-regexp 'foo(bar'
>>>>>> $ echo "foobar" | grep --extended-regexp 'foo(bar'
>>>>>> grep: Unmatched ( or \(
>>>> I would like either a clear documentation
>>> That sounds entirely reasonable - a statement something like "Squid is
>>> guaranteed to use basic POSIX grammar, but extended grammar may be
>>> available on different systems; the sysadmin should check"?
>>>
>>>> or some tool for checking whether the regular expression is correct from
>>>> the point of view of the current library used by Squid or not.
>>> What does "correct" mean?
>> "correct" mean "this will correctly works in Squid, not silently
>> ignored". This is simple and obvious, isn't it?
> No.
>
> Suppose I write a | character (as per Alex's first example above) in my regex.
>
> Basic POSIX will match that literally.
>
> Extended grep will not.
>
> Judging purely from what is written in my regex, did I mean the character to 
> be matched literally, or not?
>
> Squid cannot tell.
Yes. You now understanding root case. If we're say - "Squid uses POSIX
Basic until _admin_ specify 'POSIX Extended' in config option" - we're
can expecting POSIX Basic behaviour and only it. Agree? But point is:
we're don't know and can't know, what library functionality exists and
what will work or will not.

So, in each separate case we're should make testcase for EACH regex in
acl to make sure it will or not will work.

Generally speaking, with thousands of regular expressions and thousands
of sites - it sounds pretty dumb, right? Many to many relasions,
thousands tests etc.
>
>> Adherence to standards provides interoperability - a familiar word?
> Indeed.
>
>> I asked a simple question. And wanted a simple answer.
> Maybe there isn't one.
Noooooooo.

What could be simpler is to clearly document the following: "Never use
anything other than POSIX Basic in regular expressions because we do not
guarantee and can not guarantee it will work"?
>
>> And not reasoning, what can be, and what can not.
> Then I apologise for trying to explain.
Yes, I understand everything, Anthony. It's easier to unsubscribe -
"Test every regular expression yourself."
>
>> Interoperability is a simple thing.
> Er, no, it isn't.
Simple. You just have to follow standards and standard *documented*
behavior. As soon as rabbid's dances begin with self-made
interpretations of the standard, problems begin.
>
>
> Antony.
>

-- 
**************************
* C++: Bug to the future *
**************************

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0x3E3743A7.asc
Type: application/pgp-keys
Size: 2887 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20171027/52bce01f/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: OpenPGP digital signature
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20171027/52bce01f/attachment.sig>


More information about the squid-users mailing list