[squid-dev] RFC: Adding a new line to a regex
Amos Jeffries
squid3 at treenet.co.nz
Sat Jan 22 01:59:34 UTC 2022
On 22/01/22 08:36, Alex Rousskov wrote:
> TLDR: I am adding solution #6 into the mix based on Amos email (#5 was
> taken by Eduard). Amos needs to clarify why he thinks that Squid master
> branch cannot accept STL-based regexes "now". After that, we can decide
> whether #6 remains a viable candidate. Details below.
>
>
> On 1/21/22 12:42 PM, Amos Jeffries wrote:
>> On 20/01/22 10:32, Alex Rousskov wrote:
>>> We have a use case where a regex in squid.conf should contain/match
>>> a new line [...] This email discusses the problem and proposes how
>>> to add a new line (and other special characters) to regexes found
>>> in squid.conf and such.
>
>
>> With the current mix of squid.conf parsers this RFC seems irrelevant to me.
>
> I do not understand the relationship between "the current mix of
> squid.conf parsers" and this RFC relevance. This RFC is relevant because
> it is about a practical solution to a real problem facing real Squid admins.
>
Sentence #2 of the RFC explicitly states that admin needs are not
relevant "I do not know whether there are similar use
cases with the existing squid.conf regex directives"
The same sentence delimits RFC scope as: "adding a _new_ directive that
will need such support."
That means the syntax defining how the regex pattern is configured does
not yet exist. It is not necessary for the developer to design their
_new_ UI syntax in a way that exposes admin to this problem in the first
place. Simply design the
> Whether Squid has one parser or ten, good ones or bad ones, is relevant
> to how the solution is implemented/integrated with Squid, of course, but
> that is already a part of the analysis on this thread.
>
Very relevant. RFC cites "squid.conf preprocessor and parameter parser
use/strip all new lines" as a problem.
I point out that this behaviour depends on *which* config parser is
chosen to be used by the (again _new_) directive. It should be an
implementation detail for the dev, not design consideration for this RFC.
>
>> The developer designing a new directive also writes the parse_*()
>> function that processes the config file line. All they have to do is
>> avoid using the parser functions which implicitly do the problematic
>> behaviour.
>
> Concerns regarding the overall quality of Squid configuration syntax and
> upgrade paths expand the reach of this problem far beyond a single new
> directive, but let's assume, for the sake of the argument, that all we
> care about is a new parsing function. Now we need to decide what syntax
> that parsing function will use. This RFC is about that decision.
>
Nod.
I must state that I do not see much in the say of squid.conf syntax
discussion in the RFC text. It seems to focus a lot on syntax inside the
regex pattern.
IMO regex is such a complicated situation that we should avoid having
special things inside or on top of its syntax. That is a recipe for
admin pain.
...
>> There was a plan from 2014 (re-attempted by Christos 2016) to migrate
>> Squid from the GNURegex dependency to more flexible C++11 regex library
>> which supports many regex languages. With that plan the UI would only
>> need an option flag or pattern prefix to specify which language a
>> pattern uses.
>
> I agree that one of the solutions worth considering is to use a regex
> library that supports different regex syntax. So here is the
> corresponding entry for solution based on C++ STL regex:
>
> 6. Use STL regex features that support \n and similar escape sequences
> Pros: Supports much more than just advanced escape sequences!
> Pros: The new syntax is easy to document by referencing library docs.
Pro: we do not have to write any part of pattern matching ourselves.
Simpler config parser.
Pro: we do not have to maintain custom code supporting special
behaviours in regex pattern configuration.
Pro: we do not have to provide additional user support for non-standard
squid.conf patterns.
Pro: we do not have to waste brain cycles designing how to integrate
syntax into regex patterns cleanly.
> Cons: Requires serious changes to the internal regex support in Squid.
IIRC, the changes are not as serious as it may seem. The largest part is
squid.conf parser alteration to accept the proposals flag/prefix and
patterns cleanly. Beyond that is just a switch of container which is
easy (not trivial, just easy).
> Cons: Miserable STL regex performance in some environments[1,2]?
IMO this is balanced by Squid existing regex being well known to have
similar performance issues.
> Cons: Converting old regexes requires (complex) automation.
Disagree this is problem.
GNU regex is predecessor syntax behind all modern regex variants. We can
retain GNUregex as the default pattern and require language flag/prefix
for patterns needing modern features.
> Cons: Requires dropping GCC v4.8 support.
> Cons: Amos thinks Squid cannot support STL regex until 2024.
I am honoured that you consider my opinion to be of such importance.
But, seriously, the technical part of my earlier statement is already
covered by the GCC 4.8 line.
> [2] STL does not allow us to define a custom allocator for its regexes.
> Various STL implementations have various hidden workarounds, but we will
> be at their (varying) mercy.
>
That is an interesting point. And probably should be a Con in its own right.
>
>> That plan was put on hold due to feature-incomplete GCC 4.8 versions
>> being distributed by CentOS 7 and RHEL needing to build Squid.
>
> ... and serious/substantiated performance concerns[1]. They may have
> been addressed by STL implementations since then, but my quick check and
> the impossibility of solving [2] without breaking ABI suggest that at
> least some of these issues still remain.
>
>
>> One Core Developer (you Alex) has repeatedly expressed a strong opinion
>> veto'ing the addition/removal of features to Squid-6 while they are
>> still officially supported by a small set of "officially supported"
>> Vendors. RHEL and CentOS being in that set.
>
> Sorry, I have no idea what you are talking about.
>
Your latest voicing of it was in
<http://lists.squid-cache.org/pipermail/squid-dev/2021-December/009743.html>
> "
> Any
> known Squid regression affecting the "main" environment should block the
> PR introducing that regression IMO. I see no need to limit this to
> "build and unit tests" regressions
> "
The definition of "main" under discussion in that thread never reached
consensus to change away from the existing OS represented by the Jenkins
5-pr-test nodes. So (for now) it still includes LTS versions of RHEL /
CentOS 7 shipping the broken GCC 4.8.x std::regex.
Amos
More information about the squid-dev
mailing list