[squid-dev] RFC: Adding a new line to a regex

Amos Jeffries squid3 at treenet.co.nz
Sat Jan 22 01:59:34 UTC 2022


On 22/01/22 08:36, Alex Rousskov wrote:
> TLDR: I am adding solution #6 into the mix based on Amos email (#5 was
> taken by Eduard). Amos needs to clarify why he thinks that Squid master
> branch cannot accept STL-based regexes "now". After that, we can decide
> whether #6 remains a viable candidate. Details below.
> 
> 
> On 1/21/22 12:42 PM, Amos Jeffries wrote:
>> On 20/01/22 10:32, Alex Rousskov wrote:
>>> We have a use case where a regex in squid.conf should contain/match
>>> a new line [...] This email discusses the problem and proposes how
>>> to add a new line (and other special characters) to regexes found
>>> in squid.conf and such.
> 
> 
>> With the current mix of squid.conf parsers this RFC seems irrelevant to me.
> 
> I do not understand the relationship between "the current mix of
> squid.conf parsers" and this RFC relevance. This RFC is relevant because
> it is about a practical solution to a real problem facing real Squid admins.
> 

Sentence #2 of the RFC explicitly states that admin needs are not 
relevant "I do not know whether there are similar use
cases with the existing squid.conf regex directives"

The same sentence delimits RFC scope as: "adding a _new_ directive that 
will need such support."

That means the syntax defining how the regex pattern is configured does 
not yet exist. It is not necessary for the developer to design their 
_new_ UI syntax in a way that exposes admin to this problem in the first 
place. Simply design the



> Whether Squid has one parser or ten, good ones or bad ones, is relevant
> to how the solution is implemented/integrated with Squid, of course, but
> that is already a part of the analysis on this thread.
> 

Very relevant. RFC cites "squid.conf preprocessor and parameter parser 
use/strip all new lines" as a problem.

I point out that this behaviour depends on *which* config parser is 
chosen to be used by the (again _new_) directive. It should be an 
implementation detail for the dev, not design consideration for this RFC.


> 
>> The developer designing a new directive also writes the parse_*()
>> function that processes the config file line. All they have to do is
>> avoid using the parser functions which implicitly do the problematic
>> behaviour.
> 
> Concerns regarding the overall quality of Squid configuration syntax and
> upgrade paths expand the reach of this problem far beyond a single new
> directive, but let's assume, for the sake of the argument, that all we
> care about is a new parsing function. Now we need to decide what syntax
> that parsing function will use. This RFC is about that decision.
> 

Nod.

I must state that I do not see much in the say of squid.conf syntax 
discussion in the RFC text. It seems to focus a lot on syntax inside the 
regex pattern.

IMO regex is such a complicated situation that we should avoid having 
special things inside or on top of its syntax. That is a recipe for 
admin pain.


...
>> There was a plan from 2014 (re-attempted by Christos 2016) to migrate
>> Squid from the GNURegex dependency to more flexible C++11 regex library
>> which supports many regex languages. With that plan the UI would only
>> need an option flag or pattern prefix to specify which language a
>> pattern uses.
> 
> I agree that one of the solutions worth considering is to use a regex
> library that supports different regex syntax. So here is the
> corresponding entry for solution based on C++ STL regex:
> 
> 6. Use STL regex features that support \n and similar escape sequences
> Pros: Supports much more than just advanced escape sequences!
> Pros: The new syntax is easy to document by referencing library docs.

Pro: we do not have to write any part of pattern matching ourselves. 
Simpler config parser.

Pro: we do not have to maintain custom code supporting special 
behaviours in regex pattern configuration.

Pro: we do not have to provide additional user support for non-standard 
squid.conf patterns.

Pro: we do not have to waste brain cycles designing how to integrate 
syntax into regex patterns cleanly.


> Cons: Requires serious changes to the internal regex support in Squid.

IIRC, the changes are not as serious as it may seem. The largest part is 
squid.conf parser alteration to accept the proposals flag/prefix and 
patterns cleanly. Beyond that is just a switch of container which is 
easy (not trivial, just easy).


> Cons: Miserable STL regex performance in some environments[1,2]?

IMO this is balanced by Squid existing regex being well known to have 
similar performance issues.


> Cons: Converting old regexes requires (complex) automation.

Disagree this is problem.

GNU regex is predecessor syntax behind all modern regex variants. We can 
retain GNUregex as the default pattern and require language flag/prefix 
for patterns needing modern features.


> Cons: Requires dropping GCC v4.8 support.
> Cons: Amos thinks Squid cannot support STL regex until 2024.

I am honoured that you consider my opinion to be of such importance.

But, seriously, the technical part of my earlier statement is already 
covered by the GCC 4.8 line.


> [2] STL does not allow us to define a custom allocator for its regexes.
> Various STL implementations have various hidden workarounds, but we will
> be at their (varying) mercy.
> 

That is an interesting point. And probably should be a Con in its own right.


> 
>> That plan was put on hold due to feature-incomplete GCC 4.8 versions
>> being distributed by CentOS 7 and RHEL needing to build Squid.
> 
> ... and serious/substantiated performance concerns[1]. They may have
> been addressed by STL implementations since then, but my quick check and
> the impossibility of solving [2] without breaking ABI suggest that at
> least some of these issues still remain.
> 
> 
>> One Core Developer (you Alex) has repeatedly expressed a strong opinion
>> veto'ing the addition/removal of features to Squid-6 while they are
>> still officially supported by a small set of "officially supported"
>> Vendors. RHEL and CentOS being in that set.
> 
> Sorry, I have no idea what you are talking about.
> 

Your latest voicing of it was in 
<http://lists.squid-cache.org/pipermail/squid-dev/2021-December/009743.html>

 > "
 > Any
 > known Squid regression affecting the "main" environment should block the
 > PR introducing that regression IMO. I see no need to limit this to
 > "build and unit tests" regressions
 > "

The definition of "main" under discussion in that thread never reached 
consensus to change away from the existing OS represented by the Jenkins 
5-pr-test nodes. So (for now) it still includes LTS versions of RHEL / 
CentOS 7 shipping the broken GCC 4.8.x std::regex.



Amos


More information about the squid-dev mailing list