[squid-dev] RFC: Adding a new line to a regex
Alex Rousskov
rousskov at measurement-factory.com
Fri Jan 21 19:36:08 UTC 2022
TLDR: I am adding solution #6 into the mix based on Amos email (#5 was
taken by Eduard). Amos needs to clarify why he thinks that Squid master
branch cannot accept STL-based regexes "now". After that, we can decide
whether #6 remains a viable candidate. Details below.
On 1/21/22 12:42 PM, Amos Jeffries wrote:
> On 20/01/22 10:32, Alex Rousskov wrote:
>> We have a use case where a regex in squid.conf should contain/match
>> a new line [...] This email discusses the problem and proposes how
>> to add a new line (and other special characters) to regexes found
>> in squid.conf and such.
> With the current mix of squid.conf parsers this RFC seems irrelevant to me.
I do not understand the relationship between "the current mix of
squid.conf parsers" and this RFC relevance. This RFC is relevant because
it is about a practical solution to a real problem facing real Squid admins.
Whether Squid has one parser or ten, good ones or bad ones, is relevant
to how the solution is implemented/integrated with Squid, of course, but
that is already a part of the analysis on this thread.
> The developer designing a new directive also writes the parse_*()
> function that processes the config file line. All they have to do is
> avoid using the parser functions which implicitly do the problematic
> behaviour.
Concerns regarding the overall quality of Squid configuration syntax and
upgrade paths expand the reach of this problem far beyond a single new
directive, but let's assume, for the sake of the argument, that all we
care about is a new parsing function. Now we need to decide what syntax
that parsing function will use. This RFC is about that decision.
> The fact that there is logic imposing this problem at all is a bug to
> be resolved. But that is something for a different RFC.
FWIW, I do not know which logic/bug you are talking about here.
> There was a plan from 2014 (re-attempted by Christos 2016) to migrate
> Squid from the GNURegex dependency to more flexible C++11 regex library
> which supports many regex languages. With that plan the UI would only
> need an option flag or pattern prefix to specify which language a
> pattern uses.
I agree that one of the solutions worth considering is to use a regex
library that supports different regex syntax. So here is the
corresponding entry for solution based on C++ STL regex:
6. Use STL regex features that support \n and similar escape sequences
Pros: Supports much more than just advanced escape sequences!
Pros: The new syntax is easy to document by referencing library docs.
Cons: Requires serious changes to the internal regex support in Squid.
Cons: Miserable STL regex performance in some environments[1,2]?
Cons: Converting old regexes requires (complex) automation.
Cons: Requires dropping GCC v4.8 support.
Cons: Amos thinks Squid cannot support STL regex until 2024.
[1] See, for example, the following Reddit thread, ignoring comments
about GCC v4.8 and similar noise. The table in the second link is
representative of these performance concerns and there are similar
instability claims:
https://www.reddit.com/r/cpp/comments/e16s1m/what_is_wrong_with_stdregex
https://www.reddit.com/r/cpp/comments/e16s1m/what_is_wrong_with_stdregex/f94g2ny/
[2] STL does not allow us to define a custom allocator for its regexes.
Various STL implementations have various hidden workarounds, but we will
be at their (varying) mercy.
> That plan was put on hold due to feature-incomplete GCC 4.8 versions
> being distributed by CentOS 7 and RHEL needing to build Squid.
... and serious/substantiated performance concerns[1]. They may have
been addressed by STL implementations since then, but my quick check and
the impossibility of solving [2] without breaking ABI suggest that at
least some of these issues still remain.
> One Core Developer (you Alex) has repeatedly expressed a strong opinion
> veto'ing the addition/removal of features to Squid-6 while they are
> still officially supported by a small set of "officially supported"
> Vendors. RHEL and CentOS being in that set.
Sorry, I have no idea what you are talking about.
> When combined, those two design limitations mean the C++11 regex library
> cannot be implemented in a Squid released prior to June 2024.
Until you clarify what relevant addition/removals I have repeatedly
vetoed in this context, I cannot validate this assertion. Given how many
times you have completely misrepresented and misinterpreted my
statements (usually without providing usable references to them), it is
likely that this is just one of those pointless time wasting attacks.
Please stop doing that.
>> Unfortunately, squid.conf syntax lacks a similar general mechanism.
> This is not a property of squid.conf design choices. It is an artifact
> of the GNURegex language.
It is the other way around. C/C++ programs can supply LF characters to
GNURegex library. Squid configuration files cannot. The library is the
same -- it _does_ support LF characters in regexes (AFAIK). But the
library cannot consume the regex from the program or configuration file
directly -- the library needs Squid, compiler, or equivalent parser to
give it the regex. If that parser/supplier does not support adding new
lines, the regex library cannot receive them. That is exactly what
happens in Squid case. We screwed up. Very, very long time ago.
> Until Squid gets a major upgrade to support other regex languages. We
> are stuck with these pattern limitations.
Obviously, we are not -- solutions 1-5 all allow inclusion of LF
characters into GNURegex patterns.
Thank you,
Alex.
More information about the squid-dev
mailing list