[squid-dev] RFC: Adding a new line to a regex

Eduard Bagdasaryan eduard.bagdasaryan at measurement-factory.com
Thu Jan 20 18:27:25 UTC 2022


I would concur with Alex that (4) is preferable: It does not break old 
configurations, re-uses existing mechanisms and allows to apply it only 
when/where required. I have one more option for your consideration: 
escaping with a backtick (e.g., `n) instead of a backslash. This 
approach is used, e.g., in PowerShell.

5a. Recognize just `n escape sequence in squid.conf regexes.

5b. Recognize all '`'-based escape sequences in squid.conf regexes.

Pros:  Easier upgrade: backtick is rare in regular expressions (compared 
to '%' or '/'), probably there is no need to convert old regexes at all.
Pros:  Simplicity: no double-escaping is required (as in (1b)).
Cons: Though it should be straightforward to specify common escape 
sequences, such as `n, `r or `t, we still need to devise a way of 
providing arbitrary character (i.e., its code) in this way.


HTH,

Eduard.


On 20.01.2022 00:32, Alex Rousskov wrote:
> Here is a fairly representative sample:
>
> 1a. Recognize just \n escape sequence in squid.conf regexes
>     Pros: Simple.
>     Cons: Converting old regexes[1] requires careful checking[2].
>     Cons: Cannot detect typos in escape sequences. \r is accepted.
>     Cons: Cannot address other, similar use cases (e.g., ASCII CR).
>
> 1b. Recognize all C escape sequences in squid.conf regexes
>     Pros: Can detect typos -- unsupported escape sequences.
>     Cons: Poor readability: Double-escaping of all for-regex backslashes!
>     Cons: Converting old regexes requires non-trivial automation.
>
>
> 2a. Recognize %byte{n} logformat-like sequence in squid.conf regexes
>     Pros: Simple.
>     Cons: Converting old regexes[1] requires careful checking[3].
>     Cons: Cannot detect typos in logformat-like sequences.
>     Cons: Does not support other advanced use cases (e.g., %tr).
>
> 2b. Recognize %byte{n} and logformat sequences in squid.conf regexes
>     Pros: Can detect typos -- unsupported logformat sequences.
>     Cons: The need to escape % in regexes will surprise admins.
>     Cons: Converting old regexes requires (simple) automation.
>
>
> 3. Use composition to combine regexes and some special strings:
>     regex1 + "\n" + regex2
>     or
>     regex1 + %byte{10} + regex2
>     Pros: Old regexes can be safely used without any conversions.
>     Cons: Requires new, complex composition expressions/syntax.
>     Cons: A bit difficult to read.
>     Cons: Requires a lot of development.
>
>
> 4. Use 2b but only when regex is given to a special function:
>     substitute_logformat_codes(regex)
>     Pros: Old regexes can be safely used without any conversions.
>     Pros: New regexes do not need to escape % (by default).
>     Pros: Extendable to old regex configuration contexts.
>     Pros: Extendable to non-regex configuration contexts.
>     Pros: Reusing the existing parameters(...)-like call syntax.
>     Cons: A bit more difficult to read than 1a or 2a.
>     Cons: Duplicates "quoted string" approach in some directives[4].
>     Cons: Requires arguing about the new function name:-).
>
>
> Given all the pros and cons, I think we should use option 4 above.
>
> Do you see any better options?


More information about the squid-dev mailing list