[squid-users] Regex optimization

Wed Apr 27 14:25:04 UTC 2016

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Furthermore. The more specifically a regular expression, so it usually
faster.

27.04.16 20:01, Amos Jeffries пишет:
> On 27/04/2016 11:32 p.m., Alfredo Rezinovsky wrote:
>> I saw in debug log that when an ACL has many regexes each one is compared
>> sequentially.
>>
>> If I have
>>
>> www.facebook.com
>> facebook.com
>> www.google.com
>> google.com
>>
>> If will be faster to check just ONE optimized regex like
>> (www\.)?(facebook|google).com than the previous three?
>>
>> I'm really talking about optimizing about 3000 url regexes in one huge
>> regex because comparing each and every url to 3000 regexes is too slow.
>
> As Yuri was trying to point out (I think) simply using one bigger regex
> pattern is not always meaning faster.
>
>
>>
>> I know using
>> (www\.facebook\.com)|(facebook\.com)|(www\.google\.com)|(google\.com)
with
>> PCRE will produce the same optimized result as
>> (www\.)?(facebook|google)\.com. Squid uses GnuRegex. Does GNURegex lib
>> optimizes this as well ?
>
> If you actually pass GNURegex that *single* pattern. Yes, it will do
> some optimization. Though I'm not sure how much exactly in comparison to
> PCRE.
>
>  * Also, while GNURegex is the built-in backup regex engine bundled with
> Squid. It really is only a backup engine for systems like Windows which
> dont provide a regex engine. The stdlib regex library is always used if
> available. On some OS that stdlib engine is GNU, on others PCRE or
> something even better.
>
>
> What you see in the log is the fact that Squid is actually *not*
> configured with a single compound "optimized" pattern. You are actually
> using a file with ~3000 patterns in it ... so 3000 regex patterns to be
> checked against the URL.
>
> Whether Squid checks 3000 tests or some smaller number depends on what
> Squid version you are using. The recent versions do some trivial pattern
> aggregation and stripping away prefix/suffix ".*" garbage to help the
> library optimize better. But as Yuri showed, bigger pattern is not
> necessarily better *steps* for per-test speed. The gains are mostly in
> reduced Squid code CPU time and RAM overheads.
> Regex is still the slowest of the ACLs in terms of raw CPU consumed.
>
>
> The biggest problem with using regex for domain name lists is that regex
> is optimized for left-to-right comparisons. Domain name labels are built
> right-to-left. dstdomain is optimized for right-to-left comparison with
> an early-abort on mismatch and sub-domain wildcards - which gives it a
> huge advantage in CPU cycles over regex.
>
> Amos
>
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJXIMvAAAoJENNXIZxhPexGY6UIAJdAACrNLs2MdfAdnUHoqtrD
/5oiUdd1kBMnAyOvpJfXZOK8glSui3wnTJpsw6sq7YOgU4PzIH7rCRw9uAsTyIxQ
3lyWh5u2GJDObz9DSUZVUDs7WtEHyclNxSO4OkoA7zNliFe4uvdZ4mujrWk2mHxB
AjHEWmOEFzVlq0AbCnrbDJ6HX1KMURbCpkP/3G8zPauJEyCMiYVAIVigaT1H4yko
JV0AgSII0zns+hKPUWywZ1vlCeOaIvEqGZu1/Z1q/L1oWNZ4HqgFg1jYIBYlA3oY
34727VzE0LSLQX673nIkAn4uF/lkqmAgzAbOQ9Q+7N5bj+q0a6ELUEFMxq1m8FA=
=p9LL
-----END PGP SIGNATURE-----

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0x613DEC46.asc
Type: application/pgp-keys
Size: 2437 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20160427/76d52f9b/attachment.key>