[squid-users] Regex optimization

Amos Jeffries squid3 at treenet.co.nz
Fri Jun 17 02:57:01 UTC 2016


On 27/04/2016 11:32 p.m., Alfredo Rezinovsky wrote:
> I saw in debug log that when an ACL has many regexes each one is compared
> sequentially.
> 
> If I have
> 
> www.facebook.com
> facebook.com
> www.google.com
> google.com
> 
> If will be faster to check just ONE optimized regex like
> (www\.)?(facebook|google).com than the previous three?
> 
> I'm really talking about optimizing about 3000 url regexes in one huge
> regex because comparing each and every url to 3000 regexes is too slow.
> 
> I know using
> (www\.facebook\.com)|(facebook\.com)|(www\.google\.com)|(google\.com) with
> PCRE will produce the same optimized result as
> (www\.)?(facebook|google)\.com. Squid uses GnuRegex. Does GNURegex lib
> optimizes this as well ?

1) It is kind of a myth that Squid uses GNURegex.

Squid *bundles* with a copy of GNURegex. For use on systems which do not
provide their own regex library.

Most of the time Squid uses the system -lregex. Which on some systems is
an updated version of GNURegex, sometimes is a Perl based library, and
others the PCRE library. On the latest OS the C++11 stdlib itself builds
in regex components, so those are possibly used as well.

AFAICT, the only system where the Squid bundled GNURegex library is
still actually used is Windows.


2) sort of. You are thinking it compacts down to a differnet textual
form. It does not.

All regex libraries compact the initial pattern down to a binary format
which is equivalent (but not the same) as the second pattern you wrote.
Squid uses the library API which does that compaction.

Current Squid versions will also load multiple sequential lines from
your config file and explicitly append them together. No need to do it
manually for regex ACLs.

Amos



More information about the squid-users mailing list