[squid-users] How to create a simple whitelist using regexes?

Wed Oct 17 02:46:04 UTC 2018

In addition to what Matus and Alex have already said about your problem,
you do not appear to understand regex patterns properly.

On 16/10/18 4:11 AM, RB wrote:
> Hi Matus,
> 
> Thanks for responding so quickly. I uploaded my configurations here if
> that is more helpful: https://bit.ly/2NF4zNb
> 
> The config that I previously shared is called squid_corp.conf. I also
> noticed that if I don't use regular expressions and instead use domains,
> it works correctly:
> 
>     # acl whitelist url_regex "/vagrant/squid_sites.txt"
>     acl whitelist url_regex .squid-cache.org

This is still a regex. The ACL type is "url_regex" which makes the
string a regex - no matter what it looks like to your human eyes. To
Squid it is a regex.

It will match things like http://example.com/sZsquid-cacheXORG just
easily as any sub-domain of squid-cache.org. For example any traffic
injecting our squid-cache.org domain into their path or query-string.

> 
> Every time my squid.conf or my squid_sites.txt is modified, I restart
> the squid service
> 
>     sudo service squid3 restart
> 

If Squid does not accept the config file it will not necessarily restart.

You should always run "squid -k parse" or "squid3 -k parse" to check the
config before attempting a restart.

The old Debian sysV init scripts had some protections that would protect
you from problems. But the newer systemd "service" systems are not able
to do that in a nice way. The habit is a good one to get into anyway.

> 
> Then I use curl to test and now the url works. 
> 
>     $ curl -sSL --proxy localhost:3128 -D -
>     https://wiki.squid-cache.org/SquidFaq/SquidAcl-o /dev/null 2>&1
>     HTTP/1.1 200 Connection established
> 
>     HTTP/1.1 200 OK
>     Date: Mon, 15 Oct 2018 14:47:33 GMT
>     Server: Apache/2.4.7 (Ubuntu)
>     Vary: Cookie,User-Agent,Accept-Encoding
>     Content-Length: 101912
>     Cache-Control: max-age=3600
>     Expires: Mon, 15 Oct 2018 15:47:33 GMT
>     Content-Type: text/html; charset=utf-8
> 
> 
> But this does not allow me to get more granular. I can only allow all
> subdomains and paths for the domain squid-cache.org
> <http://squid-cache.org> but I'm unable to only allow the regular
> expressions if I put them inline or put them in squid_sites.txt.
> 
>     # acl whitelist url_regex "/vagrant/squid_sites.txt"
>     acl whitelist url_regex
>     ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
>     acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*

Any regex pattern that lacks the beginning (^) and ending ($) anchor
symbols is always a match against *anywhere* in the input string.

So starting it with an optional prefix (.* or .?) or ending it with an
optional suffix (.* or .?) is pointless and confusing.

Notice how the pattern Squid is actually using lacks these prefix/suffix
parts or your patterns:

>     aclRegexData::match: looking for
>     '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'

> 
>> are you aware that you can only see CONNECT in https requests, unless using
> ssl_bump?
> 
> Ah interesting. Are you saying that my https connections will always
> fail 

They will always fail to match your current regex, because your current
regex contain characters which are only ever existing in path portions
of URLs (note the *L*). Never in a CONNECT message URI (note the *I*)
which never contains any path portion.

> unless I use ssl_bump to decrypt https to http connections? How
> would this work correctly in production? Does squid proxy only block
> urls if it detects http? How do you configure ssl_bump to work in this
> case? and is that viable in production?

SSL-Bump is to take the CONNECT tunnel data/payload portion and
_attempt_ decrypt any TLS inside. *If* the tunnel contains HTTPS traffic
(not guaranteed) that is where the full https:// ... URLs are found.

Matus and Alex have already mentioned the issues with that so I wont
cover it again.

Amos