[squid-users] How to create a simple whitelist using regexes?
Amos Jeffries
squid3 at treenet.co.nz
Wed Oct 17 02:46:04 UTC 2018
In addition to what Matus and Alex have already said about your problem,
you do not appear to understand regex patterns properly.
On 16/10/18 4:11 AM, RB wrote:
> Hi Matus,
>
> Thanks for responding so quickly. I uploaded my configurations here if
> that is more helpful: https://bit.ly/2NF4zNb
>
> The config that I previously shared is called squid_corp.conf. I also
> noticed that if I don't use regular expressions and instead use domains,
> it works correctly:
>
> # acl whitelist url_regex "/vagrant/squid_sites.txt"
> acl whitelist url_regex .squid-cache.org
This is still a regex. The ACL type is "url_regex" which makes the
string a regex - no matter what it looks like to your human eyes. To
Squid it is a regex.
It will match things like http://example.com/sZsquid-cacheXORG just
easily as any sub-domain of squid-cache.org. For example any traffic
injecting our squid-cache.org domain into their path or query-string.
>
> Every time my squid.conf or my squid_sites.txt is modified, I restart
> the squid service
>
> sudo service squid3 restart
>
If Squid does not accept the config file it will not necessarily restart.
You should always run "squid -k parse" or "squid3 -k parse" to check the
config before attempting a restart.
The old Debian sysV init scripts had some protections that would protect
you from problems. But the newer systemd "service" systems are not able
to do that in a nice way. The habit is a good one to get into anyway.
>
> Then I use curl to test and now the url works.
>
> $ curl -sSL --proxy localhost:3128 -D -
> https://wiki.squid-cache.org/SquidFaq/SquidAcl-o /dev/null 2>&1
> HTTP/1.1 200 Connection established
>
> HTTP/1.1 200 OK
> Date: Mon, 15 Oct 2018 14:47:33 GMT
> Server: Apache/2.4.7 (Ubuntu)
> Vary: Cookie,User-Agent,Accept-Encoding
> Content-Length: 101912
> Cache-Control: max-age=3600
> Expires: Mon, 15 Oct 2018 15:47:33 GMT
> Content-Type: text/html; charset=utf-8
>
>
> But this does not allow me to get more granular. I can only allow all
> subdomains and paths for the domain squid-cache.org
> <http://squid-cache.org> but I'm unable to only allow the regular
> expressions if I put them inline or put them in squid_sites.txt.
>
> # acl whitelist url_regex "/vagrant/squid_sites.txt"
> acl whitelist url_regex
> ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
> acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*
Any regex pattern that lacks the beginning (^) and ending ($) anchor
symbols is always a match against *anywhere* in the input string.
So starting it with an optional prefix (.* or .?) or ending it with an
optional suffix (.* or .?) is pointless and confusing.
Notice how the pattern Squid is actually using lacks these prefix/suffix
parts or your patterns:
> aclRegexData::match: looking for
> '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'
>
>> are you aware that you can only see CONNECT in https requests, unless using
> ssl_bump?
>
> Ah interesting. Are you saying that my https connections will always
> fail
They will always fail to match your current regex, because your current
regex contain characters which are only ever existing in path portions
of URLs (note the *L*). Never in a CONNECT message URI (note the *I*)
which never contains any path portion.
> unless I use ssl_bump to decrypt https to http connections? How
> would this work correctly in production? Does squid proxy only block
> urls if it detects http? How do you configure ssl_bump to work in this
> case? and is that viable in production?
SSL-Bump is to take the CONNECT tunnel data/payload portion and
_attempt_ decrypt any TLS inside. *If* the tunnel contains HTTPS traffic
(not guaranteed) that is where the full https:// ... URLs are found.
Matus and Alex have already mentioned the issues with that so I wont
cover it again.
Amos
More information about the squid-users
mailing list