[squid-users] How to create a simple whitelist using regexes?

Alex Rousskov rousskov at measurement-factory.com
Mon Oct 15 17:08:05 UTC 2018


On 10/15/2018 10:48 AM, RB wrote:

> After some more research it looks like squid only has access to the url
> domain if it's HTTPS and the only way to get the url path and query
> string is to use ssl_bump to decrypt https so squid can see url path and
> query arguments.

Replace "url domain" with "service name". In many cases, they are about
the same today, but there is a trend for SNI values to migrate from
identifying specific sites (e.g., foo.example.com) to identifying broad
services (e.g., everything.example.com), making SNIs increasingly imprecise.

Please note that you cannot bump sites that pin their certificates or
use other measures that prevent bumping. Long-term, most sites will
probably fall into that category by switching to TLS v1.3 and hiding
their true names behind essentially fake/generic SNIs.


> To use ssl_bump, I have to compile the code from source with
> --enable-ssl, create a certificate, and add it to the chain of certs to
> every other vm that proxies through squid, then squid can decrypt the
> https urls to see paths and query args and finally apply the regex to
> those urls in order to only allow explicit regex urls.
> 
> Is this correct?

Replace "add it to the chain of certs" with "add it to the set of
trusted CA certificates". CA certificates are not chained... And, yes,
every client (every "vm" in your case?) that proxies through Squid would
have to trust your CA certificate.

The above sounds correct (and will be painful) if your clients cannot
send unencrypted requests for https:... URLs to Squid. On the other
hand, if your clients can send unencrypted requests for https:... URLs
to Squid, then no bumping is necessary at all. Please note that those
unencrypted requests may be inside an encrypted TLS connection -- they
are not necessarily insecure or unsafe. Unfortunately, popular browsers
do _not_ support sending unencrypted requests for https:... URLs to proxies.


HTH,

Alex.


> On Mon, Oct 15, 2018 at 11:56 AM RB wrote:
> 
>     I think I know what the issue is which can give us a clue to what is
>     going on.
> 
>         2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match:
>         aclRegexData::match: checking 'wiki.squid-cache.org:443
>         <http://wiki.squid-cache.org:443/>'
>         2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match:
>         aclRegexData::match: looking for
>         '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
>         <https://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org/SquidFaq/SquidAcl.*>)'
>         2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches:
>         ACL::ChecklistMatches: result for 'whitelist' is 0
> 
>     The above seems to be applying the regex to
>     "wiki.squid-cache.org:443 <http://wiki.squid-cache.org:443>" instead
>     of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added the
>     regex ".*squid-cache.org.*" to my list of regular expressions and
>     now I see this.
> 
>         2018/10/15 15:16:03.641 kid1| RegexData.cc(71) match:
>         aclRegexData::match: checking 'wiki.squid-cache.org:443
>         <http://wiki.squid-cache.org:443>'
>         2018/10/15 15:16:03.641 kid1| RegexData.cc(82) match:
>         aclRegexData::match: looking for
>         '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
>         <http://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org.*>)'
>         2018/10/15 15:16:03.641 kid1| RegexData.cc(93) match:
>         aclRegexData::match: match
>         '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
>         <http://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org.*>)'
>         found in 'wiki.squid-cache.org:443
>         <http://wiki.squid-cache.org:443>'
>         2018/10/15 15:16:03.641 kid1| Acl.cc(321) checklistMatches:
>         ACL::ChecklistMatches: result for 'whitelist' is 1
> 
> 
>     Any idea why url_regex wouldn't try to match the full url and
>     instead only matches on the subdomain, host domain, and port? 
> 
>     The Squid FAQ <https://wiki.squid-cache.org/SquidFaq/SquidAcl> says
>     the following:
> 
>         *url_regex*: URL regular expression pattern matching
>         *urlpath_regex*: URL-path regular expression pattern matching,
>         leaves out the protocol and hostname
> 
> 
>     with this example given
> 
>         acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
> 
> 
>     This seems to be the case between 3.3.8 (default on ubuntu 14.04)
>     and 3.5.12 (default on ubuntu 16.04).
> 
>     Is there another configuration that forces url_regex to match the
>     entire url? or should I use a different acl type?
> 
>     Best,
> 
>     On Mon, Oct 15, 2018 at 11:11 AM RB <ronthecon at gmail.com
>     <mailto:ronthecon at gmail.com>> wrote:
> 
>         Hi Matus,
> 
>         Thanks for responding so quickly. I uploaded my configurations
>         here if that is more helpful: https://bit.ly/2NF4zNb
> 
>         The config that I previously shared is called squid_corp.conf. I
>         also noticed that if I don't use regular expressions and instead
>         use domains, it works correctly:
> 
>             # acl whitelist url_regex "/vagrant/squid_sites.txt"
>             acl whitelist url_regex .squid-cache.org
>             <http://squid-cache.org>
> 
> 
>         Every time my squid.conf or my squid_sites.txt is modified, I
>         restart the squid service
> 
>             sudo service squid3 restart
> 
> 
>         Then I use curl to test and now the url works. 
> 
>             $ curl -sSL --proxy localhost:3128 -D -
>             https://wiki.squid-cache.org/SquidFaq/SquidAcl-o /dev/null 2>&1
>             HTTP/1.1 200 Connection established
> 
>             HTTP/1.1 200 OK
>             Date: Mon, 15 Oct 2018 14:47:33 GMT
>             Server: Apache/2.4.7 (Ubuntu)
>             Vary: Cookie,User-Agent,Accept-Encoding
>             Content-Length: 101912
>             Cache-Control: max-age=3600
>             Expires: Mon, 15 Oct 2018 15:47:33 GMT
>             Content-Type: text/html; charset=utf-8
> 
> 
>         But this does not allow me to get more granular. I can only
>         allow all subdomains and paths for the domain squid-cache.org
>         <http://squid-cache.org> but I'm unable to only allow the
>         regular expressions if I put them inline or put them in
>         squid_sites.txt.
> 
>             # acl whitelist url_regex "/vagrant/squid_sites.txt"
>             acl whitelist url_regex
>             ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
>             acl whitelist url_regex
>             .*squid-cache.org/SquidFaq/SquidAcl.*
>             <http://squid-cache.org/SquidFaq/SquidAcl.*>
> 
> 
>         If I put them inline like I have above, when I restarted squid
>         it says the following
> 
>             2018/10/15 14:54:48 kid1| strtokFile:
>             .*squid-cache.org/SquidFaq/SquidAcl.*
>             <http://squid-cache.org/SquidFaq/SquidAcl.*> not found
> 
> 
>         If I put the expressions in the squid_sites.txt the above "not
>         found" message isn't shown and this is the debug output in
>         /var/log/squid3/cache.log (full
>         output https://pastebin.com/NVwRxVmQ).
> 
>             2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode:
>             0x7fb0068da2b8 matched=1 async=0 finished=0
>             2018/10/15 15:05:45.083 kid1| Acl.cc(336) matches:
>             ACLList::matches: checking whitelist
>             2018/10/15 15:05:45.083 kid1| Acl.cc(319) checklistMatches:
>             ACL::checklistMatches: checking 'whitelist'
>             2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match:
>             aclRegexData::match: checking 'wiki.squid-cache.org:443
>             <http://wiki.squid-cache.org:443>'
>             2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match:
>             aclRegexData::match: looking for
>             '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
>             <https://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org/SquidFaq/SquidAcl.*>)'
>             2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches:
>             ACL::ChecklistMatches: result for 'whitelist' is 0
>             2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist
>             mismatched.
>             2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist
>             result is false
> 
> 
>         So it's failing the regular expression check. If I use grep to
>         verify if the regex works, it does.
> 
>             $ echo https://wiki.squid-cache.org/SquidFaq/SquidAcl | grep
>             "^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*"
>             https://wiki.squid-cache.org/SquidFaq/SquidAcl
> 
> 
>         > are you aware that you can only see CONNECT in https requests, unless using
>         ssl_bump?
> 
>         Ah interesting. Are you saying that my https connections will
>         always fail unless I use ssl_bump to decrypt https to http
>         connections? How would this work correctly in production? Does
>         squid proxy only block urls if it detects http? How do you
>         configure ssl_bump to work in this case? and is that viable in
>         production?
> 
>         > of course it matches all, everything should match "all".
>         > I more wonder why doesn't it match "http_access allow localhost"
> 
>         > have you reloaded squid config after changing it?
>         > Did squid confirm it?
> 
>         Would you have an example of one entire config file that would
>         work to whitelist an http/https url using a regular expression?
> 
>         Best,
> 
> 
>         On Mon, Oct 15, 2018 at 4:49 AM Matus UHLAR - fantomas
>         <uhlar at fantomas.sk <mailto:uhlar at fantomas.sk>> wrote:
> 
>             KOn 15.10.18 01:04, RB wrote:
>             >I'm trying to deny all urls except for only whitelisted regular
>             >expressions. I have only this regular expression in my file
>             >"squid_sites.txt"
>             >
>             >^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
> 
>             are you aware that you can only see CONNECT in https
>             requests, unless using
>             ssl_bump?
> 
> 
>             >acl bastion src 10.5.0.0/1 <http://10.5.0.0/1>
>             >acl whitelist url_regex "/vagrant/squid_sites.txt"
>             [...]
>             >http_access allow manager localhost
>             >http_access deny manager
>             >http_access deny !Safe_ports
>             >http_access allow localhost
>             >http_access allow purge localhost
>             >http_access deny purge
>             >http_access deny CONNECT !SSL_ports
>             >
>             >http_access allow bastion whitelist
>             >http_access deny bastion all
> 
>             >I tried enabling debugging and tailing
>             /var/log/squid3/cache.log but my
>             >curl statement keeps matching "all".
> 
>             of course it matches all, everything should match "all".
> 
>             I more wonder why doesn't it match "http_access allow localhost"
> 
>             >$ curl -sSL --proxy localhost:3128 -D - "
>             >https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o
>             /dev/null 2>&1 | grep
>             >Squid
>             >X-Squid-Error: ERR_ACCESS_DENIED 0
> 
>             >Any ideas what I'm doing wrong?
> 
>             have you reloaded squid config after changing it?
>             Did squid confirm it?
> 
>             -- 
>             Matus UHLAR - fantomas, uhlar at fantomas.sk
>             <mailto:uhlar at fantomas.sk> ; http://www.fantomas.sk/
>             Warning: I wish NOT to receive e-mail advertising to this
>             address.
>             Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek
>             reklamnu postu.
>             It's now safe to throw off your computer.
>             _______________________________________________
>             squid-users mailing list
>             squid-users at lists.squid-cache.org
>             <mailto:squid-users at lists.squid-cache.org>
>             http://lists.squid-cache.org/listinfo/squid-users
> 
> 
> 
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users
> 



More information about the squid-users mailing list