[squid-users] help me optimising caching or increasing hit ratio

Tue Feb 21 21:57:47 UTC 2017

On 22/02/2017 3:42 a.m., --Ahmad-- wrote:
> I’m using squid 3.5.2

Please ugrade to at least 3.5.19 (current release is 3.5.24). There have
been quite a few security issues fixed, and the 3.5 does caching a *lot*
better than it did in those early releases.

> and I’m browsing the same website many times but no “HIT” in logs !!!
> 
> already enabled https and cert imported .
> 
> plz help me why i don’t see HITs in my access.log ?
> 

3.5 supports HTTP/1.1 caching nowdays. The days of determining
performance from "HIT" being in the logs is long ago past - majority of
cached data transactison involves "REFRESH" actions these days.

If you want to see what your caching performance is like you need to use
a log analysis tool that understands the REFRESH codes, or use the
cachemgr 'info' report summary of HIT-ratio's.

> there are some sites I’m very interested with like ==> https://www.ruzivodigitallearning.co.zw.com
> 
> plz have a look on my config below and advise me with best options to optimise caching and hit ratio increase
> 
> cheers 
> 
> ==============
> here is my config 
> root at portablecloud-3011:~# cat /etc/squid/squid.conf
> acl wu dstdom_regex \.download\.windowsupdate\.com$
> acl wu-rejects dstdom_regex stats
> acl GET method GET
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query no-netdb-exchange name=ms1
> cache_peer_access ms1 allow GET wu !wu-rejects
> cache_peer_access ms1 deny all
> never_direct allow GET wu !wu-rejects
> never_direct deny all
> 
> ########################################
> visible_hostname pcloud
> acl ip1 myip 10.1.0.1
> acl ip2 myip 192.168.10.210
> tcp_outgoing_address 192.168.10.210 ip1
> tcp_outgoing_address 192.168.10.210 ip2
> #
> # Recommended minimum configuration:
> #
> 
> # Example rule allowing access from your local networks.
> # Adapt to list your (internal) IP networks from where browsing
> # should be allowed
> acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
> acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
> acl localnet src fc00::/7       # RFC 4193 local private network range
> acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines
> 
> acl SSL_ports port 443
> acl Safe_ports port 80          # http
> acl Safe_ports port 21          # ftp
> acl Safe_ports port 443         # https
> acl Safe_ports port 70          # gopher
> acl Safe_ports port 210         # wais
> acl Safe_ports port 1025-65535  # unregistered ports
> acl Safe_ports port 280         # http-mgmt
> acl Safe_ports port 488         # gss-http
> acl Safe_ports port 591         # filemaker
> acl Safe_ports port 777         # multiling http
> acl CONNECT method CONNECT
> 
> #
> # Recommended minimum Access Permission configuration:
> #
> # Deny requests to certain unsafe ports
> http_access deny !Safe_ports
> 
> # Deny CONNECT to other than secure SSL ports
> http_access deny CONNECT !SSL_ports
> http_access allow  CONNECT 
> # Only allow cachemgr access from localhost
> http_access allow localhost manager
> http_access deny manager
> 
> # We strongly recommend the following be uncommented to protect innocent
> # web applications running on the proxy server who think the only
> # one who can access services on "localhost" is a local user
> #http_access deny to_localhost
> 
> #
> # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
> #
> 
> # Example rule allowing access from your local networks.
> # Adapt localnet in the ACL section to list your (internal) IP networks
> # from where browsing should be allowed
> http_access allow localnet
> http_access allow localhost
> 
> # And finally deny all other access to this proxy
> http_access deny all
> 
> # Squid normally listens to port 3128
> http_port 3128
> 
> # Uncomment and adjust the following to add a disk cache directory.
> #cache_dir ufs /var/cache/squid 100 16 256
> 
> # Leave coredumps in the first cache dir
> #coredump_dir /var/cache/squid
> 
> #
> # Add any of your own refresh_pattern entries above these.
> #

Please note what that line says "above". It is the default config
comment for the final three refresh_pattern lines way, way down below.
 You can either erase it entirely, or move it back to its proper
position under your custom patterns.

Also, a note on the % values in all the refresh_pattern lines;

 One of the reasons to upgrade is that they are actually better when
used with values >100%. Some releases before 3.5.12 complained about the
% incorrectly. That bug has been fixed in the current Squid versions.

 For example; An object which is 2 hours old having a 100% pt value
configured. Has an LMF calculation indicating its cacheable for 2hrs
from time received. This sounds great, but a lot of traffic is quote
young, ie. milliseconds, and 100% of a few milliseconds is not very much
time before the object needs updating again. Values upwards to 1000% may
be useful in a busy proxy cache.

> #
> refresh_pattern -i \.htm 120 50% 10080 reload-into-ims
> refresh_pattern -i \.html 120 50% 10080 reload-into-ims
> refresh_pattern ^http://*.facebook.com/* 720 100% 4320

Where did you copy-and-paste these patterns from? they are all very,
very broken. The above will not match what you probably think it does.

1) The "/*." near the start means 0 or more _slash_ ('/') characters
followed by _only one_ character whose existence is mandatory but can
have any value.

 How many URLs you seen with "http:/blah" or "http:///////blah" ?

 If this "works" at all it is probably because the regex library
interprets the facebook URL as having 0 '/' characters for (3) and one
charater (a '/') for the mandatory position.

AFAIK a regex library that does that is buggy though. So I think this
line will never match unless you are being passed phishing-like attack
URLs where the domain "facebook.com" is prefixed with some obscure
character like "1facebook.com" to fool people into clicking links
thinking its Facebook.

The '*' at the end also means 0 or more '/' characters. This is
pointless in terms of being a wildcard. There is an implicit '.*'
pattern at the ends unless you use the special anchor code '$' to mark
the URL ending.
 But this pattern does not need that either. So you might as well erase
the whole '/*' part on the end.

Worst case you might be thinking this pattern matches only
"facebook.com" domain name because the domain is followed by a '/'.
  However since the final '*' allows omitting that '/' delimiter this
pattern _actually_ matches any URL with a *subdomain* similar to
".facebook.com"

Such as "http://_facebook_com.example.com.bwaahahaha/"

==> If that was the behaviour you actually wanted, fair enough. But I
suggest in that case adding a comment to say it is there to catch some
attack URLs, not actual facebook.com traffic.

To retain the current behaviour the correct pattern should be:
  ^http://*.facebook.com

To fix the behaviour to only match facebook.com and sub-domains, the
correct pattern would be:
  ^http://.*\.facebook\.com/

Also, I suggest adding 's?' after the '^http' bit (as demo'd below), so
the one pattern matches both HTTP and HTTPS URLs.

Also, I suggest using the -i flag. Scheme and domain are
case-insensitive URL segments. Squid should be normalizing them to lower
case, but may not.

> refresh_pattern ^https://*.ruzivodigitallearning.co.zw.com/* 720 100% 4320
> refresh_pattern ^http://*.ruzivodigitallearning.co.zw.com/* 720 100% 4320

You can merge the above two lines into one by using the regex pattern:

  ^https?://.*\.ruzivodigitallearning\.co\.zw\.com/

Note that I have corrected for the same issues the facebook pattern had.

BTW: is that .com supposed to be there? I looked up the URL in redbot to
check cachability and the .com was not found, but there is a .co.zw ccTLD.

> refresh_pattern ^http://mail.yahoo.com/.* 720 100% 4320
> refresh_pattern ^http://*.yahoo.*/.* 720 100% 4320
> refresh_pattern ^http://*.yimg.*/.* 720 100% 4320
> refresh_pattern ^http://*.gmail.*/.* 720 100% 4320
> refresh_pattern ^http://*.google.*/.* 720 100% 4320
> refresh_pattern ^http://*.kaskus.*/.* 720 100% 4320
> refresh_pattern ^http://*.googlesyndication.*/.* 720 100% 4320
> refresh_pattern ^http://*.plasa.*/.* 720 100% 4320
> refresh_pattern ^http://*.telkom.*/.* 720 100% 4320

A useful rule of thumb is that fewer refresh_pattern lines leads to
better peformance.

So a redux of the above into a single regex pattern will be faster. Do
it with (a|b|c) compounding like the "file extension" patterns below.

If you want to improve performance remove all the override-lastmod.
Last-Modified is part of HTTP/1.1 which lets Squid perform fast
revalidation - without it some (most?) things can only MISS, and it
breaks the reload-into-ims operations.

Simplify your config by removing all the 'ignore-no-cache'. It has no
effect since Squid-3.2

Also, I recommend removing the ignore-private. Squid-3.5 can store the
data relatively safely, but if the revalidation does not work well it
can also lead to users manualy forcing reloads.

> ##################################################
> refresh_pattern -i \.fbcdn.net.*\.(jpg|gif|png|swf|mp3)                  10800 80% 10800 ignore-reload  override-expire ignore-no-cache
> refresh_pattern  static\.ak\.fbcdn\.net*\.(jpg|gif|png)                  10800 80% 10800 ignore-reload  override-expire ignore-no-cache
> refresh_pattern ^http:\/\/profile\.ak\.fbcdn.net*\.(jpg|gif|png)      10800 80% 10800 ignore-reload  override-expire ignore-no-cache

Er. these last two lines are sub-sets of the tope line. I think you can
erase those 'static' and 'profile' lines.

Also, once you remove the overrides as mentioend above. You are left
with "reload-into-ims" as the only difference between the parameters of
patterns above and the patterns below which match those same
file-extensions. So you can probably improve performance a bit more by
just erasing the above lines.

However, before they went all-HTTPS facebook were becoming one of the
better sites in terms of HTTP cacheability. I do not think that has
changed, its just the HTTPS/TLS wrapper preventing most of their traffic
going to caches now.
 So override-expires is probably making things *worse* for all that
fbcdn traffic nowdays.

> ##############
> refresh_pattern -i \.(3gp|7z|ace|asx|avi|bin|cab|dat|deb|divx|dvr-ms)      10800 80% 10800 ignore-no-cache  ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(rar|jar|gz|tgz|bz2|iso|m1v|m2(v|p)|mo(d|v))          10800 80% 10800 ignore-no-cache  ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(jp(e?g|e|2)|gif|pn[pg]|bm?|tiff?|ico|swf|css|js)     10800 80% 10800 ignore-no-cache  ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(mp(e?g|a|e|1|2|3|4)|mk(a|v)|ms(i|u|p)|og(x|v|a|g)|rar|rm|r(a|p)m|snd|vob|wav) 10800 80% 10800 ignore-no-cache ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(pp(s|t)|wax|wm(a|v)|wmx|wpl|zip|cb(r|z|t))     10800 80% 10800 ignore-no-cache ignore-private override-expire override-lastmod reload-into-ims

Something to beware of:
 The above patterns will match *anywhere* within the whole URL.

So the start of domain names (aka "www.raring.com" -> \.rar ) will be
cached using these refresh parameters.
 As will any URL that happens to have a similar match in the
query-string portion. That is kind of useful if there is a filename in
the query parameters, but also dangerous since you cannot know when or
where that matching will happen.
 Note that you are caching *private* responses whenever one of these
matches. The risk you are taking is large.

If that is a problem, you can work around it by adjusting the patterns
like this:
 ^https?://[^/]+/[^?]+\.(rar|jar|gz|tgz|bz2|iso|m1v|m2(v|p)|mo(d|v))

NP: If you like them to still match inside the query portion of URLs
replace the "[^?]" with a dot "."

PS. 'rar' is listed in the 2nd and 4th lines. One of them is redundant.

>  #############################
> refresh_pattern (cgi-bin|\?)       0      0%      0

This above regex is broken. It requires the -i and the '/' path
delimiters as seen in the below refresh_pattern lines for the correct
CGI regex.

> refresh_pattern ^gopher:    1440    0%    1440
> refresh_pattern ^ftp:         10080     95%     10800 override-lastmod reload-into-ims
> refresh_pattern         .     180     95% 10800 override-lastmod reload-into-ims
> #################
> minimum_object_size 0 bytes
> maximum_object_size_in_memory 500 MB

You have placed alternative ftp, gopher and '.' refresh_patterns above.
Remove the below refresh_pattern lines.

> refresh_pattern ^ftp:           1440    20%     10080
> refresh_pattern ^gopher:        1440    0%      1440
> refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
> refresh_pattern .               0       20%     4320
> 
> http_port 3126
> #http_port 3128
> #######################################
> cache_swap_low 90
> cache_swap_high 95
> ############################
> cache_effective_user squid
> cache_effective_group squid
> memory_replacement_policy lru
> cache_replacement_policy heap LFUDA
> ########################
> maximum_object_size 10000 MB
> cache_mem 5000 MB
> maximum_object_size_in_memory 10 MB
> #########################
> logfile_rotate 2
> max_filedescriptors 131072
> ###############################
> #cache_dir ufs /root/cache3 600000 64 128
> ############
> cache_dir aufs /var/cache/squid 600000 64 128

Amos