[squid-users] help me optimising caching or increasing hit ratio
Amos Jeffries
squid3 at treenet.co.nz
Tue Feb 21 21:57:47 UTC 2017
On 22/02/2017 3:42 a.m., --Ahmad-- wrote:
> I’m using squid 3.5.2
Please ugrade to at least 3.5.19 (current release is 3.5.24). There have
been quite a few security issues fixed, and the 3.5 does caching a *lot*
better than it did in those early releases.
> and I’m browsing the same website many times but no “HIT” in logs !!!
>
> already enabled https and cert imported .
>
> plz help me why i don’t see HITs in my access.log ?
>
3.5 supports HTTP/1.1 caching nowdays. The days of determining
performance from "HIT" being in the logs is long ago past - majority of
cached data transactison involves "REFRESH" actions these days.
If you want to see what your caching performance is like you need to use
a log analysis tool that understands the REFRESH codes, or use the
cachemgr 'info' report summary of HIT-ratio's.
> there are some sites I’m very interested with like ==> https://www.ruzivodigitallearning.co.zw.com
>
> plz have a look on my config below and advise me with best options to optimise caching and hit ratio increase
>
> cheers
>
> ==============
> here is my config
> root at portablecloud-3011:~# cat /etc/squid/squid.conf
> acl wu dstdom_regex \.download\.windowsupdate\.com$
> acl wu-rejects dstdom_regex stats
> acl GET method GET
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query no-netdb-exchange name=ms1
> cache_peer_access ms1 allow GET wu !wu-rejects
> cache_peer_access ms1 deny all
> never_direct allow GET wu !wu-rejects
> never_direct deny all
>
> ########################################
> visible_hostname pcloud
> acl ip1 myip 10.1.0.1
> acl ip2 myip 192.168.10.210
> tcp_outgoing_address 192.168.10.210 ip1
> tcp_outgoing_address 192.168.10.210 ip2
> #
> # Recommended minimum configuration:
> #
>
> # Example rule allowing access from your local networks.
> # Adapt to list your (internal) IP networks from where browsing
> # should be allowed
> acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
> acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
> acl localnet src fc00::/7 # RFC 4193 local private network range
> acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
>
> acl SSL_ports port 443
> acl Safe_ports port 80 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 # https
> acl Safe_ports port 70 # gopher
> acl Safe_ports port 210 # wais
> acl Safe_ports port 1025-65535 # unregistered ports
> acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 488 # gss-http
> acl Safe_ports port 591 # filemaker
> acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT
>
> #
> # Recommended minimum Access Permission configuration:
> #
> # Deny requests to certain unsafe ports
> http_access deny !Safe_ports
>
> # Deny CONNECT to other than secure SSL ports
> http_access deny CONNECT !SSL_ports
> http_access allow CONNECT
> # Only allow cachemgr access from localhost
> http_access allow localhost manager
> http_access deny manager
>
> # We strongly recommend the following be uncommented to protect innocent
> # web applications running on the proxy server who think the only
> # one who can access services on "localhost" is a local user
> #http_access deny to_localhost
>
> #
> # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
> #
>
> # Example rule allowing access from your local networks.
> # Adapt localnet in the ACL section to list your (internal) IP networks
> # from where browsing should be allowed
> http_access allow localnet
> http_access allow localhost
>
> # And finally deny all other access to this proxy
> http_access deny all
>
> # Squid normally listens to port 3128
> http_port 3128
>
> # Uncomment and adjust the following to add a disk cache directory.
> #cache_dir ufs /var/cache/squid 100 16 256
>
> # Leave coredumps in the first cache dir
> #coredump_dir /var/cache/squid
>
> #
> # Add any of your own refresh_pattern entries above these.
> #
Please note what that line says "above". It is the default config
comment for the final three refresh_pattern lines way, way down below.
You can either erase it entirely, or move it back to its proper
position under your custom patterns.
Also, a note on the % values in all the refresh_pattern lines;
One of the reasons to upgrade is that they are actually better when
used with values >100%. Some releases before 3.5.12 complained about the
% incorrectly. That bug has been fixed in the current Squid versions.
For example; An object which is 2 hours old having a 100% pt value
configured. Has an LMF calculation indicating its cacheable for 2hrs
from time received. This sounds great, but a lot of traffic is quote
young, ie. milliseconds, and 100% of a few milliseconds is not very much
time before the object needs updating again. Values upwards to 1000% may
be useful in a busy proxy cache.
> #
> refresh_pattern -i \.htm 120 50% 10080 reload-into-ims
> refresh_pattern -i \.html 120 50% 10080 reload-into-ims
> refresh_pattern ^http://*.facebook.com/* 720 100% 4320
Where did you copy-and-paste these patterns from? they are all very,
very broken. The above will not match what you probably think it does.
1) The "/*." near the start means 0 or more _slash_ ('/') characters
followed by _only one_ character whose existence is mandatory but can
have any value.
How many URLs you seen with "http:/blah" or "http:///////blah" ?
If this "works" at all it is probably because the regex library
interprets the facebook URL as having 0 '/' characters for (3) and one
charater (a '/') for the mandatory position.
AFAIK a regex library that does that is buggy though. So I think this
line will never match unless you are being passed phishing-like attack
URLs where the domain "facebook.com" is prefixed with some obscure
character like "1facebook.com" to fool people into clicking links
thinking its Facebook.
The '*' at the end also means 0 or more '/' characters. This is
pointless in terms of being a wildcard. There is an implicit '.*'
pattern at the ends unless you use the special anchor code '$' to mark
the URL ending.
But this pattern does not need that either. So you might as well erase
the whole '/*' part on the end.
Worst case you might be thinking this pattern matches only
"facebook.com" domain name because the domain is followed by a '/'.
However since the final '*' allows omitting that '/' delimiter this
pattern _actually_ matches any URL with a *subdomain* similar to
".facebook.com"
Such as "http://_facebook_com.example.com.bwaahahaha/"
==> If that was the behaviour you actually wanted, fair enough. But I
suggest in that case adding a comment to say it is there to catch some
attack URLs, not actual facebook.com traffic.
To retain the current behaviour the correct pattern should be:
^http://*.facebook.com
To fix the behaviour to only match facebook.com and sub-domains, the
correct pattern would be:
^http://.*\.facebook\.com/
Also, I suggest adding 's?' after the '^http' bit (as demo'd below), so
the one pattern matches both HTTP and HTTPS URLs.
Also, I suggest using the -i flag. Scheme and domain are
case-insensitive URL segments. Squid should be normalizing them to lower
case, but may not.
> refresh_pattern ^https://*.ruzivodigitallearning.co.zw.com/* 720 100% 4320
> refresh_pattern ^http://*.ruzivodigitallearning.co.zw.com/* 720 100% 4320
You can merge the above two lines into one by using the regex pattern:
^https?://.*\.ruzivodigitallearning\.co\.zw\.com/
Note that I have corrected for the same issues the facebook pattern had.
BTW: is that .com supposed to be there? I looked up the URL in redbot to
check cachability and the .com was not found, but there is a .co.zw ccTLD.
> refresh_pattern ^http://mail.yahoo.com/.* 720 100% 4320
> refresh_pattern ^http://*.yahoo.*/.* 720 100% 4320
> refresh_pattern ^http://*.yimg.*/.* 720 100% 4320
> refresh_pattern ^http://*.gmail.*/.* 720 100% 4320
> refresh_pattern ^http://*.google.*/.* 720 100% 4320
> refresh_pattern ^http://*.kaskus.*/.* 720 100% 4320
> refresh_pattern ^http://*.googlesyndication.*/.* 720 100% 4320
> refresh_pattern ^http://*.plasa.*/.* 720 100% 4320
> refresh_pattern ^http://*.telkom.*/.* 720 100% 4320
A useful rule of thumb is that fewer refresh_pattern lines leads to
better peformance.
So a redux of the above into a single regex pattern will be faster. Do
it with (a|b|c) compounding like the "file extension" patterns below.
If you want to improve performance remove all the override-lastmod.
Last-Modified is part of HTTP/1.1 which lets Squid perform fast
revalidation - without it some (most?) things can only MISS, and it
breaks the reload-into-ims operations.
Simplify your config by removing all the 'ignore-no-cache'. It has no
effect since Squid-3.2
Also, I recommend removing the ignore-private. Squid-3.5 can store the
data relatively safely, but if the revalidation does not work well it
can also lead to users manualy forcing reloads.
> ##################################################
> refresh_pattern -i \.fbcdn.net.*\.(jpg|gif|png|swf|mp3) 10800 80% 10800 ignore-reload override-expire ignore-no-cache
> refresh_pattern static\.ak\.fbcdn\.net*\.(jpg|gif|png) 10800 80% 10800 ignore-reload override-expire ignore-no-cache
> refresh_pattern ^http:\/\/profile\.ak\.fbcdn.net*\.(jpg|gif|png) 10800 80% 10800 ignore-reload override-expire ignore-no-cache
Er. these last two lines are sub-sets of the tope line. I think you can
erase those 'static' and 'profile' lines.
Also, once you remove the overrides as mentioend above. You are left
with "reload-into-ims" as the only difference between the parameters of
patterns above and the patterns below which match those same
file-extensions. So you can probably improve performance a bit more by
just erasing the above lines.
However, before they went all-HTTPS facebook were becoming one of the
better sites in terms of HTTP cacheability. I do not think that has
changed, its just the HTTPS/TLS wrapper preventing most of their traffic
going to caches now.
So override-expires is probably making things *worse* for all that
fbcdn traffic nowdays.
> ##############
> refresh_pattern -i \.(3gp|7z|ace|asx|avi|bin|cab|dat|deb|divx|dvr-ms) 10800 80% 10800 ignore-no-cache ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(rar|jar|gz|tgz|bz2|iso|m1v|m2(v|p)|mo(d|v)) 10800 80% 10800 ignore-no-cache ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(jp(e?g|e|2)|gif|pn[pg]|bm?|tiff?|ico|swf|css|js) 10800 80% 10800 ignore-no-cache ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(mp(e?g|a|e|1|2|3|4)|mk(a|v)|ms(i|u|p)|og(x|v|a|g)|rar|rm|r(a|p)m|snd|vob|wav) 10800 80% 10800 ignore-no-cache ignore-private override-expire override-lastmod reload-into-ims
> refresh_pattern -i \.(pp(s|t)|wax|wm(a|v)|wmx|wpl|zip|cb(r|z|t)) 10800 80% 10800 ignore-no-cache ignore-private override-expire override-lastmod reload-into-ims
Something to beware of:
The above patterns will match *anywhere* within the whole URL.
So the start of domain names (aka "www.raring.com" -> \.rar ) will be
cached using these refresh parameters.
As will any URL that happens to have a similar match in the
query-string portion. That is kind of useful if there is a filename in
the query parameters, but also dangerous since you cannot know when or
where that matching will happen.
Note that you are caching *private* responses whenever one of these
matches. The risk you are taking is large.
If that is a problem, you can work around it by adjusting the patterns
like this:
^https?://[^/]+/[^?]+\.(rar|jar|gz|tgz|bz2|iso|m1v|m2(v|p)|mo(d|v))
NP: If you like them to still match inside the query portion of URLs
replace the "[^?]" with a dot "."
PS. 'rar' is listed in the 2nd and 4th lines. One of them is redundant.
> #############################
> refresh_pattern (cgi-bin|\?) 0 0% 0
This above regex is broken. It requires the -i and the '/' path
delimiters as seen in the below refresh_pattern lines for the correct
CGI regex.
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern ^ftp: 10080 95% 10800 override-lastmod reload-into-ims
> refresh_pattern . 180 95% 10800 override-lastmod reload-into-ims
> #################
> minimum_object_size 0 bytes
> maximum_object_size_in_memory 500 MB
You have placed alternative ftp, gopher and '.' refresh_patterns above.
Remove the below refresh_pattern lines.
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern . 0 20% 4320
>
> http_port 3126
> #http_port 3128
> #######################################
> cache_swap_low 90
> cache_swap_high 95
> ############################
> cache_effective_user squid
> cache_effective_group squid
> memory_replacement_policy lru
> cache_replacement_policy heap LFUDA
> ########################
> maximum_object_size 10000 MB
> cache_mem 5000 MB
> maximum_object_size_in_memory 10 MB
> #########################
> logfile_rotate 2
> max_filedescriptors 131072
> ###############################
> #cache_dir ufs /root/cache3 600000 64 128
> ############
> cache_dir aufs /var/cache/squid 600000 64 128
Amos
More information about the squid-users
mailing list