[squid-users] cache hit rate isn't what I'd expect

Thu Sep 28 22:47:29 UTC 2017

On 29/09/17 11:29, Aaron Turner wrote:
> So this grep through my access logs for this single URL does a good
> job illustrating a rather interesting problem:
> 
> $ grep -h 'https://static.licdn.com/sc/h/ddzuq7qeny6qn0ysh3hj6pzmr
> text/css ip_index=0,client=m0078269' access.*.log | sort
> 
> 
...
 >
> At first I thought this was because the because I have a bunch of
> clients, each of which behaves exactly the same except for one thing:
> the client includes a unique request header that squid strips off
> before forwarding to the server (you can see it logged as
> client=mXXXXX_XXXX).  But in this case I've controlled for that and
> only grep'd for a single client's request.  I've even tried setting
> "vary_ignore_expire on", but that doesn't seem to be a complete fix.
> 
> I can't for the life of me understand why the low hit rate though.
> 

The duration and size fields are quite useful for detecting reasons for 
HIT/MISS.

Request headers should not affect the response caching, unless they are 
listed in the servers Vary header.

In this case the server is delivering broken Vary responses. redbot.org 
says it is using Vary:Accept-Encoding sometimes, so both the Vary and 
Accept-Encoding would be useful info to log.

I expect it is the usual problem of clients fighting over whose variant 
gets cached when this type of server breakage happens - when the Vary 
header changes or disappears, old variants become unfindable until it 
changes back.

Amos