[squid-users] Time for cache synchronization between siblings

Wed Dec 16 20:35:01 UTC 2015

On 17/12/2015 3:10 a.m., Sreenath BH wrote:
> Hi,
> 
> Thanks for the tips. After disabling digest I believe performance improved.
> However, I found that randomly requests were being routed to parent
> even when siblings had the data cached.
> 
> From access.log I found TIMEOUT_CARP. I assumed this meant HTCP timed
> out and squid was forced to go to fetch the data. So I increased
> icp_query_timeout to 4000 milliseconds, and the hit rate increased
> further.
> 
> But I still find that sometimes, even after getting a HIT response
> from a sibling, squid, for some reason still decides to go to the
> parent for requested object.
> 
> Are there any other reasons why squid will decide to go to parent servers?

Just quirks of timing I think. Squid tracks response latency and prefers
the fastest source. If the parent is responding faster than the sibling
for man requests over a short period then Squid might switch to using
the parent as first choice for a

Some traffic is also classified as "non-hierarchical". Meaning that it
makes no sense sending it to a sibling unless all parents are down.
Things such as CONNECT, OPTIONS, POST etc where the response is not
possible to be cached at the sibling.

> 
> And another question: When the hash key is computed for storing cache
> objects, does Squid use the hostname(or IP address) also as part of
> URL, or just the part that appears after the hostname/IP:port numbers?

No. The primary Store ID/key is the absolute URL alone. Unless you are
using the Store-ID feature of Squid to change it to some other explicit
string value.

If the URL produces a reply object with Vary header, then the expansion
of the Vary header format is appended to the primary Store ID/key.

> 
> For example: if ip address is squid servers is 10.135.85.2 and
> 10.135.85.3, and a request made to 1st server would have had the IP
> address as part of the URL. However, next time same request is made to
> server2, a different IP address would be used. Does this affect cache
> hit at the sibling server?
> 
> I think it should not, but is this the case?

Correct the Squid IP has nothing to do with the cache storage.

> 
> We will have a load balancer that sends requests to each squid server,
> and we want cache peering to work correctly in this case.

FYI; the digest and HTCP algorithms you are dealing with are already
load balancing algorithms. They are just designed for use in a flat
1-layer heirarchy.

If you intend to have a 2-layer heirarchy (frontend LB and backend
caches) I suggest you might want to look into Squid as the frontend LB
using CARP algorithm. The CARP algorithm ensures deterministic storage
locations for what URLs get sent to which caches. So there is no need
for siblings communication as they all get unique URLs.

 * <http://wiki.squid-cache.org/ConfigExamples/SmpCarpCluster> has
details of how to split the frontend and backend config. The specific
example is for doing it using SMP workers within a single proxy
instance. But the split can even more easily be done across different
machines.

 * <http://wiki.squid-cache.org/ConfigExamples/ExtremeCarpFrontend> has
some details on how to add iptables port splitting on top of CARP to get
ridiculously high performance out of a proxy heirarchy. The last numbers
I heard from these setups were pushing just under the Gbps mark.

Amos