[squid-users] Cache digest vs ICP

Wed Sep 27 15:06:07 UTC 2017

On 09/27/2017 03:46 AM, Veiko Kukk wrote:

> Siblings are configured with no-proxy keyword to achieve that they don't
> cache what other siblings already have in their cache. 

I assume that by "no-proxy" you meant "proxy-only".

> This is to minimize data usage costs from origin servers. 

The proxy-only option does not minimize the amount of data transmitted
between a proxy and the origin server. It reduces cache duplication
among cache peers.

> So far digest_generation has been set to off and only ICP has been used
> between siblings. Mostly because digest stats had shown many rejects
> (not containing 100% of cache objects) and documentation about digests
> is confusing up to statements that while rebuilding digest, squid will
> stop serving requests.

Please point me to the location of that statement. IMHO, it is not
confusing but incorrect. Non-SMP Squid stops servicing requests while
rebuilding a cache digest _chunk_, not the entire digest (unless the
digest is configured to have only one chunk, of course). The size if the
chunk is controlled by digest_rebuild_chunk_percentage.

Please note that non-SMP Squid stops servicing other requests when doing
virtually anything -- Squid is not threaded. The reason cache digests
are somewhat "special" in this context is because rebuilding the entire
digest may take a long time for large caches. Squid combats that by
splitting the digest rebuild process into chunks (a misleading term!),
digesting at most digest_rebuild_chunk_percentage of cached objects at a
time.

Cache Digests are not SMP aware (but should be). You may be able to work
around that limitation using SMP macros, but I have not tested that. I
do not remember whether a worker that is not configured to generate a
digest will still look it up in the cache when a peer asks for it.
Hopefully, the worker will do that lookup.

> Digest
> documentation states that it's including based on refresh_pattern. It's
> a problem because to get squid working as we want, we had to use
> offline_mode on.

If Cache Digests do not honor offline_mode, it is a (staleness
estimation code) bug that should be reported and fixed.

Meanwhile, does refresh_pattern stop working when offline_mode is on? If
not, then can you use refresh_pattern to emulate offline_mode effects
while still using offline_mode?

> * What is the relationship between cache digests and ICP?

IIRC, none, except the former is checked before the latter.

> If they are active together, how are they used together?

I have not tested this, but Cache Digests ought to be checked first, and
if they miss, then Squid should proceed to ICP/HTCP/etc. AFAICT, a Cache
Digest miss has no effect on other peer selection algorithms.

> * How are objects added to digest when rebuilding? Does this include lot
> of disk i/o like scanning all cache_dir files or is it based on
> swap.state contents?

Objects are digested based on the in-RAM cache index. There is no disk
I/O involved until the built digest needs to be stored on disk.

> * How can i see which objects are listed in cache digest?

A Cache Digest does not list/store object URLs -- it cannot produce a
list of previously digested objects. The only way to find out whether
object X was digested (with some degree of certainty) is to query the
digest for that object X.

I am not aware of any command-line interface for interrogating digests,
but it is certainly possible to build one. Please note that Squid
includes both the URL and the method into the object cache key (which is
what ends up being hashed by the Cache Digests code).

> * Why does sibling false positive result in sending client 504 and not
> trying next sibling or parent? CD_SIBLING_HIT/192.168.1.52
> TCP_MISS/504. How to achieve proceeding with next cache_peer?

Sounds like bug #4223 to me:
http://bugs.squid-cache.org/show_bug.cgi?id=4223

HTH,

Alex.