[squid-users] Cache digest vs ICP

Mon Oct 2 16:36:58 UTC 2017

On 10/02/2017 08:28 AM, Veiko Kukk wrote:

> I found it in the book by Duane Wessels
> Quoting: During each invocation of the rebuild function, Squid adds some
> percentage of the cache to the digest. Squid doesn't process user
> requests while this function runs.

The quoted statement is correct: Digesting a (configurable) percentage
of cache index is a blocking action -- Squid does not process anything
else while that action runs. As we discussed earlier, digesting the
whole cache index is not blocking. This is similar to how one network
read is blocking but receiving the entire response body is not blocking.

>     Cache Digests are not SMP aware (but should be). You may be able to work
>     around that limitation using SMP macros, but I have not tested that. I
>     do not remember whether a worker that is not configured to generate a
>     digest will still look it up in the cache when a peer asks for it.
>     Hopefully, the worker will do that lookup.
> 
> That sounds very interesting. Could you point me to sample configuration?

I am not aware of any sample configurations that restrict digest
generation to a single worker, but that does not mean they do not exist.
SMP macros in general are described in the beginning of
squid.conf.documented.

> How frequently are cache digests refreshed from
> siblings? 

The short answer is "a new peer digest is fetched PeerDigestReqMinGap
seconds after its earlier cached version has expired". I believe the
details are covered by the discussion below and the following FAQ entry:
https://wiki.squid-cache.org/SquidFaq/CacheDigests#How_are_Cache_Digests_transferred_between_peers.3F

> It seems to me that it takes quite a lot time and i have not
> found anything in documentation that could help enfroce digest
> refreshing. 

digest_rebuild_period controls how often the local digest is refreshed.
Bugs notwithstanding, the local digest expiration (and the Expires field
in digest HTTP response) should be set accordingly.

> In test system, i've set 'digest_rebuild_period 60 second'.

Squid has several hard-coded rate limits for digest fetches:

* refresh a given peer digest no more than once in 5 minutes:

  /* min interval for requesting digests from a given peer */
  static const time_t PeerDigestReqMinGap = 5 * 60;   /* seconds */

* and request a digest no more frequently than once per minute:

  /* min interval for requesting digests (cumulative request stream) */
  static const time_t GlobDigestReqMinGap = 1 * 60;   /* seconds */

Notes for your future tests, if any:

* If you are running an SMP Squid, then please repeat the test without
SMP. Make sure non-SMP configuration works before you try to configure
SMP Squid (which will probably require lowering digest_rewrite_period as
well so that all workers can see the newly generated digest on disk).

* A 60 second refresh feels too aggressive to me. Any
digest_rebuild_period longer than digest generation should work in
theory, but I would be worried about various hard-coded hack interfering
with such a small value as 60 seconds. I recommend starting with 5
minute or longer periods. A longer regeneration period would also go
nicely with PeerDigestReqMinGap discussed above.

> With clean cache and running test downloads sibling1 very quickly
> updates it's cache digest:
> 
> Local Digest:
> store digest: size: 10492 bytes
> entries: count: 415 capacity: 16787 util: 2%
> deletion attempts: 0
> bits: per entry: 5 on: 1648 capacity: 83936 util: 2%
> bit-seq: count: 3224 avg.len: 26.03
> added: 415 rejected: 0 ( 0.00 %) del-ed: 0
> collisions: on add: 0.00 % on rej: -1.00 %
> 
> I've waited at least 20 minutes, several times ran downloads agains
> sibling2 (clean cache too) and sibling2 (192.168.1.52) still shows old,
> almost empty cache digest for sibling1(192.168.1.51):

Please note that if the old digest1 was generated before you changed
digest_rebuild_period for sibling1, then its old cached version will
still have that old expiration date. I am _not_ saying that is what
happens in your specific test, but please keep this caveat in mind.

Also, the 192.168.1.51 digest shown below is not "almost empty" -- the
stats below show that it has 55% of its bits turned on, with all 51
expected entries digested. AFAICT, that digest is full.

> Peer Digests:
> no guess stats for all peers available
> 
> Per-peer statistics:
> 
> peer digest from 192.168.1.51
> no guess stats for 192.168.1.51 available
> 
> event          timestamp    secs from now    secs from init
> initialized    1506952649    -1602              +0
> needed         1506953341     -910            +692
> requested      1506953341     -910            +692
> received       1506953341     -910            +692
> next_check     1506956584    +2333           +3935

> peer digest state: needed: yes, usable: yes, requested:  no

> last retry delay: 0 secs
> last request response time: 0 secs
> last request result: success

> peer digest traffic:
> requests sent: 1, volume: 0 KB
> replies recv:  1, volume: 0 KB
> 
> peer digest structure:
> 192.168.1.51 digest: size: 32 bytes
> entries: count: 51 capacity: 51 util: 100%
> deletion attempts: 0
> bits: per entry: 5 on: 142 capacity: 256 util: 55%
> bit-seq: count: 131 avg.len: 1.95

If I am reading the above sibling2 stats correctly, then sibling2
downloaded and cached a tiny 32-byte digest1 910 seconds ago and will
refresh the cached copy in 2333 seconds. That next check will come
(3935-692)/60 = (2333+910)/60 = 54 minutes after digest1 birth. You
should be able to correlate that with digest1 generation stats reported
by 192.168.1.51 at the time when this digest was generated.

HTH,

Alex.