[squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question

Jester Purtteman jester at optimera.us
Wed Oct 28 03:52:22 UTC 2015


That’s kind of what I figured.  My thinking is that there is probably a higher fraction of content tags that would match, but it would take some care in calculating a checksum that wasn't in fact larger than the data itself.  But in any event, it seems plausible to me that it is possible.  Sounds like cache digest may be the right spot to start reading.  If I get anything like somewhere, I'll keep you guys in the loop... more importantly, I'll ask a bunch of questions :) 

Jester Purtteman, PE
OptimERA Inc,
(907) 581-4983 - Office
(360) 701-7875 - Cell

-----Original Message-----
From: Alex Rousskov [mailto:rousskov at measurement-factory.com] 
Sent: Tuesday, October 27, 2015 10:09 AM
To: squid-users at lists.squid-cache.org
Cc: Jester Purtteman <jester at optimera.us>
Subject: Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question

On 10/26/2015 10:14 PM, Jester Purtteman wrote:

> I have been wrestling with squid for a while and my reading has 
> brought “Cache-Digests” to my attention.  I suspect the answer is 
> “that would be
...
> As far as I can tell from (very limited) experimenting and reading, 
> this doesn’t **appear** to be how it works,


Hello Jester,

    With a few configuration adjustments and code modifications, you can make Cache Digests help with your use case. Cache Digests make decisions based on request URLs. It sounds like you want to make decisions based on response body as well. It is possible to change the code to do that, but it will be a lot of non-trivial work and there will always be some false positives because Cache Digests are not meant to give always-precise answers.


As Amos mentioned, there are existing/standard HTTP mechanisms that are meant to decrease pointless fetches across expensive links. However, just like Cache Digests, "as is", they may not work well in your use case. Those mechanisms make decisions based on origin-server-supplied headers such as ETags. As you said, that information may be missing or false in many responses.

Just like with Cache Digests, with a few configuration adjustments and code modifications, you can make those standard mechanisms work better for you. For example, you can teach Squid to generate its own ETag-like
content+header checksums that can be used in conditional HTTP requests
that Squid understands. None of this is easy, but it is doable.


There have been many proposals on how to solve this problem. I do not think there is a single winning approach. Everybody seem to experiment with their own tweaks of the existing tools and standards.

If you are looking for a solution that will cost you a few days/weeks of development and sysadmin work, I do not think there is one. If you are willing and able to invest a lot more, then I recommend that you estimate the expected savings _before_ you invest in an expensive solution. Getting reliable estimates is a complicated project on its own, but it is still a lot cheaper than investing months into a solution that does not meet your needs.


HTH,

Alex.



More information about the squid-users mailing list