[squid-dev] Digest related question.

Mon Feb 23 10:27:13 UTC 2015

On 22/02/2015 2:31 p.m., Eliezer Croitoru wrote:
> On 22/02/2015 02:46, Amos Jeffries wrote:
>> The response to a HEAD request is supposed to be exactly identical to a
>> response to the GET, but with the body/payload/entity cropped off. Even
>> the Content-Length headers etc should be present saying what size the
>> body would have been.
> 
> So just to make sure I understand the reality:
> In a case I start squid with 0 objects and run only one transaction of a
> GET request it would create two objects?
> From what I have seen in the code in the past(very long ago) I remember
> that it might not be this way.
> So I cannot just run a HEAD request and expect it to reflect too much
> information about the cached data.

No, once the GET request is cached Squid can answer either GET or HEAD
requests using it.

> 
> About the store log, I have one issue with it and it is that I would not
> be able to "know" if the object is in the cache unless I will follow
> everything in the store log including cache removals.

Yes. Thats why its only suitable for the proof of concept.

The final version had probably best use HTCP to query the cache index
live. But until the Link feature you are experimenting with has actually
been implemented in Squid that will not work since HTCP does not
understand the Link keys.

The choice is between doing a PoC with not-easy interface, or just
designing and writing the feature in-place.

> 
> So one process that will follow the store log and will store it will be
> enough to supply the currently "cached" object list per squid instance
> start till the squid shuts down.
> In a case I would try to "save" it for after a shutdown would be more
> difficult.
> 
> From another point of view which is the actual content digest I would
> need some way to receive the file\response content and I was thinking
> about an ICAP based solution which might be combined with the store log
> "follower".
> 

IIRC, some while back Alex was working on providing Content-MD5 hashes
for all cached objects. I forget what happend to that project. Its
needed for other things too, so would be worth getting into trunk.

> I have a situation in mind for some situation with this kind of a setup:
> Fetch a URL which will be digested as we go and then will be "followed"
> and in a case the associated url will be purged from the cache the
> digest will also be erased from the DB.
> 
> Now the next part is the right way to "know" if there is a way to
> pre-find the request digest from the origin server.
> And it seems like a HEAD request to the origin server might be enough.
> So the origin server must be publicly accessible so the "follower" might
> be able to fetch a HEAD of the file and lookup any digest data in the
> response.

Only if the upstream server delivers Content-MD5 or similar in the
response headers (*not* ETag). HEAD or HTCP query should then return it.

> In turn a metalink file link in the HEAD headers might be usable for a
> more complicated options.
> There is another issue with digest "validation" which can be considered
> as an option for black\white list a server for digest\metalink credibility.
> 
> Do you think this idea looks a bit sane?

Yes.

Amos