[squid-dev] The next step towards: StoreID and metalink.

Mon Sep 21 23:43:10 UTC 2015

On 22/09/2015 00:57, Amos Jeffries wrote:
> Using 302 kind of defeats the purpose of metalinks: that the content can
> be fetched from alternative URLs if one breaks in any way (by the
> client, or by Squid on revalidation retries).

 > Using StoreID helper to re-write the IDs to the same cached content is
 > more inline with the metalinks model, in that Squid and/or client can
 > revalidate the cached object from whatever URI the client is fetching
 > and it updates all the other ones the StoreID maps to that sfileno ID.

You are right.
Indeed it defeats StoreID but there is another side to the whole picture.
The squid re-validation code assumes that the ETAG should be similar 
between servers so one issue is that using ETAG with metalinks kind of 
beats any squid cache basic validating logic.
As a base rule ETAG should not be present in a case that metalink 
headers of hashes are present or it should be consistent in each of the 
mirror servers.
So in some environments the usage of 302 with an override option in the 
url query terms might be the right choice.(I think it's the less proper 
solution and this is one of the reasons I was designing StoreID as it is).

> Note that all this use of eCAP and StoreID is for PoC testing how
> metalinks works. Long-term these actions should all be an internal
> feature of Squid. If we find that Squid needs to be altered to make the
> PoC work then we are probably better just patching the related part of
> metalinks operation in directly, or re-designing the PoC.

OK.
And for the PoC part of the issue.
I would like to first verify and test the influence of such hashing 
calculations(in this case SHA256) on the server operation under load.
I am using the module on my LAN proxy with SSL-bump ON to make sure how 
I feel the effect from the module.
I have observed that the actual download rates for a 20Mbps line works 
just fine but this is very far from real traffic on a loaded server.
Even if I like the metalinks concepts there is also the possibility of a 
wrong design.
So before proofing the concept itself I am looking for test candidates 
that will run the module and if possible on a controlled production 
environment.

Else then that I had the idea of using an internal StoreID for hashed 
content such as "http://hash.squid.internal/sha256/calculated_hash".
If we trust the source server we don't have any issue to use the StoreID 
and apply it without any relationship to the actual hashing of the content.
If we partially trust the source we can "test" and black\white\grey list 
the source metalink hash accuracy.

So there are couple models to how we can treat metalinks.
- Fully trust source
- Partially trust source
- Training\checking to verify if we trust the source

With the above model we can put the whole internal cache lookup logic 
aside and think about something global.

What do you think about the trust idea I had in mind? if you do have 
another trust approach please share your ideas.

Eliezer