[squid-dev] The next step towards: StoreID and metalink.

Eliezer Croitoru eliezer at ngtech.co.il
Mon Sep 21 21:40:27 UTC 2015


Alex, Amos,

The first step now I am trying to grasp first what could be done in the 
current state of squid and ECAP without any changes.
Currently squid provides the methods to modify a request and\or response.

Basic StoreID support in a REQMOD should be possible in any case to 
allow an ECAP module to do those things and also sine it has more 
information about the request details it can decide better then the 
current StoreID helpers.
Leaving the StoreID aside and back to hashes.
Currently the ECAP module can calculate hashes on the fly and then in 
the end of the transaction write the result to either a log file or a DB.
For now the benefits i see from this is the option to find duplicate 
content based on the hash.
For example: running a db lookup for similar hashes or something like 
sort by hash.
pesudo:
iterate on the hashes and urls
if hash exists add the url to the array
else create a new hash to url array mapping.
then list urls with more then one url and get statistics for that.
* the statistics can be based on an access.log object/download size lookup

It would require some relational DB or some other way to store it in a 
K\V DB.

I think that in the current state of ECAP I can only build a statistics 
tool based on the ECAP module.

The actual cases which might benefit from a cache lookup would be metalinks.
And a "If Modified Digest" might also benefit from it.

There is another way to de-duplicate content for metalinks using cache 
objects planting\redirection.
The procedure would involve a setup which will allow for example a 302 
redirection planting.
I will describe it more in depth:
A response from a trusted source was found with metalink sources.
Once the hash was validated a series of objects "insertion" or 
"implanting" stage will start.
In this stage each and everyone of the link urls will be planted in the 
DB with a 302 redirect url.
(it can be inserted into squid cached objects or using an external DB)
The result will be:
If someone tries to contact a specific URL which is the DB, a 302 
redirection will be issued towards the already cached and hashed url.
it's not 100% full proof unless there is knowledge about the cache 
internals but as Amos suggested in the past, the store.log might be 
enough to make it possible to track cache removals and insertions.

Which of the ideas is a more realistic one compared to changing squid 
and\or ECAP?

Eliezer

On 21/09/2015 23:10, Alex Rousskov wrote:
> On 09/21/2015 01:28 PM, Amos Jeffries wrote:
>> On 22/09/2015 6:19 a.m., Alex Rousskov wrote:
>>> On 09/19/2015 05:28 PM, Eliezer Croitoru wrote:
>>>
>>>> There are things which can help an ECAP module to decide how to do
>>>> things when handling metalinks related affairs.
>>>> For example, a cache object lookup from within an ECAP module.
>>>> So basically the logic of the ECAP module is 100% blind to the cache
>>>> internals.
>>>
>>> FWIW, there have been requests to expose host cache state to eCAP
>>> adapters. It can be easily done (via Cache-Lookup metadata) for the
>>> current request [in pre-cache REQMOD]. Doing so will even be compatible
>>> with ICAP services.
>>>
>>> Unfortunately, but most real use cases I know about actually need a true
>>> "cache lookup using an arbitrary request and/or StoreID" functionality
>>> and not just a "is the current request likely to be served from cache?"
>>> answer. Supporting arbitrary lookups would require expanding eCAP API. I
>>> would support such expansion, especially if we have several different
>>> specific use cases the back the new API design up.
>
>> What I have been thinking of for Metalinks was a StoreID helper that
>> took the requested URI and matched it against an index/DB of it own to
>> see if there was a previous ID for it.
>
> Mapping a given request [URL] to Store ID(s) can be supported using
> eCAP-supplied annotations without any eCAP API changes. An adapter can
> simply return a "use these IDs" annotation with Store ID(s) that Squid
> would recognize. That part requires straightforward Squid changes,
> especially if only REQMOD support is needed.
>
> My response to Eliezer question above was primarily about allowing an
> eCAP adapter to ask the host application (such as Squid) whether an
> arbitrary request [URI] is likely to be satisfied from its cache. That
> part requires non-trivial eCAP API changes [and it would be good to have
> more specific use cases to guide those changes].
>
>
>> That could be paired with an eCAP RESPMOD adapter that scanned for
>> metalink URIs and added records to the StoreID index/DB for them. So
>> that future requests for those URIs got the now-cached content.
>
> Sure.
>
>
>> There are issues around invalidation to work out. Specifically when the
>> cached object gets replaced and has different metalinks attached, how to
>> find the obsolete ones. Doing a delete operation at speed may be
>> problematic.
>
>
>> If the index/DB could be Squids internal cache index, or a special case
>> per-cache indexes like Transients but for the eCAP adapter. Then the
>> StoreID helper may not be even necessary. But that might be a lot more work.
>
>
> If we want tight integration between Squid cache and an adapter,
> including things like notifications about purged cached entries, then we
> probably want the Store to "adapt" its cache _operations_ (add, update,
> delete cache entry) rather than HTTP messages. In other words, in
> addition to icap_service and ecap_service, we would have a cache_service
> (that can still use eCAP API but send cache operations instead of real
> HTTP messages for "adaptation").
>
>
> HTH,
>
> Alex.
>
> _______________________________________________
> squid-dev mailing list
> squid-dev at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-dev
>



More information about the squid-dev mailing list