[squid-users] squid stores multiple copies of identical ETags

Alex Rousskov rousskov at measurement-factory.com
Sat Jun 27 20:59:45 UTC 2020


On 6/27/20 2:31 PM, Tabacchiera, Stefano wrote:

> Consider a cachable object ... with all its response headers set
> (content-length/last-modified/etag/expiration/etc). When the client
> requests it with "no-cache", it prevents squid from providing the
> cached on-disk object, and it forces the retrieve of a new copy from
> the origin server.

> But THIS new copy is the same identical object which is already on
> disk (same url/size/etc.)

Squid does not know that the response headers and body have not changed.
Squid could, in theory, trust the URL+Vary+ETag+etc. combination as a
precise response identity, but it is a bit risky to do that by default
because ETags/etc. might lie. There is currently no code implementing
that optimization either.


> In my (maybe faulty) understandings this could be avoided, by simply
> look up in the store log and find that this particular object already
> exists on disk.

Squid could do that if it trusts ETag/etc and updates stored headers.
Squid does not do that (yet?). Even the header update part is not fully
supported yet!


> Since this doesn't seem to be happening, chances are: squid doesn't
> care about storing multiple copies on disk

To be more accurate, Squid does not store multiple copies of (what Squid
considers to be) the same response -- only one object can be indexed per
URL/Vary. Bugs notwithstanding, Squid will overwrite the old response
(for some definition of "overwrite") with the new one.

I do not know much about aufs -- that code has been neglected for a
while -- but perhaps aufs simply does not have enough time to delete its
old/unused files? Try setting cache_swap_low and cache_swap_high to the
same very low value, perhaps even zero (to avoid backgrounding the
cleanup task).



HTH,

Alex.


More information about the squid-users mailing list