[squid-users] squid stores multiple copies of identical ETags

Amos Jeffries squid3 at treenet.co.nz
Fri Jun 26 12:36:19 UTC 2020


On 27/06/20 12:19 am, Tabacchiera, Stefano wrote:
> Hello there,
> 
> I have the following issue with squid (squid-3.5.20-12.el7_6.1.x86_64 
> on RHEL 7.6)
> 
> A client requests a json with “no-cache” header via proxy.
> 
> Squid forwards the request to origin server, which replies with “ETag”
> header (object is cachable).
> 
> Squid stores the object in cache_dir and forwards back to the client.
> 
> 
> The client is pushing same request at high rate (~10/sec), regardless
> its cachable status.
> 
> Squid keeps forwarding the request and – here is my issue – keeps
> storing the same identical object on disk.
> 
> I have thousands of copies of the same Etag on disk.
> 

Objects are *not* stored by ETag. They are stored by *URL* (or URL+Vary
header).

Also, an object existing on disk does not mean it is considered to be
"latest" version of an object. It only means that no other object has
needed to use that same cache slot/file since the existing object was
stored there.
 Clearing cache slots/files the instant its content become obsolete
would cause up to _double_ the amount of disk writing to happen. Squid
already does a huge amount of writes.

In general, if the same object occuring N times on disk is a problem,
you have issues with misconfigured cache_dir parameters. eg the
cache_dir size is too big for the physical disk it is stored on. Each
type of cache_dir is optimized for different object types on different
OS - selecting which to use can be important for high performance
installations.


> 
> Is there a way to avoid this? I think Squid should store a single copy
> per-URL/ETAg of the object.
> 
> I’d like to avoid an ad-hoc reload-into-ims refresh-pattern.
> 

Squid is doing exactly what the client is demanding with its use of
"no-cache".


There is nothing wrong with that configuration. It is the best way to
make Squid cope with such a nasty client. The alternative is to ignore
*all* Cache-Control headers from all clients on all traffic - which is
much overkill.


Amos


More information about the squid-users mailing list