[squid-users] squid stores multiple copies of identical ETags

Tabacchiera, Stefano stefano.tabacchiera at IGT.com
Sat Jun 27 18:31:08 UTC 2020


>>>In general, if the same object occuring N times on disk is a problem,
>>>you have issues with misconfigured cache_dir parameters. eg the
>>> cache_dir size is too big for the physical disk it is stored on.    >

 >> That's the point. There's  a LOT of identical objects on disk.
 >> I have 2x100gb dedicated disks, ext4 noatime.
 >> Each cache_dir is aufs 80000 16 256.
 >> Where's the issue? I did't even imagine this would lead a multiple
 >> copies stores of the same object.

   >  So far the problem appears to be you not understanding how caching
   >  works. My previous response contains the explanation that should have
   >  resolved that.

Amos, I'm sorry, but I'm still confused.

Please follow me on this:
Consider a cachable object, e.g. a static image, with all its response headers set (content-length/last-modified/etag/expiration/etc).
When the client requests it with "no-cache", it prevents squid from providing the cached on-disk object, and it forces the retrieve of a new copy from the origin server.
So far, so good.
But THIS new copy is the same identical object which is already on disk (same url/size/etc.), 'cause the client is requesting the same object many times per second, all day.

I understand that squid must serve the new copy to the client (no-cache), but what I don't get is why squid is storing every time a new copy of THIS object on disk.
In my (maybe faulty) understandings this could be avoided, by simply look up in the store log and find that this particular object already exists on disk.

Since this doesn't seem to be happening, chances are: squid doesn't care about storing multiple copies on disk OR (more probably) I'm still missing something vital.

In the real case, the object is a JSON which is modified every 5 minutes. Every times it changes, obviously it has a new Etag, new Last-modified, a proper content-length, etc.
Client requests it like 10 times per sec: 10*300 ~ 3000 copies on disk. Consider a mean object size of 500KB: 3000*500KB = 1.4GB.
A single object is wasting 1GB of disk space every 5'. Indeed, during a restart, squid does a lot of purging of duplicate objects.
Is this really necessary? I don't see the point.

You mentioned the cache_dir parameters, like the cache size compared to disk size, or L1/L2 ratio.
Can you please be more specific or point me at the right documentation?
I'd appreciate a lot your help.

Thanks
ST

____________________________________________________________________________________ La presente comunicazione ed i suoi allegati e' destinata esclusivamente ai destinatari. Qualsiasi suo utilizzo, comunicazione o diffusione non autorizzata e' proibita. Se ha ricevuto questa comunicazione per errore, la preghiamo di darne immediata comunicazione al mittente e di cancellare tutte le informazioni erroneamente acquisite. Grazie This message and its attachments are intended only for use by the addressees. Any use, re-transmission or dissemination not authorized of it is prohibited. If you received this e-mail in error, please inform the sender immediately and delete all the material. Thank you.


More information about the squid-users mailing list