[squid-users] HDD/RAM Capacity vs store_avg_object_size

Amos Jeffries squid3 at treenet.co.nz
Wed Jul 12 16:11:56 UTC 2017


On 12/07/17 22:31, bugreporter wrote:
> Hi,
> 
> Can anybody help me to confirm my understanding of the memory usage vs the
> persistent cache capacity? Below my understanding:
> 
> According to http://wiki.squid-cache.org/SquidFaq/SquidMemory:
> 
> 1- We need 14 MB of memory per 1 GB on disk for 64-bit Squid.The wiki is
> there since I know squid (ie. i'm very old now). Is this information still
> valid?

Yes. It is a rough estimate based on the size of code objects used to 
store each request message - they have not changed in at least the past 
10 years. There may be some variance based on extra headers modern HTTP 
contains. But that is not a huge amount and the number is a rough 
estimate to begin with.



> 
> 2- Is this assumption based on the default value of 13 KB for
> *store_avg_object_size*?

No.

That avg object size is for the full object with payload. Those payloads 
are stored inside cache_mem or cache_dir, and do not take up index 
space. So have a total limit of whatever you configure those storage 
areas to be.

Squid uses the above directive for its startup initialization of the 
index's hash table. The table can be changed dynamically, but that is 
quite expensive in terms of CPU cycles and would delay some requests so 
this is a nice shortcut to avoid most pauses.


The 10 or 14 MB is purely for the metadata necessary to index those 
cached objects. Which is the HTTP message header text plus a bunch of 
Squid code objects.


> 
> 3- If answers to questions above are both YES, can we deduce that we need
> *182* bytes in memory per object in the persistent cache on 64x system?
> [*182* = (14 * 1024 * 1024) / (1024 * 1024 / store_avg_object_size)]

If you want to re-do the calculations for your own proxy start with the 
values from the cachemgr "mem" report.

To get the metadata size add the per-object sizes (first number column) 
of HttpReply + MemObject + HttpHeaderEntry + all objects whose name 
starts with HttpHdr* + StoreEntry + all objects whose name starts with 
StoreMeta*.

The rest is harder. You need to do a scan of a disk cache separating the 
message headers - both counting the number of items found and total size 
of the headers processed. Multiplying the metadata size by the number of 
objects in the cache and adding the total message header size.

You now have total index size and total cache size for a given cache. 
Getting the N per GB from that should be easy and obvious.



NP: The mgr:mem "In Use" count of StoreEntry gives you approximately the 
number of currently indexed objects. Though it does includes some 
non-cacheable objects being replied to currently so not completely 
accurate. You can use that to see how the index memory use compares to 
the memory use for extra in-transit data.



> 4- Today the *store_avg_object_size* should be really greater than 13 KB.
> The mean object size I can see on my own cache is about 100 KB. Can anybody
> refer me to a website where I can find fresh information?

The value for your particular Squid can be found in the cachemgr "info" 
report. It is listed as "Mean Object Size".

It varies between proxies, and is directly dependent on what your 
particular cache settings are compared to the traffic that proxy sees. 
So even two proxies receiving the same traffic might show very different 
values and it is unlikely that any reference material you find by other 
people will be anything more than a rough approximation.


For example; my test proxy caching ISP-type traffic, with a fair bit of 
Facebook, YouTube etc. going through it:
"
	Mean Object Size:	106.08 KB
"

and a production CDN proxy in front of mostly Wordpress sites:
"
	Mean Object Size:	19.20 KB
"

Both with a 200 GB cache_dir and otherwise default cache settings.



> 
> 5- If I'm completely on a wrong way, can anybody help me to find a formula
> that can help me to deduce the required RAM for a given HDD capacity (and
> vice versa).
> 

Still the same one listed in the wiki page.

Though nowdays the 2^27 objects per cache_dir limitation is proving to 
be far more restrictive than the RAM index size. So depending on your 
"Mean Object Size" you may find yourself limited to only using 100 GB or 
less of a TB HDD.

Amos


More information about the squid-users mailing list