[squid-users] squid-users Digest, Vol 12, Issue 33

Du, Hongfei Hongfei.Du at InterDigital.com
Wed Aug 19 11:45:28 UTC 2015

Hi Eliezer and Amos
Sure, I have posted this to the squid-dev list, which is more relevant to this issue.

Many thanks for Amos' comments, really helpful information. For clarification, here we look at the HTTP caching, and only look into the cache-dir selection algorithm rather than the peer selection algorithm at this stage, namely, we create three separate folders in /var/spool/cache1, /var/spool/cache2, /var/spool/cache3, and we intend to make squid intelligent enough to strictly put content(e.g. all elements from a single URL as defined from one of our subscriber user) into a specified folder, rather than following any build-in RR/LL rules which is based on the status(e.g., residual capacity) of the cache server itself. For the RR/LL source codes we are looking for, it is really for the cache-dir selection algorithm applied for local storage(the three folders as mentioned above.) Besides, refers to: " There is algorithm(s) applied in layers to decide which type of storage area is use, then which one within the selected type is most appropriate. Based on object availability, cacheability, size, storage area speed, object popularity, and temporal relationships to others." , can you elaborate more on where we can look into this algorithm(s)?

Best Regards,



Hongfei Du
Staff Engineer (UK Software)
InterDigital UK, Inc.
Shoreditch Business Center
64 Great Eastern Street
London,  EC2A 3QR
T: +44 207.749.9140
Hongfei.Du at InterDigital.com

[cid:image3d6f6e.BMP at 467ad0ef.48a6cd90]

This e-mail is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and/or otherwise protected from disclosure to anyone other than its intended recipient. Unintended transmission shall not constitute waiver of any privilege or confidentiality obligation. If you received this communication in error, please do not review, copy or distribute it, notify me immediately by email, and delete the original message and any attachments. Unless expressly stated in this e-mail, nothing in this message or any attachment should be construed as a digital or electronic signature.

-----Original Message-----
From: squid-users [mailto:squid-users-bounces at lists.squid-cache.org] On Behalf Of squid-users-request at lists.squid-cache.org
Sent: Tuesday, August 18, 2015 1:00 PM
To: squid-users at lists.squid-cache.org
Subject: squid-users Digest, Vol 12, Issue 33

Send squid-users mailing list submissions to
       squid-users at lists.squid-cache.org

To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
       squid-users-request at lists.squid-cache.org

You can reach the person managing the list at
       squid-users-owner at lists.squid-cache.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of squid-users digest..."

Today's Topics:

  1. Re: Question on developing customized Cache Selection
     algorithm from Round Robin, Least Load (Amos Jeffries)


Message: 1
Date: Tue, 18 Aug 2015 22:24:45 +1200
From: Amos Jeffries <squid3 at treenet.co.nz>
To: squid-users at lists.squid-cache.org
Subject: Re: [squid-users] Question on developing customized Cache
       Selection algorithm from Round Robin, Least Load
Message-ID: <55D307ED.3030500 at treenet.co.nz>
Content-Type: text/plain; charset=utf-8

On 18/08/2015 5:42 a.m., Du, Hongfei wrote:
> Hello
> We are in an attempt to extend Squid Cache selection algorithm to make
a more sophisticated, let’s say to add WRR or WFQ, a few questions to start with:

Like Eliezer said this is really a question for squid-dev mailing list where the developers hang out.

WRR (weighted round-robin) is already implemented and exactly how Squid cache_dir currently operate. The weighting is based on storage area available size and I/O loading.

WFQ (weighted fair queueing) is a queueing algorthm as the 'Q' says.
Caching != queueing. In fact a cache is so different from a queue that WFQ would badly affect performance if it were used to decide what storage an object went into.
In essence, the problem is that we cannot dictate what objects will be requested by clients. They want what they ask for. Squids duty is 1) to answer reliably and 2) fast as possible regardless of objects location.

> - As we probably has to rewrite new algorithm and recompile it, so
does anyone know where(or which file) is the existing Round Robin or Least Load algorithm defined in source codes?

That depends on whether you mean the algorithm applied for local storage vs network sources, or the one(s) applied to individual caches for garbage collection.

> - Is there straight forward method to tell/instruct squid to store
content from network(e.g. an URL) in a predefined specific disk folder rather than using the selection algorithm itself?

Simply stated:
The URL and all other relevant details from the transaction are hashed to lookup an index to find the 32-bit 'sfileno' value which is a UID encoding the location of indexed objects in Squid local storage.

It _sounds_ simple enough, but those "other relevant details" is a massive complication. One single URL can potentially contain all possible objects that ever have or ever will exist on the Internet. Even considering storing things one file per URL dies a horrible death when it encounters popular modern websites.

Within Squid we refer to "the HTTP cache" as a single thing. But it is constructed of many storage areas. The individual cache_dirs and other places where HTTP objects might be found. Remote network sources are also accounted for.

There is algorithm(s) applied in layers to decide which type of storage area is use, then which one within the selected type is most appropriate. Based on object availability, cacheability, size, storage area speed, object popularity, and temporal relationships to others.
Then a sfileno is assigned if its local storage.

Then objects get moved between storage areas anyway based on need and popularity. And objects get removed from invdividual storage areas based on lack of popularity. Both of which affect future requests for them.

So the particulars of what you want to do matter, a lot.

FWIW, we have known outstanding needs for:

* updated cache_peer selection algorithms. Current Squid outgoing TCP connection failover works with a list of IPs that get tried until one succeeds. The old selection algorithms produce only a single IP rather than a preference-ordered set of peers to try.
- also none of the algorithms provide byte-base loading.

* ETag based cache index. For better performant If-Match/If-None-Match revalidation traffic.

* 206 partial object caching. Rock can store them, but no algorithms yet exist to properly manage the pieces of incomplete objects or aggregation from different transactions.

* per-area storage indexes, instead of a Big Global Index. Working towards 64-bit sfileno which are needed for some TB sized caches. Rock and Transients storage areas are done, but other caches still TODO.

* better HDD load detection. To inform the weighting of cache_dir seectio algorithms. This is a hardware driver related project.

* Support for ZFS and XFS dynamic inode sizing. This causes lots of issues with "wrong" disk storage under/over usage. Another hardware driver related project.



Subject: Digest Footer

squid-users mailing list
squid-users at lists.squid-cache.org


End of squid-users Digest, Vol 12, Issue 33
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image3d6f6e.BMP
Type: image/bmp
Size: 13078 bytes
Desc: image3d6f6e.BMP
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20150819/cf86889a/attachment-0001.bin>

More information about the squid-users mailing list