[squid-users] Getting the full file content on a range request, but not on EVERY get ...

Thu May 12 14:06:40 UTC 2016

On Mittwoch, 11. Mai 2016 21:37:17 Heiler Bemerguy wrote:
> Hey guys,
> 
> First take a look at the log:
> 
> root at proxy:/var/log/squid# tail -f access.log |grep
> http://download.cdn.mozilla.net/pub/firefox/releases/45.0.1/update/win32/pt-> BR/firefox-45.0.1.complete.mar 1463011781.572   8776 10.1.3.236 TCP_MISS/206
> 300520 GET
[...] 
> Now think: An user is just doing a segmented/ranged download, right?
> Squid won't cache the file because it is a range-download, not a full
> file download.
> But I WANT squid to cache it. So I decide to use "range_offset_limit
> -1", but then on every GET squid will re-download the file from the
> beginning, opening LOTs of simultaneous connections and using too much
> bandwidth, doing just the OPPOSITE it's meant to!
> 
> Is there a smart way to allow squid to download it from the beginning to
> the end (to actually cache it), but only on the FIRST request/get? Even
> if it makes the user wait for the full download, or cancel it
> temporarily, or.. whatever!! Anything!!

Well, this is exactly, what my squid_dedup helper was created for!

See my announcement: 

	Subject: [squid-users] New StoreID helper: squid_dedup
	Date: Mon, 09 May 2016 23:56:45 +0200

My openSUSE environment is fetching _all_ updates with byte-ranges from many 
servers. Therefor, I created squid_dedup.

Your specific config could look like this:

/etc/squid/dedup/mozilla.conf:
[mozilla]
match: http\:\/\/download\.cdn\.mozilla\.net/(.*)
replace: http://download.cdn.mozilla.net.%(intdomain)s/\1
fetch: true

The fetch parameter is unique among the other StoreID helper (AFAIK): it is 
fetching the object after a certain delay with a pool of fetcher threads.

The idea is: after the first access for an object, wait a bit (global setting, 
default: 15 secs), and then fetch the whole thing once. It won't solve 
anything for the first client, but for all subsequent accesses. 

The fetcher avoids fetching anything more than once by checking the http 
headers.

This is a pretty new project, but be assured, that the basic functions are 
working fine, and I will do my best to solve any upcoming issues. It is 
implemented with Python3 and prepared for supporting additional features 
easily, while keeping a good part of an eye on efficiency.

Let me know, if you're going to try it.

Pete