[squid-users] Not all html objects are being cached

Amos Jeffries squid3 at treenet.co.nz
Thu Jan 26 01:01:54 UTC 2017


On 26/01/2017 9:44 a.m., Yuri Voinov wrote:
> 
> 
> 26.01.2017 2:22, boruc пишет:
>> After a little bit of analyzing requests and responses with WireShark I
>> noticed that many sites that weren't cached had different combination of
>> below parameters:
>>
>> Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
>> private, public, max-age, public
>> Pragma: no-cache
> If the webmaster has done this - he had good reason to. Trying to break
> the RFC in this way, you break the Internet.

Instead use the latest Squid you can. Squid by default caches as much as
it can within the restrictions imposed by the web environment. But
'latest is best' etc. since we are still working on support for HTTP/1.1
features.


I recommend you use the tool at <http://redbot.org> to check URLs
cacheability instead of wireshark. It will tell you what those controls
actually *mean* in regards to cacheability, not just that they are used.
And whether there are other problems you may not have noticed in the
various different ways there are to fetch any given URL.


The Squid options available are mostly for disabling some caching
operation - so that if you are in a situation where disabling operation
X causes operation Y to cache better you can tune the behaviour.

You can't really *force* things which are not cacheable to be stored.
They will just be replaced with a newer copy shortly after with no
benefit gained - just some possibly nasty side effects, or real monetary
costs.


>>
>> There is a possibility to disable this in squid by using
> Don't do it.
>> request_header_access and reply_header_access, however it doesn't work for
>> me, many pages aren't still in cache. I am currently using lines below:
>>
>> request_header_access Cache-Control deny all
>> request_header_access Pragma deny all
>> request_header_access Accept-Encoding deny all
>> reply_header_access Cache-Control deny all
>> reply_header_access Pragma deny all
>> reply_header_access Accept-Encoding deny all
>>

Ah, changing the headers on the *outgoing* traffic does not in any way
affect how Squid interprets the _previously_ received inbound messages.

==> In other words; doing the above is pointless and screws everybody
using your proxy over. Dont do that.


By erasing the Cache-Controls response header delivered along with that
content you are technically in violation of International copyright laws.
==> Dont do that.


By removing the Accept-Encoding on requests (only) you can improve HIT
ratio (only a small amount), but at cost of 50-90% bandwidth increase on
each MISS - so the cost increase usually swamps the gains.

==> Making this change lead to the opposite of what you intended. Dont
do that.


Removing the Accept-Encoding header on responses. Is just pointless. It
controls POST/PUT payload data, which Squid cannot cache anyway. So all
you did was prevent the clients using less bandwidth.

==> More bandwidth, more costs. Dont do that.


Removing the Pragma header is also pointless. It's used by very ancient
software from the 1990's and such.

==> if the web application was actually using the Pragma for anything
important (some do) you just screwed them over with no gains to
yourself. Dont do that.


>> I could also try refresh_pattern, but I don't think that code below will
>> work because not every URL ends with .html or .htm (because you visit
>> /www.example.com/, not /www.example.com/index.html/)
>> refresh_pattern -i \.(html|htm)$          1440   40% 40320 ignore-no-cache
>> ignore-no-store ignore-private override-expire reload-into-ims
>>


Quite. So configure the correct options.

No software is psychic enough to do operation X which you want, when you
configure it to do *only* some other non-X operation.


Amos



More information about the squid-users mailing list