[squid-users] Not all html objects are being cached

Yuri yvoinov at gmail.com
Fri Jan 27 09:47:06 UTC 2017



27.01.2017 9:10, Amos Jeffries пишет:
> On 27/01/2017 9:46 a.m., Yuri Voinov wrote:
>>
>> 27.01.2017 2:44, Matus UHLAR - fantomas пишет:
>>>> 26.01.2017 2:22, boruc пишет:
>>>>> After a little bit of analyzing requests and responses with WireShark I
>>>>> noticed that many sites that weren't cached had different
>>>>> combination of
>>>>> below parameters:
>>>>>
>>>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>>>> pre-check,
>>>>> private, public, max-age, public
>>>>> Pragma: no-cache
>>> On 26.01.17 02:44, Yuri Voinov wrote:
>>>> If the webmaster has done this - he had good reason to. Trying to break
>>>> the RFC in this way, you break the Internet.
>>> Actually, no. If the webmaster has done the above - he has no damn
>>> idea what
>>> those mean (private and public?) , and how to provide properly cacheable
>>> content.
>> It was sarcasm.
>
> You may have intended it to be. But you spoke the simple truth.
>
> Other than 'public' there really are situations which have "good reason"
> to send that set of controls all at once.
>
> For example; any admin who wants a RESTful or SaaS application to
> actually work for all their potential customers.
>
>
> I have been watching the below cycle take place for the past 20 years in
> HTTP:
>
> Webmaster: dont cache this please.
>
>    "Cache-Control: no-store"
>
> Proxy Admin: ignore-no-store
>
>
> Webmaster: I meant it. Dont deliver anything you cached without fetching
> a updated version.
>
>    ... "no-store, no-cache"
>
> Proxy Admin: ignore-no-cache
>
>
> Webmaster: really you MUST revalidate before using ths data.
>
>   ... "no-store, no-cache, must-revalidate"
>
> Proxy Admin: ignore-must-revalidate
>
>
> Webmaster: Really I meant it. This is non-storable PRIVATE DATA!
>
> ... "no-store, no-cache, must-revalidate, private"
>
> Proxy Admin: ignore-private
>
>
> Webmaster: Seriously. I'm changing it on EVERY request! dont store it.
>
> ... "no-store, no-cache, must-revalidate, private, max-age=0"
> "Expires: -1"
>
> Proxy Admin: ignore-expires
>
>
> Webmaster: are you one of those dumb HTTP/1.0 proxies who dont
> understand Cache-Control?
>
> "Pragma: no-cache"
> "Expires: 1 Jan 1970"
>
> Proxy Admin: hehe! I already ignore-no-cache ignore-expires
>
>
> Webmaster: F*U!  May your clients batch up their traffic to slam you
> with it all at once!
>
> ... "no-store, no-cache, must-revalidate, private, max-age=0,
> pre-check=1, post-check=1"
>
>
> Proxy Admin: My bandwidth! I need to cache more!
>
> Webmaster: Doh! Oh well, so I have to write my application to force new
> content then.
>
> Proxy Admin: ignore-reload
>
>
> Webmaster: Now What? Oh HTTPS wont have any damn proxies in the way....
>
> ... the cycle repeats again within HTTPS. Took all of 5 years this time.
>
> ... the cycle repeats again within SPDY. That took only ~1 year.
>
> ... the cycle repeats again within CoAP. The standards are not even
> finished yet and its underway.
>
>
> Stop this cycle of stupidity. It really HAS "broken the Internet".
All that would be just great if a webmaster was conscientious. I will 
give just one example.

Only one example.

root @ khorne /patch # wget -S http://www.microsoft.com
--2017-01-27 15:29:54--  http://www.microsoft.com/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
   HTTP/1.1 302 Found
   Server: AkamaiGHost
   Content-Length: 0
   Location: http://www.microsoft.com/ru-kz/
   Date: Fri, 27 Jan 2017 09:29:54 GMT
   X-CCC: NL
   X-CID: 2
   X-Cache: MISS from khorne
   X-Cache-Lookup: MISS from khorne:3128
   Connection: keep-alive
Location: http://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54--  http://www.microsoft.com/ru-kz/
Reusing existing connection to 127.0.0.1:3128.
Proxy request sent, awaiting response...
   HTTP/1.1 301 Moved Permanently
   Server: AkamaiGHost
   Content-Length: 0
   Location: https://www.microsoft.com/ru-kz/
   Date: Fri, 27 Jan 2017 09:29:54 GMT
   Set-Cookie: 
akacd_OneRF=1493285394~rv=7~id=6a2316770abdbb58a85c16676a0f84fd; path=/; 
Expires=Thu, 27 Apr 2017 09:29:54 GMT
   X-CCC: NL
   X-CID: 2
   X-Cache: MISS from khorne
   X-Cache-Lookup: MISS from khorne:3128
   Connection: keep-alive
Location: https://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
   HTTP/1.1 200 OK
   Cache-Control: no-cache, no-store
   Pragma: no-cache
   Content-Type: text/html
   Expires: -1
   Server: Microsoft-IIS/8.0
   CorrelationVector: BzssVwiBIUaXqyOh.1.1
   X-AspNet-Version: 4.0.30319
   X-Powered-By: ASP.NET
   Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, 
Accept
   Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
   Access-Control-Allow-Credentials: true
   P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo 
OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
   X-Frame-Options: SAMEORIGIN
   Vary: Accept-Encoding
   Content-Encoding: gzip
   Date: Fri, 27 Jan 2017 09:29:56 GMT
   Content-Length: 13322
   Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com; 
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
   Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com; 
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
   Strict-Transport-Security: max-age=0; includeSubDomains
   X-CCC: NL
   X-CID: 2
   X-Cache: MISS from khorne
   X-Cache-Lookup: MISS from khorne:3128
   Connection: keep-alive
Length: 13322 (13K) [text/html]
Saving to: 'index.html'

index.html          100%[==================>]  13.01K --.-KB/s    in 0s

2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]

Can you explain me - for what static index.html has this:

Cache-Control: no-cache, no-store
Pragma: no-cache

?

What can be broken to ignore CC in this page?


Yes, saving traffic is the most important, because not all and not 
everywhere has terabit links with unlimited calling. Moreover, the 
number of users increases and the capacity is finite. In any case, the 
decision on how to deal with the content in such a situation should 
remain behind the proxy administrator. And not for the developers of 
this proxy, which is hardcoded own vision, even with RFC. Because the 
byte-hit 10% (vanilla Squid, after very hadr work it will be up to 30%, 
but no more) - this is ridiculous. In such a situation would be more 
honest nothing at all cache - only let's not say that the squid - a 
caching proxy. Set the path of the secondary server that requires a lot 
of attention, despite the fact that it gives a gain only 10% - a mockery 
of users.

Let me explain the situation as I see it. Webmaster hanging everywhere 
ban caching in any way possible, because on its pages full of 
advertising. For that pays money. This is the same reason that Google 
prevents caching Youtube. Big money. We do not get the money, in fact, 
our goal - to minimize the costs of traffic. We choose Squid as a tool. 
And you, with your point of view, deprived us of weapons against 
unscrupulous webmasters. So it looks.

Again. Breaking the Internet - it should be my choice, not yours. Or 
follow the RFC at 100% - or do not have to break it in part. You either 
wear pants or remove the cross, as they say.
>
>
> HTH
> Amos
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users



More information about the squid-users mailing list