[squid-users] Not all html objects are being cached
Yuri
yvoinov at gmail.com
Fri Jan 27 09:47:06 UTC 2017
27.01.2017 9:10, Amos Jeffries пишет:
> On 27/01/2017 9:46 a.m., Yuri Voinov wrote:
>>
>> 27.01.2017 2:44, Matus UHLAR - fantomas пишет:
>>>> 26.01.2017 2:22, boruc пишет:
>>>>> After a little bit of analyzing requests and responses with WireShark I
>>>>> noticed that many sites that weren't cached had different
>>>>> combination of
>>>>> below parameters:
>>>>>
>>>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>>>> pre-check,
>>>>> private, public, max-age, public
>>>>> Pragma: no-cache
>>> On 26.01.17 02:44, Yuri Voinov wrote:
>>>> If the webmaster has done this - he had good reason to. Trying to break
>>>> the RFC in this way, you break the Internet.
>>> Actually, no. If the webmaster has done the above - he has no damn
>>> idea what
>>> those mean (private and public?) , and how to provide properly cacheable
>>> content.
>> It was sarcasm.
>
> You may have intended it to be. But you spoke the simple truth.
>
> Other than 'public' there really are situations which have "good reason"
> to send that set of controls all at once.
>
> For example; any admin who wants a RESTful or SaaS application to
> actually work for all their potential customers.
>
>
> I have been watching the below cycle take place for the past 20 years in
> HTTP:
>
> Webmaster: dont cache this please.
>
> "Cache-Control: no-store"
>
> Proxy Admin: ignore-no-store
>
>
> Webmaster: I meant it. Dont deliver anything you cached without fetching
> a updated version.
>
> ... "no-store, no-cache"
>
> Proxy Admin: ignore-no-cache
>
>
> Webmaster: really you MUST revalidate before using ths data.
>
> ... "no-store, no-cache, must-revalidate"
>
> Proxy Admin: ignore-must-revalidate
>
>
> Webmaster: Really I meant it. This is non-storable PRIVATE DATA!
>
> ... "no-store, no-cache, must-revalidate, private"
>
> Proxy Admin: ignore-private
>
>
> Webmaster: Seriously. I'm changing it on EVERY request! dont store it.
>
> ... "no-store, no-cache, must-revalidate, private, max-age=0"
> "Expires: -1"
>
> Proxy Admin: ignore-expires
>
>
> Webmaster: are you one of those dumb HTTP/1.0 proxies who dont
> understand Cache-Control?
>
> "Pragma: no-cache"
> "Expires: 1 Jan 1970"
>
> Proxy Admin: hehe! I already ignore-no-cache ignore-expires
>
>
> Webmaster: F*U! May your clients batch up their traffic to slam you
> with it all at once!
>
> ... "no-store, no-cache, must-revalidate, private, max-age=0,
> pre-check=1, post-check=1"
>
>
> Proxy Admin: My bandwidth! I need to cache more!
>
> Webmaster: Doh! Oh well, so I have to write my application to force new
> content then.
>
> Proxy Admin: ignore-reload
>
>
> Webmaster: Now What? Oh HTTPS wont have any damn proxies in the way....
>
> ... the cycle repeats again within HTTPS. Took all of 5 years this time.
>
> ... the cycle repeats again within SPDY. That took only ~1 year.
>
> ... the cycle repeats again within CoAP. The standards are not even
> finished yet and its underway.
>
>
> Stop this cycle of stupidity. It really HAS "broken the Internet".
All that would be just great if a webmaster was conscientious. I will
give just one example.
Only one example.
root @ khorne /patch # wget -S http://www.microsoft.com
--2017-01-27 15:29:54-- http://www.microsoft.com/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
HTTP/1.1 302 Found
Server: AkamaiGHost
Content-Length: 0
Location: http://www.microsoft.com/ru-kz/
Date: Fri, 27 Jan 2017 09:29:54 GMT
X-CCC: NL
X-CID: 2
X-Cache: MISS from khorne
X-Cache-Lookup: MISS from khorne:3128
Connection: keep-alive
Location: http://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54-- http://www.microsoft.com/ru-kz/
Reusing existing connection to 127.0.0.1:3128.
Proxy request sent, awaiting response...
HTTP/1.1 301 Moved Permanently
Server: AkamaiGHost
Content-Length: 0
Location: https://www.microsoft.com/ru-kz/
Date: Fri, 27 Jan 2017 09:29:54 GMT
Set-Cookie:
akacd_OneRF=1493285394~rv=7~id=6a2316770abdbb58a85c16676a0f84fd; path=/;
Expires=Thu, 27 Apr 2017 09:29:54 GMT
X-CCC: NL
X-CID: 2
X-Cache: MISS from khorne
X-Cache-Lookup: MISS from khorne:3128
Connection: keep-alive
Location: https://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54-- https://www.microsoft.com/ru-kz/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store
Pragma: no-cache
Content-Type: text/html
Expires: -1
Server: Microsoft-IIS/8.0
CorrelationVector: BzssVwiBIUaXqyOh.1.1
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type,
Accept
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Credentials: true
P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo
OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
X-Frame-Options: SAMEORIGIN
Vary: Accept-Encoding
Content-Encoding: gzip
Date: Fri, 27 Jan 2017 09:29:56 GMT
Content-Length: 13322
Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
Strict-Transport-Security: max-age=0; includeSubDomains
X-CCC: NL
X-CID: 2
X-Cache: MISS from khorne
X-Cache-Lookup: MISS from khorne:3128
Connection: keep-alive
Length: 13322 (13K) [text/html]
Saving to: 'index.html'
index.html 100%[==================>] 13.01K --.-KB/s in 0s
2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]
Can you explain me - for what static index.html has this:
Cache-Control: no-cache, no-store
Pragma: no-cache
?
What can be broken to ignore CC in this page?
Yes, saving traffic is the most important, because not all and not
everywhere has terabit links with unlimited calling. Moreover, the
number of users increases and the capacity is finite. In any case, the
decision on how to deal with the content in such a situation should
remain behind the proxy administrator. And not for the developers of
this proxy, which is hardcoded own vision, even with RFC. Because the
byte-hit 10% (vanilla Squid, after very hadr work it will be up to 30%,
but no more) - this is ridiculous. In such a situation would be more
honest nothing at all cache - only let's not say that the squid - a
caching proxy. Set the path of the secondary server that requires a lot
of attention, despite the fact that it gives a gain only 10% - a mockery
of users.
Let me explain the situation as I see it. Webmaster hanging everywhere
ban caching in any way possible, because on its pages full of
advertising. For that pays money. This is the same reason that Google
prevents caching Youtube. Big money. We do not get the money, in fact,
our goal - to minimize the costs of traffic. We choose Squid as a tool.
And you, with your point of view, deprived us of weapons against
unscrupulous webmasters. So it looks.
Again. Breaking the Internet - it should be my choice, not yours. Or
follow the RFC at 100% - or do not have to break it in part. You either
wear pants or remove the cross, as they say.
>
>
> HTH
> Amos
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users
More information about the squid-users
mailing list