<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
-----BEGIN PGP SIGNED MESSAGE----- <br>
Hash: SHA256 <br>
<br>
A bit more details.<br>
<br>
This is 4 transactions in chronological order. Two from wget -S and
two from same PC via browser for the same URL:<br>
<br>
*root @ khorne /tmp # wget -S
<a class="moz-txt-link-freetext" href="http://www.gazeta.ru/nm2015/js/gazeta.media.query.js*">http://www.gazeta.ru/nm2015/js/gazeta.media.query.js*</a><br>
- --2016-10-14 17:18:05--
<a class="moz-txt-link-freetext" href="http://www.gazeta.ru/nm2015/js/gazeta.media.query.js">http://www.gazeta.ru/nm2015/js/gazeta.media.query.js</a><br>
Connecting to 127.0.0.1:3128... connected.<br>
Proxy request sent, awaiting response...<br>
HTTP/1.1 301 Moved Permanently<br>
Server: nginx<br>
Date: Fri, 14 Oct 2016 11:18:07 GMT<br>
Content-Type: text/html<br>
Content-Length: 178<br>
Location: <a class="moz-txt-link-freetext" href="https://www.gazeta.ru/nm2015/js/gazeta.media.query.js">https://www.gazeta.ru/nm2015/js/gazeta.media.query.js</a><br>
X-Cache: MISS from khorne<br>
X-Cache-Lookup: MISS from khorne:3128<br>
Connection: keep-alive<br>
Location: <a class="moz-txt-link-freetext" href="https://www.gazeta.ru/nm2015/js/gazeta.media.query.js">https://www.gazeta.ru/nm2015/js/gazeta.media.query.js</a>
[following]<br>
- --2016-10-14 17:18:07--
<a class="moz-txt-link-freetext" href="https://www.gazeta.ru/nm2015/js/gazeta.media.query.js">https://www.gazeta.ru/nm2015/js/gazeta.media.query.js</a><br>
Connecting to 127.0.0.1:3128... connected.<br>
Proxy request sent, awaiting response...<br>
HTTP/1.1 200 OK<br>
Server: nginx<br>
Date: Fri, 14 Oct 2016 10:45:57 GMT<br>
Content-Type: application/javascript; charset=windows-1251<br>
Last-Modified: Fri, 30 Oct 2015 12:33:38 GMT<br>
ETag: W/"cdf370-758-52351a306ac80"<br>
Cache-Control: max-age=3600<br>
Expires: Fri, 14 Oct 2016 11:45:57 GMT<br>
Access-Control-Allow-Origin: *<br>
Age: 1930<br>
X-Cache: HIT from khorne<br>
X-Cache-Lookup: HIT from khorne:3128<br>
Transfer-Encoding: chunked<br>
Connection: keep-alive<br>
Length: unspecified [application/javascript]<br>
Saving to: 'gazeta.media.query.js'<br>
<br>
gazeta.media.query. [ <=> ] 1.84K
--.-KB/s in 0s <br>
<br>
2016-10-14 17:18:07 (138 MB/s) - 'gazeta.media.query.js' saved
[1880]<br>
<br>
/HTTP object in cache and HIT./<br>
*<br>
**root @ khorne /tmp # wget -S
<a class="moz-txt-link-freetext" href="https://www.gazeta.ru/nm2015/js/gazeta.media.query.js*">https://www.gazeta.ru/nm2015/js/gazeta.media.query.js*</a><br>
- --2016-10-14 17:18:30--
<a class="moz-txt-link-freetext" href="https://www.gazeta.ru/nm2015/js/gazeta.media.query.js">https://www.gazeta.ru/nm2015/js/gazeta.media.query.js</a><br>
Connecting to 127.0.0.1:3128... connected.<br>
Proxy request sent, awaiting response...<br>
HTTP/1.1 200 OK<br>
Server: nginx<br>
Date: Fri, 14 Oct 2016 10:45:57 GMT<br>
Content-Type: application/javascript; charset=windows-1251<br>
Last-Modified: Fri, 30 Oct 2015 12:33:38 GMT<br>
ETag: W/"cdf370-758-52351a306ac80"<br>
Cache-Control: max-age=3600<br>
Expires: Fri, 14 Oct 2016 11:45:57 GMT<br>
Access-Control-Allow-Origin: *<br>
Age: 1953<br>
X-Cache: HIT from khorne<br>
X-Cache-Lookup: HIT from khorne:3128<br>
Transfer-Encoding: chunked<br>
Connection: keep-alive<br>
Length: unspecified [application/javascript]<br>
Saving to: 'gazeta.media.query.js.1'<br>
<br>
gazeta.media.query. [ <=> ] 1.84K
--.-KB/s in 0s <br>
<br>
2016-10-14 17:18:30 (120 MB/s) - 'gazeta.media.query.js.1' saved
[1880]<br>
<br>
/HTTPS object in cache and HIT too./<br>
<br>
This is ok.<br>
<br>
*Ctrl+F5 (force reload) from browser:*<br>
<br>
1476443947.419 92 192.168.100.103 TCP_MISS/200 2323 GET
<a class="moz-txt-link-freetext" href="https://www.gazeta.ru/nm2015/js/gazeta.media.query.js">https://www.gazeta.ru/nm2015/js/gazeta.media.query.js</a> -
HIER_DIRECT/81.19.72.0 application/javascript<br>
<br>
MISS - it is ok too, client browser sends no-cache.<br>
<br>
At this point we sure object in cache, right? Both in proxy cache
and in client cache (client is the same in attempt 3 and 4). Now -
refresh from browser on the same page (same session), which is
equivalent of page auto-refresh.<br>
<br>
*F5 (refresh) from the same browser:*<br>
<br>
1476443997.252 96 192.168.100.103 TCP_MISS/304 353 GET
<a class="moz-txt-link-freetext" href="https://www.gazeta.ru/nm2015/js/gazeta.media.query.js">https://www.gazeta.ru/nm2015/js/gazeta.media.query.js</a> -
HIER_DIRECT/81.19.72.0 -<br>
<br>
Here is it. Object in proxy cache, in client cache, revalidation is
ok - object not changed. It must be TCP_REFRESH_UNMODIFIED, and this
tag we've got with HTTP object via browser.<br>
<br>
/But shit! HTTPS goes TCP_MISS/304! We're expected to get
TCP_REFRESH_UNMODIFIED/304! Because this is refresh operation, we're
sure object in both caches - proxy and client, revalidation is ok,
but this marks as MISS./<br>
<br>
Why HTTP refresh goes with TCP_REFRESH_UNMODIFIED, and the same
object via HTTPS goes with TCP_MISS? As shown above, object has no
headers preventing caching.<br>
<br>
Is it bug or feature? Because of, when site goes under HTTPS, it
will has lower hit with the same content. It seems wrong.<br>
<br>
Note: This is news site. There is no private headers or any other
cache-preventing headers.<br>
<br>
14.10.2016 15:57, Yuri Voinov пишет:<br>
<span style="white-space: pre;">><br>
> It seems I found the root of problem.<br>
><br>
> When cached object refreshed and goes via HTTP, it gives
TCP_REFRESH_UNMODIFIED in access.log, which is correct.<br>
> When the _same_ _cached_ _object_ refreshes and goes via
HTTPS, it gives TCP_MISS/304 in access.log, and this is wrong.<br>
><br>
> It seems like bug.<br>
><br>
> 14.10.2016 3:44, Yuri Voinov пишет:<br>
><br>
><br>
><br>
><br>
><br>
><br>
> > 14.10.2016 2:48, Alex Rousskov пишет:<br>
><br>
> > > On 10/13/2016 01:44 PM, Yuri Voinov wrote:<br>
><br>
><br>
><br>
> > >> However, this is nothing more than word
games, Alex.<br>
><br>
><br>
><br>
> > > ... unless the definition of a hit affects
your billing<br>
> or your<br>
><br>
> > > interpretation of Squid documentation or the
developer<br>
> interpretation of<br>
><br>
> > > the code. Definitions matter! You yourself
have seen<br>
> their importance<br>
><br>
> > > when you showed your excellent byte hit ratio
results<br>
> but folks were<br>
><br>
> > > looking at the ordinary document hit ratio
numbers<br>
> instead.<br>
><br>
> > Sure. But difference with TCP_HIT itself and byte
hit is<br>
> obvious.<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> > >> The question is -<br>
><br>
> > >> can we more or less significant
differences from<br>
> known what hit proxy<br>
><br>
> > >> code level and / or transactions which,
obviously,<br>
> on the proxy level,<br>
><br>
> > >> we can see in its entirety.<br>
><br>
><br>
><br>
> > > Sorry, I do not understand the question.<br>
><br>
> > I want to say that on the proxy level, seeing the
transaction<br>
> as a<br>
><br>
> > whole, we are able to differentiate hit or his
likeness from<br>
> all other<br>
><br>
> > transactions. We see the whole session in its
entirety. We<br>
> see repeated<br>
><br>
> > queries of the same client to the same resource.
Accordingly,<br>
> we can<br>
><br>
> > quite clearly be judged by the behavior of the
header from<br>
> the client or<br>
><br>
> > server that is happening. Correctly?<br>
><br>
><br>
><br>
> > Specifically, in this particular case. Proxy IMS
settings is<br>
> enabled:<br>
><br>
><br>
><br>
> > refresh_all_ims on<br>
><br>
> > reload_into_ims on<br>
><br>
><br>
><br>
> > On web-page level we have: periodically
reload/refresh<br>
> directive, which<br>
><br>
> > is forces to check (after initially store in
shared cache)<br>
> freshness of<br>
><br>
> > content.<br>
><br>
><br>
><br>
> > In this situation (and I've checked this web-page
elements<br>
> stored in<br>
><br>
> > cache) TCP_MISS/304 means TCP_REFRESH_UNMODIFIED.<br>
><br>
><br>
><br>
> > So, this is HIT exactly.<br>
><br>
><br>
><br>
> > I'm not saying - literally. And in fact.
Correctly?<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> > >>> Unfortunately, there are too many
definitions of<br>
> a "hit".<br>
><br>
><br>
><br>
> > >> There is no many definitions of hit. We
are talking<br>
> about the caching<br>
><br>
> > >> proxy, which is basically no different
from all the<br>
> other caches, and<br>
><br>
> > >> subject to the same rules.<br>
><br>
><br>
><br>
> > > You are oversimplifying a complex subject
matter. If<br>
> Squid delivers a<br>
><br>
> > > single response comprising 1000 bytes from
the cache and<br>
> 10 bytes from<br>
><br>
> > > the origin server, is that a hit or a miss?
If Squid<br>
> delivers the entire<br>
><br>
> > > response from the cache but spends 10 minutes
talking to<br>
> the origin<br>
><br>
> > > server about that object first, is that a hit
or a miss?<br>
> Different<br>
><br>
> > > people will give you different answers to
those<br>
> questions.<br>
><br>
> > 10 minutes a bit above TCP timeout and will be
aborted, I<br>
> think. So,<br>
><br>
> > Squid's write TCP_MISS_ABORTED in access.log. :)<br>
><br>
><br>
><br>
><br>
><br>
> > > We have [poorly defined] byte hits, document
hits,<br>
> revalidation hits,<br>
><br>
> > > stale hits, partial hits, etc., etc.<br>
><br>
> > What yes - yes. The documentation is the problem.<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> > >> If the first access does not find an
object in the<br>
> cache, it requests<br>
><br>
> > >> from the network,<br>
><br>
><br>
><br>
> > > yes<br>
><br>
><br>
><br>
> > >> saves in the cache,<br>
><br>
><br>
><br>
> > > or does not<br>
><br>
> > Yes. May be or may be not. But in this case we
are:<br>
><br>
> > 1) Know about transaction history and we know the
object(s)<br>
> in cache.<br>
><br>
> > 2) Proxy can easy check it, right? Just swap in
object from<br>
> disk in<br>
><br>
> > memory. If this success, object in cache, so we
can qualify<br>
> it as HIT.<br>
><br>
> > Otherwise, exactly MISS.<br>
><br>
><br>
><br>
><br>
><br>
> > >> and re-treatment or gets a hit,<br>
><br>
><br>
><br>
> > > or does not<br>
><br>
><br>
><br>
> > >> "the object is not changed." Dot.<br>
><br>
><br>
><br>
> > > or the Squid-cached object did not change but
the<br>
> client-cached object<br>
><br>
> > > did. Or vice versa.<br>
><br>
> > Client-cached object gives from Squid. They
(ideally) must<br>
> not be the<br>
><br>
> > different. Client cache and squid's cache operates
like<br>
> chain, one is<br>
><br>
> > source for another.<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> > >> If the time in the cache<br>
><br>
> > >> object lifetime expires, or a lifetime on
the server<br>
> timed out - the<br>
><br>
> > >> object is requested again and a miss is
recorded.<br>
><br>
><br>
><br>
> > > * Yes, if you define a miss as "contact with
the origin<br>
> server".<br>
><br>
> > I want to add: "contact with the origin server for
get<br>
> content". Not for<br>
><br>
> > revalidation purposes. If revalidation returns
"Object not<br>
> changed" -<br>
><br>
> > this is positive and must be qualified as HIT IMO.<br>
><br>
><br>
><br>
> > > * No, if contact with the origin server is OK
for a hit<br>
> as long as the<br>
><br>
> > > server does not send the response _body_ back
to Squid.<br>
><br>
> > .... when revalidation true - i.e. object in
shared cache not<br>
> stale,<br>
><br>
> > this is HIT. We're not interested in client
browser's cache<br>
> state. Only<br>
><br>
> > shared cache matters.<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> > >> if<br>
><br>
> > >> the proxy responds to the client "has not
changed",<br>
> it means, in fact,<br>
><br>
> > >> that the client has a copy of the object<br>
><br>
><br>
><br>
> > > Yes.<br>
><br>
><br>
><br>
> > >> and a copy of the proxy object,<br>
><br>
><br>
><br>
> > > The copy in the proxy cache may be different
from the<br>
> copy in the client<br>
><br>
> > > cache or may not exist at all.<br>
><br>
> > Yes. If object not exists in proxy - this is proxy
MISS. If<br>
> client cache<br>
><br>
> > not contains object - client go to proxy and asks
it about<br>
> object. Found<br>
><br>
> > - excellent, for client this is MISS, for proxy -
HIT. If<br>
> proxy also not<br>
><br>
> > contains object - it will be MISS-MISS and loading
object<br>
> from origin.<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> > >> the proxy and responds to the client,
performing<br>
> REFRESH that the object<br>
><br>
> > >> did not change. What is this, if not hit?<br>
><br>
><br>
><br>
> > > Assuming the proxy asked the origin server
whether the<br>
> object in the<br>
><br>
> > > client (or the proxy, depending on the
circumstances)<br>
> cache is fresh,<br>
><br>
> > > for many, it is<br>
><br>
><br>
><br>
> > > * a [document] miss (because there was a
potentially<br>
> very slow contact<br>
><br>
> > > with the origin server) or<br>
><br>
> > > * a [byte] hit (because the response body
came from the<br>
> Squid cache and<br>
><br>
> > > not from the origin server).<br>
><br>
><br>
><br>
> > > Resisting the existence of different valid
hit<br>
> definitions is futile<br>
><br>
> > > IMO. State what _your_ definition is (be as
precise as<br>
> possible; this<br>
><br>
> > > may require several iterations) and then you
may ask<br>
> whether a<br>
><br>
> > > particular transaction is a hit.<br>
><br>
><br>
><br>
> > > Alex.<br>
><br>
> > I agree that there are a number of boundary cases.
However,<br>
> in most<br>
><br>
> > cases we are dealing with a relatively simple
chain, which<br>
> should be<br>
><br>
> > considered and, in my opinion. How is it to be
regarded<br>
> revalidation<br>
><br>
> > facilities and its results? If revalidation
confirms that the<br>
> object is<br>
><br>
> > not stale and not expired - it's a hit, is not it?<br>
><br>
><br>
><br>
> > If revalidation fails - object stale/expired -
everything is<br>
> clear and<br>
><br>
> > there is nothing to discuss. Definitely miss.<br>
><br>
><br>
><br>
> > Well, let's say we do not know and can not know
about the<br>
> object in the<br>
><br>
> > client cache. Assume also that we do not want to
check -<br>
> whether this<br>
><br>
> > object is in the cache proxy. Let us assume that
we do not<br>
> want to spend<br>
><br>
> > resources to figure out what happened to the
object in the<br>
> future, in<br>
><br>
> > client's browser, or on proxy's disk cache. Ok.
Is, in this<br>
> case, would<br>
><br>
> > not be more correct to write in log TCP_NONE/304?<br>
><br>
><br>
><br>
> > In this case, we're talking directly - "We do not
know, hit<br>
> it or not.<br>
><br>
> > We only know that the object has not changed since
the last<br>
><br>
> > request/revalidation. We do not want to know, and
you can<br>
> interpret it<br>
><br>
> > any way you like".<br>
><br>
><br>
><br>
> > It would be more correct, it seems to me, than
just to say -<br>
><br>
> > "TCP_MISS/304 - This is a cache miss, whatever it
was not<br>
> really."<br>
><br>
><br>
><br>
> > WBR, Yuri<br>
><br>
><br>
><br>
></span><br>
<br>
-----BEGIN PGP SIGNATURE-----
<br>
Version: GnuPG v2
<br>
<br>
iQEcBAEBCAAGBQJYAMKnAAoJENNXIZxhPexGLz4H/RjDMSnsVTHDHpBqksSB28jd
<br>
SbeNAv34cRd9ECGFb0kM1I7tYe4CdwBEbXWLMdDhYc4vW9AGq70Tc55d+CMl65BV
<br>
VW/vkVcDge6g4yJ1YHTZE+sb3djlTIkjurDUTo+VZ6LUXcly58IFR2DoFNTNtU6D
<br>
K0n5zfSKkYuw9TKr3tTp9hVyldDRHI3iSvBBsE70AM1iwdTTcLKg8P6i6q51MZEx
<br>
SliuJ6gWT3o05guceGspSusBL5fRdU0twUpJhcXohI4oM0JafNmhV29CYhi4t3KU
<br>
OaCDraL8ntIHzIcLMHB74Mz0vGrHXojeot+bZ8crtpsumMt9BuUDg5HZTEj3KEU=
<br>
=4K9l
<br>
-----END PGP SIGNATURE-----
<br>
<br>
</body>
</html>