[squid-users] TCP_MISS/304 question

Yuri Voinov yvoinov at gmail.com
Fri Oct 14 11:34:00 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
 
A bit more details.

This is 4 transactions in chronological order. Two from wget -S and two
from same PC via browser for the same URL:

*root @ khorne /tmp # wget -S
http://www.gazeta.ru/nm2015/js/gazeta.media.query.js*
- --2016-10-14 17:18:05-- 
http://www.gazeta.ru/nm2015/js/gazeta.media.query.js
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 301 Moved Permanently
  Server: nginx
  Date: Fri, 14 Oct 2016 11:18:07 GMT
  Content-Type: text/html
  Content-Length: 178
  Location: https://www.gazeta.ru/nm2015/js/gazeta.media.query.js
  X-Cache: MISS from khorne
  X-Cache-Lookup: MISS from khorne:3128
  Connection: keep-alive
Location: https://www.gazeta.ru/nm2015/js/gazeta.media.query.js [following]
- --2016-10-14 17:18:07-- 
https://www.gazeta.ru/nm2015/js/gazeta.media.query.js
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 200 OK
  Server: nginx
  Date: Fri, 14 Oct 2016 10:45:57 GMT
  Content-Type: application/javascript; charset=windows-1251
  Last-Modified: Fri, 30 Oct 2015 12:33:38 GMT
  ETag: W/"cdf370-758-52351a306ac80"
  Cache-Control: max-age=3600
  Expires: Fri, 14 Oct 2016 11:45:57 GMT
  Access-Control-Allow-Origin: *
  Age: 1930
  X-Cache: HIT from khorne
  X-Cache-Lookup: HIT from khorne:3128
  Transfer-Encoding: chunked
  Connection: keep-alive
Length: unspecified [application/javascript]
Saving to: 'gazeta.media.query.js'

gazeta.media.query.     [ <=>               ]   1.84K  --.-KB/s    in
0s     

2016-10-14 17:18:07 (138 MB/s) - 'gazeta.media.query.js' saved [1880]

/HTTP object in cache and HIT./
*
**root @ khorne /tmp # wget -S
https://www.gazeta.ru/nm2015/js/gazeta.media.query.js*
- --2016-10-14 17:18:30-- 
https://www.gazeta.ru/nm2015/js/gazeta.media.query.js
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 200 OK
  Server: nginx
  Date: Fri, 14 Oct 2016 10:45:57 GMT
  Content-Type: application/javascript; charset=windows-1251
  Last-Modified: Fri, 30 Oct 2015 12:33:38 GMT
  ETag: W/"cdf370-758-52351a306ac80"
  Cache-Control: max-age=3600
  Expires: Fri, 14 Oct 2016 11:45:57 GMT
  Access-Control-Allow-Origin: *
  Age: 1953
  X-Cache: HIT from khorne
  X-Cache-Lookup: HIT from khorne:3128
  Transfer-Encoding: chunked
  Connection: keep-alive
Length: unspecified [application/javascript]
Saving to: 'gazeta.media.query.js.1'

gazeta.media.query.     [ <=>               ]   1.84K  --.-KB/s    in
0s     

2016-10-14 17:18:30 (120 MB/s) - 'gazeta.media.query.js.1' saved [1880]

/HTTPS object in cache and HIT too./

This is ok.

*Ctrl+F5 (force reload) from browser:*

1476443947.419     92 192.168.100.103 TCP_MISS/200 2323 GET
https://www.gazeta.ru/nm2015/js/gazeta.media.query.js -
HIER_DIRECT/81.19.72.0 application/javascript

MISS - it is ok too, client browser sends no-cache.

At this point we sure object in cache, right? Both in proxy cache and in
client cache (client is the same in attempt 3 and 4). Now - refresh from
browser on the same page (same session), which is equivalent of page
auto-refresh.

*F5 (refresh) from the same browser:*

1476443997.252     96 192.168.100.103 TCP_MISS/304 353 GET
https://www.gazeta.ru/nm2015/js/gazeta.media.query.js -
HIER_DIRECT/81.19.72.0 -

Here is it. Object in proxy cache, in client cache, revalidation is ok -
object not changed. It must be TCP_REFRESH_UNMODIFIED, and this tag
we've got with HTTP object via browser.

/But shit! HTTPS goes TCP_MISS/304! We're expected to get
TCP_REFRESH_UNMODIFIED/304! Because this is refresh operation, we're
sure object in both caches - proxy and client, revalidation is ok, but
this marks as MISS./

Why HTTP refresh goes with TCP_REFRESH_UNMODIFIED, and the same object
via HTTPS goes with TCP_MISS? As shown above, object has no headers
preventing caching.

Is it bug or feature? Because of, when site goes under HTTPS, it will
has lower hit with the same content. It seems wrong.

Note: This is news site. There is no private headers or any other
cache-preventing headers.

14.10.2016 15:57, Yuri Voinov пишет:
>
> It seems I found the root of problem.
>
> When cached object refreshed and goes via HTTP, it gives
TCP_REFRESH_UNMODIFIED in access.log, which is correct.
> When the _same_ _cached_ _object_ refreshes and goes via HTTPS, it
gives TCP_MISS/304 in access.log, and this is wrong.
>
> It seems like bug.
>
> 14.10.2016 3:44, Yuri Voinov пишет:
>
>
>
>
>
>
>       > 14.10.2016 2:48, Alex Rousskov пишет:
>
>       > > On 10/13/2016 01:44 PM, Yuri Voinov wrote:
>
>
>
>       > >> However, this is nothing more than word games, Alex.
>
>
>
>       > > ... unless the definition of a hit affects your billing
>       or your
>
>       > > interpretation of Squid documentation or the developer
>       interpretation of
>
>       > > the code. Definitions matter! You yourself have seen
>       their importance
>
>       > > when you showed your excellent byte hit ratio results
>       but folks were
>
>       > > looking at the ordinary document hit ratio numbers
>       instead.
>
>       > Sure. But difference with TCP_HIT itself and byte hit is
>       obvious.
>
>
>
>
>
>
>
>       > >> The question is -
>
>       > >> can we more or less significant differences from
>       known what hit proxy
>
>       > >> code level and / or transactions which, obviously,
>       on the proxy level,
>
>       > >> we can see in its entirety.
>
>
>
>       > > Sorry, I do not understand the question.
>
>       > I want to say that on the proxy level, seeing the transaction
>       as a
>
>       > whole, we are able to differentiate hit or his likeness from
>       all other
>
>       > transactions. We see the whole session in its entirety. We
>       see repeated
>
>       > queries of the same client to the same resource. Accordingly,
>       we can
>
>       > quite clearly be judged by the behavior of the header from
>       the client or
>
>       > server that is happening. Correctly?
>
>
>
>       > Specifically, in this particular case. Proxy IMS settings is
>       enabled:
>
>
>
>       > refresh_all_ims on
>
>       > reload_into_ims on
>
>
>
>       > On web-page level we have: periodically reload/refresh
>       directive, which
>
>       > is forces to check (after initially store in shared cache)
>       freshness of
>
>       > content.
>
>
>
>       > In this situation (and I've checked this web-page elements
>       stored in
>
>       > cache) TCP_MISS/304 means TCP_REFRESH_UNMODIFIED.
>
>
>
>       > So, this is HIT exactly.
>
>
>
>       > I'm not saying - literally. And in fact. Correctly?
>
>
>
>
>
>
>
>       > >>> Unfortunately, there are too many definitions of
>       a "hit".
>
>
>
>       > >> There is no many definitions of hit. We are talking
>       about the caching
>
>       > >> proxy, which is basically no different from all the
>       other caches, and
>
>       > >> subject to the same rules.
>
>
>
>       > > You are oversimplifying a complex subject matter. If
>       Squid delivers a
>
>       > > single response comprising 1000 bytes from the cache and
>       10 bytes from
>
>       > > the origin server, is that a hit or a miss? If Squid
>       delivers the entire
>
>       > > response from the cache but spends 10 minutes talking to
>       the origin
>
>       > > server about that object first, is that a hit or a miss?
>       Different
>
>       > > people will give you different answers to those
>       questions.
>
>       > 10 minutes a bit above TCP timeout and will be aborted, I
>       think. So,
>
>       > Squid's write TCP_MISS_ABORTED in access.log. :)
>
>
>
>
>
>       > > We have [poorly defined] byte hits, document hits,
>       revalidation hits,
>
>       > > stale hits, partial hits, etc., etc.
>
>       > What yes - yes. The documentation is the problem.
>
>
>
>
>
>
>
>       > >> If the first access does not find an object in the
>       cache, it requests
>
>       > >> from the network,
>
>
>
>       > > yes
>
>
>
>       > >> saves in the cache,
>
>
>
>       > > or does not
>
>       > Yes. May be or may be not. But in this case we are:
>
>       > 1) Know about transaction history and we know the object(s)
>       in cache.
>
>       > 2) Proxy can easy check it, right? Just swap in object from
>       disk in
>
>       > memory. If this success, object in cache, so we can qualify
>       it as HIT.
>
>       > Otherwise, exactly MISS.
>
>
>
>
>
>       > >> and re-treatment or gets a hit,
>
>
>
>       > > or does not
>
>
>
>       > >> "the object is not changed." Dot.
>
>
>
>       > > or the Squid-cached object did not change but the
>       client-cached object
>
>       > > did. Or vice versa.
>
>       > Client-cached object gives from Squid. They (ideally) must
>       not be the
>
>       > different. Client cache and squid's cache operates like
>       chain, one is
>
>       > source for another.
>
>
>
>
>
>
>
>       > >> If the time in the cache
>
>       > >> object lifetime expires, or a lifetime on the server
>       timed out - the
>
>       > >> object is requested again and a miss is recorded.
>
>
>
>       > > * Yes, if you define a miss as "contact with the origin
>       server".
>
>       > I want to add: "contact with the origin server for get
>       content". Not for
>
>       > revalidation purposes. If revalidation returns "Object not
>       changed" -
>
>       > this is positive and must be qualified as HIT IMO.
>
>
>
>       > > * No, if contact with the origin server is OK for a hit
>       as long as the
>
>       > > server does not send the response _body_ back to Squid.
>
>       > .... when revalidation true - i.e. object in shared cache not
>       stale,
>
>       > this is HIT. We're not interested in client browser's cache
>       state. Only
>
>       > shared cache matters.
>
>
>
>
>
>
>
>       > >> if
>
>       > >> the proxy responds to the client "has not changed",
>       it means, in fact,
>
>       > >> that the client has a copy of the object
>
>
>
>       > > Yes.
>
>
>
>       > >> and a copy of the proxy object,
>
>
>
>       > > The copy in the proxy cache may be different from the
>       copy in the client
>
>       > > cache or may not exist at all.
>
>       > Yes. If object not exists in proxy - this is proxy MISS. If
>       client cache
>
>       > not contains object - client go to proxy and asks it about
>       object. Found
>
>       > - excellent, for client this is MISS, for proxy - HIT. If
>       proxy also not
>
>       > contains object - it will be MISS-MISS and loading object
>       from origin.
>
>
>
>
>
>
>
>
>
>       > >> the proxy and responds to the client, performing
>       REFRESH that the object
>
>       > >> did not change. What is this, if not hit?
>
>
>
>       > > Assuming the proxy asked the origin server whether the
>       object in the
>
>       > > client (or the proxy, depending on the circumstances)
>       cache is fresh,
>
>       > > for many, it is
>
>
>
>       > > * a [document] miss (because there was a potentially
>       very slow contact
>
>       > > with the origin server) or
>
>       > > * a [byte] hit (because the response body came from the
>       Squid cache and
>
>       > > not from the origin server).
>
>
>
>       > > Resisting the existence of different valid hit
>       definitions is futile
>
>       > > IMO. State what _your_ definition is (be as precise as
>       possible; this
>
>       > > may require several iterations) and then you may ask
>       whether a
>
>       > > particular transaction is a hit.
>
>
>
>       > > Alex.
>
>       > I agree that there are a number of boundary cases. However,
>       in most
>
>       > cases we are dealing with a relatively simple chain, which
>       should be
>
>       > considered and, in my opinion. How is it to be regarded
>       revalidation
>
>       > facilities and its results? If revalidation confirms that the
>       object is
>
>       > not stale and not expired - it's a hit, is not it?
>
>
>
>       > If revalidation fails - object stale/expired - everything is
>       clear and
>
>       > there is nothing to discuss. Definitely miss.
>
>
>
>       > Well, let's say we do not know and can not know about the
>       object in the
>
>       > client cache. Assume also that we do not want to check -
>       whether this
>
>       > object is in the cache proxy. Let us assume that we do not
>       want to spend
>
>       > resources to figure out what happened to the object in the
>       future, in
>
>       > client's browser, or on proxy's disk cache. Ok. Is, in this
>       case, would
>
>       > not be more correct to write in log TCP_NONE/304?
>
>
>
>       > In this case, we're talking directly - "We do not know, hit
>       it or not.
>
>       > We only know that the object has not changed since the last
>
>       > request/revalidation. We do not want to know, and you can
>       interpret it
>
>       > any way you like".
>
>
>
>       >  It would be more correct, it seems to me, than just to say -
>
>       > "TCP_MISS/304 - This is a cache miss, whatever it was not
>       really."
>
>
>
>       > WBR, Yuri
>
>
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
 
iQEcBAEBCAAGBQJYAMKnAAoJENNXIZxhPexGLz4H/RjDMSnsVTHDHpBqksSB28jd
SbeNAv34cRd9ECGFb0kM1I7tYe4CdwBEbXWLMdDhYc4vW9AGq70Tc55d+CMl65BV
VW/vkVcDge6g4yJ1YHTZE+sb3djlTIkjurDUTo+VZ6LUXcly58IFR2DoFNTNtU6D
K0n5zfSKkYuw9TKr3tTp9hVyldDRHI3iSvBBsE70AM1iwdTTcLKg8P6i6q51MZEx
SliuJ6gWT3o05guceGspSusBL5fRdU0twUpJhcXohI4oM0JafNmhV29CYhi4t3KU
OaCDraL8ntIHzIcLMHB74Mz0vGrHXojeot+bZ8crtpsumMt9BuUDg5HZTEj3KEU=
=4K9l
-----END PGP SIGNATURE-----

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20161014/3f3659be/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0x613DEC46.asc
Type: application/pgp-keys
Size: 2437 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20161014/3f3659be/attachment-0001.key>


More information about the squid-users mailing list