[squid-users] unexplained MISSes

Josep Borrell jborrell at central.aplitec.com
Sat Oct 11 14:36:18 UTC 2014


Hi Amos,

I know that the first request is always a miss.
I'm reproducing the same vídeo from the same PC, and same browser erasing the cache between tests.

Are there any doc that explain the meaning of all responses like ORIGINAL_DST ?
And are there any way to know the reason of a MISS or a HIT ?
I tried with the debug options, but the reason of a MISS isn't there.

Thanks

Josep


-----Mensaje original-----
De: squid-users [mailto:squid-users-bounces at lists.squid-cache.org] En nombre de Amos Jeffries
Enviado el: viernes, 10 de octubre de 2014 19:05
Para: squid-users at lists.squid-cache.org
Asunto: Re: [squid-users] unexplained MISSes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/10/2014 5:00 a.m., Josep Borrell wrote:
> Hi,
> 
> I'm trying build a squid server that can cache youtube request for a 
> school. I'm using squid 3.4.7 compiled from source on Ubuntu server 
> 14.04 I have a lot of request that are cached, but not served from 
> cache and generate a TCP_MISS/200 I'm trying to figure why this 
> requests are not served from cache. Please find attached squid.conf 
> and cache.log sample with debug enabled.
> 
> I hope someone can help me.
> 
> Thanks
> 
> Josep
> 
> *************************************************** access.log
> *******************************************************************
>
>  1412953551.775    167 192.168.1.112 TCP_MISS/200 4344 GET
> http://i1.ytimg.com/vi/9_iANxI-Mrc/default.jpg -
> ORIGINAL_DST/74.125.230.3 image/jpeg
> 

I see ORIGINAL_DST, meaning you are intercepting users traffic. This interception action naturally imposes some big problems which Squid has to deal with.

IF the domain name does not specifically resolve for Squid to the same IP the user/client was contacting the object is NOT safely cacheable.
Squid will therefore not cache it, so as to prevent one smart-alec student causing http://i1.ytimg.com/vi/9_iANxI-Mrc/default.jpg to be fetched from a malware host - thus infecting anyone else loading the "image" they get back.

That required decision by Squid may be part of your problem, or not.
Check the log file for mentions of "Host header forgery" by this clients connection or the previous one(s) fetching that same URL.

NP: if there is *none* previously fetching the URL, there is your answer. For it to have become cached someone has to fetch it - the first fetch is *always* a MISS.


> ****************************** cache.log
> **********************************************************************
> ********************
>
> 
- ----------
> 2014/10/10 17:05:51.772| ctx: enter level  0:
> 'http://ytimg.com.squid.internal/vi/9_iANxI-Mrc/default.jpg' 
> 2014/10/10 17:05:51.773| http.cc(705) processReplyHeader:
> processReplyHeader: key '9F1BB8D27BED16A8B74F8995105B2941' 
> 2014/10/10 17:05:51.773| http.cc(749) processReplyHeader: HTTP Server 
> local=192.168.111.10:59210 remote=74.125.230.3:80 FD 65
> flags=1 2014/10/10 17:05:51.773| http.cc(750) processReplyHeader:
> HTTP Server REPLY: --------- HTTP/1.1 200 OK Content-Type:
> image/jpeg Last-Modified: Thu, 01 Jan 1970 00:23:21 GMT Date: Fri,
> 10 Oct 2014 10:45:51 GMT Expires: Fri, 10 Oct 2014 16:45:51 GMT
> X-Content-Type-Options: nosniff Server: sffe Content-Length: 3861
> X-XSS-Protection: 1; mode=block Age: 15619 Cache-Control: public,
> max-age=21600 Alternate-Protocol: 80:quic,p=0.01
> 
> ���� ---------- 2014/10/10 17:05:51.773| ctx: exit level
> 0 2014/10/10 17:05:51.773| ctx: enter level  0:
> 'http://ytimg.com.squid.internal/vi/9_iANxI-Mrc/default.jpg' 
> 2014/10/10 17:05:51.773| http.cc(919) haveParsedReplyHeaders: HTTP
> CODE: 200 2014/10/10 17:05:51.773| refresh.cc(247) refreshCheck:
> refreshCheck:
> 'http://ytimg.com.squid.internal/vi/9_iANxI-Mrc/default.jpg' 
> 2014/10/10 17:05:51.773| refresh.cc(262) refreshCheck:
> refreshCheck: Matched '^http:\/\/ytimg\.com\.squid\.internal.*
> 604800 80%% 4794000' 2014/10/10 17:05:51.773| refresh.cc(264)
> refreshCheck:  age:    15679 2014/10/10 17:05:51.773|
> refresh.cc(266) refreshCheck:  check_time:     Fri, 10 Oct 2014
> 15:06:51 GMT 2014/10/10 17:05:51.773| refresh.cc(268) refreshCheck:
> entry->timestamp:       Fri, 10 Oct 2014 10:45:32 GMT 2014/10/10
> 17:05:51.773| refresh.cc(171) refreshStaleness: FRESH: expires
> 1412959532 >= check_time 1412953611 2014/10/10 17:05:51.773|
> refresh.cc(288) refreshCheck: Staleness = -1 2014/10/10 17:05:51.773| 
> refresh.cc(373) refreshCheck: refreshCheck: object isn't stale.. 
> 2014/10/10 17:05:51.773| refresh.cc(375)
> refreshCheck: refreshCheck: returning FRESH_EXPIRES 2014/10/10 
> 17:05:51.773| http.cc(482) cacheableReply: YES because HTTP status
> 200


That image is cacheable according to its headers. Provided the next client fetches it again within the next 99mins, and the interception limits do not prevent storage.

Check for mentions of "Host header forgery" by this same user/client or the previous one fetching this URL.


> 
> **********************************************************************
> squid.conf *******************************************************
> 
> # debug helper #debug_options 84,3 #debug Refresh Calculation 
> #debug_options 22,3 # HTTP #debug_options 11,3
> 
> debug_options 11,3 22,3 84,3
> 
> 
> acl customernet src 172.16.212.154/32 acl customernet src
> 192.168.1.0/24
> 
> 
> acl SSL_ports port 443 acl SSL_ports port 873  # rsync
> 
> acl Safe_ports port 80         # http acl Safe_ports port 21
> # ftp acl Safe_ports port 443        # https acl Safe_ports port 70
> # gopher acl Safe_ports port 210        # wais acl Safe_ports port
> 1025-65535 # unregistered ports acl Safe_ports port 280        #
> http-mgmt acl Safe_ports port 488        # gss-http acl Safe_ports
> port 591        # filemaker acl Safe_ports port 777        #
> multiling http acl CONNECT method CONNECT
> 
> http_access deny !Safe_ports http_access deny CONNECT !SSL_ports 
> http_access allow localhost manager http_access deny manager 
> http_access allow localhost http_access allow customernet http_access 
> deny all # http_access allow all
> 
> #HTTPS (SSL) trafic interception options always_direct allow SSL_ports 
> ssl_bump server-first SSL_ports #sslproxy_cert_error deny all 
> #sslproxy_flags DONT_VERIFY_PEER sslcrtd_program 
> /usr/lib/squid3/ssl_crtd -s /var/spool/squid3_ssldb -M 4MB 
> sslcrtd_children 8 startup=1 idle=1
> 
> acl rewritedoms dstdomain .c.youtube.com .googlevideo.com .ytimg.com 
> store_id_program /usr/lib/squid3/storeid_file_rewrite
> /etc/squid3/storeid_rewrite store_id_children 40 startup=10 idle=5
> concurrency=0 store_id_access allow rewritedoms store_id_access deny 
> all
> 
> 
> http_port 3128 http_port 8080 intercept https_port 8081 intercept 
> ssl-bump generate-host-certificates=on dynamic_cert_mem_cache_size=4MB 
> cert=/etc/squid3/ssl_cert/squidcert.pem
> 
> 
> forward_max_tries 25 cache_mem 2 GB maximum_object_size_in_memory
> 25 MB maximum_object_size 1 GB
> 
> visible_hostname squid-v2 dns_v4_first on
> 
> 
> coredump_dir /var/spool/squid3 cache_replacement_policy heap LFUDA 
> cache_dir aufs /var/spool/squid3 14000 16 256

14GB of disk trying to cache YT videos may not be enough space.

If the objects being fetched above were legitimately discarded to make room for newer content then MISS is entirely to be expected.

> 
> refresh_pattern
> ^http:\/\/video-srv\.(googlevideo|youtube)\.com\.squid\.internal.*
> 10080 80%  79900 override-lastmod override-expire ignore-reload 
> ignore-must-revalidate ignore-private refresh_pattern
> ^http:\/\/ytimg\.com\.squid\.internal.*  10080 80%  79900 
> override-lastmod override-expire ignore-reload ignore-must-revalidate 
> ignore-private


If you look at those server reply headers you can see the Cache-Control explicitly says the reply is "public". This means that the "ignore-private" setting is useless on these images.
 * ignoring privacy controls is a highly nasty thing to do to start with. I suggest strongly that you remove it from the ytimg pattern options as unnecessary.

Also, here is an artifact of the override options which you may not be
aware:

 The Expires header coming from YT indicates that the reply is cacheable for 6hrs from the time it was generated. That is 21600 seconds, no less.
 Your override-expires is actively *reducing* the storage time to
10800 seconds (the 'min' field of the pattern options).

 Therefore to maximise caching you should *drop* the override-expires option on the ytimg pattern.


NP: override-lastmod is usually also a bad thing to do, but I see you are actually hitting THE use-case for which it legitimately exist...
server emitting incorrect Last-Modified value (1970? sure).



Amos

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUOBHWAAoJELJo5wb/XPRjGv4H/iKpL4UUBOXVsy7BixBeM5xp
JoyRtwRcfH8ksI/G622KANQ2XzOMLyYr9BX5X8ofBGjT8WRRxEbQDzp/EVADB7+6
xO/OwW2413YiYSmgszOTx0joAdUFB4R4ShnCEjDHQ6RJFKIquQql58hDN+6eGHFZ
AkzrlL+V+CzkpOGWmu51kBLlaSMQJ2SaWOJMw0ykn4giuL87XYeRyQ9seFR9Q5Ll
7cKi0QZflcCvDnwa87vPfnvsQfBF/cFjuv7/NigGsO8IklgQ/T3xPf+A6WVXmURj
USLjOmFFsA6xaLZMBeUnmWo4kjX5iYXT1S14yQokpf+DUPfaM0+bt/hUFNLnkuk=
=1TqK
-----END PGP SIGNATURE-----
_______________________________________________
squid-users mailing list
squid-users at lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


More information about the squid-users mailing list