[squid-users] Caching http google deb files

Hardik Dangar hardikdangar+squid at gmail.com
Wed Oct 5 10:27:42 UTC 2016


Hey Amos,

I have implemented your patch at

and added following to my squid.conf
archive_mode allow all

and my refresh pattern is,
refresh_pattern dl-ssl.google.com/.*\.(deb|zip|tar|rpm) 129600 100% 129600
ignore-reload ignore-no-store override-expire override-lastmod ignor$

but i am still not able to cache it, can you tell from below output what
would be the problem ? Do i need to configure anything extra ?

here is the debug output for the same,
------------------------------------------------------------------------------------------------

2016/10/05 15:46:25.319 kid1| 5,2| TcpAcceptor.cc(220) doAccept: New
connection on FD 14
2016/10/05 15:46:25.319 kid1| 5,2| TcpAcceptor.cc(295) acceptNext:
connection on local=[::]:3128 remote=[::] FD 14 flags=9
2016/10/05 15:46:25.319 kid1| 11,2| client_side.cc(2346) parseHttpRequest:
HTTP Client local=192.168.1.1:3128 remote=192.168.1.76:51236 FD 12 flags=1
2016/10/05 15:46:25.319 kid1| 11,2| client_side.cc(2347) parseHttpRequest:
HTTP Client REQUEST:
---------
GET
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb
HTTP/1.1
Host: dl-ssl.google.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101
Firefox/49.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie:
NID=88=109tS20j8Ec0EQb5HzuNnbwtsl4sK64aakVRn-2qOe91Zv4e3st9lfyik8qQe7d12J4xBDCmdKMwiXY98a2dj4mOitaP4AbJV6fD7o9YKTxE7MziEkNCJ45GiDszPM8wXca5cuYK_gE4QVrU52VqzSa1IzmHbh_7XKsvYuDCSsgIMZaC8d4Fp01vrAU8dHPXGopVpBIxgpHwAjPv8NvLFM3e4y-um5A8umQ-GCFmpaaLd1_1jyafkNLTj-9Ix4hfsw;
SID=1ANPj1-lw03bKfunZfrmk8ZsjEcTl5AiLgwzgtzki8MZ3JuvGyYgiP7LRJ05U1HQWbf76g.;
HSID=AUu5M-p2Rw1uDb2_0; APISID=ss4uEw9eIOgmsZXv/ARs9Vws4Es_o_sfVX
Connection: keep-alive
Upgrade-Insecure-Requests: 1


----------
2016/10/05 15:46:25.320 kid1| 85,2| client_side_request.cc(744)
clientAccessCheckDone: The request GET
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb
is ALLOWED; last ACL checked: CONNECT
2016/10/05 15:46:25.320 kid1| 85,2| client_side_request.cc(720)
clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
2016/10/05 15:46:25.320 kid1| 85,2| client_side_request.cc(744)
clientAccessCheckDone: The request GET
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb
is ALLOWED; last ACL checked: CONNECT
2016/10/05 15:46:25.320 kid1| 17,2| FwdState.cc(133) FwdState: Forwarding
client request local=192.168.1.1:3128 remote=192.168.1.76:51236 FD 12
flags=1, url=
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb
2016/10/05 15:46:25.320 kid1| 44,2| peer_select.cc(258) peerSelectDnsPaths:
Find IP destination for:
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb'
via dl-ssl.google.com
2016/10/05 15:46:25.417 kid1| 44,2| peer_select.cc(280) peerSelectDnsPaths:
Found sources for '
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb
'
2016/10/05 15:46:25.417 kid1| 44,2| peer_select.cc(281) peerSelectDnsPaths:
  always_direct = ALLOWED
2016/10/05 15:46:25.417 kid1| 44,2| peer_select.cc(282) peerSelectDnsPaths:
   never_direct = DENIED
2016/10/05 15:46:25.417 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
         DIRECT = local=[::] remote=[2404:6800:4008:c02::be]:80 flags=1
2016/10/05 15:46:25.417 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
         DIRECT = local=0.0.0.0 remote=74.125.23.136:80 flags=1
2016/10/05 15:46:25.417 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
         DIRECT = local=0.0.0.0 remote=74.125.23.93:80 flags=1
2016/10/05 15:46:25.417 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
         DIRECT = local=0.0.0.0 remote=74.125.23.91:80 flags=1
2016/10/05 15:46:25.418 kid1| 44,2| peer_select.cc(286) peerSelectDnsPaths:
         DIRECT = local=0.0.0.0 remote=74.125.23.190:80 flags=1
2016/10/05 15:46:25.418 kid1| 44,2| peer_select.cc(295) peerSelectDnsPaths:
       timedout = 0
2016/10/05 15:46:25.418 kid1| 14,2| ipcache.cc(924) ipcacheMarkBadAddr:
ipcacheMarkBadAddr: dl-ssl.google.com [2404:6800:4008:c02::be]:80
2016/10/05 15:46:25.567 kid1| 11,2| http.cc(2203) sendRequest: HTTP Server
local=192.168.1.1:36674 remote=74.125.23.136:80 FD 13 flags=1
2016/10/05 15:46:25.567 kid1| 11,2| http.cc(2204) sendRequest: HTTP Server
REQUEST:
---------
GET /dl/linux/direct/mod-pagespeed-beta_current_i386.deb HTTP/1.1
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101
Firefox/49.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie:
NID=88=109tS20j8Ec0EQb5HzuNnbwtsl4sK64aakVRn-2qOe91Zv4e3st9lfyik8qQe7d12J4xBDCmdKMwiXY98a2dj4mOitaP4AbJV6fD7o9YKTxE7MziEkNCJ45GiDszPM8wXca5cuYK_gE4QVrU52VqzSa1IzmHbh_7XKsvYuDCSsgIMZaC8d4Fp01vrAU8dHPXGopVpBIxgpHwAjPv8NvLFM3e4y-um5A8umQ-GCFmpaaLd1_1jyafkNLTj-9Ix4hfsw;
SID=1ANPj1-lw03bKfunZfrmk8ZsjEcTl5AiLgwzgtzki8MZ3JuvGyYgiP7LRJ05U1HQWbf76g.;
HSID=AUu5M-p2Rw1uDb2_0; APISID=ss4uEw9eIOgmsZXv/ARs9Vws4Es_o_sfVX
Host: dl-ssl.google.com
Cache-Control: max-age=7776000
Connection: keep-alive


----------
2016/10/05 15:46:25.780 kid1| ctx: enter level  0: '
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb
'
2016/10/05 15:46:25.780 kid1| 11,2| http.cc(717) processReplyHeader: HTTP
Server local=192.168.1.1:36674 remote=74.125.23.136:80 FD 13 flags=1
2016/10/05 15:46:25.780 kid1| 11,2| http.cc(718) processReplyHeader: HTTP
Server REPLY:
---------
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 6662208
Content-Type: application/x-debian-package
Etag: "fa383"
Last-Modified: Thu, 15 Sep 2016 19:24:00 GMT
Server: downloads
Vary: *
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
Date: Wed, 05 Oct 2016 10:16:25 GMT

!<arch>
debian-binary   1473872866  0     0     100644  4         `
2.0
control.tar.gz  1473872866  0     0     100644  7806      `
�
----------
2016/10/05 15:46:25.780 kid1| ctx: exit level  0
2016/10/05 15:46:25.780 kid1| 20,2| store.cc(949) checkCachable:
StoreEntry::checkCachable: NO: not cachable
2016/10/05 15:46:25.780 kid1| 20,2| store.cc(949) checkCachable:
StoreEntry::checkCachable: NO: not cachable
2016/10/05 15:46:25.781 kid1| 88,2| client_side_reply.cc(2005)
processReplyAccessResult: The reply for GET
http://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb
is ALLOWED, because it matched all
2016/10/05 15:46:25.781 kid1| 11,2| client_side.cc(1392)
sendStartOfMessage: HTTP Client local=192.168.1.1:3128 remote=
192.168.1.76:51236 FD 12 flags=1
2016/10/05 15:46:25.781 kid1| 11,2| client_side.cc(1393)
sendStartOfMessage: HTTP Client REPLY:
---------
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 6662208
Content-Type: application/x-debian-package
ETag: "fa383"
Last-Modified: Thu, 15 Sep 2016 19:24:00 GMT
Server: downloads
Vary: *
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
Date: Wed, 05 Oct 2016 10:16:25 GMT
Connection: keep-alive


----------
2016/10/05 15:46:25.781 kid1| 20,2| store.cc(949) checkCachable:
StoreEntry::checkCachable: NO: not cachable
2016/10/05 15:46:25.781 kid1| 20,2| store.cc(949) checkCachable:
StoreEntry::checkCachable: NO: not cachable
2016/10/05 15:46:25.781 kid1| 20,2| store.cc(949) checkCachable:
StoreEntry::checkCachable: NO: not cachable
2016/10/05 15:46:25.782 kid1| 20,2| store.cc(949) checkCachable:
StoreEntry::checkCachable: NO: not cachable



On Tue, Oct 4, 2016 at 8:00 PM, Hardik Dangar <hardikdangar+squid at gmail.com>
wrote:

> Wow, i couldn't think about that. google might need tracking data that
> could be the reason they have blindly put vary * header. oh Irony, company
> which talks to all of us on how to deliver content is trying to do such
> thing.
>
> I have looked at your patch but how do i enable that ? do i need to write
> custom ACL ? i know i need to compile and reinstall after applying patch
> but what do i need to do exactly in squid.conf file as looking at your
> patch i am guessing i need to write archive acl or i am too naive to
> understand C code :)
>
> Also
>
> reply_header_replace is any good for this ?
>
>
> On Tue, Oct 4, 2016 at 7:47 PM, Amos Jeffries <squid3 at treenet.co.nz>
> wrote:
>
>> On 5/10/2016 2:34 a.m., Hardik Dangar wrote:
>> > Hey Amos,
>> >
>> > We have about 50 clients which downloads same google chrome update
>> every 2
>> > or 3 days means 2.4 gb. although response says vary but requested file
>> is
>> > same and all is downloaded via apt update.
>> >
>> > Is there any option just like ignore-no-store? I know i am asking for
>> too
>> > much but it seems very silly on google's part that they are sending very
>> > header at a place where they shouldn't as no matter how you access those
>> > url's you are only going to get those deb files.
>>
>>
>> Some things G does only make sense whan you ignore all the PR about
>> wanting to make the web more efficient and consider it's a company whose
>> income is derived by recording data about peoples habits and activities.
>> Caching can hide that info from them.
>>
>> >
>> > can i hack squid source code to ignore very header ?
>> >
>>
>> Google are explicitly saying the response changes. I suspect there is
>> something involving Google account data being embeded in some of the
>> downloads. For tracking, etc.
>>
>>
>> If you are wanting to test it I have added a patch to
>> <http://bugs.squid-cache.org/show_bug.cgi?id=4604> that should implement
>> archival of responses where the ACLs match. It is completely untested by
>> me beyond building, so YMMV.
>>
>> Amos
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20161005/e0262913/attachment-0001.html>


More information about the squid-users mailing list