[squid-users] Caching http google deb files

Hardik Dangar hardikdangar+squid at gmail.com
Wed Oct 5 18:40:46 UTC 2016


Hey Jok,

Thanks for the suggetion but the big issue with that is i have to download
whole repository about ( 80-120 GB ) first and then each week i need to
download 20 to 25 GB.  We hardly use any of that except few popular repos.
big issue i always have with most of them is third party repo's.
squid-deb-proxy is quite reliable but again its squid with custom config
nothing else and it fails to cache google debs.

Squid is perfect for me because it can cache things which is requested
first time. So next time anybody requests it it's ready. The problem lies
when big companies like google and github does not wants us to cache their
content and puts various tricks so we can't do that. My issue is same
google deb files are downloaded 50 times in same day as apt updates happen
and i waste 100s of gb into same content. Country where i live bandwidth is
very very costly matter and fast connections are very costly. So this is
important for me.

@Amos,

I think it's about time Squid needs update of code which can cache use
cases like difficult to handle google and github. I am interested to create
proposal and will soon share at squid dev and ask for ideas and will try to
get official approval so i can build this according to squid standards.

but before that can you help me with few things.essentially i don't have
much experience with C code. as i have worked most of my life with
php,python and javascript side. I do know how to write C code but i am not
an expert at it. So i want to know if there is any pattern squid follows
except the oop pattern. I also want to know workflow of squid i.e. what
happens when it receives request and how acls are applied programmatically
and how refresh patterns are applied. is there a way i can debug and check
if refresh patterns are applied for given url. as well as
reply_header_replace has replaced header if i can see those lines in debug
it will help me with this. i know debug options can help me but if i turn
it with level 9 it is very difficult to go past so many debug entries.

My idea is to develop a module which will not change any of the squid code
but will be loaded only if its called explicitly within squid config. So i
want to know is there any piece of code available within squid which
behaves similarly just like your archive mode.




On Wed, Oct 5, 2016 at 9:49 PM, Jok Thuau <jok at spikes.com> wrote:

> This is sort of off-topic, but have you considered using a deb repo
> mirroring software?
> (it would mean that you need to update your clients to point to that
> rather than google, but that's not really difficult).
> software like aptly (aptly.info) are really good about this (though a
> little hard to get going in the first place). or a deb-caching proxy
> (apt-cacher-ng? squid-deb-proxy?)
>
>
> On Tue, Oct 4, 2016 at 7:30 AM, Hardik Dangar <
> hardikdangar+squid at gmail.com> wrote:
>
>> Wow, i couldn't think about that. google might need tracking data that
>> could be the reason they have blindly put vary * header. oh Irony, company
>> which talks to all of us on how to deliver content is trying to do such
>> thing.
>>
>> I have looked at your patch but how do i enable that ? do i need to write
>> custom ACL ? i know i need to compile and reinstall after applying patch
>> but what do i need to do exactly in squid.conf file as looking at your
>> patch i am guessing i need to write archive acl or i am too naive to
>> understand C code :)
>>
>> Also
>>
>> reply_header_replace is any good for this ?
>>
>>
>> On Tue, Oct 4, 2016 at 7:47 PM, Amos Jeffries <squid3 at treenet.co.nz>
>> wrote:
>>
>>> On 5/10/2016 2:34 a.m., Hardik Dangar wrote:
>>> > Hey Amos,
>>> >
>>> > We have about 50 clients which downloads same google chrome update
>>> every 2
>>> > or 3 days means 2.4 gb. although response says vary but requested file
>>> is
>>> > same and all is downloaded via apt update.
>>> >
>>> > Is there any option just like ignore-no-store? I know i am asking for
>>> too
>>> > much but it seems very silly on google's part that they are sending
>>> very
>>> > header at a place where they shouldn't as no matter how you access
>>> those
>>> > url's you are only going to get those deb files.
>>>
>>>
>>> Some things G does only make sense whan you ignore all the PR about
>>> wanting to make the web more efficient and consider it's a company whose
>>> income is derived by recording data about peoples habits and activities.
>>> Caching can hide that info from them.
>>>
>>> >
>>> > can i hack squid source code to ignore very header ?
>>> >
>>>
>>> Google are explicitly saying the response changes. I suspect there is
>>> something involving Google account data being embeded in some of the
>>> downloads. For tracking, etc.
>>>
>>>
>>> If you are wanting to test it I have added a patch to
>>> <http://bugs.squid-cache.org/show_bug.cgi?id=4604> that should implement
>>> archival of responses where the ACLs match. It is completely untested by
>>> me beyond building, so YMMV.
>>>
>>> Amos
>>>
>>>
>>
>> _______________________________________________
>> squid-users mailing list
>> squid-users at lists.squid-cache.org
>> http://lists.squid-cache.org/listinfo/squid-users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20161006/98c8f171/attachment.html>


More information about the squid-users mailing list