[squid-users] Windows Updates a Caching Stub zone, A windows updates store.

Eliezer Croitoru eliezer at ngtech.co.il
Tue Jul 19 19:11:41 UTC 2016


Hey Omid,

I will try to answer about the subject in general and it should contain the
answers to what you have asked.

Windows Updates can somehow be cached when combining squid StoreID and a
refresh_pattern. 
However the nature of squid is to be a "cache", and in many cases since we
can predict that we will need specific content more often it would be
preferred to store the objects.
The "tradeoff" is using the wire and the clients to fetch over and over
again the same exact content while assuring consistency and integrity of it.
For example most of Windows updates can be publically cached for 48 hours
which should be enough for a "torrent"(some would call it DOS) of updates.

A refresh_pattern which have the next options: ignore-no-store
ignore-reload ignore-private ignore-must-revalidate override-expire
override-lastmod
Will cause in a way in a reduction of some bandwidth consumption by the
clients but it kind of "breaks" some other features of the cache.
Since squid 3.X there was a software change inside squid to prefer
Integrity over caching due to changes in the nature of the Internet.

MS are cache friendly in general and you will probably won't need
override-lastmod and couple other options in the refresh_pattern definition.
A refresh_pattern location in the squid.conf should not cause any
difference on caching but it is important to place them like many FW and ACL
rules: first seen and match HIT.
This is since squid.conf parser places the refresh_patterns to be validated
one at a time and each at a time from top to bottom inside the squid.conf
file.

To prevent de-duplication of content as Amos advised you should use the
"cache" config directive.
Take a peek at the docs: http://www.squid-cache.org/Doc/config/cache/
And also the example at:
http://wiki.squid-cache.org/SquidFaq/SquidAcl#how_do_I_configure_Squid_not_t
o_cache_a_specific_server.3F
And remember to add after a cache deny, "cache allow all".

About the "msip" acl you have added:
It's not really needed and can also cause strange things if some request
for\to another domain would be sent to this cache_peer.
This is due to this service nature to return a 500 error on requests for
non windows update domains.

If you notice weird behavior with this store service like space consumption
there are couple steps you should take:
- stop the crontab of the fetcher(to rebase the situation)
- verify that currently there are no stored responses which was supposed to
be "private" ie use this script:
https://gist.github.com/elico/5ae8920a4fbc813b415f8304cf1786db
- verify how many unique requests are stored in the ie "ls
/cache1/request/v1/|wc -l"

The next step would be to examine the requests dump ie "tar cvfJ
requests.tar.xz /cache1/request/v1/" and send me these dumps for analysis.
If you need to filter the requests before sending them to me you will need
to verify if there are cookies in the requests files.

I believe that when we will have the numbers of responses that has private
or public cache-control headers it would be much simpler as a start point
before the next step.

Just to mention that a Garbage Collection operation should be done before
the actual full fetch.
In my experiments I couldn't find evidence of a situation like you have but
I assumed that some networks will have issues in this level.
I will enhance my fetcher to avoid private content fetching but to be sure
on the right move I need both the statistics and the requests dump.

My code is not a cache which manage some level of the expiration or
validation of the content.
It's a simple http web service which was embedded with a special File
System structure and a forward proxy.

I will be available tonight at my skype: elico2013
Also on the squid irc channel at irc.freenode.net with the nick: elico
And of-course my email.
Just contact me so we would be able to understand the situation better.

Eliezer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: eliezer at ngtech.co.il


-----Original Message-----
From: squid-users [mailto:squid-users-bounces at lists.squid-cache.org] On
Behalf Of Omid Kosari
Sent: Tuesday, July 19, 2016 1:59 PM
To: squid-users at lists.squid-cache.org
Subject: Re: [squid-users] Windows Updates a Caching Stub zone, A windows
updates store.

Eliezer Croitoru-2 wrote
> Hey Omid,
> 
> Indeed my preference is that if you can ask ask and I will try to give
you
> couple more details on the service and the subject.

Hey Eliezer,

1.I have refresh patterns from days before your code .
Currently i prefer not to store windows updates in squid internal storage
because of de-duplication .
Now what should i do ? delete this refresh pattern ? or even
create a pattern not to cache windows updates ?

refresh_pattern -I
(microsoft|windowsupdate)\.com/.*?\.(cab|exe|dll|ms[iuf]|asf|wm[va]|dat|zip|
iso|psf)$ 10080 100% 172800 ignore-no-store ignore-reload ignore-private
ignore-must-revalidate override-expire override-lastmod

2.Is the position of your squid config important to prevent logical
conflicts?
for example should it be before above refresh patterns to prevent
de-duplication ?

acl wu dstdom_regex \.download\.windowsupdate\.com$
acl wu-rejects dstdom_regex stats
acl GET method GET
cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query
no-netdb-exchange name=ms1
cache_peer_access ms1 allow GET wu !wu-rejects
cache_peer_access ms1 deny all
never_direct allow GET wu !wu-rejects
never_direct deny all

3.Is it good idea to change your squid config as bellow to have more hits?
Or maybe it is big mistake !

acl msip dst 13.107.4.50
acl wu dstdom_regex \.download\.windowsupdate\.com$
\.download\.microsoft\.com$
acl wu-rejects dstdom_regex stats
acl GET method GET
cache_peer 127.0.0.1 parent 8080 0 proxy-only no-tproxy no-digest no-query
no-netdb-exchange name=ms1
cache_peer_access ms1 allow GET wu !wu-rejects
cache_peer_access ms1 allow GET msip !wu-rejects
cache_peer_access ms1 deny all
never_direct allow GET wu !wu-rejects
never_direct allow GET msip !wu-rejects
never_direct deny all

4.Current storage capacity is 500G andmore than 50% of it becomes full and
growing fast .
Is there any mechanism for garbage collection in your code ?
If not is it good idea to remove files based on last access time (ls -ltu
/cache1/body/v1/) ?
should i also delete old files from header and request folders ?




--
View this message in context:
http://squid-web-proxy-cache.1019090.n4.nabble.com/Windows-Updates-a-Caching
-Stub-zone-A-windows-updates-store-tp4678454p4678581.html
Sent from the Squid - Users mailing list archive at Nabble.com.
_______________________________________________
squid-users mailing list
squid-users at lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5626 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20160719/6d69ab87/attachment.bin>


More information about the squid-users mailing list