[squid-users] urlpath_regex

Amos Jeffries squid3 at treenet.co.nz
Thu Mar 12 10:23:36 UTC 2015


On 12/03/2015 9:14 p.m., James Harper wrote:
> I have just noticed that urlpath_regex isn't doing what I want:
> 
> acl wuau_repo dstdomain .download.windowsupdate.com
> acl wuau_path urlpath_regex -i \.psf$
> acl dst_server dstdomain server
> acl apt_cacher browser apt-cacher
> 
> cache deny dst_server
> cache deny apt_cacher
> cache deny wuau_repo
> cache allow all
> 
> url_rewrite_program /usr/local/squid/libexec/ext_apt_rewrite
> url_rewrite_children 5 startup=0 idle=1 concurrency=0
> url_rewrite_access deny apt_cacher
> url_rewrite_access allow wuau_repo wuau_path
> url_rewrite_access deny all
> 
> 
> Basically I am using apt-cacher to cache windows updates (apt-cacher
> appears to handle and cache partial requests nicely while squid
> doesn't). The main static windows update content files appear to be
> .psf. So basically, if the destination is windows update, and the path
> ends in .psf, then rewrite the url to go to my apt-cacher server (called
> server, conveniently). If the browser string is apt-cacher then don't
> rewrite, to avoid loops. Also don't cache anything to do with these servers.

Three things;

* by re-writing you are generating an entirely new request with the
apt-cacher server URL as the destination. The HTTP message details about
what was originally requested and from where is *gone* when the traffic
leaves for the server. The solution for that is outlined at the end of
this mail.

* the .cab also contain "static" content for the updates installer files
and DLLs etc. particularly for the older Windows versions.

* if you leave Squid as being allowed to cache the content it will do so
for most of the visible page contents people see. And for a moderate
portion of the updates as well. Its just the range request fetches
inside archives that Squid wont cache without tuning.

> 
> It works except for the wuau_path acl. The line:
> 
> url_rewrite_access allow wuau_repo wuau_path
> 
> matches all paths on the wuau_repo access list, not just those ending in
> .psf. I get hits for paths ending in .cab, and even worse, paths with a
> ? in them.
>
> What am I doing wrong?

Strange. It should only be matching for URLs ending in ".psf".

Be aware that regex is quite literaly only looking at the line ending
and does explicitly mean URLs such as these as well:

 http://download.windowsupdate.com/blah.cab?voodoo.psf
 http://download.windowsupdate.com/?BLAH.PSF

If you need it to match only the things that look (to humans) like
on-disk file names I recommend:

   -i \.psf(\?.*)?$


Did you build your Squid normally, or with a specific regex library?

Is there another config line adding to wuau_path somewhere in your
squid.conf?

Is Squid actually running with the squid.conf you think it is?
 (sounds silly, but mistakes happen)



Also, unrelated to this ... have you tried using cache_peer instead of
URL-rewriting ?

 cache_peer server parent 8080 0
 cache_peer_access server allow wuau_repo wuau_path
 cache_peer_access server deny all

I assuming its listening on port 8080. It may need the "originserver"
option as well.
apt-cacher should fetch from the Internet, not looping back through Squid.

Amos



More information about the squid-users mailing list