[squid-users] urlpath_regex

James Harper james at ejbdigital.com.au
Thu Mar 12 10:52:59 UTC 2015


> Three things;
> 
> * by re-writing you are generating an entirely new request with the
> apt-cacher server URL as the destination. The HTTP message details about
> what was originally requested and from where is *gone* when the traffic
> leaves for the server. The solution for that is outlined at the end of
> this mail.
> 
> * the .cab also contain "static" content for the updates installer files
> and DLLs etc. particularly for the older Windows versions.

Long before I started using apt-cacher, I was having a problem with .cab?xxx files being cached when they shouldn't be, so I'm excluding them for now.

> * if you leave Squid as being allowed to cache the content it will do so
> for most of the visible page contents people see. And for a moderate
> portion of the updates as well. Its just the range request fetches
> inside archives that Squid wont cache without tuning.

AFAICT, the "tuning" involves telling squid to grab the whole file, and with some of these updates being >800MB, I think squid is the wrong tool for the job here.

My setup has squid running on an old laptop with a fairly small disk, and apt-cacher running on a server with a very large disk. The server isn't always on though, so I can't put squid there. I'm content with windows updates failing when the server is off, it's just for a home network.

> >
> > It works except for the wuau_path acl. The line:
> >
> > url_rewrite_access allow wuau_repo wuau_path
> >
> > matches all paths on the wuau_repo access list, not just those ending in
> > .psf. I get hits for paths ending in .cab, and even worse, paths with a
> > ? in them.
> >
> > What am I doing wrong?
> 
> Strange. It should only be matching for URLs ending in ".psf".
> 
> Be aware that regex is quite literaly only looking at the line ending
> and does explicitly mean URLs such as these as well:
> 
>  http://download.windowsupdate.com/blah.cab?voodoo.psf
>  http://download.windowsupdate.com/?BLAH.PSF
> 
> If you need it to match only the things that look (to humans) like
> on-disk file names I recommend:
> 
>    -i \.psf(\?.*)?$

I will rewrite as per your suggestion, but as I'm only applying the regex to windowsupdate.com urls I don't think this is the problem I'm having.

> Did you build your Squid normally, or with a specific regex library?

Normally, I guess.

# /usr/local/squid/sbin/squid -v
Squid Cache: Version 3.5.2
Service Name: squid
configure options:  '--enable-storeio=ufs' '--enable-linux-netfilter' '--with-openssl' --enable-ltdl-convenience

> 
> Is there another config line adding to wuau_path somewhere in your
> squid.conf?

No

> 
> Is Squid actually running with the squid.conf you think it is?
>  (sounds silly, but mistakes happen)

Yes

I also tried the same thing with http_access and that works as expected - *.psf files are allowed, non *.psf file are denied. I'm thinking bug at the point... I'll do some more testing and see if I can narrow it doen.

> Also, unrelated to this ... have you tried using cache_peer instead of
> URL-rewriting ?
> 
>  cache_peer server parent 8080 0
>  cache_peer_access server allow wuau_repo wuau_path
>  cache_peer_access server deny all
> 
> I assuming its listening on port 8080. It may need the "originserver"
> option as well.

apt-cacher expects a url such that if you wanted to get:

http://ftp.debian.org/debian/somefile

you would ask apt-cacher for:

http://server:3142/apt-cacher/ftp.debian.org/debian/somefile

And it knows about debian package lists and what it can throw out when etc, and if you specify ftp.debian.org and ftp.au.debian.org as apt sources, it's smart enough to know that a file cached from one can be used in the other. I'm probably pushing the friendship by using it to cache windows packages but so far it seems to do the job okay. Keeping the archive clean might be a problem though.

> apt-cacher should fetch from the Internet, not looping back through Squid.
>

Given that the server might also retrieve other urls, and all the proxying is transparent, the best I could come up with was to match the browser agent string from apt-cacher when it requests a url:

acl apt_cacher browser apt-cacher

and then exclude that from caching and rewriting.

Thanks

James


More information about the squid-users mailing list