[squid-users] Centralized Squid - design and implementation

Tue Nov 18 12:39:37 UTC 2014

On Tue, 2014-11-18 at 08:35 -0300, Carlos Defoe wrote:
> Well, you just wrote a load balancer in PHP, with a load balancing
> algorithm in it. It serves the same purpose as HAproxy (I don't really
> use HAproxy, so I don't know, but I use the F5 big-ip which is
> perfectly capable of testing Internet links behind squid). In you
> scheme, WPAD is being used to tell the clients where the load balancer
> (a webserver with a php script) is, and PAC probably as the answer
> format, which returns a currently valid proxy node address directly to
> the client. But as far as I know, once the client gets the PAC answer,
> it willl not refresh until the browser is restarted, so it might be a
> small problem there.
> 
> But it is a good solution, as proved by your decade of using it, and
> much cheaper than a F5. As for the DNS trick, it is intended to
> increase high availability of the web servers that are serving
> wpad.dat (or your php script), because if it runs on only one
> webserver, at some point no clients will find anything at all.
> 
> Well, there's a lot of ways of doing the same thing, including ucarp,
> squid cache_peer as Amos said... It's just a matter of picking the one
> that fits.
> 
> On Tue, Nov 18, 2014 at 3:31 AM, Jason Haar <Jason_Haar at trimble.com> wrote:
> > On 18/11/14 16:07, Carlos Defoe wrote:
> >> As for my scenario, I also use wpad to configure some exceptions, some
> >> clients that will use a completely different proxy, etc...
> > Our "wpad.dat" is actually a PHP script which tests that the "official"
> > proxy (per client subnet) is actually working (with caching of the
> > results for performance reasons of course), if not it flicks them off to
> > another site's proxy server. Much better than trying to do dynamic DNS
> > tricks with a local HAproxy. ie if you have actually lost local Internet
> > access due to an ISP outage, HAproxy isn't going to help. But if WPAD
> > knows that a WAN-connected proxy is still working - why not point your
> > users at that instead
> >
> > We've been doing this for 10+ years, 99% of the time it's never needed,
> > but when it's needed, it works :-)
> >
> > --
> > Cheers
> >
> > Jason Haar
> > Corporate Information Security Manager, Trimble Navigation Ltd.
> > Phone: +1 408 481 8171
> > PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
> >
> >
> > _______________________________________________
> > squid-users mailing list
> > squid-users at lists.squid-cache.org
> > http://lists.squid-cache.org/listinfo/squid-users
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users

web servers providing pac/wpad dont need to be a single point of
failure, given that multiple instances of web servers can be behind a
load balancer, just like squid.  i have this arrangement, and get plenty
of reliability out of it.  it scales well too.

i have setup my VIP for the proxies in such a way that if you hit port
8080 you get load balanced to the pool with all members in it.  if you
hit the VIP on port 8081, you get load balanced to a pool with only the
first proxy in it, 8082 goes to the second proxy, etc.  this allows me
to test each proxy individually, and because the VIP name is the same,
the same kerberos ticket satisfies the auth requests.  at work, we have
F5s as well, and as a service check we attempt to GET some content we
host, and attempt to GET google or cnn.  the check requires that at
least one of the GETs succeed, in order to mark the device up.  i dont
have the external check in my HAProxy configs, but might have to look
into it.

as for my pac/wpad script, i have logic in it to send requests proxied
or unproxied, based on my design or security decisions.  i have logic
for direct access domains, direct access hosts, direct access networks,
proxied domains (forces the use of the proxy, overriding any other
logic), proxied hosts (again, override logic), and hosts that are forced
via a specific proxy by sending the request to a specific port on the
VIP.

the bulk of my access will be proxied, and i return the VIP on port 8080
as the primary proxy, and then ports 8081, 8082, etc as secondary,
tertiary, and so on.  that way the browser will always get all possible
avenues for access, should something be wrong with one or more of the
VIPs.  what i am not sure of is if HAProxy will reply with a RST when no
pool member(s) is/are available for a given VIP/pool.  we have this
setup at work on the F5s, and i'm not sure if i have it in HAProxy (or
if i can do it at all).

i would suggest that if you use a pac/wpad solution, you look into
pactester, which is a google summer of code project that executes pac
files and provides output indicating what actions would be returned to
the browser, given a URL.  so, with my setup if i call pactester and
give it http://www.google.com, it returns to me:

PROXY proxy.bpk2.com:8080; PROXY proxy.bpk2.com:8081; PROXY
proxy.bpk2.com:8082

if i call pactester with http://www.bpk2.com, it returns to me:

DIRECT

with a bit of scripting and a couple of files with URLs in them, i can
quickly evaluate my proxy script, validate the logic and perform a
rudimentary syntax and punctuation check on any changes i make to the
script.