[squid-users] Centralized Squid - design and implementation

Brendan Kearney bpk678 at gmail.com
Sun Nov 16 16:51:43 UTC 2014


On Sun, 2014-11-16 at 17:22 +0100, Kinkie wrote:
> On Sun, Nov 16, 2014 at 4:54 PM, alberto <alberto.furia at gmail.com> wrote:
> > Hello everyone,
> > first of all thanks to the community of squid for such a great job.
> 
> Hello Alberto,
> 
> [...]
> 
> > I have some questions that I would like to share with you:
> >
> > 1. I would like to leave the solution we are using now (wpad balancing). In
> > a situation like the one I have described, centralized squid serving the
> > spokes/branches, which is the best solution for clustering/HA? If one of the
> > centralized nodes had to "die" I would like client machines not to remain
> > "hanging" but to continue working on an active node without disruption. A
> > hierarchy of proxy would be the solution?
> 
> If you want to maximize the efficiency of your balancing solution, you
> probably want a slightly different approach: instead of using the
> client-ip as hashing mechanism, you want to hash on the destination
> host.
> e.g. have a pac-file like (untested, and to be adjusted):
> 
> function FindProxyForURL(url, host) {
>    var dest_ip = dnsResolve(host);
>    var dest_hash= dest_ip.slice(-1) % 2;
>    if (dest_hash)
>      return "PROXY local_proxy1:port; PROXY local_proxy2:port; DIRECT";
>    return "PROXY local_proxy2:port; PROXY local_proxy1:port; DIRECT"
> }
> This will balance by the final digit of the destination IP of the
> service. The downside is that it requires DNS lookups by the clients,
> and that if the primary local proxy fails, it takes a few seconds (up
> to 30) for clients to give up and fail over to secondary.
> 
> local_proxies can then either go direct to the origin server (if
> intranet) or use a balancing mechanism such as carp (see the
> documentation for the cache_peer directive in squid) to maximize
> efficiency, especially for Internet destinations.
> 
> The only single-point-of-failure at the HTTP level in this design is
> the PACfile server, it'll be up to you to make that reliable.
> 
> > 2. Bearing in mind that all users will be AD authenticated, which url
> > filtering/blacklist solution do you suggest?
> > In the past I have worked a lot with squidguard and dansguardian but now
> > they don't seem to be the state of the art anymore.
> > I've been thinking about two different solutions:
> >   2a. To use the native acl squid with the squidblacklist.org lists
> > (http://www.squidblacklist.org/)
> >   2b. To use urlfilterdb (http://www.urlfilterdb.com/products/overview.html)
> 
> I don't know, sorry.
> 
> > 3. Which GNU/Linux distro do you suggest me? I was thinking about Debian
> > Jessie (just frozen) or CentOS7.
> 
> http://wiki.squid-cache.org/BestOsForSquid
> 

i have all my squid instances (only 2 right now) share their caches:
cache_peer 192.168.25.1 sibling 3128    4827    htcp=no-clr
and
cache_peer 192.168.50.1 sibling 3128    4827    htcp=no-clr

which allows for anything cached to be served from local cache or a
sibling, instead of from the internet.  the likelihood of the sibling
cache being faster than the internet is high.

i use HAProxy to load balance based on the least number of connections
associated with the a pool member.  since i am sharing caches, i dont
need to pin a client or request to any particular proxy, at all or for
only a period of time.  with HAProxy, i only see a couple of seconds
interruption when one proxy goes offline.  generally this is trivial in
the end user experience.  i have it logging when instances go offline or
come back online, and the stats web interface is handy for quickly
checking status.

while i dont have any suggestions about which filtering option to use, i
will note that DansGuardian versions i have found are only HTTP/1.0
compliant, so you are likely losing gzip compression at the protocol
layer, and caching is likely affected, too.



More information about the squid-users mailing list