[squid-users] load balancing and site failover

Thu Mar 26 00:53:01 UTC 2015

On 26/03/2015 10:26 a.m., Brendan Kearney wrote:
> On Wed, 2015-03-25 at 15:03 +1300, Amos Jeffries wrote:
>> On 25/03/2015 9:55 a.m., brendan kearney wrote:
>>> Was not sure if bugzilla was used for mailing list issues.  If you would
>>> like me to open one, I will but it looks like the list is working again.
>>
>> Bugzilla is used, list bugs under the "project services" product.
>>
>>
>> As for your query...
>>
>>> On Mar 24, 2015 2:25 PM, "Brendan Kearney" wrote:
>>>
>>>> On Tue, 2015-03-24 at 10:18 -0400, Brendan Kearney wrote:
>>>>> while load balancing is not a requirement in a proxy environment, it
>>>>> does afford a great deal of functionality, scaling and fault tolerance
>>>>> in one.  several if not many on this list probably employ them for their
>>>>> proxies and likely other technologies, but they are not all created
>>>>> equal.
>>>>>
>>>>> i recently looked to see if a specific feature was in HAProxy.  i was
>>>>> looking to see if HAProxy could reply to a new connection with a RST
>>>>> packet if no pool member was available.
>>>>>
>>>>> the idea behind this is, if all of the proxies are not passing the
>>>>> service check and are marked down by the load balancer, the reply of a
>>>>> RST in the TCP handshake (i.e. SYN -> RST, not SYN -> SYN/ACK -> ACK)
>>>>> tells the browser to failover to the next proxy assigned by the PAC
>>>>> file.
>>>>>
>>>>> where i work, we have this configuration working.  the load balancers
>>>>> are configured with the option to send a reset when no proxy is
>>>>> available in the pool.  the PAC file assigns all 4 of the proxy VIPs in
>>>>> a specific order based on which proxy VIP is assigned as the primary.
>>>>> In every case, if the primary VIP does not have an available pool
>>>>> member, the browser fails over to the next in the list.  failover would
>>>>> happen again, if the secondary VIP replies with a RST during the
>>>>> connection establishing.  the process repeats until a TCP connection
>>>>> establishes or all proxies assigned have been exhausted.  the browser
>>>>> will use the proxy VIP that it successfully connects to, for the
>>>>> duration of the session.  once the browser is closed and reopened, the
>>>>> evaluation of the PAC file occurs again, and the process starts anew.
>>>>> plug-ins such as Proxy Selector are the exception to this, and can be
>>>>> used to reevaluate a PAC file by selecting it for use.
>>>>>
>>>>> we have used this configuration several times, when we found an ISP link
>>>>> was flapping or some other issue more global in nature than just the
>>>>> proxies was affecting our egress and internet access.  i can attest to
>>>>> the solution as working and elegantly handling site wide failures.
>>>>>
>>>>> being that the solutions where i work are proprietary commercial
>>>>> products, i wanted to find an open source product that does this.  i
>>>>> have been a long time user of HAProxy, and have recommended it for
>>>>> others here, but sadly they cannot perform this function.  per their
>>>>> mailing list, they use the network stack of the OS for connection
>>>>> establishment and cannot cause a RST to be sent to the client during a
>>>>> TCP handshake if no pool member is available.
>>>>>
>>>>> they suggested an external helper that manipulates IPTables rules based
>>>>> on a pool member being available.  they do not feel that a feature like
>>>>> this belongs in a layer 4/7 reverse proxy application.
>>
>> They are right. HTTP != TCP.
> i didnt confuse that detail.  it was unknown to me that HAProxy could
> not tie layer 7 status to layer 3/4 actions.  the decisions they made
> and how they architected the app is why they cannot do this, not that it
> is technically impossible to do it.  i may be spoiled because i work
> with equipment that can do this for me.
>>
>> In particular TCP depends on routers having a full routing map of the
>> entire Internet (provided by BGP) and deciding the best upstream hop
>> based on that global info. Clients have one (and only one) upstream
>> router for each server they want to connect to.
> i will contest this.  my router does not need a full BGP map to route
> traffic locally on my LAN or remotely out its WAN interface.  hell, it
> does not even run BGP, and i can still get to the intarwebs, no problem.
> it too, only has one upstream router / default route.

Then your router has more in common with proxy than usual. Its operating
with a next-hop packet relay model (OSPF? MPLS?) rather than an
end-to-end model (BGP with RIB/FIB).

>>
>> In HTTP each proxy (aka router) performs independent upstream connection
>> attempts, failover, and verifies it worked before responding to the
>> client with a final response. Each proxy only has enough detail to check
>> its upstream(s). Each proxy can connect to any server (subject to ACLs).
> how are you comparing a HTTP proxy (a layer 7 application) to a router
> (a layer 3 device)?  routers route traffic and proxies proxy traffic.

while, routers proxy TCP packets and proxies route HTTP messages.

Its the behaviour abstraction I'm talking about here.
[if you dont want to dive into theory skip to the end of this mail]

The algorithms each are capable of are the same despite differences in
details of layer and what designed mechanisms are optimal for their
protocol.
DNS recursive resolvers are also a type of caching proxy, and SMTP email
servers too. Of the surviving "old" protocols only FTP seems not to have
proxies as integral part of the design.

> very different functions.  routers dont look past a certain point in the
> headers in order to make decisions on where to send the traffic.

Same for proxies. I'm sure you're aware that Squid dont look into HTTP
payloads same as routers dont look into TCP payloads.

> proxies look all the way to the end of the headers and sometimes into
> the payload, too.

No proxies dont look into payloads. Content filters do that bit instead
of the proxy.
Just like routers can be enhanced by DPI systems doing payload
inspection at their level.

In both systems the DPI / content filter is where they cross over into
firewall activity.

>  proxies are more akin to a protocol specific
> firewall.

You have no firewall ability built into your router?

>  proxies also dont send the incoming traffic out an interface.

layer-3/4 interface is equivalent to layer-5/7 socket in abstract.

> they terminate the client session, and initiate a new session on behalf
> of the client.

not necessarily, that is the architectural difference between HTTP and
TCP. SOCKS proxies and TCP have much more in common in architectural design.

What the proxies do is terminate the *TCP* session. Obviously all
non-TCP protocols will do that.
The HTTP layer is more inline with UDP here than TCP. Although at the
layer just above HTTP there is the "browsing session" concept
implemented with cookies/auth etc which is retained across both TCP
layer connections. For cases where TCP-level sessions need to be
emulated there is connection pinning in the proxy.

>  simply because the proxy can elect how to send a request
> it is making on behalf of a client, does not make a proxy a router.

The names are terminology for the activity performed. TCP routers
normally do not re-selecting the destination endpoint, just the path.
The proxy selects the actual endpoint destination.

What you can configure each to do crosses over and you can specifically
configure a proxy to become a router, and vice versa. Doing it though
screws with their normal and optimal behavour. Router becomes like a
proxy by turning on NAT - and we probably both know the hell that
causes. Making a proxy do strictly router behaviour has a similar range
of nasty side effects.

Yes each one (proxy vs router) is designed to work at different levels,
in different ways. But how you use them determines what they do.

In particular this is the difference which you request is asking us to
provide a way to disable. Making the proxy return TCP RST if a
particular endpoint is not available - despite other endpoints
potentially being usable.
 Making the proxy behave like a router.

>  the
> fact that one connection is terminated and a new one is initiated rules
> out a proxy from being any kind of router, in my opinion.

That rules out all devices containing NAT functionality or LB software
from being routers.

>  even with SSL
> or the CONNECT Method, the connection is still made by the proxy to the
> remote server.  the client never makes a connection to the remote
> server, therefore the traffic was not routed.  it was proxied.

By definition CONNECT method is not a proxy. In particular it is the
request that the proxy stop being a HTTP proxy and become a tunnel / TCP
relay.
It is roughly equivalent to a router sending packets down a tunnel / VPN
or SOCKS interface (the proxy is the outgoing end of the tunnel/VPN).

>>
>>>>>
>>>>> my search for a load balancer solution went through ipvsadm, balance and
>>>>> haproxy before i selected haproxy.  haproxy was more feature rich than
>>>>> balance, and easier to implement than ipvsadm.  do any other list
>>>>> members have a need for such a feature from their load balancers?  do
>>>>> any other list members have site failover solutions that have been
>>>>> tested or used and would consider sharing their design and/or pain
>>>>> points?  i am not looking for secret sauce or confidential info, but
>>>>> more high level architecture decisions and such.
>>
>>
>> I havent tested it but this should do what you are asking:
>>
>>  acl err http_status 500-505 408
>>  deny_info TCP_RESET err
>>  http_reply_access deny err
>>
>> It replaces the response from Squid with a TCP RST packet.
> this is useful in the case that the proxy is alive and well, but cannot
> get to the internet.  in my example, the ISP issue would seem to be
> covered, though i am not sure how the actual implementation would go.
> the client has a TCP session established with the load balancer, which
> gets the full SYN -> SYN/ACK -> ACK treatement.  the load balancer would
> get the SYN -> RST from the proxy, and presumably sends the RST back to
> the client.  While this does seem to hold up logically, the
> implementation may have nuances that have to be dealt with.  Does the
> RST in the middle of an established TCP session cause the browser to
> failover to the next proxy assigned?  i would have to test that out.

Nod.

> 
> now, what about the case where the proxies are not alive and well behind
> the load balancer, and they are not able to reply with a RST?  This is
> the scenario that i would want the load balancer to be able to manage.
> this is where tying a layer 7 status to a layer 3/4 action on the load
> balancer becomes relevant.  then, the ability for the load balancer to
> do this negates the need to manage this in the proxy layer, and removes
> any nuances that may be encountered with the implementation.

That is a normal TCP error case. Same things happen if there is any
network level outage, or packets destined to a non-existent IP range. It
depends on the LB software what will be done about it.

Amos