[squid-users] load balancing and site failover

Thu Mar 26 22:22:31 UTC 2015

On Thu, 2015-03-26 at 13:53 +1300, Amos Jeffries wrote:
> On 26/03/2015 10:26 a.m., Brendan Kearney wrote:
> > On Wed, 2015-03-25 at 15:03 +1300, Amos Jeffries wrote:
> >> On 25/03/2015 9:55 a.m., brendan kearney wrote:
> >>> Was not sure if bugzilla was used for mailing list issues.  If you would
> >>> like me to open one, I will but it looks like the list is working again.
> >>
> >> Bugzilla is used, list bugs under the "project services" product.
> >>
> >>
> >> As for your query...
> >>
> >>> On Mar 24, 2015 2:25 PM, "Brendan Kearney" wrote:
> >>>
> >>>> On Tue, 2015-03-24 at 10:18 -0400, Brendan Kearney wrote:
> >>>>> while load balancing is not a requirement in a proxy environment, it
> >>>>> does afford a great deal of functionality, scaling and fault tolerance
> >>>>> in one.  several if not many on this list probably employ them for their
> >>>>> proxies and likely other technologies, but they are not all created
> >>>>> equal.
> >>>>>
> >>>>> i recently looked to see if a specific feature was in HAProxy.  i was
> >>>>> looking to see if HAProxy could reply to a new connection with a RST
> >>>>> packet if no pool member was available.
> >>>>>
> >>>>> the idea behind this is, if all of the proxies are not passing the
> >>>>> service check and are marked down by the load balancer, the reply of a
> >>>>> RST in the TCP handshake (i.e. SYN -> RST, not SYN -> SYN/ACK -> ACK)
> >>>>> tells the browser to failover to the next proxy assigned by the PAC
> >>>>> file.
> >>>>>
> >>>>> where i work, we have this configuration working.  the load balancers
> >>>>> are configured with the option to send a reset when no proxy is
> >>>>> available in the pool.  the PAC file assigns all 4 of the proxy VIPs in
> >>>>> a specific order based on which proxy VIP is assigned as the primary.
> >>>>> In every case, if the primary VIP does not have an available pool
> >>>>> member, the browser fails over to the next in the list.  failover would
> >>>>> happen again, if the secondary VIP replies with a RST during the
> >>>>> connection establishing.  the process repeats until a TCP connection
> >>>>> establishes or all proxies assigned have been exhausted.  the browser
> >>>>> will use the proxy VIP that it successfully connects to, for the
> >>>>> duration of the session.  once the browser is closed and reopened, the
> >>>>> evaluation of the PAC file occurs again, and the process starts anew.
> >>>>> plug-ins such as Proxy Selector are the exception to this, and can be
> >>>>> used to reevaluate a PAC file by selecting it for use.
> >>>>>
> >>>>> we have used this configuration several times, when we found an ISP link
> >>>>> was flapping or some other issue more global in nature than just the
> >>>>> proxies was affecting our egress and internet access.  i can attest to
> >>>>> the solution as working and elegantly handling site wide failures.
> >>>>>
> >>>>> being that the solutions where i work are proprietary commercial
> >>>>> products, i wanted to find an open source product that does this.  i
> >>>>> have been a long time user of HAProxy, and have recommended it for
> >>>>> others here, but sadly they cannot perform this function.  per their
> >>>>> mailing list, they use the network stack of the OS for connection
> >>>>> establishment and cannot cause a RST to be sent to the client during a
> >>>>> TCP handshake if no pool member is available.
> >>>>>
> >>>>> they suggested an external helper that manipulates IPTables rules based
> >>>>> on a pool member being available.  they do not feel that a feature like
> >>>>> this belongs in a layer 4/7 reverse proxy application.
> >>
> >> They are right. HTTP != TCP.
> > i didnt confuse that detail.  it was unknown to me that HAProxy could
> > not tie layer 7 status to layer 3/4 actions.  the decisions they made
> > and how they architected the app is why they cannot do this, not that it
> > is technically impossible to do it.  i may be spoiled because i work
> > with equipment that can do this for me.
> >>
> >> In particular TCP depends on routers having a full routing map of the
> >> entire Internet (provided by BGP) and deciding the best upstream hop
> >> based on that global info. Clients have one (and only one) upstream
> >> router for each server they want to connect to.
> > i will contest this.  my router does not need a full BGP map to route
> > traffic locally on my LAN or remotely out its WAN interface.  hell, it
> > does not even run BGP, and i can still get to the intarwebs, no problem.
> > it too, only has one upstream router / default route.
> 
> Then your router has more in common with proxy than usual. Its operating
> with a next-hop packet relay model (OSPF? MPLS?) rather than an
> end-to-end model (BGP with RIB/FIB).
DOCSIS 2 -> ethernet on the WAN side and locally connected on the LAN
side. :D  oh, and a static route pointing a /24 for vpn traffic to a
specific device.
> 
> >>
> >> In HTTP each proxy (aka router) performs independent upstream connection
> >> attempts, failover, and verifies it worked before responding to the
> >> client with a final response. Each proxy only has enough detail to check
> >> its upstream(s). Each proxy can connect to any server (subject to ACLs).
> > how are you comparing a HTTP proxy (a layer 7 application) to a router
> > (a layer 3 device)?  routers route traffic and proxies proxy traffic.
> 
> while, routers proxy TCP packets and proxies route HTTP messages.
> 
> Its the behaviour abstraction I'm talking about here.
> [if you dont want to dive into theory skip to the end of this mail]
> 
> The algorithms each are capable of are the same despite differences in
> details of layer and what designed mechanisms are optimal for their
> protocol.
> DNS recursive resolvers are also a type of caching proxy, and SMTP email
> servers too. Of the surviving "old" protocols only FTP seems not to have
> proxies as integral part of the design.
recursive DNS queries, yes i can see that as being proxied.  SMTP gets a
bit gray.  MUA -> MRA -> MTA, store and forward, ehh.  you can make a
convincing argument for or against, but the "on behalf of" piece is
there.  FTP and the PORT command (and subsequent bounce attack) could be
considered a proxied connection, but i dont think that was an intended
design.
> 
> 
> > very different functions.  routers dont look past a certain point in the
> > headers in order to make decisions on where to send the traffic.
> 
> Same for proxies. I'm sure you're aware that Squid dont look into HTTP
> payloads same as routers dont look into TCP payloads.
> 
> > proxies look all the way to the end of the headers and sometimes into
> > the payload, too.
> 
> No proxies dont look into payloads. Content filters do that bit instead
> of the proxy.
> Just like routers can be enhanced by DPI systems doing payload
> inspection at their level.
> 
> In both systems the DPI / content filter is where they cross over into
> firewall activity.
i sometimes forget that i have two technologies bolted together on a
"proxy appliance" and that is why i misstated that.  yes, proxies dont
inspect payload.  content inspection systems do that.
> 
> >  proxies are more akin to a protocol specific
> > firewall.
> 
> You have no firewall ability built into your router?
from a puritanical perspective, no.  routing decisions are based on
layer 3 info.  ports are layer 4 info.  a true, layer 3 only router
would not make a good firewall at all.  that is not to say you cannot
enforce a security posture with a router, as part of an overall
strategy.
> 
> >  proxies also dont send the incoming traffic out an interface.
> 
> layer-3/4 interface is equivalent to layer-5/7 socket in abstract.
your missing the point.  routers dont accept an incoming connection, and
initiate a separate outgoing connection, in order to facilitate the
end-to-end conversation.  proxies do.
> 
> > they terminate the client session, and initiate a new session on behalf
> > of the client.
> 
> not necessarily, that is the architectural difference between HTTP and
> TCP. SOCKS proxies and TCP have much more in common in architectural design.
> 
> What the proxies do is terminate the *TCP* session. Obviously all
> non-TCP protocols will do that.
> The HTTP layer is more inline with UDP here than TCP. Although at the
> layer just above HTTP there is the "browsing session" concept
> implemented with cookies/auth etc which is retained across both TCP
> layer connections. For cases where TCP-level sessions need to be
> emulated there is connection pinning in the proxy.
socks proxies are a different animal outside the scope of what i
intended to discuss.

http can be considered almost stateless, like udp, but it is still a
connection oriented protocol.  the sessions are generally short and
chatty.  these qualities are why CDNs work and can use anycast for load
distribution.
> 
> 
> >  simply because the proxy can elect how to send a request
> > it is making on behalf of a client, does not make a proxy a router.
> 
> The names are terminology for the activity performed. TCP routers
> normally do not re-selecting the destination endpoint, just the path.
> The proxy selects the actual endpoint destination.
> 
> What you can configure each to do crosses over and you can specifically
> configure a proxy to become a router, and vice versa. Doing it though
> screws with their normal and optimal behavour. Router becomes like a
> proxy by turning on NAT - and we probably both know the hell that
> causes. Making a proxy do strictly router behaviour has a similar range
> of nasty side effects.
um, no.  NAT does not cause a connection to be proxied.  NAT is a header
rewrite operation, and the connection is from endpoint to endpoint.  it
is not proxied, where the router terminates the incoming connection and
initiates a new outgoing connection to facilitate the the conversation.
> 
> Yes each one (proxy vs router) is designed to work at different levels,
> in different ways. But how you use them determines what they do.
> 
> In particular this is the difference which you request is asking us to
> provide a way to disable. Making the proxy return TCP RST if a
> particular endpoint is not available - despite other endpoints
> potentially being usable.
>  Making the proxy behave like a router.
i am looking for the load balancer to return the RST.  i asked what
others here may be doing in that space, if any are doing something.  you
provided an example of how to send a RST from a proxy when a proxy
cannot fulfill the request.
> 
> 
> >  the
> > fact that one connection is terminated and a new one is initiated rules
> > out a proxy from being any kind of router, in my opinion.
> 
> That rules out all devices containing NAT functionality or LB software
> from being routers.
no, again NAT does not result in a terminated incoming connection and a
new outgoing connection.  Load balancing does do this as it is TCP
proxying of connections.  in many cases load balancers that do TCP
proxying are intelligent and aware of TCP based protocols such as SMTP,
LDAP, HTTP, etc.  it is this intelligence that i was hoping could be
tied to the TCP proxying piece, so as to reply with the RST on the
frontend when no backend server is available.
> 
> >  even with SSL
> > or the CONNECT Method, the connection is still made by the proxy to the
> > remote server.  the client never makes a connection to the remote
> > server, therefore the traffic was not routed.  it was proxied.
> 
> By definition CONNECT method is not a proxy. In particular it is the
> request that the proxy stop being a HTTP proxy and become a tunnel / TCP
> relay.
even the CONNECT Method results in one incoming connection being
terminated, and new outgoing connection being initiated.  that is
proxying.
> It is roughly equivalent to a router sending packets down a tunnel / VPN
> or SOCKS interface (the proxy is the outgoing end of the tunnel/VPN).
no it isnt, because the router is not making the connection.  its
facilitating the connection. again no in and out connections.  the
endpoint sending the SYN is what is seen on the other end of the VPN as
making the connection.  while the packet may or may not be NAT'ed, the
connection is endpoint to endpoint.
> 
> 
> 
> >>
> >>>>>
> >>>>> my search for a load balancer solution went through ipvsadm, balance and
> >>>>> haproxy before i selected haproxy.  haproxy was more feature rich than
> >>>>> balance, and easier to implement than ipvsadm.  do any other list
> >>>>> members have a need for such a feature from their load balancers?  do
> >>>>> any other list members have site failover solutions that have been
> >>>>> tested or used and would consider sharing their design and/or pain
> >>>>> points?  i am not looking for secret sauce or confidential info, but
> >>>>> more high level architecture decisions and such.
> >>
> >>
> >> I havent tested it but this should do what you are asking:
> >>
> >>  acl err http_status 500-505 408
> >>  deny_info TCP_RESET err
> >>  http_reply_access deny err
> >>
> >> It replaces the response from Squid with a TCP RST packet.
> > this is useful in the case that the proxy is alive and well, but cannot
> > get to the internet.  in my example, the ISP issue would seem to be
> > covered, though i am not sure how the actual implementation would go.
> > the client has a TCP session established with the load balancer, which
> > gets the full SYN -> SYN/ACK -> ACK treatement.  the load balancer would
> > get the SYN -> RST from the proxy, and presumably sends the RST back to
> > the client.  While this does seem to hold up logically, the
> > implementation may have nuances that have to be dealt with.  Does the
> > RST in the middle of an established TCP session cause the browser to
> > failover to the next proxy assigned?  i would have to test that out.
> 
> Nod.
> 
> > 
> > now, what about the case where the proxies are not alive and well behind
> > the load balancer, and they are not able to reply with a RST?  This is
> > the scenario that i would want the load balancer to be able to manage.
> > this is where tying a layer 7 status to a layer 3/4 action on the load
> > balancer becomes relevant.  then, the ability for the load balancer to
> > do this negates the need to manage this in the proxy layer, and removes
> > any nuances that may be encountered with the implementation.
> 
> 
> That is a normal TCP error case. Same things happen if there is any
> network level outage, or packets destined to a non-existent IP range. It
> depends on the LB software what will be done about it.
> 
> Amos
>