[squid-users] parent peer timeout (Amos Jeffries)

Tue Nov 21 16:00:17 UTC 2017

Hi Amos, thanks for taking the time to analize this.

>Are you actually terminating the peer, or just simulating it some other way?
My method of testing is shutting down the service on the parent "192.168.1.1" with "/etc/init.d/squid stop", whith this in place there are no remaining active connections, and no new ones are being established, all I see is tcp RST responses.
It seems there is a TCP timer that is not configurable, because of the time it takes to notice the dead peer:
> 2017/11/20 22:55:02| Ready to serve requests.
> 2017/11/20 22:55:03| storeLateRelease: released 0 objects
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| Detected DEAD Parent: 192.168.1.1
My objective is to configure dead peer detection based only in TCP connection, can this be achieved?

Do I need to allow a specific type of traffic with "cache_peer_access" statements so dead peer detection happens?, if I comment those lines, dead peer detection works, but I need to enable it so i can filter what traffic those parent peers accept.

regards,
ignacio

On 21/11/17 14:09, Ignacio Freyre wrote:
> Hi guys, i have a simple configuration that i'm testing with 2 parent proxys for a specific domain, if parent proxy 192.168.1.1 fails, failover to 192.168.1.2 proxy.
> I have a couple of questions:
> 1)Having configured "connect-timeout=3" and "connect-fail-limit=2", failover takes about 2 minutes, how can I reduce failover time?

Are you actually terminating the peer, or just simulating it some other way?

The behaviour you are seeing is what will happen for the particular 
error you cause to happen. I suspect you are only simulating a firewall 
rule table overload (ie firewall suddenly stops allowing *new* 
connections) instead of actual peer machine disconnect or shutdown.

The connect-timeout=3 is to make *new* TCP connections signal failure if 
the SYN+ACK takes more than 3 seconds to return. Otherwise it is a 
successful connect.

Added to that Squid is HTTP/1.1 software these days. Which means it uses 
multiplexing and pipeline features to reduce new TCP connections being 
needed at all. So that type of network failure may have zero effect on 
the proxy<->peer communications. Exactly as intended by the HTTP/1.1 design.

> 2)If I enable cache_peer_access statements, failover never happens because the peers dont get detected as dead

You disabled the features used as primary methods of detecting dead 
peers (no-query no-digest).

Additionally restricting traffic with cache_peer_access removes 
additional hints from HTTP and TCP traffic.

It is hard to say how those two things are impacting your proxies peer 
selection logic, since it is also complicated by the things mentioned 
above about #1.

> 
> #CONFIGURATION START
> #hostname
> visible_hostname testing
> 
> #parent proxy's
> cache_peer 192.168.1.1 parent 3128 0 no-query no-digest connect-timeout=3 connect-fail-limit=2
> cache_peer 192.168.1.2 parent 3128 0 no-query no-digest connect-timeout=3 connect-fail-limit=2
> 
> #send traffic to peers
> acl foo_url url_regex site\.domain\.com
> never_direct allow foo_url

regex is the second slowest ACL type around, generally to match domain 
use dstdomain ACL type.

> 
> #peer access
> cache_peer_access 192.168.1.1 deny !foo_url
> cache_peer_access 192.168.1.2 deny !foo_url
> 
> #allow all for testing purposes
> http_access allow all
> 

Not a good idea even for testing purposes. If there is a problem with 
your intended http_access rules that needs solving before anything else 
can be properly investigated since what is allowed to be handled by the 
proxy impacts on what can happen for outbound attempts.

> # Squid normally listens to port 3128
> http_port 3128
> 
> # Leave coredumps in the first cache dir
> coredump_dir /var/spool/squid
> 
> # Add any of your own refresh_pattern entries above these.
> refresh_pattern ^ftp:           1440    20%     10080
> refresh_pattern ^gopher:        1440    0%      1440
> refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
> refresh_pattern .               0       20%     4320
> #CONFIGURATION END
> 
> LOGS that I see when peer is detected as dead
> 2017/11/20 22:55:02| Ready to serve requests.
> 2017/11/20 22:55:03| storeLateRelease: released 0 objects
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| TCP connection to 192.168.1.1/3128 failed
> 2017/11/20 22:56:55| Detected DEAD Parent: 192.168.1.1
> 

Configure "debug_options 28,3" to see the peer selection results.

Amos