[squid-users] gateway failure

Amos Jeffries squid3 at treenet.co.nz
Tue Sep 5 10:22:01 UTC 2017


On 05/09/17 21:31, Vieri wrote:
> Hi,
> 
> I'm sometimes getting hit by ERR_GATEWAY_FAILURE. I'd like to know what could be causing this issue.
> When this happens on a production server, I don't have much time to investigate.
> I usually only have enough time to ssh into the squid server, test internet access via command line, and before I know it, the issue's gone.
> 

Squid generates GATEWAY_FAILURE when URL-redirector/rewriter is not 
responding or TLS handshakes fail.

If it is the crypo issues that is exactly the kind of things which your 
SSH connection will not be able to get through either so of course is 
already gone when that TCP + encryption succeeds.


> Nothing much in cache.log. I have debug_options rotate=1 ALL,1. I'd rather not set ALL,9 on a production system for something that happens maybe only once every 2 or 3 days.
> I'm not sure however which sections and levels to  set so I can get an idea as to why I'm getting ERR_GATEWAY_FAILURE.
> 
> https://wiki.squid-cache.org/KnowledgeBase/DebugSections
> 
> Any suggestions?

In absence of ALL,9 (or ALL,6) you will have to work your way through 
the list of components involved with upstream server connections and 
then any components you are using that can slow Squid down in general or 
periodically.

DNS, Comm, and TLS levels - and also things like Digest creation, store 
rebuild, and cache replacement policy actions. Unfortunately most of 
those are major components used all the time, so not much better than 
ALL,6 in terms of log output.


Main focus obviously is on the domain/server(s) whose URL hit the issue, 
but anything else could be impacting the transaction latency so it is by 
no means certain to be that server.


I would start with DNS to see if the results are coming back fast 
enough. With the latest Squid you will also have to check all the 
permutations of DNS response ordering and timing since the "Happy 
Eyeballs" algorithms can mens Squid is only working with partial DNS 
results and failing when the incomplete IP set are all broken servers.


Then check for ICMPv4/v6 issues on the route(s) between Squid and all 
the servers IPs. A lot of networks still have disabled ICMP fully or 
partially in ways which can break route recovery. Lack of ICMP is how 
temporary router power spikes etc halfway across the Internet can kill 
traffic on your network for brief times.
  On the one hand these ICMP issues are not temporary (though Squid may 
only hit them if trying certain IPs), on the other it is not something 
that can be tested or logged from inside Squid. You will need to setup 
some sort of monitor to watch servers Squid connects to - maybe a 
trigger to automatically check anything that results in the gateway 
error being logged before you can manually login.


Next on the line would be TLS handshake behaviours with all IPs of the 
problem server(s). That is easier to test after the fact, but don't take 
success as a guarantee. It could still be a temporary failure in the 
handshake.

 From there it is pot-luck and hope there are some clues lying about to 
hint at good directions.

Amos


More information about the squid-users mailing list