[squid-users] logging: TCP connection to x.x.x.x/3128 failed

Mon Apr 3 00:55:06 UTC 2023

On 3/30/23 07:58, Waldemar Brodkorb wrote:

> we recently updated one of our stage 1 proxies to Ubuntu 22.04 with
> Squid 5.8. The setup is like so:
> clients <-> loadbalancer <-> stage 1 proxies <-> stage 2 proxies <-> internet
> 
> Now the cache.log on the stage 1 proxy is polluted with a lot of
> messages like: TCP connection to x.x.x.x/3128 failed
> 
> The message appear when a client tries to connect to a website which
> does not resolve via DNS. The parent stage 2 proxy sends then a
> 50x error and the stage 1 proxy logs messages like above for every
> of the six stage 2 proxies.

> How can we suppress these messages 

I do not think you can suppress these messages without changing Squid 
source code.

Please note that when Squid considers the connection to a cache_peer 
failed, it decrements the "up" counter for that peer. If that counter 
reaches zero, the peer will be marked as dead. The counter starts at 
connect-fail-limit=N level. The counter is restored to N when a 
connection succeeds, so you may not normally see a dead peer for this 
specific reasons, but it is just luck.

> or can they be fixed to what is really happening?

I suspect any fixes would require changing Squid sources. Squid v6 
commit 022dbab might be helpful here, at least as an inspiration (it 
does not apply to v5 cleanly and it focuses on 4xx peer responses while 
your messages are related to 5xx peer responses).

Squid v6 has cache_log_message directive that can be used to suppress 
some level-1 errors, but the error you are writing about is not one of 
them, and, more importantly, I would not recommend suppressing it (see 
the "Please note..." paragraph above for the discussion of this error 
potential importance). That directive is not available in v5.

Ultimately, what you are describing sounds like a Squid bug to me: Squid 
should not blame its cache_peer for request-target DNS resolution errors 
outside that peer control. Fixing that bug can be controversial because 
an admin may want Squid to automatically start bypassing a cache_peer 
that is, for example, misconfigured and cannot resolve _any_ 
request-targets. One could argue that detection (and bypass) of (such) 
problematic cache_peers should be done differently, but it is unknown 
whether others would agree with any specific logic changes in that area.

For example, the proposal to add an ACL-driven cache_peer_fault 
directive[1] (to give the admin more control over alive/dead decisions) 
was rejected as "overkill"[2], preserving the approach that relies on 
hard-coding decisions (including commit 022dbab fixes mentioned above).

[1] 
https://github.com/squid-cache/squid/blob/25431f18f2f5e796b8704c85fc51f93b6cc2a73d/src/cf.data.pre#L4019

[2] https://github.com/squid-cache/squid/pull/1166#issuecomment-1295806530

Going forward, one can try to argue that HTTP 502 cache_peer responses 
should never be treated as that cache_peer fault (as far as alive/dead 
decisions are concerned) despite the fact that some 502 responses may be 
related to cache_peer problems rather than basic DNS errors. That would 
be a straightforward change, especially in v6. (If that attempt fails,) 
one could also try to re-introduce the cache_peer_fault directive to let 
admin decide.

https://wiki.squid-cache.org/SquidFaq/AboutSquid#how-to-add-a-new-squid-feature-enhance-of-fix-something

Cheers,

Alex.