[squid-users] 6.x gives frequent connection to peer failed - spurious?

Stephen Borrill squid at borrill.org.uk
Wed Nov 22 10:15:13 UTC 2023


On 21/11/2023 15:55, Alex Rousskov wrote:
> On 2023-11-21 08:38, Stephen Borrill wrote:
>> On 15/11/2023 21:55, Alex Rousskov wrote:
>>> On 2023-11-10 05:46, Stephen Borrill wrote:
>>>
>>>> With 6.x (currently 6.5) there are very frequent (every 10 seconds 
>>>> or so) messages like:
>>>> 2023/11/10 10:25:43 kid1| ERROR: Connection to 127.0.0.1:8123 failed
>>>
>>>
>>>> why is this logged as a connection failure
>>>
>>> The current error wording is too assuming and, in your case, 
>>> evidently misleading. The phrase "Connection to X failed" should be 
>>> changed to something more general like "Cannot contact cache_peer X" 
>>> or "Cannot communicate with cache_peer X".
>>>
>>> CachePeer::countFailure() patches welcome.
> 
>> But the point is that it _can_ communicate with the peer, but the peer 
>> itself can't service the request. The peer returning 503 shouldn't be 
>> logged as a connection failure
> 
> 
> My bad. I missed the fact that the described DNS error happens at a 
> _peer_ Squid. Sorry.
> 
> 
> Currently, Squid v6 treats most CONNECT-to-peer errors as a sign of a 
> broken peer. In 2022, 4xx errors were excluded from that set[1]. At that 
> time, we also proposed to make that decision configurable using a new 
> cache_peer_fault directive[2], but the new directive was blocked as an 
> "overkill"[3], so we hard-coded 4xx exclusion instead.
> 
> Going forward, you have several options, including these two:
> 
> 1. Convince others that Squid should treat all 503 CONNECT errors from 
> peers as it already treats all 4xx errors. Hard-code that new logic.
> 
> 2. Convince others that cache_peer_fault or a similar directive is a 
> good idea rather than an overkill. Resurrect its implementation[2].
> 
> 
> [1] 
> https://github.com/squid-cache/squid/commit/022dbabd89249f839d1861aa87c1ab9e1a008a47
> 
> [2] 
> https://github.com/squid-cache/squid/commit/25431f18f2f5e796b8704c85fc51f93b6cc2a73d
> 
> [3] https://github.com/squid-cache/squid/pull/1166#issuecomment-1295806530


2) seems sensible. Especially in the case where you have a single 
cache_peer and cannot go direct. No benefit to marking it as dead.

However, I'm currently running with 1) as per below and this stops the 
non-existent domains counting against the peer (which surely opens it to 
a DoS attack):

--- src/CachePeer.cc.orig       2023-11-22 08:30:17.524266325 +0000
+++ src/CachePeer.cc    2023-11-22 08:31:05.394052184 +0000
@@ -71,7 +71,7 @@
  void
  CachePeer::noteFailure(const Http::StatusCode code)
  {
-    if (Http::Is4xx(code))
+    if (Http::Is4xx(code) || code == Http::scServiceUnavailable)
          return; // this failure is not our fault

      countFailure();


>>>  > do I need to worry about it beyond it filing up the logs needlessly?
>>>
>>> In short, "yes".
>>>
>>> I cannot accurately assess your specific needs, but, in most 
>>> environments, one should indeed worry that their cache_peer server 
>>> names cannot be reliably resolved because failed resolution attempts 
>>> waste Squid resources and increase transaction response time. 
>>> Moreover, if these failures are frequent enough (relative to peer 
>>> usage attempts), the affected cache_peer will be marked as DEAD (as 
>>> you have mentioned):
>>>
>>>  > 2023/11/09 08:55:22 kid1| Detected DEAD Parent: 127.0.0.1:8123
>>
>> Problem seems to be easily reproducible:
>>
>> 1# env https_proxy=http://127.0.0.1:8084 curl https://www.invalid.domain/
>> curl: (56) CONNECT tunnel failed, response 503
>> 2# grep invalid /usr/local/squid/logs/access.log|tail -1
>> 1700573429.015      4 127.0.0.1:8084 TCP_TUNNEL/503 0 CONNECT 
>> www.invalid.domain:443 - FIRSTUP_PARENT/127.0.0.1:8123 -
>> 3# date -r 1700573429 '+%Y/%m/%d %H:%M:%S'
>> 2023/11/21 13:30:29
>> 4# grep '2023/11/21 13:30:29' /usr/local/squid/logs/cache.log
>> 2023/11/21 13:30:29 kid1| ERROR: Connection to 127.0.0.1:8123 failed
>>
>>>> With 4.x there were no such messages.
>>>>
>>>> By comparing to the peer squid logs, these seems to tally with DNS 
>>>> failures:
>>>> peer_select.cc(479) resolveSelected: PeerSelector1688 found all 0 
>>>> destinations for bugzilla.tucasi.com:443
>>>>
>>>> Full ALL,2 log at the time of the reported connection failure:
>>>>
>>>> 2023/11/10 10:25:43.162 kid1| 5,2| TcpAcceptor.cc(214) doAccept: New 
>>>> connection on FD 17
>>>> 2023/11/10 10:25:43.162 kid1| 5,2| TcpAcceptor.cc(316) acceptNext: 
>>>> connection on conn3 local=127.0.0.1:8123 remote=[::] FD 17 flags=9
>>>> 2023/11/10 10:25:43.162 kid1| 11,2| client_side.cc(1332) 
>>>> parseHttpRequest: HTTP Client conn13206 local=127.0.0.1:8123 
>>>> remote=127.0.0.1:57843 FD 147 flags=1
>>>> 2023/11/10 10:25:43.162 kid1| 11,2| client_side.cc(1336) 
>>>> parseHttpRequest: HTTP Client REQUEST:
>>>> 2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(707) 
>>>> clientAccessCheckDone: The request CONNECT bugzilla.tucasi.com:443 
>>>> is ALLOWED; last ACL checked: localhost
>>>> 2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(683) 
>>>> clientAccessCheck2: No adapted_http_access configuration. default: 
>>>> ALLOW
>>>> 2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(707) 
>>>> clientAccessCheckDone: The request CONNECT bugzilla.tucasi.com:443 
>>>> is ALLOWED; last ACL checked: localhost
>>>> 2023/11/10 10:25:43.162 kid1| 44,2| peer_select.cc(460) 
>>>> resolveSelected: Find IP destination for: bugzilla.tucasi.com:443' 
>>>> via bugzilla.tucasi.com
>>>> 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(479) 
>>>> resolveSelected: PeerSelector1526 found all 0 destinations for 
>>>> bugzilla.tucasi.com:443
>>>> 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(480) 
>>>> resolveSelected:    always_direct = ALLOWED
>>>> 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(481) 
>>>> resolveSelected:     never_direct = DENIED
>>>> 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(482) 
>>>> resolveSelected:         timedout = 0
>>>> 2023/11/10 10:25:43.163 kid1| 4,2| errorpage.cc(1397) buildBody: No 
>>>> existing error page language negotiated for ERR_DNS_FAIL. Using 
>>>> default error file.
>>>> 2023/11/10 10:25:43.163 kid1| 33,2| client_side.cc(617) swanSong: 
>>>> conn13206 local=127.0.0.1:8123 remote=127.0.0.1:57843 flags=1
>>>>
>>>> If my analysis is correct why is this logged as a connection failure 
>>>> and do I need to worry about it beyond it filing up the logs 
>>>> needlessly?
>>>>
>>>> My concern is that this could lead to the parent being incorrectly 
>>>> declared DEAD thus impacting other traffic:
>>>>
>>>> 2023/11/09 08:55:22 kid1| Detected DEAD Parent: 127.0.0.1:8123
>>>>      current master transaction: master4581234
>>>>
>>>
> 



More information about the squid-users mailing list