[squid-users] IPv4 addresses go missing - markAsBad wrong?

Stephen Borrill squid at borrill.org.uk
Tue Jan 9 10:56:16 UTC 2024


On 09/01/2024 09:51, Stephen Borrill wrote:
> On 09/01/2024 03:41, Alex Rousskov wrote:
>> On 2024-01-08 08:31, Stephen Borrill wrote:
>>> I'm trying to determine why squid 6.x (seen with 6.5) connected via 
>>> IPv4-only periodically fails to connect to the destination and then 
>>> requires a restart to fix it (reload is not sufficient).
>>>
>>> The problem appears to be that a host that has one address each of 
>>> IPv4 and IPv6 occasionally has its IPv4 address go missing as a 
>>> destination. On closer inspection, this appears to happen when the 
>>> IPv6 address (not the IPv4) address is marked as bad. A log fragment 
>>> is as follows:
>>>
>>> 2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(460) 
>>> resolveSelected: Find IP destination for: 
>>> clientservices.googleapis.com:443' via clientservices.googleapis.com
>>> 2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(1174) handlePath: 
>>> PeerSelector82284 found conn696198 local=0.0.0.0 
>>> remote=142.250.187.227:443 HIER_DIRECT flags=1, destination #1 for 
>>> clientservices.googleapis.com:443
>>> 2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(1174) handlePath: 
>>> PeerSelector82284 found conn696199 local=[::] 
>>> remote=[2a00:1450:4009:820::2003]:443 HIER_DIRECT flags=1, 
>>> destination #2 for clientservices.googleapis.com:443
>>> 2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(479) 
>>> resolveSelected: PeerSelector82284 found all 2 destinations for 
>>> clientservices.googleapis.com:443
>>> 2024/01/08 13:18:40.245 kid1| 14,2| ipcache.cc(1031) markAsBad: 
>>> [2a00:1450:4009:820::2003]:443 of clientservices.googleapis.com
>>> 2024/01/08 13:18:40.245 kid1| 14,3| ipcache.cc(946) seekNewGood: 
>>> succeeded for clientservices.googleapis.com: 
>>> [2a00:1450:4009:820::2003] #2/2-1
>>> 2024/01/08 13:18:40.245 kid1| 14,3| ipcache.cc(978) restoreGoodness: 
>>> cleared all IPs for clientservices.googleapis.com; now back to 
>>> [2a00:1450:4009:820::2003] #2/2-1
>>> 2024/01/08 13:18:42.065 kid1| 14,3| Address.cc(389) lookupHostIP: 
>>> Given Non-IP 'clientservices.googleapis.com': hostname or servname 
>>> not provided or not known
>>> 2024/01/08 13:18:42.065 kid1| 44,2| peer_select.cc(460) 
>>> resolveSelected: Find IP destination for: 
>>> clientservices.googleapis.com:443' via clientservices.googleapis.com
>>> 2024/01/08 13:18:42.065 kid1| 14,3| Address.cc(389) lookupHostIP: 
>>> Given Non-IP 'clientservices.googleapis.com': hostname or servname 
>>> not provided or not known
>>> 2024/01/08 13:18:42.065 kid1| 44,2| peer_select.cc(1174) handlePath: 
>>> PeerSelector82372 found conn697148 local=[::] 
>>> remote=[2a00:1450:4009:820::2003]:443 HIER_DIRECT flags=1, 
>>> destination #1 for clientservices.googleapis.com:443
>>> 2024/01/08 13:18:42.065 kid1| 44,2| peer_select.cc(479) 
>>> resolveSelected: PeerSelector82372 found all 1 destinations for 
>>> clientservices.googleapis.com:443
>>>
>>>
>>> This shows two subsequent connection attempts to 
>>> clientservices.googleapis.com. The first one has both IPv4 and IPv6 
>>> destinations. The IPv6 address is passed to markAsBad. 
>>
>> Yes.
>>
>>
>>> After that the IPv4 address is not listed as a destination.
>>
>> I do not see that. I see IPv6 address being selected as the first 
>> destination (instead of the IPv4 address).
> 
> My reading of the the log files is that all possible destinations are 
> listed, e.g. "found all 18 destinations for www.googleapis.com:443". 
> Prior to markAsBad being called both IP addresses for 
> clientservices.googleapis.com are shown, afterwards the IPv4 is not 
> listed. It isn't just that IPv6 is being selected as the first 
> destination, it is the ONLY destination as the IPv4 address is no longer 
> a candidate.
> 
> 
>> I cannot explain why that happens though. Moreover, a combination of 
>> certain lines in your debug output near "seekNewGood" do not make 
>> sense to me -- I do not see how it is possible for Squid to display 
>> those exact debugging details, but I am probably missing something. 
>> Can you retest and repost similar lines with 14,9 (or at least 14,7) 
>> added to your debug_options (or share those lines privately; the more 
>> lines you can share, the better)?
> 
> Thanks, I have started a new run with 14,7 and will pass it on privately 
> unless I can distil out the relevant lines to be small enough for the list.

A distilled version is here. Let me know if you want the full log.

This looks suspicious to me:
have: [2001:4860:4802:32::78]:443 at 0 in 216.239.38.120 #1/2-0

2024/01/09 09:34:34.102 kid1| 14,4| ipcache.cc(617) nbgethostbyname: 
forcesafesearch.google.com
2024/01/09 09:34:34.102 kid1| 14,4| ipcache.cc(657) 
ipcache_nbgethostbyname_: ipcache_nbgethostbyname: HIT for 
'forcesafesearch.google.com'
2024/01/09 09:34:34.102 kid1| 14,7| ipcache.cc(253) forwardIp: 
216.239.38.120
2024/01/09 09:34:34.102 kid1| 44,2| peer_select.cc(1174) handlePath: 
PeerSelector128957 found conn1010793 local=0.0.0.0 
remote=216.239.38.120:443 HIER_DIRECT flags=1, destination #1 for 
forcesafesearch.google.com:443
2024/01/09 09:34:34.102 kid1| 14,7| ipcache.cc(253) forwardIp: 
[2001:4860:4802:32::78]
2024/01/09 09:34:34.102 kid1| 44,2| peer_select.cc(1174) handlePath: 
PeerSelector128957 found conn1010794 local=[::] 
remote=[2001:4860:4802:32::78]:443 HIER_DIRECT flags=1, destination #2 
for forcesafesearch.google.com:443
2024/01/09 09:34:34.102 kid1| 14,7| ipcache.cc(236) finalCallback: 
0x18f816d38
2024/01/09 09:34:34.102 kid1| 44,2| peer_select.cc(479) resolveSelected: 
PeerSelector128957 found all 2 destinations for 
forcesafesearch.google.com:443
2024/01/09 09:34:34.352 kid1| 14,7| ipcache.cc(990) have: 
[2001:4860:4802:32::78]:443 at 0 in 216.239.38.120 #1/2-0
2024/01/09 09:34:34.352 kid1| 14,2| ipcache.cc(1031) markAsBad: 
[2001:4860:4802:32::78]:443 of forcesafesearch.google.com
2024/01/09 09:34:34.352 kid1| 14,3| ipcache.cc(946) seekNewGood: 
succeeded for forcesafesearch.google.com: [2001:4860:4802:32::78] #2/2-1
2024/01/09 09:34:34.352 kid1| 14,3| ipcache.cc(978) restoreGoodness: 
cleared all IPs for forcesafesearch.google.com; now back to 
[2001:4860:4802:32::78] #2/2-1

2024/01/09 09:34:35.683 kid1| 14,4| ipcache.cc(617) nbgethostbyname: 
forcesafesearch.google.com
2024/01/09 09:34:35.683 kid1| 14,3| Address.cc(389) lookupHostIP: Given 
Non-IP 'forcesafesearch.google.com': hostname or servname not provided 
or not known
2024/01/09 09:34:35.683 kid1| 14,4| ipcache.cc(657) 
ipcache_nbgethostbyname_: ipcache_nbgethostbyname: HIT for 
'forcesafesearch.google.com'
2024/01/09 09:34:35.683 kid1| 14,7| ipcache.cc(253) forwardIp: 
[2001:4860:4802:32::78]
2024/01/09 09:34:35.683 kid1| 44,2| peer_select.cc(1174) handlePath: 
PeerSelector128974 found conn1010956 local=[::] 
remote=[2001:4860:4802:32::78]:443 HIER_DIRECT flags=1, destination #1 
for forcesafesearch.google.com:443
2024/01/09 09:34:35.683 kid1| 14,7| ipcache.cc(236) finalCallback: 
0x18f816d38
2024/01/09 09:34:35.683 kid1| 44,2| peer_select.cc(479) resolveSelected: 
PeerSelector128974 found all 1 destinations for 
forcesafesearch.google.com:443
2024/01/09 09:34:35.683 kid1| 14,7| ipcache.cc(990) have: 
[2001:4860:4802:32::78]:443 at 0 in [2001:4860:4802:32::78] #2/2-1


>>> Note that there have been many connections to 
>>> clientservices.googleapis.com prior to this where markAsBad was not 
>>> called, even though IPv6 connectivity was never available.
>>
>> No markAsBad() is probably normal if Squid did not try to establish an 
>> IPv6 connection or did not wait long enough to know the result of that 
>> attempt. However, that does not explain why Squid selected an IPv6 
>> address as the next "good" address right after marking that IPv6 
>> address as bad (at "restoreGoodness" line) when there was another good 
>> IP address available. It is as if Squid stored two identical IPv6 
>> addresses (and not IPv4 ones), but that should not happen either.
> 
> This is tangentially related to this thread too:
> https://lists.squid-cache.org/pipermail/squid-users/2023-November/026266.html
> 
> Once only the IPv6 address is being used, then it returns 503 for that 
> host and thus can quickly get marked as dead by a downstream squid 
> meaning it does not get used at all (and if it's the only peer all 
> access stops).
> 



More information about the squid-users mailing list