[squid-users] BUG 3556

Stephen Borrill squid at borrill.org.uk
Thu Oct 15 14:06:04 UTC 2020


On 15/10/2020 14:59, Alex Rousskov wrote:
> On 10/15/20 4:07 AM, Stephen Borrill wrote:
>> At a few installations of squid 4.12 (patched for GREASE) on NetBSD
>> 9, I'm seeing that occasionally one of the listening ports no longer 
>> accepts connections (it doesn't reject them, but a connection does
>> not get established).> The port appears random; it's not the same
>> every time and isn't related to ports with SSL interception. A
>> restart of squid fixes it.
> 
>> Looking through the logs, this appears to coincide with lines such as:
>>
>> 2020/10/14 22:32:16 kid1| ERROR: getsockname() failed to locate local-IP
>> on local=[::] remote=10.0.106.147:61996 FD 25 flags=1: (22) Invalid argument
>> 2020/10/14 22:32:16 kid1| BUG 3556: FD 25 is not an open socket.
>>
>> This looks similar to Alex Rousskov's recent observations:
>> https://bugs.squid-cache.org/show_bug.cgi?id=3556#c15
> 
> Please keep in mind that those "BUG 3556" messages warn us about Squid
> bugs elsewhere/somewhere in Squid code. For each particular message
> instance, the exact bug is unknown a priori, and several different bugs
> have triggered these messages in the past. While the original bug 3556
> report was for a specific bug, the log messages were not (and are not).
> 
> 
>> However, we have also seen with at sites where there is no SSL
>> interception (the above lines are from such an installation).
> 
>> 1) Am I right that triggering BUG 3556 could lead to the described symptoms?
> 
> I would rephrase this as "Failure to obtain the (intended) IP address of
> an (intercepted) connection leads to BUG 3556 messages."
> 
> 
>> 2) Is this a squid bug or triggered by a problem in the underlying OS?
> 
> Those "BUG 3556" messages indicate a Squid bug. There is no question
> about that. However, the ERROR messages may indicate a Squid
> bug/deficiency and/or an environment (OS configuration, etc.) problem.
> In summary, you are dealing with multiple problems here. You should
> focus on the ERROR messages, not "BUG 3556" messages.
> 
> 
>> If the latter, where to start looking?
> 
> Check system log for errors. Perhaps you are exhausting some system
> resource?

That's always my first port of call anyway, there are no reports.

> I would also try to map ERROR messages to client transactions in hope to
> spot some common pattern behind those failed transactions.
> Unfortunately, I do not know whether Squid (especially Squid v4) would
> log these failed transactions.
> 
> 
>> 3) What workarounds are there?
> 
> a) Monitor logs and automatically restart the Squid instance if needed.
> 
> b) Patch Squid to kill the affected process. Adding "assert(false);"
> after the ERROR message is printed in Comm::TcpAcceptor::oldAccept()
> will kill the process. Killing a single worker may or may not be enough
> in SMP mode; it would be interesting and potentially useful to know
> whether that is enough.

We aren't using SMP mode.

> You may be able to easily test your workaround using the trick I
> outlined in https://bugs.squid-cache.org/show_bug.cgi?id=3556#c15

I have also been pointed to your comment here:
https://bugs.squid-cache.org/show_bug.cgi?id=5069#c1

I'm currently testing the COMM_ERROR -> NOMESSAGE patch as bug 5069
sounds identical.

>> Given that a restart fixes it, in some
>> respects it would be better for squid to quit so it can be restarted
>> rather than continue to run in a half-working state.
> 
> Yes, earlier Squids were written using the "Do whatever you can to stay
> up" or "Damn the torpedoes!" principle. FWIW, I am pushing for reversing
> the relevant code logic to follow the "Squid instance that cannot
> provide an essential service explicitly requested by the admin should
> quit with an error" principle, but it will take time to achieve that ideal.
> 
> 
>> 4) Related to 3), are there any other ways to detect the problem when it
>> is happening besides parsing logs or testing if all ports are accepting
>> connections? This could be used to trigger an automated restart as a
>> temporary workaround.
> 
> Yes, see suggestion 3b above.
> 
> 
> HTH,
> 
> Alex.
> 



More information about the squid-users mailing list