[squid-dev] [PATCH] Bug 4430 Squid crashes on shutdown while cleaning up idle ICAP connections.

Amos Jeffries squid3 at treenet.co.nz
Wed Mar 2 01:10:33 UTC 2016


On 2/03/2016 5:21 a.m., Christos Tsantilas wrote:
> On 03/01/2016 04:36 AM, Amos Jeffries wrote:
>> On 1/03/2016 9:31 a.m., Christos Tsantilas wrote:
>>> Hi all,
>>>
>>>    Squid crashes on shutdown with the following backtrace:
>>> (gdb) bt
>>> #0  0x00000000007138d8 in commSetConnTimeout(RefCount<Comm::Connection>
>>> const&, int, RefCount<AsyncCall>&) ()
>>> #1  0x0000000000510ddf in
>>> commUnsetConnTimeout(RefCount<Comm::Connection> const&) ()
>>> #2  0x00000000006104a4 in
>>> IdleConnList::clearHandlers(RefCount<Comm::Connection> const&) ()
>>> #3  0x000000000041226b in IdleConnList::closeN(unsigned long) ()
>>> #4  0x0000000000412bf9 in IdleConnList::~IdleConnList() ()
>>> #5  0x0000000000617e9b in Adaptation::Icap::ServiceRep::~ServiceRep() ()
>>> #6  0x0000000000618579 in Adaptation::Icap::ServiceRep::~ServiceRep() ()
>>> #7  0x000000000060e79c in Adaptation::DetachServices() ()
>>> #8  0x0000000000606da0 in Adaptation::Config::freeService() ()
>>> #9  0x0000000000606e4e in Adaptation::Config::~Config() ()
>>> #10 0x001237f3e8664e29 in __run_exit_handlers (status=0,
>>> listp=0x7f3e5c9cf5a8 <__exit_funcs>,
>>>      run_list_atexit=run_list_atexit at entry=true) at exit.c:82
>>>
>>>
>>> The global Adaptation::Icap::TheConfig object is automatically
>>> destroyed when Squid exits. Its destructor destroys Icap::ServiceRep
>>> objects that, in turn, close all open connections in the idle
>>> connections pool. Since this happens after comm_exit has destroyed all
>>> Comm structures associated with those connections, Squid crases.
>>>
>>> I am attaching two patches. A simple  patch
>>> (Squid-Segfault-on-Shutdown-t2.patch) which I believe it should applied
>>> to squid-3.5, but also I am posting a second preview patch
>>> (Squid-Segfault-on-Shutdown-preview-trunk-t4.patch) with a possible long
>>> term solution.
>>>
>>> I am suggesting to commit for now the squid3.5 patch while we are
>>> discussing a long term solution for this or future similar problems for
>>> trunk.
>>>
>>
>> Okay. +1 on that 3.5 patch going in.
>>
>> For the other:
>>
>> Please split the IndependentRunner and ConnStateData parts out of the
>> main patch. It is unrelated to the ICAP bug specifics.
> 
> OK.
> However the IndepedentRunner is a solution for runner objects which can
> be destroyed outside the runners registry.
> The ICAP IdleConnList are such objects.
> 

They are that. But the ConnStateData is already present and would
benefit. The two changes can therefore be separated and tested
independently to ensure the IndepedentRunner design is okay without the
patch size explosion for IdleConnList.


>>
>> On the one hand;
>>   I am a little worried about the virtual change to finishedShutdown(). I
>> have found myself several times already mistakenly implementing that
>> hook in child Runners. The lack of virtual was what made me take a
>> second look and fix my design bug.
> 
> The finishShutdown should not be used or overwritten by kid classes.
> Maybe we can convert it to a private function.

Nod.

> 
>>   If we go ahead with this I would kind of like some other mechanism for
>> IndependentRunner to use to avoid "delete this" that requires a bit more
>> dev thought and work to make use of than simply overridding the virtual
> 
> In this case we are running the risk to make it complex and finally add
> more bugs than those we are trying to solve...


I was thinking along the lines of some private/protected flag that had
to be explicitly set by IndependentRunner (in its ctor?) to skip the
delete. The name of that flag would be part of the documentation that is
was being dangerous.
 Just something harder than defining a virtual method.


> 
>> method. Making it easier to use IndependentRunner than to duplicate its
>> purpose by accident is a good thing.
>>
>>
>> On the other hand;
>>   ConnStateData being a Runner situation was only intended to be a
>> temporary middle-term workaround until we had a global list of client
>> connections workingproperly and the comm_exit was itself turned into a
>> proper Runner.
> 
> ok.
> 
>>
>>
>> On the other hand;
>>   comm_exit action still needs to be made into a Runner. Which will make
>> its exact timing less predictable and require us to design some
>> friendlier behaviour for it in the case of still-open FD.
>>   - for a long-term solution we should aim at that happening sooner
>> rather than later.
>>   - this does not conflict with the need to fix the ICAP idle list and
>> many other FD uses.
> 
> I do not disagree with this as a general plan. What are you suggesting?


I have only got a few half-formed (aka bad) ideas at present. I keep
floating back to the idea of a Runner that tries to do some action and
if it does not succeed immediately causes itself to happen at the end of
the current runner chain.

That might mean adapting the Registry to take a success/fail result from
the hooks, or Runners de-registering and re-registering themselves. Or
havig two hooks, one being 'hopeful', and another later being absolute.
Pretty bad / rough ideas right now.

Amos



More information about the squid-dev mailing list