[squid-dev] [PATCH] No reconfiguration during shutdown

Alex Rousskov rousskov at measurement-factory.com
Fri Oct 30 20:58:14 UTC 2015


On 10/26/2015 11:24 AM, Amos Jeffries wrote:
> On 27/10/2015 5:00 a.m., Alex Rousskov wrote:
>> Hello,
>>
>>     To avoid crashes, prohibit pointless reconfiguration during shutdown.
>>
>> Also consolidated and polished signal action handling code:
>>
>> 1. For any executed action X, clear do_X at the beginning of action X
>>    code because once we start X, we should accept/queue more X
>>    requests (or inform the admin if we reject them).
>>
>> 2. Delay any action X requested during startup or reconfiguration
>>    because the latter two actions modify global state that X depends
>>    on. Inform the admin that the requested action is being delayed.
>>
>> 3. Cancel any action X requested during shutdown. We cannot run X
>>    during shutdown because shutdown modifies global state that X
>>    depends on, and we never come back from shutdown so there is no
>>    point in delaying X. Inform the admin that the requested action is
>>    canceled.
>>
>> Repeated failed attempts to fix crashes related to various overlapping
>> signal actions confirm that this code is a lot trickier than it looks.
>> This change introduces a more systematic/comprehensive approach to
>> resolving associated conflicts compared to previous ad hoc attempts.
>>
>> For example, there were several changes related to bug 3574 (trunk
>> r14354), but trunk Squid still crashes if SIGHUP is received at the
>> "wrong" time. I hope this fix will kill the remaining similar bugs or at
>> least make future fixes easier.
>>
>>     http://bugs.squid-cache.org/show_bug.cgi?id=3574
>>
> 
> +1 on this patch.
> 
> Please apply with a "--fixes squid:3574" and bug reference in the commit
> title.

Done (r14374).



>> One possible future work is to split shutdown into two states:
>>
>> * scheduled (waiting for timeout to expire; may not affect some of the
>>   signal actions) and
>> * in-progress (blocks out all other actions).
>>
>> Currently, the two states are merged into one in trunk code (there is
>> only one shutting_down global). This fix does not attempt to address
>> that deficiency. Factory does not plan to work on this in the
>> foreseeable future. Please feel free to solve this problem!
> 
> 
> I did (re-)discover that the final cycle through SignalsEngine whe
> hutdown timeout ends does indeed drain the AsyncQueue. But not wait for
> any other types of pending I/O or FD events that might appear during
> that drain. That is paving the way for the current swap.state read/write
> crashes on shutdown.

Glad you found that bug. We should fix it, but the correct fix is not
trivial because we have to both loop when stop looping at some point. I
resist the temptation to discuss specifics on this thread.


> I plan to work towards Runners doing all the shutdown handling and in
> particular hooking some components into that which are currently not
> paying any attention to shutdown termination (ie the swap.state and DNS
> sockets FD). Once that conversion is completed we shall see what remains
> that needs any async handling after timout ends.


Thank you,

Alex.



More information about the squid-dev mailing list