[squid-dev] [PATCH] No reconfiguration during shutdown

Mon Oct 26 17:24:56 UTC 2015

On 27/10/2015 5:00 a.m., Alex Rousskov wrote:
> Hello,
> 
>     To avoid crashes, prohibit pointless reconfiguration during shutdown.
> 
> Also consolidated and polished signal action handling code:
> 
> 1. For any executed action X, clear do_X at the beginning of action X
>    code because once we start X, we should accept/queue more X
>    requests (or inform the admin if we reject them).
> 
> 2. Delay any action X requested during startup or reconfiguration
>    because the latter two actions modify global state that X depends
>    on. Inform the admin that the requested action is being delayed.
> 
> 3. Cancel any action X requested during shutdown. We cannot run X
>    during shutdown because shutdown modifies global state that X
>    depends on, and we never come back from shutdown so there is no
>    point in delaying X. Inform the admin that the requested action is
>    canceled.
> 
> Repeated failed attempts to fix crashes related to various overlapping
> signal actions confirm that this code is a lot trickier than it looks.
> This change introduces a more systematic/comprehensive approach to
> resolving associated conflicts compared to previous ad hoc attempts.
> 
> For example, there were several changes related to bug 3574 (trunk
> r14354), but trunk Squid still crashes if SIGHUP is received at the
> "wrong" time. I hope this fix will kill the remaining similar bugs or at
> least make future fixes easier.
> 
>     http://bugs.squid-cache.org/show_bug.cgi?id=3574
> 

+1 on this patch.

Please apply with a "--fixes squid:3574" and bug reference in the commit
title.

> 
> One possible future work is to split shutdown into two states:
> 
> * scheduled (waiting for timeout to expire; may not affect some of the
>   signal actions) and
> * in-progress (blocks out all other actions).
> 
> Currently, the two states are merged into one in trunk code (there is
> only one shutting_down global). This fix does not attempt to address
> that deficiency. Factory does not plan to work on this in the
> foreseeable future. Please feel free to solve this problem!

I did (re-)discover that the final cycle through SignalsEngine whe
hutdown timeout ends does indeed drain the AsyncQueue. But not wait for
any other types of pending I/O or FD events that might appear during
that drain. That is paving the way for the current swap.state read/write
crashes on shutdown.

I plan to work towards Runners doing all the shutdown handling and in
particular hooking some components into that which are currently not
paying any attention to shutdown termination (ie the swap.state and DNS
sockets FD). Once that conversion is completed we shall see what remains
that needs any async handling after timout ends.

Amos