[squid-dev] Rock store stopped accessing discs
Alex Rousskov
rousskov at measurement-factory.com
Tue Mar 14 19:41:33 UTC 2017
On 03/14/2017 10:43 AM, Heiler Bemerguy wrote:
> Em 07/03/2017 20:26, Alex Rousskov escreveu:
>> How can a disker response get stuck? Most likely, something unusual
>> happened ~13 days ago. This could be a Squid bug and/or a kid restart.
> root at proxy:~# ps auxw |grep squid-
> proxy 10225 0.0 0.0 13964224 21708 ? S Mar10 0:10 (squid-coord-10) -s
> proxy 10226 0.1 12.5 14737524 8268056 ? S Mar10 7:14 (squid-disk-9) -s
> proxy 10227 0.0 11.6 14737524 7686564 ? S Mar10 3:08 (squid-disk-8) -s
> proxy 10228 0.1 14.9 14737540 9863652 ? S Mar10 7:30 (squid-disk-7) -s
> proxy 18348 3.5 10.3 17157560 6859904 ? S Mar13 48:44 (squid-6) -s
> proxy 18604 2.8 9.0 16903948 5977728 ? S Mar13 37:28 (squid-4) -s
> proxy 18637 1.7 10.8 16836872 7163392 ? R Mar13 23:03 (squid-1) -s
> proxy 20831 15.3 10.3 17226652 6838372 ? S 08:50 39:51 (squid-2) -s
> proxy 21189 5.3 2.8 16538064 1871788 ? S 12:29 2:12 (squid-5) -s
> proxy 21214 3.8 1.5 16448972 1012720 ? S 12:43 1:03 (squid-3) -s
> Diskers aren't dying but workers are, a lot..
I suspect that worker deaths may cause SMP queues to get stuck, but I
have not validated that theory. We probably need to add more code to SMP
queues so that they can recover from untimely kid deaths.
> Another weird thing: lots of timeouts and overflows are happening on
> non-active hours.. From 0h to 7h we have like 1-2% of the clients we
> usually have from 8h to 17h.. (commercial time)
If a queue is stuck, you will see these errors and warnings as long as
there is some need for disk I/O. The volume is not important.
> Still don't know how /cache3 stopped and /cache4 is still active, even
> with all those warnings and errors.. :/
Do you expect Squid to function well in the presence of assertions and
to explain what went wrong while asserting? Unfortunately, we are very
far from that kind of robustness and self-diagnosis nirvana!
I have not studied your error messages in detail, but it is possible
that there are not-yet-stuck queues that feed cache4 while all cache3
queues are stuck. There is one SMP queue for each worker:disker pair.
Alex.
More information about the squid-dev
mailing list