[squid-dev] Rock store stopped accessing discs
Heiler Bemerguy
heiler.bemerguy at cinbesa.com.br
Tue Mar 14 16:43:48 UTC 2017
Em 07/03/2017 20:26, Alex Rousskov escreveu:
> These stuck disker responses probably explain why your disks do not
> receive any traffic. It is potentially important that both disker
> responses shown in your logs got stuck at approximately the same
> absolute time ~13 days ago (around 2017-02-22, give or take a day;
> subtract 1136930911 milliseconds from 15:53:05.255 in your Squid time
> zone to know the "exact" time when those stuck requests were queued).
>
> How can a disker response get stuck? Most likely, something unusual
> happened ~13 days ago. This could be a Squid bug and/or a kid restart.
>
> * Do all currently running Squid kid processes have about the same start
> time? [1]
>
> * Do you see ipcIo6.381049w7 or ipcIo6.153009r8 mentioned in any old
> non-debugging messages/warnings?
I searched the log files from those days, nothing unusual, "grep"
returns nothing for ipcIo6.381049w7 or ipcIo6.153009r8.
On that day I couldn't verify if the kids were still with the same
uptime, I've reformatted those /cache2 /cache3 and /cache4 partitions
and started fresh with squid -z, but looking at the PS right now, I feel
I can answer that question:
root at proxy:~# ps auxw |grep squid-
proxy 10225 0.0 0.0 13964224 21708 ? S Mar10 0:10
(squid-coord-10) -s
proxy 10226 0.1 12.5 14737524 8268056 ? S Mar10 7:14
(squid-disk-9) -s
proxy 10227 0.0 11.6 14737524 7686564 ? S Mar10 3:08
(squid-disk-8) -s
proxy 10228 0.1 14.9 14737540 9863652 ? S Mar10 7:30
(squid-disk-7) -s
proxy 18348 3.5 10.3 17157560 6859904 ? S Mar13 48:44
(squid-6) -s
proxy 18604 2.8 9.0 16903948 5977728 ? S Mar13 37:28
(squid-4) -s
proxy 18637 1.7 10.8 16836872 7163392 ? R Mar13 23:03
(squid-1) -s
proxy 20831 15.3 10.3 17226652 6838372 ? S 08:50 39:51
(squid-2) -s
proxy 21189 5.3 2.8 16538064 1871788 ? S 12:29 2:12
(squid-5) -s
proxy 21214 3.8 1.5 16448972 1012720 ? S 12:43 1:03
(squid-3) -s
Diskers aren't dying but workers are, a lot.. with that "assertion
failed: client_side_reply.cc:1167: http->storeEntry()->objectLen() >=
headers_sz" thing.
Viewing DF and IOSTAT, it seems right now /cache3 isn't being accessed
anymore. (I think it is the disk-8 above, look at the CPU time usage..)
Another weird thing: lots of timeouts and overflows are happening on
non-active hours.. From 0h to 7h we have like 1-2% of the clients we
usually have from 8h to 17h.. (commercial time)
2017/03/14 00:26:50 kid3| WARNING: abandoning 23 /cache4/rock I/Os after
at least 7.00s timeout
2017/03/14 00:26:53 kid1| WARNING: abandoning 1 /cache4/rock I/Os after
at least 7.00s timeout
2017/03/14 02:14:48 kid5| ERROR: worker I/O push queue for /cache4/rock
overflow: ipcIo5.68259w9
2017/03/14 06:33:43 kid3| ERROR: worker I/O push queue for /cache4/rock
overflow: ipcIo3.55919w9
2017/03/14 06:57:53 kid3| ERROR: worker I/O push queue for /cache4/rock
overflow: ipcIo3.58130w9
This cache4 partition is where huge files would be stored:
maximum_object_size 4 GB
cache_dir rock /cache2 110000 min-size=0 max-size=65536
max-swap-rate=150 swap-timeout=360
cache_dir rock /cache3 110000 min-size=65537 max-size=262144
max-swap-rate=150 swap-timeout=380
cache_dir rock /cache4 110000 min-size=262145 max-swap-rate=150
swap-timeout=500
Still don't know how /cache3 stopped and /cache4 is still active, even
with all those warnings and errors.. :/
--
Atenciosamente / Best Regards,
Heiler Bemerguy
Network Manager - CINBESA
55 91 98151-4894/3184-1751
More information about the squid-dev
mailing list