[squid-dev] Rock store stopped accessing discs

Tue Mar 7 23:26:23 UTC 2017

On 03/07/2017 01:08 PM, Heiler Bemerguy wrote:

> Some log from right now...

Here is my analysis:

> 15:53:05.255| ipc/Queue.h findOldest: peeking from 7 to 6 at 1

Squid worker (kid6) is looking at the queue of disker (kid7) responses.
There is just one response in the queue.

> IpcIoFile.cc canWait: cannot wait: 1136930911 oldest: ipcIo6.381049w7

Squid worker is trying to estimate whether it has enough time to queue
(and eventually perform) more disk I/O. The expected wait is
1'136'930'911 milliseconds or ~13 days. That is longer than the
configured cache_dir swap-timeout (a few hundred milliseconds) so Squid
refuses to queue this disk I/O request:

> store_io.cc storeCreate: no swapdirs for e:=sw1p2RDV/0x206e17d0*4

The same story happens in your other log snippets AFAICT:

> 15:53:05.302| ipc/Queue.h findOldest: peeking from 8 to 6 at 1
> IpcIoFile.cc canWait: cannot wait: 1136930958 oldest: ipcIo6.153009r8
> store_io.cc storeCreate: no swapdirs for e:=msw1DV/0x3a4d5870*2

and

> 15:53:05.318| ipc/Queue.h findOldest: peeking from 7 to 6 at 1
> IpcIoFile.cc canWait: cannot wait: 1136930974 oldest: ipcIo6.381049w7
> store_io.cc storeCreate: no swapdirs for e:m381432=w1p2DV/0x3ace6740*4

and

> 15:53:05.793| ipc/Queue.h findOldest: peeking from 7 to 6 at 1
> IpcIoFile.cc canWait: cannot wait: 1136931448 oldest: ipcIo6.381049w7
> store_io.cc storeCreate: no swapdirs for e:=sw1V/0x206e17d0*1

Focusing on one disker (kid7), we can see that the oldest response does
not change: It is always ipcIo6.381049w7. This stuck response results in
gradual increment of the expected wait time with every check, matching
wall clock time increment:

> 15:53:05.255| cannot wait: 1136930911
> 15:53:05.318| cannot wait: 1136930974
> 15:53:05.793| cannot wait: 1136931448

These stuck disker responses probably explain why your disks do not
receive any traffic. It is potentially important that both disker
responses shown in your logs got stuck at approximately the same
absolute time ~13 days ago (around 2017-02-22, give or take a day;
subtract 1136930911 milliseconds from 15:53:05.255 in your Squid time
zone to know the "exact" time when those stuck requests were queued).

How can a disker response get stuck? Most likely, something unusual
happened ~13 days ago. This could be a Squid bug and/or a kid restart.

* Do all currently running Squid kid processes have about the same start
time? [1]

* Do you see ipcIo6.381049w7 or ipcIo6.153009r8 mentioned in any old
non-debugging messages/warnings?

[1]
http://stackoverflow.com/questions/5731234/how-to-get-the-start-time-of-a-long-running-linux-process

Thank you,

Alex.