[squid-dev] Rock store stopped accessing discs

Heiler Bemerguy heiler.bemerguy at cinbesa.com.br
Tue Mar 7 20:08:39 UTC 2017



Em 07/03/2017 15:22, Alex Rousskov escreveu:
>
>> 2017/02/18 13:48:19 kid3| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo3.9082w9
>> 2017/02/18 13:48:42 kid4| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo4.3371w9
>> 2017/02/18 14:06:01 kid9| WARNING: /cache4/rock delays I/O requests for 9.97 seconds to obey 200/sec rate limit
>> 2017/02/18 14:06:34 kid9| WARNING: /cache4/rock delays I/O requests for 21.82 seconds to obey 200/sec rate limit
>> 2017/02/18 14:06:42 kid4| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:06:47 kid3| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:06:48 kid1| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:06:49 kid4| WARNING: abandoning 4 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:06:54 kid3| WARNING: abandoning 2 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:07:55 kid9| WARNING: /cache4/rock delays I/O requests for 68.64 seconds to obey 200/sec rate limit
>> 2017/02/18 14:08:03 kid5| WARNING: abandoning 511 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:08:47 kid2| WARNING: abandoning 20 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:08:51 kid3| WARNING: abandoning 41 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 14:08:54 kid1| WARNING: abandoning 41 /cache4/rock I/Os after at least 7.00s timeout
>> 2017/02/18 15:26:35 kid5| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo5.31404w9
>> 2017/02/18 15:29:00 kid9| WARNING: /cache4/rock delays I/O requests for 9.92 seconds to obey 200/sec rate limit
>> 2017/02/18 15:29:13 kid9| WARNING: /cache4/rock delays I/O requests for 8.23 seconds to obey 200/sec rate limit
>> 2017/02/18 15:29:45 kid9| WARNING: /cache4/rock delays I/O requests for 8.86 seconds to obey 200/sec rate limit
>> 2017/02/18 15:30:06 kid9| WARNING: /cache4/rock delays I/O requests for 7.34 seconds to obey 200/sec rate limit
>> 2017/02/18 15:30:27 kid9| WARNING: /cache4/rock delays I/O requests for 7.65 seconds to obey 200/sec rate limit
>> 2017/02/18 15:30:48 kid9| WARNING: /cache4/rock delays I/O requests for 8.97 seconds to obey 200/sec rate limit
>> 2017/02/18 15:31:09 kid9| WARNING: /cache4/rock delays I/O requests for 8.52 seconds to obey 200/sec rate limit
>> 2017/02/18 15:31:22 kid9| WARNING: /cache4/rock delays I/O requests for 10.61 seconds to obey 200/sec rate limit
>> 2017/02/18 17:19:40 kid9| WARNING: /cache4/rock delays I/O requests for 10.22 seconds to obey 200/sec rate limit
> There is a Squid bug and/or your cache disks could not keep up with the
> I/O load. Please note that I/O load during initial cache index rebuild
> is higher.
>
> There are several ways to proceed IMO:
>
> 1. Figure out why your Squid is not using the disk cache _now_. This
> will require enabling debugging, at least for a few seconds, and then
> analyzing cache.log. I recommend enabling debugging in one worker kid
> only (i.e., sending its process the right signal instead of running
> "squid -k debug"). See Squid wiki for debugging details. Do not restart
> or reconfigure Squid until you collect those debugging logs! Please note
> that the logs may contain private user info.

Some log from right now...

2017/03/07 15:53:05.255 kid6| 19,4| MemObject.cc(431) isContiguous: 
MemObject::isContiguous: Returning true
2017/03/07 15:53:05.255 kid6| 73,3| HttpRequest.cc(658) storeId: sent 
back effectiveRequestUrl: http://www.agenciabelem.com.br/signalr/hubs
2017/03/07 15:53:05.255 kid6| 20,3| store_swapmeta.cc(52) 
storeSwapMetaBuild: storeSwapMetaBuild URL: 
http://www.agenciabelem.com.br/signalr/hubs
2017/03/07 15:53:05.255 kid6| 54,2| ../../../src/ipc/Queue.h(490) 
findOldest: peeking from 7 to 6 at 1
2017/03/07 15:53:05.255 kid6| 47,2| IpcIoFile.cc(415) canWait: cannot 
wait: 1136930911 oldest: ipcIo6.381049w7
2017/03/07 15:53:05.255 kid6| 20,2| store_io.cc(38) storeCreate: 
storeCreate: no swapdirs for e:=sw1p2RDV/0x206e17d0*4
2017/03/07 15:53:05.255 kid6| 90,3| store_client.cc(729) invokeHandlers: 
InvokeHandlers: 69538C9E111C077214004DE73D0F257B
2017/03/07 15:53:05.255 kid6| 90,3| store_client.cc(735) invokeHandlers: 
StoreEntry::InvokeHandlers: checking client #0
2017/03/07 15:53:05.255 kid6| 90,3| store_client.cc(755) 
storePendingNClients: storePendingNClients: returning 1
2017/03/07 15:53:05.255 kid6| 20,3| store.cc(494) unlock: Client 
unlocking key 69538C9E111C077214004DE73D0F257B e:=sp2RDV/0x206e17d0*4
2017/03/07 15:53:05.255 kid6| 17,3| FwdState.cc(274) ~FwdState: FwdState 
destructor start
--
2017/03/07 15:53:05.302 kid6| 90,3| store_client.cc(661) 
storeUnregister: storeUnregister: called for 
'C8303E098834088226809BF615D12C51'
2017/03/07 15:53:05.302 kid6| 19,4| MemObject.cc(431) isContiguous: 
MemObject::isContiguous: Returning true
2017/03/07 15:53:05.302 kid6| 20,3| store_swapmeta.cc(52) 
storeSwapMetaBuild: storeSwapMetaBuild URL: 
http://www.ormnews.com.br/library/bootstrap/css/bootstrap.min.css
2017/03/07 15:53:05.302 kid6| 54,2| ../../../src/ipc/Queue.h(490) 
findOldest: peeking from 8 to 6 at 1
2017/03/07 15:53:05.302 kid6| 47,2| IpcIoFile.cc(415) canWait: cannot 
wait: 1136930958 oldest: ipcIo6.153009r8
2017/03/07 15:53:05.302 kid6| 20,2| store_io.cc(38) storeCreate: 
storeCreate: no swapdirs for e:=msw1DV/0x3a4d5870*2
2017/03/07 15:53:05.302 kid6| 20,3| store.cc(457) lock: storeUnregister 
locked key C8303E098834088226809BF615D12C51 e:=msDV/0x3a4d5870*3
2017/03/07 15:53:05.302 kid6| 90,3| store_client.cc(755) 
storePendingNClients: storePendingNClients: returning 0
2017/03/07 15:53:05.302 kid6| 20,3| store.cc(494) unlock: 
storeUnregister unlocking key C8303E098834088226809BF615D12C51 
e:=msDV/0x3a4d5870*3
2017/03/07 15:53:05.302 kid6| 20,3| store.cc(494) unlock: 
clientReplyContext::removeStoreReference unlocking key 
C8303E098834088226809BF615D12C51 e:=msDV/0x3a4d5870*2
2017/03/07 15:53:05.302 kid6| 33,3| Pipeline.cc(69) popMe: Pipeline 
0xc8a25a0 drop 0x1b8b2cd0*3
--
2017/03/07 15:53:05.318 kid6| 20,3| store_swapout.cc(377) 
mayStartSwapOut: already allowed
2017/03/07 15:53:05.318 kid6| 73,3| HttpRequest.cc(658) storeId: sent 
back effectiveRequestUrl: 
http://img.olx.com.br/images/31/310706022882854.jpg
2017/03/07 15:53:05.318 kid6| 20,3| store_swapmeta.cc(52) 
storeSwapMetaBuild: storeSwapMetaBuild URL: 
http://img.olx.com.br/images/31/310706022882854.jpg
2017/03/07 15:53:05.318 kid6| 54,2| ../../../src/ipc/Queue.h(490) 
findOldest: peeking from 7 to 6 at 1
2017/03/07 15:53:05.318 kid6| 47,2| IpcIoFile.cc(415) canWait: cannot 
wait: 1136930974 oldest: ipcIo6.381049w7
2017/03/07 15:53:05.318 kid6| 20,2| store_io.cc(38) storeCreate: 
storeCreate: no swapdirs for e:m381432=w1p2DV/0x3ace6740*4
2017/03/07 15:53:05.318 kid6| 90,3| store_client.cc(729) invokeHandlers: 
InvokeHandlers: 5E66A439D7F0C1ED9A6B0742518C59B6
2017/03/07 15:53:05.318 kid6| 90,3| store_client.cc(735) invokeHandlers: 
StoreEntry::InvokeHandlers: checking client #0
2017/03/07 15:53:05.318 kid6| 11,3| http.cc(1090) persistentConnStatus: 
local=10.1.10.9:59284 remote=104.113.39.233:80 FD 612 flags=1 eof=0
2017/03/07 15:53:05.318 kid6| 5,3| comm.cc(559) commSetConnTimeout: 
local=10.1.10.9:59284 remote=104.113.39.233:80 FD 612 flags=1 timeout 600
2017/03/07 15:53:05.318 kid6| 5,4| AsyncCallQueue.cc(55) fireNext: 
entering TunnelBlindCopyReadHandler(local=10.1.10.9:54832 
remote=201.57.89.205:443 FD 144 flags=1, data=0x2d04
83d8, size=4344, buf=0x207b41b0)
--
2017/03/07 15:53:05.793 kid6| 20,3| store.cc(1342) validLength: 
storeEntryValidLength: Checking '96501FC1CC75A4D3B9B6BA8A2D899B45'
2017/03/07 15:53:05.793 kid6| 19,4| MemObject.cc(431) isContiguous: 
MemObject::isContiguous: Returning true
2017/03/07 15:53:05.793 kid6| 20,3| store_swapmeta.cc(52) 
storeSwapMetaBuild: storeSwapMetaBuild URL: 
http://globoesporte.globo.com/busca/?q=gabriel+jesus&ps=on
2017/03/07 15:53:05.793 kid6| 54,2| ../../../src/ipc/Queue.h(490) 
findOldest: peeking from 7 to 6 at 1
2017/03/07 15:53:05.793 kid6| 47,2| IpcIoFile.cc(415) canWait: cannot 
wait: 1136931448 oldest: ipcIo6.381049w7
2017/03/07 15:53:05.793 kid6| 20,2| store_io.cc(38) storeCreate: 
storeCreate: no swapdirs for e:=sw1V/0x206e17d0*1
2017/03/07 15:53:05.793 kid6| 90,3| store_client.cc(729) invokeHandlers: 
InvokeHandlers: 96501FC1CC75A4D3B9B6BA8A2D899B45
2017/03/07 15:53:05.793 kid6| 20,3| store.cc(494) unlock: 
StoreEntry::forcePublicKey+Vary unlocking key 
96501FC1CC75A4D3B9B6BA8A2D899B45 e:=sV/0x206e17d0*1
2017/03/07 15:53:05.793 kid6| 90,3| store_client.cc(755) 
storePendingNClients: storePendingNClients: returning 0
2017/03/07 15:53:05.793 kid6| 20,3| store.cc(396) destroyStoreEntry: 
destroyStoreEntry: destroying 0x206e17d8
2017/03/07 15:53:05.793 kid6| 20,3| store.cc(378) destroyMemObject: 
destroyMemObject 0x354a3970

> 2. Reducing max-swap-rate from 200 to, say, 20. If your disks cannot
> keep up _and_ there is a Squid bug that screws something up when your
> disks cannot keep up, then this blind configuration change may avoid
> triggering that bug.

I'll try to use this on a last resort attempt...

> 3. Collect enough iostat 5-second outputs (or equivalent) to correlate
> system performance with cache.log messages. I would also collect other
> system activity during those hours. The "atop" tool may be useful for
> collecting everything in one place. You will probably want to restart
> Squid for a clean experiment/collection.
>

Trying to figure out how to use atop information...

-- 
Best Regards,

Heiler Bemerguy
Network Manager - CINBESA
55 91 98151-4894/3184-1751



More information about the squid-dev mailing list