[squid-users] squid hanging in 100% steal

Thu Jan 24 13:23:03 UTC 2019

On 25/01/19 1:24 am, Marc wrote:
> Hi,
> 
> For some reason my squid sometimes hangs (after weeks of running
> smoothly) in 100% steal, until I kill the proces and restart it, after
> which the proces will again run stable for weeks.

What does "100% steal" mean?

> 
> It's running on a AWS EC2 instance, squid version:
> squid-3.5.20-10.34.amzn1.x86_64 , see below for some debugging info.
> Any idea what could be the problem here ? Thanks!
> 
> top:
> [11:56:49][root at ip-172-31-9-138 ~]# top
> top - 11:57:11 up 218 days, 17:36,  1 user,  load average: 1.06, 1.17, 1.09
> Tasks:  81 total,   2 running,  79 sleeping,   0 stopped,   0 zombie
> Cpu(s):  4.5%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 95.2%st
> Mem:    501220k total,   405748k used,    95472k free,    65512k buffers
> Swap:        0k total,        0k used,        0k free,    88948k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 29963 squid     20   0  290m 171m 7472 R 99.9 35.1 672:59.73 squid
>     1 root      20   0 19648 2480 2148 S  0.0  0.5   0:02.05 init
> <snip>
> 
> vmstat:
> [11:57:39][root at ip-172-31-9-138 ~]# vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  1      0  95408  65536  89052    0    0     0     4    1    1  0  0 99  0  0
>  1  0      0  95408  65536  89040    0    0     0     4   56   36  5  0  0  0 95
>  2  0      0  95408  65536  89040    0    0     0     0   54   18  5  0  0  0 95
>  1  0      0  95408  65536  89040    0    0     0     0   57   30  5  0  0  0 95
>  1  0      0  95408  65536  89040    0    0     0     4   52   25  5  0  0  0 95
>  3  0      0  95408  65536  89040    0    0     0     0   52   14  6  0  0  0 94
>  1  0      0  95408  65536  89040    0    0     0     0   50   26  4  0  0  0 96
>  2  0      0  95408  65536  89040    0    0     0     0   53   21  6  0  0  0 94
>  1  0      0  95408  65540  89036    0    0     0    12   62   38  5  0  0  0 95
>  2  0      0  95408  65540  89040    0    0     0    36   55   14  5  0  0  0 95
>  1  0      0  95408  65540  89040    0    0     0     0   51   34  5  0  0  0 95
> 
> gdb:
> [11:55:07][root at ip-172-31-9-138 ~]# sudo gdb -n -batch -ex backtrace -pid 29963
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 0x00000000007bca52 in
> CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
> const&) ()
> #0  0x00000000007bca52 in
> CbcPointer<Comm::TcpAcceptor>::operator=(CbcPointer<Comm::TcpAcceptor>
> const&) ()
> #1  0x00000000007bc3d4 in Comm::AcceptLimiter::kick() ()
> #2  0x0000000000721867 in AsyncCall::make() ()
> #3  0x00000000007259e2 in AsyncCallQueue::fireNext() ()
> #4  0x0000000000725e20 in AsyncCallQueue::fire() ()
> #5  0x00000000005b0089 in EventLoop::runOnce() ()
> #6  0x00000000005b0178 in EventLoop::run() ()
> #7  0x00000000006192cc in SquidMain(int, char**) ()
> #8  0x0000000000514b3b in main ()
> 

This looks like it may be one of the symptoms of
<https://bugs.squid-cache.org/show_bug.cgi?id=4885> which was fixed in
Squid-4.3 release.

Please try the current Squid-4 release to see if the issue is already
resolved. v3.5 is no longer supported, so if it is a bug we will need
traces and replication using the current Squid (v4 or v5) version to
have a realistic chance of anyone being able to fix it.

Amos