[squid-users] High CPU usage

Fri Apr 15 11:22:48 UTC 2016

On 15/04/2016 7:26 p.m., Mohammad Sadegh Nasiri wrote:
> Hi
> 
> Does anyone knows why my squid cpu usage is 100%?
> 

Before trying to answer you need to be aware that when/if it needs to
Squid will push CPU, RAM, disk I/O etc right to the hardware limits.

Your first trace is telling the story about 1150 RPS happening. Very few
transactions overlapping, so Squid spends most of its time pushing
individual or small groups of responses very fast. This reaches 800 Mbps
with Squid still spending a measurable chunk of its time (~30%) waiting
for something to do.

Your second trace is telling of a proxy receiving almost as many client
requests per second, but now juggling about 8,000 of them at a time.
Often 250 needing things to be done every cycle. Thats a lot of work, so
Squid has slowed down to 600 Mbps and is now using all of the CPU it can
get.

I think at some point between the traces something went a bit slower, or
some clients did a big transaction making more overlap, or generally
just pushed Squid up to a peak in its workoad that needed more CPU than
was available.
Since CPU can only give 100% that thing took a short while to finish.
Resulting in some transaction overlap, which made those take more CPU to
finish so Squid stays at 100% slightly longer, and round it goes in a
feedback loop.

The numbers that I'm looking at for that are:

 client_http.requests = 	 1150 ->  1124/sec

 client_http.kbytes_in = 	  841 ->   773/sec
 client_http.kbytes_out = 	57428 -> 44051/sec
 server.all.kbytes_in = 	43019 -> 33436/sec
 server.all.kbytes_out =	  705 ->   637/sec

  (adding these gives a ~800 Mbps -> ~600 Mbps drop)

 select_loops = 	14571 ->    69/sec
 select_fds = 		27229 -> 17470/sec

 median_select_fds = 0.000000 -> 253.007812

Significantly more FDs needing things to do each time Squid checks. So
it checks fewer times per sec. Meaning more each time it checks, and so on.
 - the low select loops per sec is what I think is driving the service
times to be longer. They are still under 1sec so not very noticable to
clients.

Watch the median_select_fds to see if it is reducing. If so Squid is
(slowly) recovering after the peak event. Otherwise Squid is falling
behind the workload.

Amos