[squid-users] Slowly rising CPU load (eventually hits 100)

Thu Mar 31 16:44:03 UTC 2016

On 03/31/2016 07:53 AM, squid at peralex.com wrote:

> Every week or so I run into a problem where squid's CPU usage starts
> growing slowly, reaching 100% over the course of a day or so.  When
> running normally its CPU usage is usually less than 5%.  Restarting
> squid fixes the problem.

My working theory is that the longer you let your Squid run, the bigger
objects it might store in RAM, increasing the severity of the linear
search delays mentioned below. A similar pattern may also be caused by
larger objects becoming more popular during certain days of the week.

> Attaching GDB and getting a stack trace while squid is stuck at 100%
> generally gives me this:
> 
> #0  0x00000000005deef4 in mem_node::end ()
> #1  0x00000000005df076 in mem_node::dataRange ()
> #2  0x0000000000625d34 in mem_hdr::NodeCompare ()
> #3  0x0000000000628ad1 in SplayNode<mem_node*>::splay<mem_node*> ()
> #4  0x0000000000628b85 in Splay<mem_node*>::find<mem_node*> ()
> #5  0x0000000000625f8e in mem_hdr::getBlockContainingLocation ()
> #6  0x0000000000625ff8 in mem_hdr::hasContigousContentRange ()
> #7  0x00000000005e00fe in MemObject::isContiguous ()
> #8  0x0000000000649d05 in StoreEntry::mayStartSwapOut ()

IIRC, this is a known linear search in Squid local memory code that does
not scale with object sizes. It has been discovered a year or more ago,
but I am not aware of anybody working to optimize it since then.

In summary, dealing with in-RAM objects significantly larger than 1MB
may slow Squid down to a crawl with 100% CPU usage as the symptom and
backtraces pointing to getBlockContainingLocation() or similar code. The
bigger the object, the longer Squid takes to scan its nodes.

Unfortunately, I do not remember whether this affects just cached or
both cached and in-transit objects.

> Does anybody have any suggestions on how to fix/improve this?  

Short term, try limiting the size of in-RAM objects using
maximum_object_size_in_memory first. If that solves the problem, then,
most likely, only cached objects are affected.

Also, forcing shared memory cache (even if you are not using SMP Squid)
might help, but shared memory cache does not cache Varying objects so I
hesitate recommending that as a solution for non-SMP Squids. Also, I am
not sure whether shared memory cache avoids this bug because, while the
shared memory code itself does not have the above linear search, SMP
Squid still uses local memory code and might hit the same linear search.

> Should I file a bug?

Yes, of course. It is a [serious] bug. If it were filed long time ago,
we would have more information about it (at least) and would have to
guess less. Please point to or copy to this email exchange to your bug
report.

Thank you,

Alex.