[squid-users] tuning squid memory (aka avoiding the reaper)

Thu Sep 28 20:19:54 UTC 2017

Ok, so did some research and what I'm finding is that:

If I set sslflags=NO_DEFAULT_CA for http_port and disable both mem and
disk cache then memory is very stable.  It goes up for a little bit
and then pretty much stabilizes (it actually goes up and down a
little, but doesn't seem to be growing or trending up).

I then enabled memory cache (10GB worth) and ran that for a while.  As
the cache filled, memory usage obviously went up.  Once the cache
filled, memory usage continued to increase, but at a slower rate.
Unlike before, it doesn't seem to stabilize.  I'm seeing memory usage
increase in top (virtual, resident & shared) as well as in mgr:info's
"Total accounted" line.  It's not growing as fast before when I didn't
have the sslflags option, but it is growing.

What other information would be useful to debug this?

--
Aaron Turner
https://synfin.net/         Twitter: @synfinatic
My father once told me that respect for the truth comes close to being
the basis for all morality.  "Something cannot emerge from nothing,"
he said.  This is profound thinking if you understand how unstable
"the truth" can be.  -- Frank Herbert, Dune

On Mon, Sep 25, 2017 at 8:26 PM, Alex Rousskov
<rousskov at measurement-factory.com> wrote:
> On 09/25/2017 05:23 PM, Aaron Turner wrote:
>> So I'm testing squid 3.5.26 on an m3.xlarge w/ 14GB of RAM.  Squid is
>> the only "real" service running (sshd and the like).  I'm running 4
>> workers, and 2 rock cache.  The workers seem to be growing unbounded
>> and given ~30min or so will cause the kernel to start killing off
>> processes until memory is freed.  Yes, my clients (32 of them) are
>> hitting this at about 250 URL's/min which doesn't seem that crazy, but
>> ¯\_(ツ)_/¯
>>
>> cache_mem 1 GB resulted in workers exceeding 4GB resident.  So I tried
>> 500 MB, same problem.  Now I'm down to 250 MB and I'm still seeing
>> workers using 3-4GB of RAM after a few minutes and still growing
>
> It is not the Squid memory cache that consumes your RAM, apparently.
>
>
>> the docs indicate I should expect total memory to be roughly 3x cache_mem.
>
> ... which is an absurd formula for those using disk caches: Roughly
> speaking, most large busy Squids spend most of their RAM on
>
> * memory cache,
> * disk cache indexes,
> * SSL-related caches, and
> * in-flight transactions.
>
> Only one of those 4 components is proportional to cache_mem, with a
> coefficient closer to 1 than to 3.
>
>
>> mgr:info reports:
>
> Thank you for posting this useful info. When you are using disk caching,
> please also include the mgr:storedir report.
>
>
>> I'm trying to figure out why and how to fix.
>
> I recommend disabling all caching (memory and disk) and SslBump (if any)
> to establish a baseline first. If everything looks stable and peachy for
> a few hours, record/store the baseline measurements, and add one new
> memory consumer (e.g., the memory cache). Ideally, this testing should
> be done in a lab rather than on real users, but YMMV.
>
>
>> One thing I've read about the cache_mem knob is:
>>
>> "If circumstances require, this limit will be exceeded.
>>
>> Specifically, if your incoming request rate requires more than
>> 'cache_mem' of memory to hold in-transit objects, Squid will
>> exceed this limit to satisfy the new requests.  When the load
>> decreases, blocks will be freed until the high-water mark is
>> reached.  Thereafter, blocks will be used to store hot
>> objects."
>
> The above is more-or-less accurate, but please note that in-transit
> objects do not usually eat memory cache RAM in SMP mode. It is usually
> best to think of in-flight transactions as a distinct SMP memory
> consumer IMO.
>
>
>> Not sure if this is the cause of my problem?
>
> It could be -- it is difficult for me to say by looking at one random
> mgr:info snapshot. If I have to guess based on that snapshot alone, then
> my answer would be "no" because you have less than 4K concurrent
> transactions and transaction response times are low. Hopefully somebody
> else on the list can tell you more.
>
>
>
>> The FAQ says try a different malloc, so tried recompiling with
>> --enable-dlmalloc, but that had no impact.
>
> Do not bother unless your deployment environment is very unusual. This
> hint was helpful 20 years ago, but is rarely relevant these days AFAIK.
> See above for a different attack plan.
>
>
> HTH,
>
> Alex.