[squid-users] SQUID memory error after vm.swappines changed from 60 to 10

Mon Nov 13 09:34:32 UTC 2017

On Fri, Nov 10, 2017 at 4:43 PM, Alex Rousskov
<rousskov at measurement-factory.com> wrote:
> On 11/10/2017 06:16 AM, Bike dernikov1 wrote:
>
>> So you suggest that we totally disable disk swap (or just for
>> debuging) ?
>
> I would aim for totally disabling disk swap, but getting to that
> configuration is easier if you keep swap enabled and consider any
> non-trivial swap use as a problem that you need to fix. After all
> non-trivial swap uses are eliminated, it does not really matter much
> whether you keep swap enabled or not!

That is also my goal. I have other scenarios in mind.
We have free older servers  IBM X3550 (8 cores), which can be upgrade
to 48GB ram, (from other old servers)
On new ones,we have only 24GB on primary, and 16GB ram on  second server.
For disabling swap, X3550  servers would be better, but they have only 8 cores.
That could be problem, but for now we have only 0.5 load on 12/HT 24  cores.
I know that core per core new cpu-s can process  4x-8x more, so more testing :)

> Please note that this suggestion is specific to performance-sensitive
> Squid servers -- many other servers have very legitimate reasons to use
> swap. YMMV.
>
Oracle can be good example.
Yes, we have one Oracle server, swap killed him, users was wild, load
average over 10 continuosly (8 core), 16GB mem.
We upgraded ram to 48GB. Now, max load is at 4-5 in peeks, 1-2  during
day. Swap enabled :) it use 9GB, but use all ram :).
Users don't call after upgrade.

>
>> That setting on production can be disaster.
>
> Squid swapping in production is an arguably worse disaster, as you have
> learned. In many cases, it is better to deal with a lack of swap than to
> rely on swap's magical effects that most humans poorly understand. YMMV.

In this scenario, swap is backup cache (as I understand) ? As TIER  in  SAN-s.
Swap could be used  to translate back data to mem if used, but it
stays on disk and purge after some time if not used ?
Or I am in delusion ?

>> We will try, we reduced number of auth helper, prolong ldap caching.
>> Little steps but at last every bit counts.

We reduced kerberos helpers to 100 on Friday, 20 of those 100
currently active, 13 of those have requests/replies different than
zero.
Now reduced to 50. No problem appear over weekend.
cache_mem 14G
swappines = 60
ldap cache set to 24h

Mem stats:

             total        used        free      shared  buff/cache   available
Mem:          24101       16032        1488         156        6580        7497
Swap:         24561        1907       22654

> ... unless those bits are actually hurting. You may need to understand
> your server memory usage better to reduce the time spent on trying
> random changes. For example, if auth helpers are close to being
> overloaded, reducing their number may make things worse.

No overload after changes. When we saw that we don't use so much
helpers it was logicaly to reduce that  number in little steps.
I agree, random testing can be painfully, on production even more.
Can you recommend best way to analyze memory, (except free -m, top/htop).
Squid-internal-mgr/mem have nice detail, i will start there with. Do
you have better way ?

> Alex.

Thanks for help,