[squid-users] SQUID memory error after vm.swappines changed from 60 to 10

Fri Nov 17 19:03:55 UTC 2017

On Fri, Nov 17, 2017 at 3:53 AM, Amos Jeffries <squid3 at treenet.co.nz> wrote:
> On 17/11/17 03:49, Bike dernikov1 wrote:
>>
>> On Thu, Nov 16, 2017 at 8:58 AM, Amos Jeffries wrote:
>>>
>>> On 16/11/17 01:32, Bike dernikov1 wrote:
>>>>
>>>>
>>>>
>>>> If i can ask under same title:
>>>> Yesterday we had error in logs: syslog, cache.log, dmesg,access.log
>>>>
>>>> segfault at 8 ip ....... sp ..... error 4 is squid
>>>> process pid exited due to signal 11 with status 0
>>>>
>>>> Squid restarted,  that was at the end of work, and i didn't  notice
>>>> change while surfing.
>>>> I noticed change in used memory, after i went trough logs, and found
>>>> segfault.
>>>>
>>>> Can you point me, how to analyze what happened.
>>>> Can that be problem with kernel ?
>>>>
>>>
>>> How to retrieve info about these type of things is detailed at
>>> <https://wiki.squid-cache.org/SquidFaq/BugReporting>.
>>
>>
>> I wasn't sure it is bug, so i didn't want to post it that is a  bug.
>> As you now confirm that it can be bug i will prepare for retriving
>> infos.
>> I just hope that bug won't  happen at high  load in middle of working day.
>>
>
> The how-to are just on that page because if you are reporting that kind of
> bug those details are mandatory. You dont have to be reporting a bug to use
> the techniques.
>
> That said, segfault is almost always a bug. Though it could be a bug in the
> system environment or hardware rather than Squid. The details you get from
> looking at the traces should indicate whether those are actual or not.

In the begining, we had many crashes, and we thought that we have hardware bug.
We had two different servers, Fujitsu RX600 and X3550M3.  We was
testing Squid  on Centos and Debian.
Debian won because of new squidguard version on  which work
authorization with ldap.
First upgrade to Debian 9 (stable) crashed installation on Fujitsu. It
couldn't boot with new kernel.
Same Debian worked on IBM X3550M3. So it was a nightmare for testing.
We returned to stable kernel, and problems disappeared until now.
Although only one segfault so far in 3 days.

>>
>>> NP: If you do not have core files enabled, then the data from that
>>> segfault
>>> is probably gone irretrievably. You may need to use the script to capture
>>> segfault details from a running proxy (the 'minimal downtime' section).
>>
>>
>> I am sure that i didn't enabled it.
>>
>
> Okay, then you will need to for further diagnosis.

>From  Monday we will start with reconfiguration. Each day new problem.
Migration slowed to stop :(

Today we had different problem (with exhausted inodes). Logs exploded,
with no space on disk errors (disk on 60% free)
Luckily, we found   what  was problem (sarg and scripted  generated
reports) under 5 minutes.
We lost half  day for rewriting scripts. I hope that we solve that
problem  for good :).

> Amos

We couldn't done it without you help.
Thanks a lot.