[squid-users] Will squid core dump with worker threads? Investigating squid crash, 3.5.23

Jester Purtteman jester at optimera.us
Fri Jan 13 21:32:41 UTC 2017


Hello,

 

I am having period crashes of my squid server, and I am not getting core
dump files.  I have set "workers 6" in my squid.conf, and I know that
threads can cause trouble from reading the debugging wiki page.  I have
confirmed permissions on the directory I'm dumping to, so I don't *think*
that is the issue.

 

FWIW (core dump to follow, I'll retry without workers and see what happens)
I am having squid crashes.  Details I have so far as are as follows:

 

I am running Squid-3.5.23-R14129 on a stock Ubuntu 16.04 configured with:

 

./configure --prefix=/usr   --localstatedir=/var
--libexecdir=/usr/lib/squid    --srcdir=.   --datadir=/usr/share/squid
--sysconfdir=/etc/squid   --with-default-user=proxy   --with-logdir=/var/log
--with-pidfile=/var/run/squid.pid --enable-linux-netfilter
--enable-cache-digests --enable-storeio=ufs,aufs,diskd,rock
--enable-async-io=30 --enable-http-violations --enable-zph-qos
--with-netfilter-conntrack --with-filedescriptors=65536 --with-large-files

 

About once a day it is crashing with the following line as about my only
lead in the cache.log:

 

assertion failed: MemBuf.cc:216: "0 <= tailSize && tailSize <= cSize"

 

>From the possibly interesting-but-who-knows-maybe-unrelated-files, there is
one additional detail.  I had this version running on a Ubuntu 14.04 machine
until last week, which I had installed GCC-4.9 on (so I could test squid
4.0), and that had ran stable from December 20th to January 5th without a
any crashes.  Then something totally outside of squid went horribly off the
rails.  Ubuntu dropped support for the 3.x series kernels, so I updated to
4.4 (from the Ubuntu repositories) and that caused /proc/sys/net/bridge to
go away.  While testing an unrelated issue, I ran a script that I adapted
from http://wiki.squid-cache.org/ConfigExamples/Intercept/LinuxBridge which
contains a dangerous couple lines I had not before contemplated:

 

cd /proc/sys/net/bridge/

for i in *

do

   echo 0 > $i

done

unset i

 

When /proc/sys/net/bridge went away, the change directory failed, then the
script proceeded to turn everything in that directory into 0's.  OOPS!  I
tell this bit so that my fellow admins get a laugh at my expense, and as a
cautionary tale.  CHECK the status of that command before you let it do
other things!  As it turns out, tproxy works fine without echoing '0' at all
those files, but if you want to leave it on the page, may I suggest the
following revision to the wiki page:

 

#!/bin/bash

cd /proc/sys/net/bridge

if [ $? -eq 0 ]

then

for i in *

do

  echo 0 > $i

done

unset i

else

echo "WARNING! /proc/sys/net/bridge does not exist, you can 'sudo modprobe
br_netfilter' to get it, but you may not need it"

fi

 

That just checks whether the changedir worked, and if it didn't it issues a
warning instead of cooking all the files in your current directory, which is
nice!

 

Anyway, after that happened, for reasons completely unknown, but I suspect
related to bridging, the machine that had been my squid server completely
seized, so I installed Ubuntu 16, and have since run into this crash.  Once
I have a core dump, I'll post it and my configuration, which is pretty stock
with the exception of a couple ACLs and "workers 6".  I am wondering if the
move from gcc 4.9 to gcc 5.4 (the stock gcc in Ubuntu 16) may be a culprit,
might recompile with the downgrade and see if the issue resolves.  

 

In any event, is there a way to get a core with worker threads?  My system
benefits from them, so I'd rather not turn them off but I want a core dump.
Also let me know if there are other details that would be useful.  Adding
much in the way of debugging is going to be a challenge because it takes a
day or so to get to a crash, and I don't necessarily have the disk space to
hold the volume of cache log generated over that period of time (I have a
60-gb log partition).  If there is some clever way of limiting cache.log and
causing it to round-robin or something, I'm happy to try things.  Thank you!

 

Jester Purtteman

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20170113/e8723072/attachment-0001.html>


More information about the squid-users mailing list