[squid-users] Will squid core dump with worker threads? Investigating squid crash, 3.5.23

Sat Jan 14 05:37:11 UTC 2017

On 14/01/2017 10:32 a.m., Jester Purtteman wrote:
> Hello,
> 
>  
> 
> I am having period crashes of my squid server, and I am not getting core
> dump files.  I have set "workers 6" in my squid.conf, and I know that
> threads can cause trouble from reading the debugging wiki page.  I have
> confirmed permissions on the directory I'm dumping to, so I don't *think*
> that is the issue.

Core dumps are done by your OS when programs crash. You may need to turn
it on explicitly.
<http://wiki.squid-cache.org/SquidFaq/BugReporting#Resource_Limits>

> 
> FWIW (core dump to follow, I'll retry without workers and see what happens)
> I am having squid crashes.  Details I have so far as are as follows:
> 
>  
> 
> I am running Squid-3.5.23-R14129 on a stock Ubuntu 16.04 configured with:
> 
>  
> 
> ./configure --prefix=/usr   --localstatedir=/var
> --libexecdir=/usr/lib/squid    --srcdir=.   --datadir=/usr/share/squid
> --sysconfdir=/etc/squid   --with-default-user=proxy   --with-logdir=/var/log
> --with-pidfile=/var/run/squid.pid --enable-linux-netfilter
> --enable-cache-digests --enable-storeio=ufs,aufs,diskd,rock
> --enable-async-io=30 --enable-http-violations --enable-zph-qos
> --with-netfilter-conntrack --with-filedescriptors=65536 --with-large-files
> 
>  
> 
> About once a day it is crashing with the following line as about my only
> lead in the cache.log:
> 
>  
> 
> assertion failed: MemBuf.cc:216: "0 <= tailSize && tailSize <= cSize"
> 

This is <http://bugs.squid-cache.org/show_bug.cgi?id=4606>. We have
narrowed it down to something about the collapsed revalidation behaviour
that became visible after the recent security fix.

> 
> From the possibly interesting-but-who-knows-maybe-unrelated-files, there is
> one additional detail.  I had this version running on a Ubuntu 14.04 machine
> until last week, which I had installed GCC-4.9 on (so I could test squid
> 4.0), and that had ran stable from December 20th to January 5th without a
> any crashes.  Then something totally outside of squid went horribly off the
> rails.  Ubuntu dropped support for the 3.x series kernels, so I updated to
> 4.4 (from the Ubuntu repositories) and that caused /proc/sys/net/bridge to
> go away.  While testing an unrelated issue, I ran a script that I adapted
> from http://wiki.squid-cache.org/ConfigExamples/Intercept/LinuxBridge which
> contains a dangerous couple lines I had not before contemplated:
> 
> 
> cd /proc/sys/net/bridge/
> for i in *
> do
>    echo 0 > $i
> done
> unset i
> 
> When /proc/sys/net/bridge went away, the change directory failed, then the
> script proceeded to turn everything in that directory into 0's.  OOPS!  I
> tell this bit so that my fellow admins get a laugh at my expense, and as a
> cautionary tale.  CHECK the status of that command before you let it do
> other things!  As it turns out, tproxy works fine without echoing '0' at all
> those files, but if you want to leave it on the page, may I suggest the
> following revision to the wiki page:
> 

Thank you. There is no need for the cd or the * to be without a fixed
path. I have updated the wiki to prevent this.

It it true that TPROXY does not require bridging, and bridging has
nothing particularly requiring TPROXY. Except that for real transparency
they are usually both wanted.

>  
> 
> In any event, is there a way to get a core with worker threads?  My system
> benefits from them, so I'd rather not turn them off but I want a core dump.

<http://wiki.squid-cache.org/SquidFaq/BugReporting#Coredump_Location>
<http://wiki.squid-cache.org/SquidFaq/BugReporting#Resource_Limits>

workers are not threads. They are processes so each have their own PID
which the core file should be associated with. Also, only the individual
worker which crashed will have a core dump generated, then it should be
restarted automatically by the master process.

> Also let me know if there are other details that would be useful.  Adding
> much in the way of debugging is going to be a challenge because it takes a
> day or so to get to a crash, and I don't necessarily have the disk space to
> hold the volume of cache log generated over that period of time (I have a
> 60-gb log partition).  If there is some clever way of limiting cache.log and
> causing it to round-robin or something, I'm happy to try things.  Thank you!
> 

Using the rotate=N option on the debug_options directive will rotate
cache.log a different (usually smaller) number of times to access.log.

Or, if you use logrotate you can set it to rotate when the cache.log
gets to a certain size.
 see
<http://stackoverflow.com/questions/20162176/centos-linux-setting-logrotate-to-maximum-file-size-for-all-logs>

Or, you could also setup the cache.log to be a pipe to somewhere else
with more disk space.

Amos