[squid-users] Will squid core dump with worker threads? Investigating squid crash, 3.5.23

Tue Jan 17 00:44:34 UTC 2017

> -----Original Message-----
> From: squid-users [mailto:squid-users-bounces at lists.squid-cache.org] On
> Behalf Of Amos Jeffries
> Sent: Friday, January 13, 2017 9:37 PM
> To: squid-users at lists.squid-cache.org
> Subject: Re: [squid-users] Will squid core dump with worker threads?
> Investigating squid crash, 3.5.23
> 
> On 14/01/2017 10:32 a.m., Jester Purtteman wrote:
> > Hello,
> >
> >
> >
> > I am having period crashes of my squid server, and I am not getting
> > core dump files.  I have set "workers 6" in my squid.conf, and I know
> > that threads can cause trouble from reading the debugging wiki page.
> > I have confirmed permissions on the directory I'm dumping to, so I
> > don't *think* that is the issue.
> 
> Core dumps are done by your OS when programs crash. You may need to
> turn it on explicitly.
> <http://wiki.squid-cache.org/SquidFaq/BugReporting#Resource_Limits>

Will continue prodding.  I think systemd is doing something funny, because the process isn't getting the same ulimits as I am running it with somehow.  I used to get dumps and I don't anymore, and systemd is one of the big changes between how I have this setup and how it was before.
> 
> >
> > FWIW (core dump to follow, I'll retry without workers and see what
> > happens) I am having squid crashes.  Details I have so far as are as follows:
> >
> >
> >
> > I am running Squid-3.5.23-R14129 on a stock Ubuntu 16.04 configured with:
> >
> >
> >
> > ./configure --prefix=/usr   --localstatedir=/var
> > --libexecdir=/usr/lib/squid    --srcdir=.   --datadir=/usr/share/squid
> > --sysconfdir=/etc/squid   --with-default-user=proxy   --with-logdir=/var/log
> > --with-pidfile=/var/run/squid.pid --enable-linux-netfilter
> > --enable-cache-digests --enable-storeio=ufs,aufs,diskd,rock
> > --enable-async-io=30 --enable-http-violations --enable-zph-qos
> > --with-netfilter-conntrack --with-filedescriptors=65536
> > --with-large-files
> >
> >
> >
> > About once a day it is crashing with the following line as about my
> > only lead in the cache.log:
> >
> >
> >
> > assertion failed: MemBuf.cc:216: "0 <= tailSize && tailSize <= cSize"
> >
> 
> This is <http://bugs.squid-cache.org/show_bug.cgi?id=4606>. We have
> narrowed it down to something about the collapsed revalidation behaviour
> that became visible after the recent security fix.

If I do not use collapsed forwarding, would it be safe to revert 3.5.22?  Crashes are happening roughly daily and I don't really want to put a "babysitter" script in to keep it running if I have better options.  For right now, leaving collapsed forwarding off and not applying that patch seems like a better answer.

> 
> >
> > From the possibly interesting-but-who-knows-maybe-unrelated-files,
> > there is one additional detail.  I had this version running on a
> > Ubuntu 14.04 machine until last week, which I had installed GCC-4.9 on
> > (so I could test squid 4.0), and that had ran stable from December
> > 20th to January 5th without a any crashes.  Then something totally
> > outside of squid went horribly off the rails.  Ubuntu dropped support
> > for the 3.x series kernels, so I updated to
> > 4.4 (from the Ubuntu repositories) and that caused
> > /proc/sys/net/bridge to go away.  While testing an unrelated issue, I
> > ran a script that I adapted from
> > http://wiki.squid-cache.org/ConfigExamples/Intercept/LinuxBridge which
> contains a dangerous couple lines I had not before contemplated:
> >
> >
> > cd /proc/sys/net/bridge/
> > for i in *
> > do
> >    echo 0 > $i
> > done
> > unset i
> >
> > When /proc/sys/net/bridge went away, the change directory failed, then
> > the script proceeded to turn everything in that directory into 0's.
> > OOPS!  I tell this bit so that my fellow admins get a laugh at my
> > expense, and as a cautionary tale.  CHECK the status of that command
> > before you let it do other things!  As it turns out, tproxy works fine
> > without echoing '0' at all those files, but if you want to leave it on
> > the page, may I suggest the following revision to the wiki page:
> >
> 
> Thank you. There is no need for the cd or the * to be without a fixed path. I
> have updated the wiki to prevent this.

Glad my foolish mistakes can make the world better :)

> 
> It it true that TPROXY does not require bridging, and bridging has nothing
> particularly requiring TPROXY. Except that for real transparency they are
> usually both wanted.

I am running a TPROXY system on a bridge, and my script had the snippet in it before.  But, it kept working even after the bridging code was migrated to a module, and it kept working even when the module was not loaded.  So, what effect setting everything in that directory to 0 has is beyond my understanding, and (at least appears) to not be necessary in the latest Ubuntu at least.  So, I guess that is part of my question, is something silently broken that you know of because I am not loading the kernel module for that?

> 
> >
> >
> > In any event, is there a way to get a core with worker threads?  My
> > system benefits from them, so I'd rather not turn them off but I want a
> core dump.
> 
> <http://wiki.squid-cache.org/SquidFaq/BugReporting#Coredump_Location>
> <http://wiki.squid-cache.org/SquidFaq/BugReporting#Resource_Limits>
> 
> workers are not threads. They are processes so each have their own PID
> which the core file should be associated with. Also, only the individual worker
> which crashed will have a core dump generated, then it should be restarted
> automatically by the master process.
> 
> 
> > Also let me know if there are other details that would be useful.
> > Adding much in the way of debugging is going to be a challenge because
> > it takes a day or so to get to a crash, and I don't necessarily have
> > the disk space to hold the volume of cache log generated over that
> > period of time (I have a 60-gb log partition).  If there is some
> > clever way of limiting cache.log and causing it to round-robin or something,
> I'm happy to try things.  Thank you!
> >
> 
> Using the rotate=N option on the debug_options directive will rotate
> cache.log a different (usually smaller) number of times to access.log.
> 
> Or, if you use logrotate you can set it to rotate when the cache.log gets to a
> certain size.
>  see
> <http://stackoverflow.com/questions/20162176/centos-linux-setting-
> logrotate-to-maximum-file-size-for-all-logs>
> 
> Or, you could also setup the cache.log to be a pipe to somewhere else with
> more disk space.

I'll dig in, thanks for the ideas.  It sounds like I don't need to work too hard on debug at this time, because it appears to be a well enough known bug.  But, it would be handy to be able to generate usable debug logs without swamping my system so I'll see what I can come up with.  Thank you again!

--Jester