[squid-users] squid, SMP and authentication and service regression over time
Eugene M. Zheganin
emz at norma.perm.ru
Mon May 16 18:27:06 UTC 2016
Hi.
I'm using squid for a long time, I'm using it to authenticate/authorize
users accessing the Internet with LDAP in a Windows corporate
enviromnent (Basic/NTLM/GSS-SPNEGO) and recently (about several months
ago) I had to switch to the SMP scheme, because one process started to
eat the whole core sometimes, thus bottlenecking users on it. Situation
with CPU effectiveness improved, however I discovered several issues.
The first I was aware of, it's the non-functional SNMP (since there's no
solution, I just had to sacrifice it). But the second one is more
disturbing. I discovered that after a several uptime (usually couple of
weeks, a month at it's best) squid somehow degrades and stops
authorizing users. I have about active 600 users on my biggest site
(withount SNMP I'm not sure how many simultaneous users I got) but
usually this starts like this: someone (this starts with one person)
complains that he lost his access to the internet - not entirely, no. At
first the access is very slow, and the victim has to wait several
minutes for the page to load. Others are unaffected at this time. From
time to time the victim is able to load one of two tabs in the browser,
eventually, but at the end of the day this becomes unuseable, and my
support has to come in. Then this gots escalated to me. First I was
debugging various kerberos stuff, NTLM, victim's machine domain
membership and so on. But today I managed to figure out that all I have
to do is just restart squid, yeah (sounds silly, but I don't like to
restart things, like in the "IT Crowd" TV Series, this is kinda last
resort measure, when I'm desperate). If I'm stubborn enough to continue
the investigation, soon I got 2 users complaining, then 3, then more.
During previous outages eventually I used to restart squid (to change
the domain controller in kerberos config, if I blame one; to disable the
external Kerberos/LDAP helper connection pooling, if I blame one) - so
each time there was a candidate to blame. But this time I just decided
to restart squid, since I started to think it's the main reason, et
voila. I should also mention that I run this AAA scheme in squid for
years, and I didn't have this issue previously. I also have like dozen
of other squids running same (very similar) config, - same AAA stuff -
Basic/NTLM/GSS-SpNego, same AD group checking, but only for the
different groups membership - and none of it has this issue. I'm
thinking there's SMP involved, really.
I realize this is a poor problem report. "Something degrades, I restart
squid, please help, I think it's SMP-related". But the thing is - I
don't know where to start to narrow this stuff. If anyone's having a
good idea please let me know.
Thanks.
Eugene.
More information about the squid-users
mailing list