[squid-users] squid, SMP and authentication and service regression over time

Eugene M. Zheganin emz at norma.perm.ru
Mon May 16 18:27:06 UTC 2016


Hi.

I'm using squid for a long time, I'm using it to authenticate/authorize 
users accessing the Internet with LDAP in a Windows corporate 
enviromnent (Basic/NTLM/GSS-SPNEGO) and recently (about several months 
ago) I had to switch to the SMP scheme, because one process started to 
eat the whole core sometimes, thus bottlenecking users on it. Situation 
with CPU effectiveness improved, however I discovered several issues. 
The first I was aware of, it's the non-functional SNMP (since there's no 
solution, I just had to sacrifice it). But the second one is more 
disturbing. I discovered that after a several uptime (usually couple of 
weeks, a month at it's best) squid somehow degrades and stops 
authorizing users. I have about active 600 users on my biggest site 
(withount SNMP I'm not sure how many simultaneous users I got) but 
usually this starts like this: someone (this starts with one person) 
complains that he lost his access to the internet - not entirely, no. At 
first the access is very slow, and the victim has to wait several 
minutes for the page to load. Others are unaffected at this time. From 
time to time the victim is able to load one of two tabs in the browser, 
eventually, but at the end of the day this becomes unuseable, and my 
support has to come in. Then this gots escalated to me. First I was 
debugging various kerberos stuff, NTLM, victim's machine domain 
membership and so on. But today I managed to figure out that all I have 
to do is just restart squid, yeah (sounds silly, but I don't like to 
restart things, like in the "IT Crowd" TV Series, this is kinda last 
resort measure, when I'm desperate). If I'm stubborn enough to continue 
the investigation, soon I got 2 users complaining, then 3, then more. 
During previous outages eventually I used to restart squid (to change 
the domain controller in kerberos config, if I blame one; to disable the 
external Kerberos/LDAP helper connection pooling, if I blame one) - so 
each time there was a candidate to blame. But this time I just decided 
to restart squid, since I started to think it's the main reason, et 
voila. I should also mention that I run this AAA scheme in squid for 
years, and I didn't have this issue previously. I also have like dozen 
of other squids running same (very similar) config, - same AAA stuff - 
Basic/NTLM/GSS-SpNego, same AD group checking, but only for the 
different groups membership - and none of it has this issue. I'm 
thinking there's SMP involved, really.

I realize this is a poor problem report. "Something degrades, I restart 
squid, please help, I think it's SMP-related". But the thing is - I 
don't know where to start to narrow this stuff. If anyone's having a 
good idea please let me know.

Thanks.
Eugene.


More information about the squid-users mailing list