[squid-users] Socket handle leak?

paolo.prinx at gmail.com paolo.prinx at gmail.com
Fri Jul 12 10:58:08 UTC 2024


Hello,   apologies in advance for the silly question.
We are having some stability issues with our squid farms after a recent upgrade from Centos/Squid 3.5.x to Ubuntu/Squid 5.7/6.9. I wonder if anyone here has seen something similar, and might have some suggestion about what we are obviously missing?

In short, after running for a certain period the servers run out of file descriptors. We see a slowly growing number of TCP or TCPv6 socket handles, that eventually hits the configured maximum. The handles do not get released until after squid is restarted (-k restart)

It is somewhat similar to what reported under https://access.redhat.com/solutions/3362211 . They state that     
   - If an application fails to close() it's socket descriptors and continues to allocate new sockets then it can use up all the system memory on TCP(v6) slab objects.
   - Note some of these sockets will not show up in /proc/net/sockstat(6). Sockets that still have a file descriptor but are in the TCP_CLOSE state will consume a slab object. But will not be accounted for in /proc/net/sockstat(6) or "ss" or "netstat".
   - It can be determined whether this is an application sockets leak, by stopping the application processes that are consuming sockets. If the slab objects in /proc/slabinfo are freed then the application is responsible. As that means that destructor routines have found open file descriptors to sockets in the process.

"This is most likely to be a case of the application not handling error conditions correctly and not calling close() to free the FD and socket."


For example, on a server with squid 5.7, unmodified package:

list of open files;
lsof |wc -l56963
 
of which 35K in TCPv6:
lsof |grep proxy |grep TCPv6 |wc -l
    35301
under /proc I see less objects
    cat  /proc/net/tcp6 |wc -l
    3095
but the number of objects in the slabs is high    cat /proc/slabinfo |grep TCPv6     MPTCPv6                0      0   2048   16    8 : tunables    0    0    0 : slabdata      0      0      0    tw_sock_TCPv6       1155   1155    248   33    2 : tunables    0    0    0 : slabdata     35     35      0    request_sock_TCPv6      0      0    304   26    2 : tunables    0    0    0 : slabdata      0      0      0    TCPv6              38519  38519   2432   13    8 : tunables    0    0    0 : slabdata   2963   2963      0
I have 35K of lines like this    lsof |grep proxy |grep TCPv6 |more    squid        1049              proxy   13u     sock                0,8        0t0    5428173 protocol: TCPv6    squid        1049              proxy   14u     sock                0,8        0t0   27941608 protocol: TCPv6    squid        1049              proxy   24u     sock                0,8        0t0   45124047 protocol: TCPv6    squid        1049              proxy   25u     sock                0,8        0t0   50689821 protocol: TCPv6...

We thought maybe this is a weird IPv6 thing, as we only route IPv4, so we compiled a more recent version of squid with no v6 support. The thing just moved to TCP4..
lsof |wc -l120313
cat /proc/slabinfo |grep TCPMPTCPv6                0      0   2048   16    8 : tunables    0    0    0 : slabdata      0      0      0tw_sock_TCPv6          0      0    248   33    2 : tunables    0    0    0 : slabdata      0      0      0request_sock_TCPv6      0      0    304   26    2 : tunables    0    0    0 : slabdata      0      0      0TCPv6                208    208   2432   13    8 : tunables    0    0    0 : slabdata     16     16      0MPTCP                  0      0   1856   17    8 : tunables    0    0    0 : slabdata      0      0      0tw_sock_TCP         5577   5577    248   33    2 : tunables    0    0    0 : slabdata    169    169      0request_sock_TCP    1898   2002    304   26    2 : tunables    0    0    0 : slabdata     77     77      0TCP               102452 113274   2240   14    8 : tunables    0    0    0 : slabdata   8091   8091      0

cat /proc/net/tcp |wc -l255
After restarting squid the slab objects are released and the open file descriptors drop to a reasonable value. This further suggests it is squid hanging on to these FDs.

lsof |grep proxy |wc -l1221

Any suggestion? I guess it's something blatantly obvious, but it's a couple of days we look at this and we're not going anywhere...
Thanks again

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20240712/185d00e9/attachment-0001.htm>


More information about the squid-users mailing list