[squid-users] Socket handle leak?
Alex Rousskov
rousskov at measurement-factory.com
Fri Jul 12 13:42:53 UTC 2024
On 2024-07-12 06:58, paolo.prinx at gmail.com wrote:
> We are having some stability issues with our squid farms after a recent
> upgrade from Centos/Squid 3.5.x to Ubuntu/Squid 5.7/6.9.
> In short, after running for a certain period the servers run out of file
> descriptors. We see a slowly growing number of TCP or TCPv6 socket
> handles
Assuming that your Squids are not under an ever-increasing load, what
you describe sounds like a Squid bug. I do not see any obviously related
fixes in the latest official code, so it is possible that this bug is
unknown to Squid developers and is still present in v6+. I recommend the
following steps:
1. Forget about Squid v5. Aim to upgrade to Squid v6.
2. Collect a few mgr:filedescriptors cache manager snapshots from a
problematic Squid in hope to discover a common theme among leaked
descriptors metadata. Share your findings (and/or a pointer to
compressed snapshots).
3. Check cache.log for frequent (or at least persistent) ERROR and
WARNING messages and report your findings.
4. Does your Squid grow its resident memory usage as well? Descriptor
leaks are often (but not always!) accompanied by memory leaks. The
latter are sometimes easier to pinpoint. If (and only if) your Squid is
leaking a lot of memory, then collect a few dozen mgr:mem snapshots
(e.g., one every busy hour) and share a pointer to a compressed snapshot
archive for analysis by Squid developers. There is at least one v6
memory leak fixed in master/v7 (Bug 5322), but, hopefully, you are not
suffering from that memory leak (otherwise the noise from that leak may
obscure what we are looking for).
You may continue this triage on this mailing list or file a bug report
at https://bugs.squid-cache.org/enter_bug.cgi?product=Squid
Thank you,
Alex.
> It is somewhat similar to what reported under
> https://access.redhat.com/solutions/3362211
> <https://access.redhat.com/solutions/3362211> . They state that
>
> * /If an application fails to |close()| it's socket descriptors and
> continues to allocate new sockets then it can use up all the system
> memory on TCP(v6) slab objects./
> * /Note some of these sockets will not show up in
> |/proc/net/sockstat(6)|. Sockets that still have a file descriptor
> but are in the |TCP_CLOSE| state will consume a slab object. But
> will not be accounted for in |/proc/net/sockstat(6)| or "ss" or
> "netstat"./
> * It can be determined whether this is an application sockets leak, by
> stopping the application processes that are consuming sockets. If
> the slab objects in |/proc/slabinfo| are freed then the application
> is responsible. As that means that destructor routines have found
> open file descriptors to sockets in the process.
>
> /
> /
> /"/This is most likely to be a case of the application not handling
> error conditions correctly and not calling |close()| to free the FD and
> socket."/
> /
>
>
> For example, on a server with squid 5.7, unmodified package:
>
> list of open files;
>
> lsof |wc -l
> 56963
>
>
> of which 35K in TCPv6:
>
> lsof |grep proxy |grep TCPv6 |wc -l
>
> 35301
>
> under /proc I see less objects
> cat /proc/net/tcp6 |wc -l
> 3095
>
> but the number of objects in the slabs is high
> cat /proc/slabinfo |grep TCPv6
> MPTCPv6 0 0 2048 16 8 : tunables 0
> 0 0 : slabdata 0 0 0
> tw_sock_TCPv6 1155 1155 248 33 2 : tunables 0
> 0 0 : slabdata 35 35 0
> request_sock_TCPv6 0 0 304 26 2 : tunables 0
> 0 0 : slabdata 0 0 0
> TCPv6 *38519 38519* 2432 13 8 : tunables 0 0 0 :
> slabdata 2963 2963 0
>
> I have 35K of lines like this
> lsof |grep proxy |grep TCPv6 |more
> squid 1049 proxy 13u sock
> 0,8 0t0 5428173 protocol: TCPv6
> squid 1049 proxy 14u sock
> 0,8 0t0 27941608 protocol: TCPv6
> squid 1049 proxy 24u sock
> 0,8 0t0 45124047 protocol: TCPv6
> squid 1049 proxy 25u sock
> 0,8 0t0 50689821 protocol: TCPv6
> ...
>
>
> We thought maybe this is a weird IPv6 thing, as we only route IPv4, so
> we compiled a more recent version of squid with no v6 support. The thing
> just moved to TCP4..
>
> lsof |wc -l
> 120313
>
> cat /proc/slabinfo |grep TCP
> MPTCPv6 0 0 2048 16 8 : tunables 0 0
> 0 : slabdata 0 0 0
> tw_sock_TCPv6 0 0 248 33 2 : tunables 0 0
> 0 : slabdata 0 0 0
> request_sock_TCPv6 0 0 304 26 2 : tunables 0 0
> 0 : slabdata 0 0 0
> TCPv6 208 208 2432 13 8 : tunables 0 0
> 0 : slabdata 16 16 0
> MPTCP 0 0 1856 17 8 : tunables 0 0
> 0 : slabdata 0 0 0
> tw_sock_TCP 5577 5577 248 33 2 : tunables 0 0
> 0 : slabdata 169 169 0
> request_sock_TCP 1898 2002 304 26 2 : tunables 0 0
> 0 : slabdata 77 77 0
> TCP *102452 113274 * 2240 14 8 : tunables 0 0 0 :
> slabdata 8091 8091 0
>
>
> cat /proc/net/tcp |wc -l
> 255
>
> After restarting squid the slab objects are released and the open file
> descriptors drop to a reasonable value. This further suggests it is
> squid hanging on to these FDs.
>
> lsof |grep proxy |wc -l
> 1221
>
>
> Any suggestion? I guess it's something blatantly obvious, but it's a
> couple of days we look at this and we're not going anywhere...
>
> Thanks again
>
>
>
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> https://lists.squid-cache.org/listinfo/squid-users
More information about the squid-users
mailing list