[squid-users] Socket handle leak?

Fri Jul 12 13:42:53 UTC 2024

On 2024-07-12 06:58, paolo.prinx at gmail.com wrote:

> We are having some stability issues with our squid farms after a recent 
> upgrade from Centos/Squid 3.5.x to Ubuntu/Squid 5.7/6.9.

> In short, after running for a certain period the servers run out of file 
> descriptors. We see a slowly growing number of TCP or TCPv6 socket 
> handles

Assuming that your Squids are not under an ever-increasing load, what 
you describe sounds like a Squid bug. I do not see any obviously related 
fixes in the latest official code, so it is possible that this bug is 
unknown to Squid developers and is still present in v6+. I recommend the 
following steps:

1. Forget about Squid v5. Aim to upgrade to Squid v6.

2. Collect a few mgr:filedescriptors cache manager snapshots from a 
problematic Squid in hope to discover a common theme among leaked 
descriptors metadata. Share your findings (and/or a pointer to 
compressed snapshots).

3. Check cache.log for frequent (or at least persistent) ERROR and 
WARNING messages and report your findings.

4. Does your Squid grow its resident memory usage as well? Descriptor 
leaks are often (but not always!) accompanied by memory leaks. The 
latter are sometimes easier to pinpoint. If (and only if) your Squid is 
leaking a lot of memory, then collect a few dozen mgr:mem snapshots 
(e.g., one every busy hour) and share a pointer to a compressed snapshot 
  archive for analysis by Squid developers. There is at least one v6 
memory leak fixed in master/v7 (Bug 5322), but, hopefully, you are not 
suffering from that memory leak (otherwise the noise from that leak may 
obscure what we are looking for).

You may continue this triage on this mailing list or file a bug report 
at https://bugs.squid-cache.org/enter_bug.cgi?product=Squid

Thank you,

Alex.

> It is somewhat similar to what reported under 
> https://access.redhat.com/solutions/3362211 
> <https://access.redhat.com/solutions/3362211> . They state that
> 
>   * /If an application fails to |close()| it's socket descriptors and
>     continues to allocate new sockets then it can use up all the system
>     memory on TCP(v6) slab objects./
>   * /Note some of these sockets will not show up in
>     |/proc/net/sockstat(6)|. Sockets that still have a file descriptor
>     but are in the |TCP_CLOSE| state will consume a slab object. But
>     will not be accounted for in |/proc/net/sockstat(6)| or "ss" or
>     "netstat"./
>   * It can be determined whether this is an application sockets leak, by
>     stopping the application processes that are consuming sockets. If
>     the slab objects in |/proc/slabinfo| are freed then the application
>     is responsible. As that means that destructor routines have found
>     open file descriptors to sockets in the process.
> 
> /
> /
> /"/This is most likely to be a case of the application not handling 
> error conditions correctly and not calling |close()| to free the FD and 
> socket."/
> /
> 
> 
> For example, on a server with squid 5.7, unmodified package:
> 
> list of open files;
> 
>     lsof |wc -l
>     56963
> 
> 
> of which 35K in TCPv6:
> 
>     lsof |grep proxy |grep TCPv6 |wc -l
> 
>      35301
> 
> under /proc I see less objects
>      cat  /proc/net/tcp6 |wc -l
>      3095
> 
> but the number of objects in the slabs is high
>      cat /proc/slabinfo |grep TCPv6
>      MPTCPv6                0      0   2048   16    8 : tunables    0    
> 0    0 : slabdata      0      0      0
>      tw_sock_TCPv6       1155   1155    248   33    2 : tunables    0    
> 0    0 : slabdata     35     35      0
>      request_sock_TCPv6      0      0    304   26    2 : tunables    0  
>    0    0 : slabdata      0      0      0
>      TCPv6 *38519  38519*   2432   13    8 : tunables    0    0    0 : 
> slabdata   2963   2963      0
> 
> I have 35K of lines like this
>      lsof |grep proxy |grep TCPv6 |more
>      squid        1049              proxy   13u     sock                
> 0,8        0t0    5428173 protocol: TCPv6
>      squid        1049              proxy   14u     sock                
> 0,8        0t0   27941608 protocol: TCPv6
>      squid        1049              proxy   24u     sock                
> 0,8        0t0   45124047 protocol: TCPv6
>      squid        1049              proxy   25u     sock                
> 0,8        0t0   50689821 protocol: TCPv6
> ...
> 
> 
> We thought maybe this is a weird IPv6 thing, as we only route IPv4, so 
> we compiled a more recent version of squid with no v6 support. The thing 
> just moved to TCP4..
> 
> lsof |wc -l
> 120313
> 
> cat /proc/slabinfo |grep TCP
> MPTCPv6                0      0   2048   16    8 : tunables    0    0    
> 0 : slabdata      0      0      0
> tw_sock_TCPv6          0      0    248   33    2 : tunables    0    0    
> 0 : slabdata      0      0      0
> request_sock_TCPv6      0      0    304   26    2 : tunables    0    0  
>    0 : slabdata      0      0      0
> TCPv6                208    208   2432   13    8 : tunables    0    0    
> 0 : slabdata     16     16      0
> MPTCP                  0      0   1856   17    8 : tunables    0    0    
> 0 : slabdata      0      0      0
> tw_sock_TCP         5577   5577    248   33    2 : tunables    0    0    
> 0 : slabdata    169    169      0
> request_sock_TCP    1898   2002    304   26    2 : tunables    0    0    
> 0 : slabdata     77     77      0
> TCP *102452 113274 * 2240   14    8 : tunables    0    0    0 : 
> slabdata   8091   8091      0
> 
> 
> cat /proc/net/tcp |wc -l
> 255
> 
> After restarting squid the slab objects are released and the open file 
> descriptors drop to a reasonable value. This further suggests it is 
> squid hanging on to these FDs.
> 
> lsof |grep proxy |wc -l
> 1221
> 
> 
> Any suggestion? I guess it's something blatantly obvious, but it's a 
> couple of days we look at this and we're not going anywhere...
> 
> Thanks again
> 
> 
> 
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> https://lists.squid-cache.org/listinfo/squid-users