[squid-users] squid-4.9 TCP_MISS_ABORTED and memory leak

Thu Jun 4 08:12:12 UTC 2020

On 2/06/20 11:44 pm, biao.wei wrote:
> hi squid developer:
>     we use squid-4.9 meet two questions：
>       (1)some request have timeout, not response to user data neither to
> request upstream from log information.

Your logs are showing just over 14min to deliver 360 bytes to the
client. That smells like a routing MTU issue, probably ICMP packets
being dropped.

>       (2)memory leak, now we need restart squid to release memory.

memory leaks are a rarity these days. Note that your large caches
require up to 530 GB of RAM spread over the 8 Squid processes. Plus
whatever the active traffic may be consuming - which may be some GB.
That is not all going to be allocated on startup, it will grow as the
caches fill, particularly the in-memory ones.

If you still think there is a leak - details are required.

> 
>        look forward to response.
> 
> 1. log format
> logformat logfmt_cdn [%{%Y-%m-%d:%H:%M:%S}tl.%03tu] %{X-Real-IP}>h
> %>a:%>p %<a:%<p %rm %03>Hs %<st %>st HTTP/%>rv "%>rs://%>rd:%>rP" "%>rp"
> "%{Referer}>h" "%{User-Agent}>h" %tr %<pt %03<Hs %Ss %Sh/%<A
> "%{Range}>h" "%{xxx-ComeFrom}>h" "%{xxx-Origin-Domain}>h"
> 
> 2. log infomation
> [2020-06-01:10:59:40.346] 123.124.197.113 10.8.120.9:40680 -:- GET 206
> 396 740 HTTP/1.1 "http://download.xx.com:80"
> "/xx_mac_install_1.0.0.114_s.dmg" "-" "Mozilla/5.0 (Windows NT 10.0;
> Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61
> Safari/537.36" 850826 - - TCP_MISS_ABORTED HIER_NONE/-
> "bytes=25165824-25165824" "tcdn" "download.xx.com"
> [2020-06-01:11:02:12.629] 123.124.197.113 10.8.120.10:49318 -:- GET 206
> 402 743 HTTP/1.1 "http://download.xx.com:80"
> "/xx_mac_install_1.0.0.114_s.dmg" "-" "Mozilla/5.0 (Windows NT 10.0;
> Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61
> Safari/537.36" 850299 - - TCP_MISS_ABORTED HIER_NONE/-
> "bytes=48234496-49283071" "tcdn" "download.xx.com"
> 
> 3. topology
>   origin site <---  [squid <- nginx]  <--- bussiness cdn <--- users
> 
> 4.  config files
> 
> squid.conf
> coredump_dir /var/spool/squid
> pid_filename /var/run/squid.pid
> httpd_suppress_version_string on
> via off
> cache_effective_user squid
> cache_effective_group squid
> 
> max_filedescriptors 65535 
> cache_swap_low 90
> cache_swap_high 95
> minimum_object_size 0 KB
> maximum_object_size 200 MB
> 
> acl urlpath_QUERY urlpath_regex -i cgi-bin \? \.php \.xml
> no_cache deny urlpath_QUERY  
> # acl urlpath_denyssl urlpath_regex -i ^https:\\
> # no_cache deny urlpath_denyssl

The name "no_cache" is obsolete since *very*, *very* long ago.

Remove the "no_" characters from the start of those config lines.

> 
> logformat logfmt_cdn [%{%Y-%m-%d:%H:%M:%S}tl.%03tu] %{X-Real-IP}>h
> %>a:%>p %<a:%<p %rm %03>Hs %<st %>st HTTP/%>rv "%>rs://%>rd:%>rP" "%>rp"
> "%{Referer}>h" "%{User-Agent}>h" %tr %<pt %03<Hs %Ss %Sh/%<A
> "%{Range}>h" "%{EEO-ComeFrom}>h" "%{xxx-Origin-Domain}>h"
> acl nolog_url url_regex -i cache_object
> 
> cache_log /data/cache.log
> cache_store_log none
> logfile_rotate 1
> 
> quick_abort_min 512 KB
> quick_abort_max 512 KB
> quick_abort_pct 80
> 
> range_offset_limit -1
> reload_into_ims on
> # collapsed_forwarding on
> 
> client_db off
> dns_v4_first on
> dns_nameservers 127.0.0.1
> dns_retransmit_interval 0.01 second
> dns_timeout 0.01 second
> dns_defnames off
> ignore_unknown_nameservers on
> ipcache_size 10240
> ipcache_low 90
> ipcache_high 95
> fqdncache_size 1024

Many of the above settings are irrelevant. They are setting things to
the default value things things use anyway. You can simplify by removing
those lines entirely from your config file.

> positive_dns_ttl 365 days
> negative_dns_ttl 365 days

This is very bad. negative_dns_ttl is a *minimum* limit for DNS value
storage. There will be up to a years delay before Squid notices any time
servers are added, renumbered, or removed from the network your diagram
labels "origin site".
 negative_dns_ttl should be in the order of seconds or minutes.

 positive_dns_ttl is okay with large values, but also should be in the
order of days or weeks to avoid long-term timeouts trying to connect to
servers that no longer exist.

> 
> peer_connect_timeout 30 seconds

This is the default, no need to configure it to this value.

> 
> connect_timeout 10 minutes

Really 10min for a TCP SYN+ACK packet exchange?

This is a major DoS risk, and not just for Squid. It allows an attacker
to consume the systems networking resources completely. Causing the
entire machine to become non-responsive.

Amos