[squid-users] Crash: every 19 hours: kernel: Out of memory: Kill process (squid)

Cherukuri, Naresh ncherukuri at partycity.com
Fri Aug 11 17:30:02 UTC 2017


Thank You Amos. Appreciate your help!

Thanks & Regards,
Naresh

-----Original Message-----
From: Amos Jeffries [mailto:squid3 at treenet.co.nz] 
Sent: Friday, August 11, 2017 12:50 PM
To: squid-users at lists.squid-cache.org
Cc: Cherukuri, Naresh
Subject: Re: [squid-users] Crash: every 19 hours: kernel: Out of memory: Kill process (squid)

On 12/08/17 01:13, Cherukuri, Naresh wrote:
> Amos,
> 
> Please find below my squid conf and access logs and memory output in MB. 
> Appreciate any help.
> 
> Memory Info:
> 
> [root@******prod ~]# free -m
> 
>               total       used       free     shared    buffers     cached
> 
> Mem:         11845       4194       7651         41        190       1418
> 
> -/+ buffers/cache:       2585       9260
> 
> Swap:        25551        408      25143
> 
> Squidconf:
> 
> [root@******prod squid]# more squid.conf
> 
> #
> 
> # Recommended minimum configuration:
> 
> #
> 
> max_filedesc 4096

Ouch. Squid requires between 2 and 6 sockets (FD) per client connection and clients tend to make upwards of 8 connections per domain being contacted (dozens per web page loaded). While HTTP/1.1 improvements can reduce that average a lot 4K FD cannot serve 7K clients well.

The above number should be at least 4x the expected client count to cope with load, at least 8x would be better for peak times.


> 
> acl manager proto cache_object
> 
> visible_hostname ******prod
> 
> logfile_rotate 10
> 
> access_log /cache/access.log
> 
> acl localnet src 172.16.0.0/16
> 
> acl backoffice_users src 10.136.0.0/13
> 
> acl h****_backoffice_users src 10.142.0.0/15
> 
> acl re****_users src 10.128.0.0/13
> 
> acl hcity_r*****_users src 10.134.0.0/15
> 

So you are immediately confusing "users" with "clients". They are 
different, and at the Squid layer of networking the difference starts to 
matter.

For example; are you aware that automatic software and devices without 
any user logged in can still perform network transactions through the 
proxy? usually it is not a problem, but if you are thinking of only 
*users* utilizing the proxy you could be in for a major surprise.


> acl par**** url_regex par****
> 
> acl SSL_ports port 443
> 
> acl Safe_ports port 80          # http
> 
> #acl Safe_ports port 21         # ftp
> 
> acl Safe_ports port 443         # https
> 
> #acl Safe_ports port 70         # gopher
> 
> #acl Safe_ports port 210                # wais
> 
> #acl Safe_ports port 1025-65535 # unregistered ports
> > #acl Safe_ports port 280                # http-mgmt
> 
> #acl Safe_ports port 488                # gss-http
> 
> #acl Safe_ports port 591                # filemaker
> 
> #acl Safe_ports port 777                # multiling http
> 

NP: "Safe_ports" is meant to block HTTP connections made to ports whose 
defined protocols are known to be dangerous for HTTP messages to go to. 
Those protocols are so similar they can be confused with HTTP messages 
and things go badly wrong.

By doing the above you are making the default security rules (if you 
re-enable them) decide that any web API using a non-80 port is 
prohibited to all your clients.
eg. services like http://pc-sep.pcwhq.par****.net:8014/ in your logs.


The above change to the default http_access security rules behaviour is 
probably why you had to move your custom config to the top of the 
http_access list - thus bypassing everything making use of the above 
ports definitions.


> acl CONNECT method CONNECT
> 
> acl backoffice_allowed_sites url_regex "/etc/squid/backoffice_allowed_sites"
> 
> acl h***_backoffice_allowed_sites url_regex 
> "/etc/squid/backoffice_allowed_sites"
> 
> acl backoffice_blocked_sites url_regex "/etc/squid/backoffice_blocklist"
> 
> acl h***_backoffice_blocked_sites url_regex 
> "/etc/squid/backoffice_blocklist"
> 
> acl re****_allowed_sites url_regex "/etc/squid/re****_allowed_sites"
> 
> acl h****_reg****_allowed_sites url_regex 
> "/etc/squid/h***_reg*****_allowed_sites"

Are all these URLs *actually* needing regex? regex is the second slowest 
and most memory consuming type of ACL Squid provides.

If you can reduce that to just domain names (which can have wildcard 
subdomains), the dstdomain ACL type would improve both memory and 
performance a fair bit.


> 
> #
> 
> http_access allow localnet reg***_allowed_sites
> 
> http_access deny backoffice_users backoffice_blocked_sites
> 
> http_access deny h***_backoffice_users backoffice_blocked_sites
> 
> http_access allow backoffice_users backoffice_allowed_sites
> 
> http_access allow h***_backoffice_users backoffice_allowed_sites
> 
> http_access allow reg****_users reg****_allowed_sites
> 
> http_access allow h***_reg****_users h***_reg****_allowed_sites
> 
> no_cache deny par****
> 

"no_cache" has not existed for many years. Even "cache" directive is now 
deprecated.

Most likely you want to use "store_miss deny ..." to prevent caching of 
objects.



> http_access deny all
> 

"deny all" ... so none of the below security protections will work.

Your custom http_access lines should be down ....

> #http_access allow manager localhost
> 
> #http_access deny manager
> 

(sure, manager report access can be moved or removed. In fact current 
recommendation is to place it after the CONNECT rule below.

BUT, be aware that removing it *allows* all clients to access the Squid 
management interfaces. Probably not what you intended with the above. 
Only sysadmin should need that access.
)

> # Deny requests to certain unsafe ports
> 
> http_access deny !Safe_ports
> 
> # Deny CONNECT to other than secure SSL ports
> 
> #http_access deny CONNECT !SSL_ports
> 
> http_access  allow CONNECT SSL_ports

Oh boy. You are lucky these were disabled by the above "deny all".

The default config rule was specifically crafted to prevent anonymous 
tunneling to send arbitrary bytes to arbitrary ports with no proxy 
control possible. Only HTTPS (port 443) is safe enough to deliver by 
default.

Other ports can be permitted if you need by adding to the SSL_Ports ACL. 
Absolutely DO NOT change that to an "allow CONNECT" like above.

> 
> # We strongly recommend the following be uncommented to protect innocent
> 
> # web applications running on the proxy server who think the only
> 
> # one who can access services on "localhost" is a local user
> 
> http_access deny to_localhost
> 

... Your custom http_access lines should be down here. After the basic 
security protections.


> # Example rule allowing access from your local networks.
> 
> # Adapt localnet in the ACL section to list your (internal) IP networks
> 
> # from where browsing should be allowed
> 
> #http_access allow localnet
> 
> http_access allow localhost
> 
> # And finally deny all other access to this proxy
> 
> http_access deny all
> 
> # Squid normally listens to port 3128
> 
> http_port 3128 ssl-bump \
> 
> key=/etc/squid/pc****sslcerts/pc*****prod.pkey \
> 
> cert=/etc/squid/pc******sslcerts/pc*****prod.crt \
> 

cert= before key=. The most recent Squid will complain loudly, and may 
not start if there is no cert to associate with the key.

> generate-host-certificates=on dynamic_cert_mem_cache_size=4MB
> 

You should also add the option sslflags=NO_DEFAULT_CA here.

This lack is probably a bit part of your memory problem as OpenSSL grabs 
a huge amount of memory per client connection (ouch!) to store the 
"globally trusted CA" certificates - which are pointless on that type of 
connection and never used in your setup.


> acl step1 at_step SslBump1
> 
> ssl_bump peek step1
> 
> #ssl_bump bump all
> 
> ssl_bump bump backoffice_users !localnet !h***_backoffice_users 
> !reg****_users !h***_reg***_users !par***
> 

Outwardly that may look reasonable, but it can be simplified a fair bit.

eg. A message sent by client whose IP is in the range 10.136.0.0/13 
cannot simultaneously be sent using an IP in the range 172.16.0.0/16 or 
10.128.0.0/13.

You only need to exclude (with '!') when the excluded ACL can match 
things that would otherwise be caught by the other ACLs in the list. For 
example the h****_backoffice_users is a subset of backoffice_users, and 
the par*** ACL matches a wholly different criteria than src-IP.



> #sslproxy_capath /etc/ssl/certs
> 
> sslproxy_cert_error allow all
> 
> always_direct allow all
> 
> sslproxy_flags DONT_VERIFY_PEER
> 

Just no. Remove the above three lines. Then fix any TLS/SSL issues you 
find in a proper way - which will be different per-problem, and 
worthwhile fixing.

Since 3.2 Squid has been able to mimic errors in the fake certs it 
generates so silencing them is more harmful than good.

If there are errors that actually cannot be fixed AND happen to be safe 
for a proxy to hide from the end-user [be very sure of that first], then 
set only that error into an "sslproxy_cert_error allow ..." rule. Leave 
all other errors going on through to the client software, often they are 
better able to cope with it cleanly than the proxy can.


and for the log;

All these 403 look like the relevant URLs are not in your *_sites ACL 
definitions.

> 
> 1502424001.504      0 10.138.142.6 TCP_DENIED/403 4175 GET 
> http://update.scansoft.com/GetCertificate.asp? - HIER_NONE/- text/html
> 
> 1502424001.533    329 10.140.230.6 TAG_NONE/200 0 CONNECT 
> watson.telemetry.microsoft.com:443 - HIER_DIRECT/65.55.252.202 -
> 
> 1502424001.543      0 10.141.80.6 TCP_DENIED/403 4167 GET 
> http://update.scansoft.com/Version.asp? - HIER_NONE/- text/html
> 
> 1502424001.546    331 10.140.230.6 TAG_NONE/200 0 CONNECT 
> watson.telemetry.microsoft.com:443 - HIER_DIRECT/65.55.252.202 -
> 
> 1502424001.551  29923 10.130.27.24 TCP_MISS_ABORTED/000 0 GET 
> http://pc-sep.pcwhq.par****.net:8014/secars/secars.dll? - 
> HIER_DIRECT/10.1.2.35 -
> 

That is a little more worrying, your web server at 10.1.2.35 appears not 
to be producing a response for at least 30sec.



> 
> Cachelog errors I am seeing daily:
> 
> Error negotiating SSL connection on FD 26: error:140A1175:SSL 
> routines:SSL_BYTES_TO_CIPHER_LIST:inappropriate fallback (1/-1)
> 
> Error negotiating SSL connection on FD 1175: error:14094416:SSL 
> routines:SSL3_READ_BYTES:sslv3 alert certificate unknown (1/0)
> 

First thing to do is ensure that your OpenSSL library is up to date. 
That should resolve most cipher and TLS protocol related issues.


Second thing is to ensure that the qa-certificates package (or whatever 
your OS calls it) is kept up to date. It changes every few weeks. 
Squid-3.x cannot download CA certs on demand so this is particularly 
important, and you may need to configure the 
sslproxy_foreign_intermediate_certs directive with additional and 
intermediate CA - only as needed and after checking them though.




> 2017/08/02 09:01:02 kid1| Error negotiating SSL on FD 989: 
> error:00000000:lib(0):func(0):reason(0) (5/-1/104) ##Very rare i found 
> few not frequently
> 

These opaque error codes tends to be non-TLS being pushed into port 443. 
It varies based on what software your clients are running.



> 2017/08/02 09:01:43 kid1| Queue overload, rejecting # too many times
> 
> 2017/08/02 09:01:45 kid1| Error negotiating SSL connection on FD 1749: 
> (104) Connection reset by peer ## too many times
> 
> 2017/08/02 10:12:58 kid1| WARNING: Closing client connection due to 
> lifetime timeout ## only one
> 
> 2017/08/07 22:37:56 kid1| comm_open: socket failure: (24) Too many open 
> files

Either lack of FD or your OS has set a per-process open FD limit far too 
low for what Squid requires. Either HDD disk files or network socket 
limits result in the above message.

> 
> 2017/08/07 22:39:37 kid1| WARNING: Error Pages Missing Language: en
> 
> 2017/08/07 22:39:37 kid1| '/usr/share/squid/errors/en-us/ERR_DNS_FAIL': 
> (24) Too many open files
> 
> 2017/08/07 22:39:37 kid1| WARNING: Error Pages Missing Language: en-us
> 

Your system appears to be missing the Squid langpack for error page 
localization. Or if you have one it needs updating to match the error 
pages your Squid version uses.
<http://www.squid-cache.org/Versions/langpack/>


> 2017/08/07 22:01:42 kid1| WARNING: All 32/32 ssl_crtd processes are busy.
> 
> 2017/08/07 22:01:42 kid1| WARNING: 32 pending requests queued
> 
> 2017/08/07 22:01:42 kid1| WARNING: Consider increasing the number of 
> ssl_crtd processes in your config file.

As Squid said, you can try increasing the crtd processes being run.

Though the above says 32 running and your squid.conf sslcrtd_children 
directive says maximum of 8 to be run.


Alternatively, since you are bumping you may want to try Squid-4. The 
SSL-Bump code and behaviour is quite a bit better in the latest version 
even though it is in beta still.


> 
> 2017/08/11 00:58:56 kid1| WARNING: Closing client connection due to 
> lifetime timeout
> 
> 2017/08/09 12:55:45 kid1| WARNING! Your cache is running out of 
> filedescriptors
> 


Yes, well. I covered that already, the above just confirms it.


Amos


More information about the squid-users mailing list