[squid-users] Sudden but sustained high bandwidth usage

Wed Mar 2 04:06:46 UTC 2016

On 2/03/2016 10:57 a.m., Heiler Bemerguy wrote:
> 
> Hey guys.
> 
> For the third time, we got a sudden high bandwidth usage, almost saturating our 
> link, and it won't stop until squid is restarted.
> I'm totally SURE this inbound traffic comes from squid. It's like it's download 
> stuff itself....
> 

Yes, it probably is. Or something very close...

> 
> Look that after squid was restarted near 10:45, the network usage drops 
> immediately and won't increase as high as before anymore..
> 
> This pattern started to happen when I changed from ROCK+AUFS to ROCK+ROCK, squid 
> 3.5.14 x64.

Please upgrade to 3.5.15 asap. Or better the latest snapshot if you have
trouble with the main release (a few more side effects have been fixed
this week).

> 
> Here's the most important conf settings.. I appreciate all comments about it.
> 
> /acl windowsupdate dstdomain .ws.microsoft.com .windowsupdate.microsoft.com 
> .update.microsoft.com .windowsupdate.com//
> //http_access allow windowsupdate//
> //range_offset_limit none windowsupdate//

I suspect you are hitting a case of clients aborting downloads of Win10
files early and Squid continuing to try to complete them.

The secret downloads the GWX application does of multi-GB files on a
per-machine basis have been quite a problem for several people over the
last few months.

> //cache_mem 4 GB//
> //maximum_object_size_in_memory 5 MB//
> //memory_replacement_policy heap GDSF//
> //cache_replacement_policy heap LFUDA//
> //maximum_object_size 10 GB//
> //cpu_affinity_map process_numbers=1,2,3,4,5,6 cores=1,2,3,4,5,6//
> /*/workers 2/**/
> /**/cache_dir rock /cache2/rock1 90000 min-size=0 max-size=32768/**/
> /**/cache_dir rock /cache/rock1 300000 min-size=32768 max-size=10737418240/*/
> //store_dir_select_algorithm round-robin//

Don't force-configure this when you have min/max controlling which dir
are usable. Squid default should try to round-robin anyway, but it may
select a better best-fit action.

> //read_ahead_gap 4096 KB//
> //client_request_buffer_max_size 2048 KB//

 !!! 2MB packets !??

Please have a read of
<http://www.bufferbloat.net/projects/bloat/wiki/Introduction>

Ths buffer only needs to store the maximum size of expected HTTP request
mime headers on a single request. That is ~64KB for Squid due to
hardcoded internal issues. Going far beyond that leads to trouble.

Having larger buffer for multipe requests can be a small help with
pipelining. BUT you have completely disabled that performance enhancing
feature of HTTP in your proxy (the *_persistent_connections off settings
below)

> //dns_v4_first on//
> //ipcache_size 80000//
> //fqdncache_size 40000//
> //memory_pools on//
> //memory_pools_limit 150 MB//
> //reload_into_ims on//
> //connect_retries 3//
> //cache_swap_low 98//
> //cache_swap_high 99//
> //store_avg_object_size 92 KB//
> //client_idle_pconn_timeout 30 seconds//
> //client_persistent_connections off//
> //server_persistent_connections off/
> 
> error.log right in this moment:

Ayayeye, you got many troubles.

> 08:55:29 kid1| local=10.1.10.9:3080 remote=10.107.0.71:54515 FD 3665 flags=1: 
> read/write failure: (32) Broken pipe
> 09:00:02 kid2| snmpHandleUdp: FD 55 recvfrom: (11) Resource temporarily unavailable
> 09:00:02 kid1| snmpHandleUdp: FD 29 recvfrom: (11) Resource temporarily unavailable

Unresolved bug in Squid.

> 09:02:14 kid2| WARNING: Closing client connection due to lifetime timeout
> 09:02:14 kid2| 
> http://prod.video.msn.com/tenant/amp/entityid/BBq7uZY?blobrefkey=103&$blob=1

That would be a single HTTP request+reply transaction that took more
than 24hrs (!?) to complete.

> 09:03:34 kid1| WARNING: HTTP: Invalid Response: Bad header encountered from 
> http://sable.madmimi.com/view?id=24371.4971993.01561ff3
> e8e7c09ac362ded25f80a76b AKA 
> sable.madmimi.com/view?id=24371.4971993.01561ff3e8e7c09ac362ded25f80a76b

Okay. That server is being a bad HTTP citizen. This is just info to help
with the client complaints you will probably get about the 4xx/5xx
errors contacting that site through Squid. If you want to assist fixing
you can report the issue to its admin.

> 09:03:38 kid1| WARNING: Closing client connection due to lifetime timeout
> 09:03:38 kid1| 
> http://download.windowsupdate.com/d/msdownload/update/software/defu/2016/02/am_delta_patch_1.213.7305.0_59c57a
> caccbdfa7fa9dd5574f0a7ded60de11963.exe

Another 24hr one ?

> 09:04:15 kid1| WARNING: HTTP: Invalid Response: Bad header encountered from 
> http://sable.madmimi.com/view?id=24371.4931972.b5133065
> c861d91790f59bf39ef1abf3 AKA 
> sable.madmimi.com/view?id=24371.4931972.b5133065c861d91790f59bf39ef1abf3
> 09:04:28 kid2| WARNING: Closing client connection due to lifetime timeout
> 09:04:28 kid2| http://www.ingressocerto.com/facet-search.json?f=/p-data-Offset:2

Getting a lot of these long transactions.

> 09:04:42 kid2| Could not parse headers from on disk object

This innocent seeming message is related to the CVE-2016-2571 issue. It
is a sign that the vulnerability has happened in some past transaction.
Squid is handling this part of the fallout though, so whats happened
*right now* is okay.

> 09:05:02 kid1| snmpHandleUdp: FD 29 recvfrom: (11) Resource temporarily unavailable
> 09:05:02 kid2| snmpHandleUdp: FD 55 recvfrom: (11) Resource temporarily unavailable
> 09:05:02 kid1| snmpHandleUdp: FD 29 recvfrom: (11) Resource temporarily unavailable
> 09:05:18 kid2| SECURITY ALERT: Missing hostname in URL 'http://'. see access.log 
> for details.

Should be self explanatory. Your proxy appears to be under attack.
<http://wiki.squid-cache.org/SquidFaq/SquidLogs#Squid_Error_Messages>

<snip many repeats of earier problems>
> 09:25:49 kid1| urlParse: URL too large (8231 bytes)
> 09:27:24 kid1| urlParse: URL too large (8231 bytes)
> 09:27:46 kid2| urlParse: URL too large (10742 bytes)

These should also be self-explanatory. They are also attack signatures
for certain types of buffer-overrun attacks.
Squid is coping, but you should really do something forceful to whack
the source of these requests over the head.

It might be related to the ALERT situation. For example a "GET
http://... HTTP/1.1" where the ... is a 8-10 KB long "domain name".

<snip more repeats>
> 09:50:28 kid1| Could not parse headers from on disk object
> 09:50:28 kid1| varyEvaluateMatch: Oops. Not a Vary object on second attempt, 
> 'http://pix04.revsci.net/D08734/a1/0/3/0.js?DM_LOC=%3D
> http%3A%2F%2Fna.com%3FdlxInitiated%3Dtrue%26nada%3D%26naid%3D2015121611542932923036123812%26namp%3D' 
> 'accept-encoding="gzip,%20deflate,%20sdch
> "'
> 09:50:28 kid1| clientProcessHit: Vary object loop!

Probably a side effect of the other nasties going on. Though some people
do see this happening and it has open bug report(s), we are still trying
to get to the bottom of it.

> 09:50:46 kid1| helperHandleRead: unexpected reply on channel 0 from redirector 
> #Hlpr301 'OK'
> 09:50:46 kid1| helperHandleRead: unexpected reply on channel 0 from redirector 
> #Hlpr301 'OK'

** URGENT PROBLEM: **

The redirector helper you are using is broken. It is presenting either
multiple-lines for each reply, or replies without being asked about any URL.
In both cases Squid will be given wrong instructions to re-write random
requests to some other URL for producing the reply.

This could be the root cause behind some of those weird long request
timeouts or aborted transaction issues. It will *definitely* result in
clients randomly being given wrong objects to their replies.

So, my advice:

 * fix the redirector. See what other issues / side effects of that
disappear.

 * if they remain track down what those SECURITY ALERT are about. Get
that fixed if you can.

I expect the high bandwith will reduce with those two above issues gone
and the WU settings altered.

You can also further improve things by looking into the too-long URL
issues if they remain.

Amos