[squid-users] Huge amount of time_wait connections after upgrade from v2 to v3

Amos Jeffries squid3 at treenet.co.nz
Sat Jul 8 12:46:48 UTC 2017


On 08/07/17 02:06, Ivan Larionov wrote:
> Thank you for the fast reply.
> 
>> On Jul 7, 2017, at 01:10, Amos Jeffries <squid3 at treenet.co.nz> wrote:
>>
>>> On 07/07/17 13:55, Ivan Larionov wrote:
 >>>
>>> However I assumed that this is a bug and that I can find older version which worked fine. I started testing from 3.1.x all the way to 3.5.26 and this is what I found:
>>> * All versions until 3.5.21 work fine. There no issues with huge amount of TIME_WAIT connections under load.
>>> * 3.5.20 is the latest stable version.
>>> * 3.5.21 is the first broken version.
>>> * 3.5.23, 3.5.25, 3.5.26 are broken as well.
>>> This effectively means that bug is somewhere in between 3.5.20 and 3.5.21.
>>> I hope this helps and I hope you'll be able to find an issue. If you can create a bug report based on this information and post it here it would be awesome.
>>
>> The changes in 3.5.21 were fixes to some common crashes and better caching behaviour. So I expect at least some of the change is due to higher traffic throughput on proxies previously restricted by those problems.
>>
> 
> I can't imagine how throughput increase could result in 500 times more TIME_WAIT connections count.
> 

More requests per second generally means more TCP connections churning.

Also when going from Squid-2 to Squid-3 there is a change from HTTP/1.0 
to HTTP/1.1 and the accompanying switch from MISS to near-HIT 
revalidations. Revalidations usually only have headers without payload 
so the same bytes/sec can contain orders more magnitude of those than 
MISS - which is the point of having them.


> In our prod environment when we updated from 2.7.x to 3.5.25 we saw increase from 100 to 10000. This is 100x.
> 

Compared to what RPS change? Given the above traffic change this may be 
reasonable for a v2 to v3 jump. Or own very rough tests on old hardware 
lab tests have shown rates for Squid-2 at ~900 RPS and Squid-3 at around 
1900 RPS.


> When I was load testing different versions yesterday I was always sending the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in jump from 20 to 10000 TIME_WAIT count. This is 500x.
> 
> I know that time_wait is fine in general. Until you have too many of them.
> 

At this point I'd check that your testing software supports HTTP/1.1 
pipelines. It may be giving you worst-case results with per-message TCP 
churn rather than what will occur normally (pipelines of N requests per 
TCP connection).
Though seeing such a jump between Squid-3 releases is worrying.

Amos


More information about the squid-users mailing list