[squid-users] Huge amount of time_wait connections after upgrade from v2 to v3

Sat Jul 8 16:51:50 UTC 2017

RPS didn't change. Throughput didn't change. Our prod load is 200-700 RPS
per server (changes during the day) and my load test load was constant 470
RPS.

Clients didn't change. Doesn't matter if they use HTTP 1.1 or 1.0, because
the only thing which changed is squid version. And as I figured out, it's
not actually about 2.7 to 3.5 update, it's all about difference between
3.5.20 and 3.5.21.

I'm sorry but anything you say about throughput doesn't make any sense.
Load pattern didn't change. Squid still handles the same amount of requests.

I think I'm going to load test every patch applied to 3.5.21 from this
page:
http://www.squid-cache.org/Versions/v3/3.5/changesets/SQUID_3_5_21.html so
I'll be able to point to exact change which introduced this behavior. I'll
try to do it during the weekend or may be on Monday.

On Sat, Jul 8, 2017 at 5:46 AM, Amos Jeffries <squid3 at treenet.co.nz> wrote:

> On 08/07/17 02:06, Ivan Larionov wrote:
>
>> Thank you for the fast reply.
>>
>> On Jul 7, 2017, at 01:10, Amos Jeffries <squid3 at treenet.co.nz> wrote:
>>>
>>> On 07/07/17 13:55, Ivan Larionov wrote:
>>>>
>>> >>>
>
>> However I assumed that this is a bug and that I can find older version
>>>> which worked fine. I started testing from 3.1.x all the way to 3.5.26 and
>>>> this is what I found:
>>>> * All versions until 3.5.21 work fine. There no issues with huge amount
>>>> of TIME_WAIT connections under load.
>>>> * 3.5.20 is the latest stable version.
>>>> * 3.5.21 is the first broken version.
>>>> * 3.5.23, 3.5.25, 3.5.26 are broken as well.
>>>> This effectively means that bug is somewhere in between 3.5.20 and
>>>> 3.5.21.
>>>> I hope this helps and I hope you'll be able to find an issue. If you
>>>> can create a bug report based on this information and post it here it would
>>>> be awesome.
>>>>
>>>
>>> The changes in 3.5.21 were fixes to some common crashes and better
>>> caching behaviour. So I expect at least some of the change is due to higher
>>> traffic throughput on proxies previously restricted by those problems.
>>>
>>>
>> I can't imagine how throughput increase could result in 500 times more
>> TIME_WAIT connections count.
>>
>>
> More requests per second generally means more TCP connections churning.
>
> Also when going from Squid-2 to Squid-3 there is a change from HTTP/1.0 to
> HTTP/1.1 and the accompanying switch from MISS to near-HIT revalidations.
> Revalidations usually only have headers without payload so the same
> bytes/sec can contain orders more magnitude of those than MISS - which is
> the point of having them.
>
>
> In our prod environment when we updated from 2.7.x to 3.5.25 we saw
>> increase from 100 to 10000. This is 100x.
>>
>>
> Compared to what RPS change? Given the above traffic change this may be
> reasonable for a v2 to v3 jump. Or own very rough tests on old hardware lab
> tests have shown rates for Squid-2 at ~900 RPS and Squid-3 at around 1900
> RPS.
>
>
> When I was load testing different versions yesterday I was always sending
>> the same amount of RPS to them. Update from 3.5.20 to 3.5.21 resulted in
>> jump from 20 to 10000 TIME_WAIT count. This is 500x.
>>
>> I know that time_wait is fine in general. Until you have too many of them.
>>
>>
> At this point I'd check that your testing software supports HTTP/1.1
> pipelines. It may be giving you worst-case results with per-message TCP
> churn rather than what will occur normally (pipelines of N requests per TCP
> connection).
> Though seeing such a jump between Squid-3 releases is worrying.
>
> Amos
>

-- 
With best regards, Ivan Larionov.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20170708/32dfc8a6/attachment.htm>