[squid-users] Linearly increasing delays in HTTPS proxy CONNECTS / 3.5.20

Mon Sep 2 11:22:47 UTC 2019

> On 30 Aug 2019, at 16.40, Alex Rousskov <rousskov at measurement-factory.com> wrote:
> 
> On 8/30/19 8:16 AM, Ilari Laitinen wrote:
> 
>> I suspect Squid might be waiting for local TCP ports from the kernel
>> (or something related).
> 
> IIRC, ephemeral source port allocator is instantaneous -- Squid either
> gets a port or a port allocation error, without waiting. When we
> overload the server with high-performance tests (without an explicit
> port manager), we see port allocation errors rather than stalled tests.
> However, perhaps that is not true in your OS/environment.

Ah yes, of course. I actually saw this error first-hand when setting up the test environment.

>> Right now, there are four different IP addresses returned for the
>> target cloud service. For practical purposes, they are returned in a
>> random order. The traffic would ideally be spread over all of them.
>> Unfortunately it is evident both from the debug log and from the TCP
>> dump that Squid is using only one of the addresses at a time. The
>> amount of connections in the TIME_WAIT state for that single IP
>> address gets very close to the maximum defined by the
>> net.ipv4.ip_local_port_range sysctl. After a while (a minute or so in
>> the recording) this address changes presumably in response to a new
>> DNS query result.
> 
> In theory, Squid should round-robin across all destination IP addresses
> for a single host name. If your Squid v3 does not, it is probably a
> Squid bug that can be fixed [by upgrading].
> 
> Said that, IIRC, the notion of "round robin" is rather vague in Squid
> because there are several places where an IP may be requested for the
> same host name inside the same transaction. I would not be surprised if
> that low-level round-robin behavior results in the same IP being used
> for most transactions in some environments (until an error or a new DNS
> query reshuffles the IPs). Debugging logs may expose this problem.

I looked into this further. All our tcp dumps so far show the same: Squid uses (almost always) exactly one remote address at a time. The increasing delays start right after Squid has switched to a new remote IP and last precisely until another switch happens (typically the next one). The problem does not occur every time and is not limited to a single target IP.

Now that I know that this is not expected and is possibly related to a bug, I’ll look into upgrading Squid from the platform default.

>> One possible workaround that I can think of is setting a short
>> positive_dns_ttl, but this doesn’t fully guarantee an even
>> distribution, now does it?
> 
> No, it does not. Moreover, Squid v3 had some TTL handling bugs that were
> fixed (in v4 and later code) by the Happy Eyeballs project. Taking all
> the known problems into the account, it is difficult for me to predict
> the effect of changing TTLs. Said that, it does not hurt to try! Maybe
> you will be lucky, and a simple configuration change will remove the
> cause of increasing transaction delays.

Thank you very much for your informed and timely replies!

I’ll report our results here.

Best,

-- 
Ilari Laitinen