[squid-users] Linearly increasing delays in HTTPS proxy CONNECTS / 3.5.20

Fri Aug 30 12:16:49 UTC 2019

>> I noticed small but consistent spikes in squid's disk cache usage
>> coinciding with the issue at hand. This seemed strange, given there
>> was no other traffic during the tests and proxied HTTPS means there's
>> nothing to cache (right?).
> 
> Correct. To avoid suspecting disks, configure Squid to log to a
> RAM-based partition and remove cache_dirs [until you solve the problem].

The behaviour persisted with this configuration. Not a disk bottleneck, then.

>> The servers are located in a IPv4-only local network. Every outgoing
>> request is supposed to be IPv4. The servers do have IPv6 interfaces
>> but there is no traffic at all. Squid periodically queries AAAA
>> records. Is it possible that new connections get queued while squid
>> is busy trying to use IPv6 after receiving the new AAAAs?
> 
> If "no traffic at all" means "zero IPv6 packets", then it is not
> possible. Otherwise, it is possible (only the latest Squid (i.e. future
> v5) does not have this kind of problem).
> 
>> I have very little control over the environment. Is dns_v4_first
>> worth a try in my scenario?
> 
> It is not a reliable solution, but it would not hurt as far as
> performance is concerned.

Tried dns_v4_first, the problem persisted. I also noticed that the platform (in general) tries to resolve ipv6 first, but the TCP dumps have no ipv6 packages at all. This is baffling, because there were indeed some unrelated open ipv6 connections on the Squid server (reported by netstat).

>> What should I look into next?
> 
> 1. Check system logs.

Nothing out of the ordinary.

> 2. Check atop output while the problem is present. If this is a resource
> bottleneck, atop may expose it.

No bottlenecks identified this way.

> 3. If there is IPv6 traffic, to eliminate IPv6 as a suspect, you can
> disable IPv6 on the box, use Squid built without IPv6 support, or even
> use a DNS forwarder that, for example, rejects all AAAA queries. All
> these solutions can and should be validated by examining actual IPv6
> traffic. And none of them are needed if there is no IPv6 traffic at all.

This is something I may need to look into further.

> 4. With delays ranging into _seconds_ it should be fairly easy for a
> capable Squid developer to figure out what your Squid is doing by
> looking at Squid debugging logs. You can post a link to compressed
> cache.log here for analysis, but you should first simplify your workload
> so that it has CONNECT tunnels and nothing else (if you have not
> already) and enable debugging when the problem is present (e.g., use
> "squid -k debug" although it is currently better to send the right
> signal manually).

I unfortunately cannot share the debug log because it contains some sensitive information. We nevertheless recorded what ended up being a huge sample.

I suspect Squid might be waiting for local TCP ports from the kernel (or something related). Why am I thinking this?

Right now, there are four different IP addresses returned for the target cloud service. For practical purposes, they are returned in a random order. The traffic would ideally be spread over all of them. Unfortunately it is evident both from the debug log and from the TCP dump that Squid is using only one of the addresses at a time. The amount of connections in the TIME_WAIT state for that single IP address gets very close to the maximum defined by the net.ipv4.ip_local_port_range sysctl. After a while (a minute or so in the recording) this address changes presumably in response to a new DNS query result.

Could this be the bottleneck? Is there a way to configure Squid to use all returned IP addresses?

One possible workaround that I can think of is setting a short positive_dns_ttl, but this doesn’t fully guarantee an even distribution, now does it?

>> Could setting up "workers N” help, for example?
> 
> The answer depends on your definition of "help": Large number of workers
> may mask the problem to the point where it no longer bothers you, but I
> would not make the setup a lot more complex until you know where the
> current bottleneck is. In fact, I would go into the opposite direction
> of making the setup as simple as possible!

Thanks, this is what I wanted to hear. :)

Best,

-- 
Ilari Laitinen