[squid-users] connections from particular users sometimes get stuck
Alex Rousskov
rousskov at measurement-factory.com
Fri Sep 30 15:12:46 UTC 2016
On 09/30/2016 12:38 AM, Eugene M. Zheganin wrote:
> And the
> main sign indicating there's something wrong with this initial
> transaction was the fact that 407 answer took 42 seconds to appear in
> both tcpdump captures.
To avoid misunderstanding: There are many red flags in your logs,
including excessive processing delays. I am ignoring the ones that do
not appear to be related to the "stuck browser" problem.
>>> tcpdump capture taken from a client machine:
>>> http://zhegan.in/files/squid/squid-stuck-client.pcap
>> This capture one is missing most of the second transaction packets
>> (tcp.stream eq 186). I do not know why tcpdump was unable to collect them.
> This is because of the gap between Ctrl^C issued in the client machine
> cmd console and squid server ssh console (in this exact order) - I had
> to switch between windows on my desktop, and because about a minute has
> passed since the beginning of the transaction (42 seconds plus quite
> some time), and I was worried that the debug log will grow more and
> more, making it difficult to navigate then.
Please do not worry about log navigation -- you are already posting logs
with 95+% of irrelevant transactions; adding 2% more would make no
difference. Focus on collecting complete logs. However, please
_compress_ your logs before posting them.
>> In summary, your browser is probably stuck because Squid could not
>> accept a connection. Why did that accept call fail with ECONNABORTED?
>> If you cancel browser wait and repeat the request, will it work?
> Sometimes, but this means like 3-5% percents.
This may be your pathway to the answer! A _random_ ECONNABORTED
accept(2) error would have a ~100% probability of disappearing on the
second attempt. Your observations seem to confirm that this error
(assuming Squid gets ECONNABORTED for all stuck transactions) is caused
by something on that client or something between that specific client
and Squid.
We obviously do not know what is wrong yet, but just for the sake of an
example, consider a mismatching negotiated Ethernet settings that lead
to packet loss and similar low-level problems that lead to TCP accept(2)
errors from Squid point of view. Again, I am not claiming that this is
what is going on.
> Most of the time
> rerequests lead to the same timeout, Chrome shows the request is
> "pending" in the developer's tools/network tab for dozens of seconds,
> and so on.
Very good. This is something we can use to investigate.
> I still have this machine in stuck state (I think), what should I focus on ?
I recommend the following:
0. Start capturing to/from-Squid packets on the client machine.
Make sure tcpdump does not do DNS resolution (if applicable).
1. Start capturing to/from-clientS packets on the Squid machine.
Make sure tcpdump does not do DNS resolution (if applicable).
2. Start Squid debugging.
3. On both machines (if possible):
Collect "netstat -s" or equivalent TCP stack stats.
Collect "ifconfig ..." or equivalent high-level interface stats.
Collect "ethtool -s ..." or equivalent low-level interface stats.
4. Reproduce the problem. Wait until Chrome times out.
5. On both machines (if possible):
Collect "netstat -s" or equivalent TCP stack stats.
Collect "ifconfig ..." or equivalent high-level interface stats.
Collect "ethtool -s ..." or equivalent low-level interface stats.
6. Rereproduce the problem. Wait until Chrome times out.
If possible, use a slightly different URL for this test.
For example, append "?test2" to the URL in #4, assuming
that will not screw things up.
7. On both machines (if possible):
Collect "netstat -s" or equivalent TCP stack stats.
Collect "ifconfig ..." or equivalent high-level interface stats.
Collect "ethtool -s ..." or equivalent low-level interface stats.
8. Wait 60 seconds.
9. Stop all captures and Squid debugging.
10. Archive, compress, and share all the logs and the test URL(s).
When archiving, please preserve file modification times for
logs from steps 3,5,7.
Adjust the procedure as needed, of course.
Alex.
More information about the squid-users
mailing list