[squid-users] connections from particular users sometimes get stuck
Eugene M. Zheganin
emz at norma.perm.ru
Fri Sep 30 06:38:07 UTC 2016
Hi.
On 29.09.2016 23:17, Alex Rousskov wrote:
> On 09/29/2016 02:58 AM, Eugene M. Zheganin wrote:
>> This time turbodom.ru entries are present in the debug log
> Yes, there are two complete HTTP transactions with that domain. One is a
> 407 Authentication Required and one is a 301 redirect:
>
>> HTTP/1.1 301 Moved Permanently
> ...
>> Location: http://turbodom.ru/index.html/
> I see no relevant problems with those two transactions.
Me neither, but the original transaction start wasn't 'GET
http://turbodom.ru/index.html/', it was 'GET
http://turbodom.ru/index.html', without trailing slash. Trailing slash
was added by the web-server, due to it's specific configuration. And the
main sign indicating there's something wrong with this initial
transaction was the fact that 407 answer took 42 seconds to appear in
both tcpdump captures.
>
>
>> tcpdump capture taken from a client machine:
>> http://zhegan.in/files/squid/squid-stuck-client.pcap
> This capture one is missing most of the second transaction packets
> (tcp.stream eq 186). I do not know why tcpdump was unable to collect them.
This is because of the gap between Ctrl^C issued in the client machine
cmd console and squid server ssh console (in this exact order) - I had
to switch between windows on my desktop, and because about a minute has
passed since the beginning of the transaction (42 seconds plus quite
some time), and I was worried that the debug log will grow more and
more, making it difficult to navigate then.
>> tcpdump capture taken from squid machine, on the interface the client
>> machine is connected via:
>> http://zhegan.in/files/squid/squid-stuck-server-to-client.pcap
> This one has both HTTP transactions described above (tcp.stream eq 206).
>
> It also contains a related and incomplete transaction that follows the
> above redirect (tcp.stream eq 346). According to tcpdump, that
> transaction starts around 13:31:26.
>
> Squid cache log does not mention that third transaction (or that TCP
> connection), probably because Squid could not accept it. There were a
> few accept failures (ECONNABORTED) right around the time of that
> third/missing transaction:
>
>> 13:31:25.060 kid1| accept failure: (53) Software caused connection abort
>> 13:31:25.865 kid1| accept failure: (53) Software caused connection abort
>> 13:31:25.904 kid2| accept failure: (53) Software caused connection abort
> The timestamps are a ~second off, but AFAICT, they are a ~second off for
> successful accepts as well, so it is probably just a tcpdump logging
> artifact.
>
> In summary, your browser is probably stuck because Squid could not
> accept a connection. Why did that accept call fail with ECONNABORTED? I
> cannot say for sure -- the packet trace is rather dirty/misleading
> (e.g., it shows the redirect packet being sent to the client _after_ the
> client follows that redirect which does not make sense).
>
> Any relevant errors in you system logs?
Nothing except
Limiting closed port RST response from 294 to 200 packets/sec
repeated by many times, which doesn't look related to this, because they
keep popping even when nobody's complaining. I've already did the
initial investigation and found no signs of resource starvation -
meaning no connection/files/mbufs/memory/pf states starvation happens.
At least I didn't find it. Plus, it would affect all of the clients
randomly, right ? Not this particular one.
>
> If you cancel browser wait and repeat the request, will it work?
Sometimes, but this means like 3-5% percents. Most of the time
rerequests lead to the same timeout, Chrome shows the request is
"pending" in the developer's tools/network tab for dozens of seconds,
and so on.
> If this
> was just a random accept failure, then it should work on the second try.
> If it does not work again, then there is something more serious going on
> (but you would need to collect more logs to study that).
>
> The connection accepting code in Squid is in poor shape, but I do not
> think those minor code problems affect this particular use case.
>
I still have this machine in stuck state (I think), what should I focus on ?
Thanks.
Eugene.
More information about the squid-users
mailing list