[squid-users] Squid for Windows Repeatedly Crashing

Van Order, Drew (US - Hermitage) dvanorder at deloitte.com
Sun Feb 24 13:40:15 UTC 2019


This is helpful, and I especially appreciate the time given it is the weekend.

The Squids are confusing me, as everything is well behaved at the moment. One server was erroring off and on for a few hours earlier today, but stopped after a reboot.

It does appear that redirecting roughly 125 servers to no longer use the proxy has helped. Unfortunately, our F5 guy can't tell me how many IP addresses remain coming into this F5 VIP, which would give me the number of servers, and an idea how loaded this thing is. I have good reason to believe it is under 1,000. He has shown us graphs indicating the VIP isn't stressed, but I will keep working on him, b/c I can't imagine not being able to report how many distinct IP addresses hit the VIP.

I don't have a Visio, but

Server running the Microsoft Monitoring Agent sends data over tcp/443-->Internal facing firewall(s)-->F5 VIP-->one of 4 Squids-->internet 

Each of the 4 VMWare Squids has 4 proc and 8 GB memory, 10 GB NIC.

We're a large enterprise with multiple data centers and many subnets, so there are quite a few firewalls, and most of the time a server must go through more than one firewall. Can't help but wonder if firewall exhaustion could cause the symptoms.

Revision: I typed the above last night. This morning, the server that had been erroring is at it again, but stopped. Others are fine. Interesting problem.

-----Original Message-----
From: eliezer at ngtech.co.il <eliezer at ngtech.co.il> 
Sent: Saturday, February 23, 2019 12:16 PM
To: Van Order, Drew (US - Hermitage) <dvanorder at deloitte.com>; 'Amos Jeffries' <squid3 at treenet.co.nz>; squid-users at lists.squid-cache.org
Subject: [EXT] RE: [squid-users] Squid for Windows Repeatedly Crashing

The next tool might help you to understand the status of the open connections.
If the socket is being closed( I think Windows Server 2016 is a very good OS...).
https://secure-web.cisco.com/1gLLf4HP_bwYOteW6x8gJ8EGyBrYzTMzMIi7P6q7aGi136WObNRd7uZQkrv-CKTO7ipHpLgOvHaGbzxLT7RpG6AGtkeTHUn2O8-CIAgcBOCUzn6KyZoPhqsAcpIXokXWcjlWHdUVUwlZVT0WKEhuOuAGvw2washhJEOg1Gcbsf99cy7ofqJfuTc-fS23KxfiE8W-2GLLNuF_J8q5uGJdvUMhm6HN-4CO3c_i8wxOlHrxgX3GjSLbLo8odnA6YctD5A01sjW3dpC4oiioIkGY7gDY-hjSSNYr_xoZzsixScColG-JRDlR3uktjsFF5JCkU1EROfoOfUHsDdeJ0IV2Cpk6yzbSPNNno7jV5BmZSsmR_jRgW7WJa4eVhKUvicMfy8RBespjtbfk17lUf9JamqmxPBtP2eHsiIb4_wk9iJfRr_S-aA1Ve7rPDmCXm9bZ9HRmXphi8o5AeYMWbK9DTrnmPDmFamis922AT6F4KUuBvS3PKqeCkT3EUuGmlwHXxCiJGwYBKXQmOehcFbqgfFQ/https%3A%2F%2Fwww.nirsoft.net%2Futils%2Fcports.html

There is a possibility that some OS TCP limit is being reached and there for the socket closure.
If you are using F5 you can easily find out the load at the crash point.
I assume that if a normal Squid instance can take a load of 900k requests per second in somewhat constant rate for more than a minute then the issue might be else where then squid.
I am not sure but pretty sure that if you do not have anyone that is knowledgeable enough about windows sockets, sessions and FW limitations you will either:
- learn it your self
- find an expert
- use an OS that is more then 20% supported by any of the Squid-Cache team members and other developers around the globe.

Just to say a good word about Windows Server 2016, I compared it to a Windows 10 under load and it seems to take a lot more load.
Also it not just takes the load but balance it well (on an open source windows designed software).

Also if you have a specific use case maybe a specific proxy can be customized for it.
Let me know if you wish to shed more details on the configuration so I can take my time and understand if there is a solution else then Squid.

Eliezeer

----
Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: eliezer at ngtech.co.il


-----Original Message-----
From: squid-users <squid-users-bounces at lists.squid-cache.org> On Behalf Of Van Order, Drew (US - Hermitage)
Sent: Friday, February 22, 2019 15:32
To: Amos Jeffries <squid3 at treenet.co.nz>; squid-users at lists.squid-cache.org
Subject: Re: [squid-users] Squid for Windows Repeatedly Crashing

The test box I set up outside the F5 finally started exhibiting these errors, once I pointed roughly 60 machines to it. It took a few hours.
Sounds like this narrows it down to either the OS itself (seems unlikely, other apps would crash), or the litany of agents our security folks have mandated. It may indeed be necessary to move to Linux.

Thank you very much for your time!

-----Original Message-----
From: Amos Jeffries <squid3 at treenet.co.nz>
Sent: Thursday, February 21, 2019 11:31 PM
To: Van Order, Drew (US - Hermitage) <dvanorder at deloitte.com>; squid-users at lists.squid-cache.org
Subject: [EXT] Re: [squid-users] Squid for Windows Repeatedly Crashing

On 22/02/19 4:21 am, Van Order, Drew (US - Hermitage) wrote:
> Thank you for replying, and that's an excellent point.
>
> Short answer--definitely not in a container, these are garden variety
VMWare instances. I've already flagged the OS power settings to maximum performance, so nothing should be going to sleep. I'll doublecheck, though.
>
> So, if I understand correctly, this error could also be indicative of 
> an
issue in between the agent and Squid. Agents first go through a firewall, then the F5 before reaching Squid.

No that is not what I meant.

The port Squid has already opened and used syscall listen(2) on is what is being closed (or its address corrupted) outside of Squid. That should only ever be closed by Squid itself. Thus the error.

It is being closed repeatedly. Thus the abort/shutdown. This is not a crash, it is intentional shutdown by Squid due to these fatal
(non-recoverable) errors.


>
> [Stopped, reason:Listener socket closed job1]: (14) Bad address
>
> Any thoughts on this error, which tends to be more common than the other?
>
> 2019/02/20 09:42:33 kid1| comm_poll: poll failure: (14) Bad address
> 2019/02/20 09:42:33 kid1| Select loop Error. Retry 2
>

Notice how the error from the OS "(14) Bad Address" is the same. This is just another display of the same problem. Maybe the poll() layer reporting the exact same error as Squid tries to recover. Maybe for other non-listener ports also being corrupted somehow.

If non-listener ports are having that same error it would be a sign the machine memory is being corrupted rather than other software touching the listener ports specifically.


( The details you have provided so far have no hints about where the problem may be coming from, and I am not having any ideas about possibilities either. I just hope the above explanation of meaning can help you think of things to look at for more hints on this very weird issue. )

Amos
This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and any disclosure, copying, or distribution of this message, or the taking of any action based on it, by you is strictly prohibited.

v.E.1
_______________________________________________
squid-users mailing list
squid-users at lists.squid-cache.org
http://secure-web.cisco.com/1rTtG7rZtQ6ZYF-exa33X6jslvqhns0Pi1uNpYErXcG6etibmd2SGhMCHECLwNvCY_z6WNGI9PaBD1nPWRtPe1XdcdZhuC10Oc9dQlldi3fS1vGPfi61VTB_e97sfZ2nE_5La5ibKly97QaMVeX4ib_qbPmqDOLDxWojYptvrbanhvTw0LMDyj92Yemr6GmVWk24CafYzhUBtvf-e8KVWHfPeNVfB537hUMROtnb3P2Ai1mcKSoamHQIIRn3kSkUD0Hg7sY7b-9LxTw617U5_JrdvsS5Qv8KJvkOYV-8jTAumLo3yhoc8WuMnYFRMDvbkwDV2T1LnqyfjyCzukxeiXxfgRMIDIrj2OBfNj33Xiw-rbU-thwedxYHIPJ0lIxU49DL4kAwlhAH173i_vZBUxMyqjVSvMIHutBPmEYNSDsnG0CVDRrYiF2BA3-7ZDPpQNjCUGUVP7K1NyA41OMZSeaRP8mtbuqrTwKT_BpNzx6IUc4_gFtkJZ_FgqpC2_uFPmtzLnSxnCM4Lz1om84BJVQ/http%3A%2F%2Flists.squid-cache.org%2Flistinfo%2Fsquid-users




More information about the squid-users mailing list