[squid-users] Optimizing squid
Eliezer Croitoru
eliezer at ngtech.co.il
Fri Feb 26 12:18:49 UTC 2016
Hey again,
It took me some time...
The number of clients is sometimes irrelevant compared to other factors.
For example a network with 30k Clients\Users which only access basic
email service. So it might be possible that in some period of time your
service will have this kind of load AVG.
You should continue to monitor the service using couple tools to
identify what is the load around the clock. Also try to dump the basic
info page you attached data from. Every 5 minutes would be OK as a starter.
And since you have asked somewhere in the thread about "why" is squid so
unique about cpu Load AVG I think you deserve a detailed response.
I am not *the* expert but you must know already that there are couple
forms that a network service software is being built.
I do not have the example I wanted right now but I wrote an example Ruby
code of an endless queue "loop" software which consumes the CPU in an
instant. Usually a queue based event handling software are not being
understood well enough compared to a simple "select" based loops.
The design should be in such a way that the CPU cycles would never be
consumed by the software if not required but in most cases you will not
see a *wait* state of it.
Couple times I tried to understand how squid works and only after
writing couple models in couple languages I kind of understood the basic
concept. Eventually I got a really good description from Amos which
confirmed my assumptions.
Most of the network services these days are based on some event driven
engine\code with threading in it. It is the most used idea for the last
years(I don't know since when). Most of these event driven approaches
are efficient but lacks couple key points and in most cases since the
developers are not novices they build these software's well and cover
the special "cases".
Squid however is an old piece of gold which uses a queue instead of only
events. Since most of the event driven services use some kind of
"select" which puts the software in some kind of *wait* mode you will
probably catch these services in *wait* mode in top from time to time.
If these services constantly work\run you will probably won't catch them
in *wait* mode in top.
Specifically for the relationship between high CPU and disk IO I can
assume that if a service relies on a queue compared to event based IO it
sometimes would be confusing to understand why exactly the CPU is being
used this is since in most event based DISK\IO programs there might be
some use of files\IO "splice" for reads or writes. These are throwing
most of the IO tasks into the kernel lands compares to the user-lands.
The kernel is somehow probably the best in handling some IO operations
efficiently(CPU related).
The above is far from complete but I think it's enough to understand
that sometimes you might expect from top one thing but it will not
reflect what you assume, then you need some insight into things.
Somehow I can maybe describe event driven code compared to queue using
an ambulance or emergency services to a super-market or a restaurant queue.
Unless there are special events the driver and the medic of the
ambulance will be idle while in a restaurant you can see that as long as
the restaurant is getting full things are starting to get busy.
If you will "top" them both you will encounter a mostly idle(wait)
process and in the other hand a continuously growing load process.
If indeed the restaurant was designed to be event driven based it would
look somehow like the emergency service. Mostly idle but when triggered
then getting very busy.
Again it's not 100% accurate so don't catch me on something and maybe
others here can give couple better examples or descriptions then I do.
If you have specific questions about anything related to squid just ask.
Eliezer
* It is possible that some look-ups will cause the issues you described
and the first thing to do is to limit the cache_dir sizes and to try an
calculate based on couple weeks of analysis the amount of reasonable
cache for this machine(not related to the storage media)
On 24/02/2016 21:44, Heiler Bemerguy wrote:
>
> Hi Eliezer, thanks for your reply.
>
> As you've suggested, I removed all cache_dirs to verify if the rest was
> stable/fast and raised cache_mem to 10GB. I didn't disable access logs
> because we really need it..
>
> And it is super fast, I can't even notice it using only ONE core.. (and
> it isn't running as smp)
>
> %Cpu0 : 0,7 us, 1,0 sy, 0,0 ni, 98,3 id, 0,0 wa, 0,0 hi, 0,0 si,
> 0,0 st
> %Cpu1 : 8,8 us, 5,6 sy, 0,0 ni, 76,1 id, 0,0 wa, 0,0 hi, 9,5 si,
> 0,0 st
> %Cpu2 : 8,7 us, 4,0 sy, 0,0 ni, 83,3 id, 0,0 wa, 0,0 hi, 4,0 si,
> 0,0 st
> %Cpu3 : 5,4 us, 3,4 sy, 0,0 ni, 86,2 id, 0,0 wa, 0,0 hi, 5,0 si,
> 0,0 st
> %Cpu4 : 7,8 us, 5,1 sy, 0,0 ni, 73,5 id, 6,8 wa, 0,0 hi, 6,8 si,
> 0,0 st
> %Cpu5 : 1,0 us, 1,0 sy, 0,0 ni, 98,0 id, 0,0 wa, 0,0 hi, 0,0 si,
> 0,0 st
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 11604 proxy 20 0 11,6g 11g 5232 S 48,4 72,2 72:31.24 squid
>
> Start Time: Wed, 24 Feb 2016 15:38:59 GMT
> Current Time: Wed, 24 Feb 2016 19:18:30 GMT
> Connection information for squid:
> Number of clients accessing cache: 1433
> Number of HTTP requests received: 2532800
> Average HTTP requests per minute since start: 11538.5
> Select loop called: 68763019 times, 0.192 ms avg
> Storage Mem size: 9874500 KB
> Storage Mem capacity: 94.2% used, 5.8% free
>
> I don't think I had a bottleneck on I/O itself, maybe the hash/search of
> cache indexes was too much for a single thread?
>
> Best Regards,
>
More information about the squid-users
mailing list