[squid-users] Optimizing squid

Fri Feb 26 12:18:49 UTC 2016

Hey again,

It took me some time...

The number of clients is sometimes irrelevant compared to other factors. 
For example a network with 30k Clients\Users which only access basic 
email service. So it might be possible that in some period of time your 
service will have this kind of load AVG.

You should continue to monitor the service using couple tools to 
identify what is the load around the clock. Also try to dump the basic 
info page you attached data from. Every 5 minutes would be OK as a starter.

And since you have asked somewhere in the thread about "why" is squid so 
unique about cpu Load AVG I think you deserve a detailed response.
I am not *the* expert but you must know already that there are couple 
forms that a network service software is being built.
I do not have the example I wanted right now but I wrote an example Ruby 
code of an endless queue "loop" software which consumes the CPU in an 
instant. Usually a queue based event handling software are not being 
understood well enough compared to a simple "select" based loops.
The design should be in such a way that the CPU cycles would never be 
consumed by the software if not required but in most cases you will not 
see a *wait* state of it.
Couple times I tried to understand how squid works and only after 
writing couple models in couple languages I kind of understood the basic 
concept. Eventually I got a really good description from Amos which 
confirmed my assumptions.

Most of the network services these days are based on some event driven 
engine\code with threading in it. It is the most used idea for the last 
years(I don't know since when). Most of these event driven approaches 
are efficient but lacks couple key points and in most cases since the 
developers are not novices they build these software's well and cover 
the special "cases".
Squid however is an old piece of gold which uses a queue instead of only 
events. Since most of the event driven services use some kind of 
"select" which puts the software in some kind of *wait* mode you will 
probably catch these services in *wait* mode in top from time to time.
If these services constantly work\run you will probably won't catch them 
in *wait* mode in top.

Specifically for the relationship between high CPU and disk IO I can 
assume that if a service relies on a queue compared to event based IO it 
sometimes would be confusing to understand why exactly the CPU is being 
used this is since in most event based DISK\IO programs there might be 
some use of files\IO "splice" for reads or writes. These are throwing 
most of the IO tasks into the kernel lands compares to the user-lands. 
The kernel is somehow probably the best in handling some IO operations 
efficiently(CPU related).

The above is far from complete but I think it's enough to understand 
that sometimes you might expect from top one thing but it will not 
reflect what you assume, then you need some insight into things.

Somehow I can maybe describe event driven code compared to queue using 
an ambulance or emergency services to a super-market or a restaurant queue.
Unless there are special events the driver and the medic of the 
ambulance will be idle while in a restaurant you can see that as long as 
the restaurant is getting full things are starting to get busy.
If you will "top" them both you will encounter a mostly idle(wait) 
process and in the other hand a continuously growing load process.

If indeed the restaurant was designed to be event driven based it would 
look somehow like the emergency service. Mostly idle but when triggered 
then getting very busy.

Again it's not 100% accurate so don't catch me on something and maybe 
others here can give couple better examples or descriptions then I do.
If you have specific questions about anything related to squid just ask.

Eliezer

* It is possible that some look-ups will cause the issues you described 
and the first thing to do is to limit the cache_dir sizes and to try an 
calculate based on couple weeks of analysis the amount of reasonable 
cache for this machine(not related to the storage media)

On 24/02/2016 21:44, Heiler Bemerguy wrote:
>
> Hi Eliezer, thanks for your reply.
>
> As you've suggested, I removed all cache_dirs to verify if the rest was
> stable/fast and raised cache_mem to 10GB. I didn't disable access logs
> because we really need it..
>
> And it is super fast, I can't even notice it using only ONE core.. (and
> it isn't running as smp)
>
> %Cpu0  :  0,7 us,  1,0 sy,  0,0 ni, 98,3 id,  0,0 wa,  0,0 hi,  0,0 si,
> 0,0 st
> %Cpu1  :  8,8 us,  5,6 sy,  0,0 ni, 76,1 id,  0,0 wa,  0,0 hi,  9,5 si,
> 0,0 st
> %Cpu2  :  8,7 us,  4,0 sy,  0,0 ni, 83,3 id,  0,0 wa,  0,0 hi,  4,0 si,
> 0,0 st
> %Cpu3  :  5,4 us,  3,4 sy,  0,0 ni, 86,2 id,  0,0 wa,  0,0 hi,  5,0 si,
> 0,0 st
> %Cpu4  :  7,8 us,  5,1 sy,  0,0 ni, 73,5 id,  6,8 wa,  0,0 hi,  6,8 si,
> 0,0 st
> %Cpu5  :  1,0 us,  1,0 sy,  0,0 ni, 98,0 id,  0,0 wa,  0,0 hi,  0,0 si,
> 0,0 st
>
>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+ COMMAND
> 11604 proxy     20   0 11,6g  11g 5232 S  48,4 72,2  72:31.24 squid
>
> Start Time:     Wed, 24 Feb 2016 15:38:59 GMT
> Current Time:   Wed, 24 Feb 2016 19:18:30 GMT
> Connection information for squid:
>          Number of clients accessing cache:      1433
>          Number of HTTP requests received:       2532800
>          Average HTTP requests per minute since start:   11538.5
>          Select loop called: 68763019 times, 0.192 ms avg
>          Storage Mem size:       9874500 KB
>          Storage Mem capacity:   94.2% used,  5.8% free
>
> I don't think I had a bottleneck on I/O itself, maybe the hash/search of
> cache indexes was too much for a single thread?
>
> Best Regards,
>