[squid-users] squid centos and osq_lock

Amos Jeffries squid3 at treenet.co.nz
Fri Jul 31 18:56:03 UTC 2015


On 1/08/2015 4:06 a.m., Josip Makarevic wrote:
> Marcus, tnx for your info.
> OS is centos 6 w kernel  2.6.32-504.30.3.el6.x86_64
> Yes, cpu_affinity_map is good and with 6 instances there is load only on
> first 6 cores and the server is 12 core, 24 HT

Then I suspect that mutex and locking will be the kernel scheduling work
on the HT cores.
 In high performance Squid will max out a physical cores worth of
cycles. HT essentially tries to over-clock physical cores. But trying to
reach 200% capacity into a physical core with Squid workloads only leads
to trouble.
 It is far better to tie Squid with affinity to one instance per
physical core and let the extra HT capacity be available to the OS and
other supporting things the Squid instance needs to have happen externally.


> each instance is bound to 1 core. Instance 1 = core1, instance 2 = core 2
> and so on so that should not be the problem.
> I've tried with 12 workers but that's even worse.

You do need to be very careful about which core numbers are the HT core
vs the physical core ID. Last time I saw anyone doing it, every second
number was a real physical core ID. YMMV.

> 
> Let me try to explain:
> on non-smp with traffic at ~300mbits we have load of ~4 (on 6 workers).
> in that case, actual user time is about 10-20% and 70-80% is sys time
> (osq_lock) and there are no connection timeouts.
> 
> If I switch to SMP 6 workers user time goes up but sys time goes up too and
> there are connection timeouts and the load jumps to ~12.
> If I give it more workers only load jumps and more connections are being
> dropped to the point that load goes to 23/24 and the entire server is slow
> as hell.
> 
> So, best performance so far are with 6 non-smp workers.
> 
> For now I have 2 options:
> 1. Install older squid (3.1.10 centos repo) and try it then
> 2. build custom 64bit kernel with RCU and specific cpu family support (in
> progress).
> 
> The end idea is to be able to sustain 1gig of traffic on this server :)
> Any advice is welcome

I agree with Marcus then. The non-SMP then is the way to go at present.
The main benefit of SMP support in current Squid is for caching
de-duplication (ie rock store).


Also some things to note:

* a good percentage of the speed of Squid is the 20-40% caching HIT rate
normal HTTP traffic has. Albeit memory-only caching on highest
performance boxen. Memory hits are 4-6 orders of magnitude faster than
network fetches. This has little to do with anything you can control
(normally). The (relatively) slow speed of origin servers creating the
content is the bottleneck. Even "static" content may be encoded to the
clients requested desire on each fetch, which takes time.


* Going by out lab tests and real-world results so far I rate Squid
per-worker at ~50Mbps on 3.1GHz core, and ~70Mbps on 3.7GHz. Your 12
cores will only get you up around 800 Mbits IMHO (thats after tuning). I
would gladly be proven wrong though :-)


* Squid effectively *polls* all the listening ports every 10ms or once
every 10 I/O events (whichever is faster). So running with 1024
listening ports is a bit counter-productive, more time could be spent
checking those ports than doing work.
 That said going from one to multiple listening ports does make a speed
improvement. Finding the sweet spot between those trends is something
else to tune for.
 <http://wiki.squid-cache.org/MultipleInstances#Tips>


> 2015-07-31 14:53 GMT+02:00 Marcus Kool:
> 
>> osq_lock is used in the kenel for the implementation of a mutex.
>> It is not clear which mutex so we can only guess.
>>
>> Which version of the kernel and distro do you use?
>>
>> Since mutexes are used by Squid SMP, I suggest to switch for now to Squid
>> non-SMP.
>>
>> What is the value of cpu_affinity_map in all config files?
>> You say they are static. But do you allocate each instance on a different
>> core?
>> Does 'top' show that all CPUs are used?
>>
>> Do you have 24 cores or 12 hyperthreaded cores?
>> In case you have 12 real cores, you might want to experiment with 12
>> instances of Squid and then try to upscale.
>>
>> Make maximum_object_size large, a max size of 16K will prohibit the
>> retrieval of objects larger than 16K.
>> I am not sure about 'maximum_object_size_in_memory 16 KB' but let it be
>> infinite and do not worry since
>> cache_mem is zero.
>>
>> Marcus
>>
>>
>>
>> On 07/31/2015 03:52 AM, Josip Makarevic wrote:
>>
>>> Hi Amos,
>>>
>>>   cache_mem 0
>>>   cache deny all
>>>
>>> already there.
>>> Regarding number of nic ports we have 4 10G eth cards 2 in each bonding
>>> interface.
>>>
>>> Well, entire config would be way too long but here is the static part:
>>> via off
>>> cpu_affinity_map process_numbers=1 cores=2
>>> forwarded_for delete
>>> visible_hostname squid1
>>> pid_filename /var/run/squid1.pid

Remove these...

>>> icp_port 0
>>> htcp_port 0
>>> icp_access deny all
>>> htcp_access deny all
>>> snmp_port 0
>>> snmp_access deny all

... to here. They do nothing but slow Squid-3 down.

>>> dns_nameservers x.x.x.x
>>> cache_mem 0
>>> cache deny all
>>> pipeline_prefetch on

In Squid-3.4 and later this is set to the length of pipeline you want to
accept.

NP: 'on' traditionally has meant pipeline length of 1 (two parallel
requests). Longer lengths are not yet well tested but generally it seems
to work okay.


>>> memory_pools off
>>> maximum_object_size 16 KB
>>> maximum_object_size_in_memory 16 KB

Like Marcus said. Without even memory caching these two have no useful
effects.

There is one related setting "read_ahead_gap" which affects performance
by tuning the amount of undelivered object data Squid will buffer in
transient memory. Higher value for that mean faster servers can finish
sending earlier and resources for them released for other uses.
 Tuning this is a fine art since it modulates how much Squid internal
buffers (and pipieline prefetching) read off TCP buffers. And all of
those buffers have limits of their own and may contain multiple requests
data.


>>> ipcache_size 0

Remove this. Without IP cache Squid will be forced to do about 4x remote
DNS lookup for every single HTTP request - *minimum*. Maybe more if you
apply any access controls to the traffic.
 If anything increase the ipcache size to store more results.


>>> cache_store_log none

Not needed in Squid-3. You can remove.

>>> half_closed_clients off
>>> include /etc/squid/rules
>>> access_log /var/log/squid/squid1-access.log

Logging I/O slows Squid down. I suggest making that a daemon, TCP or UDP
log output.


>>> cache_log /var/log/squid/squid1-cache.log
>>> coredump_dir /var/spool/squid/squid1
>>> refresh_pattern ^ftp:           1440    20%     10080
>>> refresh_pattern ^gopher:        1440    0%      1440
>>> refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
>>> refresh_pattern .               0       20%     4320

Without caching you can remove these *entirely*.

>>>
>>> acl port0 myport 30000

Mumble. Less reliable than myportname, but it is infintessimally faster
when it does work at all.

>>> http_access allow testhost
>>> tcp_outgoing_address x.x.x.x port0
>>>
>>> include is there for basic ACL - safe ports and so on - to minimize
>>> config file footprint since it's static and same for every worker.
>>>
>>> and so on 44 more times in this config file

Only put allow testhost once. Every time you test ACLs Squid slows down.

Some ACLs are worse drag than others. You can probably optimize even the
default recommended security settings you shuffled into "rules" file to
operate better.


>>>
>>> Do you know of any good article hot to tune kernel locking or have any
>>> idea why is it happening?
>>> I cannot find any good info on it and all I've found are bits and peaces
>>> of kernel source code.

Sorry no. All I found was the same.

Though I do know that one of the big differences between Linux 2.6 and
3.0 was the removal of the "Big Kernel Lock" system that allowed Linux
to run on multi-core systems properly. It could be CentOS 6 itelf biting
you with its ancient kernel version.


Amos


More information about the squid-users mailing list