[squid-users] very poor performance of rock cache ipc
Alex Rousskov
rousskov at measurement-factory.com
Sun Oct 15 03:42:39 UTC 2023
On 2023-10-14 12:04, Julian Taylor wrote:
> On 14.10.23 17:40, Alex Rousskov wrote:
>> On 2023-10-13 16:01, Julian Taylor wrote:
>>
>>> When using squid for caching using the rock cache_dir setting the
>>> performance is pretty poor with multiple workers.
>>> The reason for this is due to the very high number of systemcalls
>>> involved in the IPC between the disker and workers.
>>
>> Please allow me to rephrase your conclusion to better match (expected)
>> reality and avoid misunderstanding:
>>
>> By design, a mostly idle SMP Squid should use a lot more system calls
>> per disk cache hit than a busy SMP Squid would:
>>
>> * Mostly idle Squid: Every disk I/O may require a few IPC messages.
>> * Busy Squid: Bugs notwithstanding, disk I/Os require no IPC messages.
>>
>>
>> In your single-request test, you are observing the expected effects
>> described in the first bullet. That does not imply those effects are
>> "good" or "desirable" in your use case, of course. It only means that
>> SMP Squid was no optimized for that use case; SMP rock design was
>> explicitly targeting the opposite use case (i.e. a busy Squid).
>
> The reproducer uses as single request, the same very thing can be
> observed on a very busy squid
If a busy Squid sends lots of IPC messages between worker and disker,
then either there is a Squid bug we do not know about OR that disker is
just not as busy as one might expect it to be.
In Squid v6+, you can observe disker queues using mgr:store_queues cache
manager report. In your environment, do those queues always have lots of
requests when Squid is busy? Feel free to share (a pointer to) a
representative sample of those reports from your busy Squid.
N.B. Besides worker-disker IPC messages, there are also worker-worker
cache synchronization IPC messages. They also have the same "do not send
IPC messages if the queue has some pending items already" optimization.
> and workaround improves both the single
> request case and the actual heavy loaded production squid in the same way.
FWIW, I do not think that observation contradicts anything I have said.
> The hardware involved has a 10G card, not ssds but lots of ram so it has
> a very high page cache hit rate and the squid is very busy, so much it
> is overloaded by system cpu usage in default configuration with the rock
> cache. The network or disk bandwidth is barely ever utilized more than
> 10% with all 8 cpus busy on system load.
The above facts suggest that the disk is just not used much OR there is
a bug somewhere. Slower (for any reason, including CPU overload) IPC
messages should lead to longer queues and the disappearance of "your
queue is no longer empty!" IPC messages.
> The only way to get the squid to utilize the machine is to increase the
> IO size via the request buffer change or not use the rock cache. UFS
> cache works ok in comparison, but requires multiple independent squid
> instances as it does not support SMP.
>
> Increasing the IO size to 32KiB as I mentioned does allow the squid
> workers to utilize a good 60% of the hardware network and disk
> capabilities.
Please note that I am not disputing this observation. Unfortunately, it
does not help me guess where the actual/core problem or bottleneck is.
Hopefully, cache manager mgr:store_queues report will shed some light.
>> Roughly speaking, here, "busy" means "there are always some messages
>> in the disk I/O queue [maintained by Squid in shared memory]".
>>
>> You may wonder how it is possible that an increase in I/O work results
>> in decrease (and, hopefully, elimination) of related IPC messages.
>> Roughly speaking, a worker must send an IPC "you have a new I/O
>> request" message only when its worker->disker queue is empty. If the
>> queue is not empty, then there is no reason to send an IPC message to
>> wake up disker because disker will see the new message when dequeuing
>> the previous one. Same for the opposite direction: disker->worker...
> This is probably true if you have slow disks and are actually IO bound,
> but on fast disks or high page cache hit rate you essential see this ipc
> ping pong and very little actual work being done.
AFAICT, "too slow" IPC messages should result in non-empty queues and,
hence, no IPC messages at all. For this logic to work, it does not
matter whether the system is I/O bound or not, whether disks are "slow"
or not.
>> > Is it necessary to have these read chunks so small
>>
>> It is not. Disk I/O size should be at least the system I/O page size,
>> but it can be larger. The optimal I/O size is probably very dependent
>> on traffic patterns. IIRC, Squid I/O size is at most one Squid page
>> (SM_PAGE_SIZE or 4KB).
>>
>> FWIW, I suspect there are significant inefficiencies in disk I/O
>> related request alignment: The code does not attempt to read from and
>> write to disk page boundaries, probably resulting in multiple
>> low-level disk I/Os per one Squid 4KB I/O in some (many?) cases. With
>> modern non-rotational storage these effects are probably less
>> pronounced, but they probably still exist.
> The kernel drivers will mostly handle this for you if multiple requests
> are available, but this is also almost irrelevant with current hardware,
> typically it will be so fast software overhead will make it hard to
> utilize modern large disk arrays properly
I doubt doing twice as many low-level disk I/Os (due to wrong alignment)
is likely to be irrelevant, but we do not need to agree on that to make
progress: Clearly, excessive low-level disk I/Os is not the bottleneck
in your current environment.
> you probably need to look at
> other approaches like io_ring to get rid of the classical read/write
> systemcall overhead dominating your performance.
Yes, but those things are complementary (i.e. not mutually exclusive).
Cheers,
Alex.
More information about the squid-users
mailing list