[squid-users] very poor performance of rock cache ipc

Alex Rousskov rousskov at measurement-factory.com
Tue Oct 17 03:26:50 UTC 2023


On 2023-10-16 16:24, Julian Taylor wrote:
> On 15.10.23 05:42, Alex Rousskov wrote:
>> On 2023-10-14 12:04, Julian Taylor wrote:
>>> On 14.10.23 17:40, Alex Rousskov wrote:
>>>> On 2023-10-13 16:01, Julian Taylor wrote:
>>>>
>>>
>>> The reproducer uses as single request, the same very thing can be 
>>> observed on a very busy squid
>>
>> If a busy Squid sends lots of IPC messages between worker and disker, 
>> then either there is a Squid bug we do not know about OR that disker 
>> is just not as busy as one might expect it to be.
>>
>> In Squid v6+, you can observe disker queues using mgr:store_queues 
>> cache manager report. In your environment, do those queues always have 
>> lots of requests when Squid is busy? Feel free to share (a pointer to) 
>> a representative sample of those reports from your busy Squid.
>>
>> N.B. Besides worker-disker IPC messages, there are also worker-worker 
>> cache synchronization IPC messages. They also have the same "do not 
>> send IPC messages if the queue has some pending items already" 
>> optimization.

> I checked the queues running with the configuration from my initial mail 
> with workers increase and the queues are generally low, around 1-10 
> items in the queue when sending around 100 parallel requests reading 
> about 100mb data files. Here is a sample: https://dpaste.com/8SLNRW5F8
> Also with the higher request rate than the single curl the majority of 
> work throughput was more than doubled by increasing the blocksize.
> 
> How are the queues supposed to look like on a busy squid that is not 
> spending a large portion of its time doing notify IPC?

The queues are supposed to look "not empty" -- a non-empty queue does 
not result in IPC notifications. Needless to say, the further away from 
"empty" the queues are, the lesser the chance they will become empty 
when cache manager report is _not_ "looking" at them.


> Increasing the parallel requests does decrease the amount of overhead 
> but its still pretty large, I measured about 10%-30% cpu overhead with 
> 100 parallel requests served from cache in the worker and disker
> Here a snipped of a profile:
> --22.34%--JobDialer<AsyncJob>::dial(AsyncCall&)
>     |
>     |--21.19%--Ipc::UdsSender::start()
>     |       |
>     |        --21.13%--Ipc::UdsSender::write()
>     |           |
>     |           |--16.12%--Ipc::UdsOp::conn()
>     |           |          |
>     |           |           --15.84%--comm_open_uds(int, int, 
> sockaddr_un*, int)
>     |           |                |--1.70%--commSetCloseOnExec(int)
>     |           |                 --1.56%--commSetNonBlocking(int)
>    ...
> --12.98%--comm_close_complete(int)
> 
> Clearing and constructing the large Ipc::TypedMsgHdr is also very 
> noticeable.
> 
> That the overhead and maximum throughput is so low for not so busy 
> squids (say 1-10 requests per second but requests on average > 1MiB) is 
> imo also a reason for concern and could be improved.

I agree.


> If I understand the way it works correctly e.g. the worker when it gets 
> a request splits it into 4k blocks and enqueues read requests into the 
> ipc queue and if the queue is empty it emits a notify ipc so the disker 
> starts popping from the queue.

Yes, at some level of abstraction, the above summary is not wrong. 
However, please keep in mind that, for a single HTTP transaction, most 
of the disk read requests are queued by worker, read by disk, and 
received by worker one read request at a time. There is no disk read 
"prefetching" (yet?).


> On large requests that are answered immediately from the disker the 
> problem seems to be that the queue is mostly empty and it sends an ipc 
> ping pong for each 4k block.

Due to lack of prefetching, the total size of the HTTP response does not 
really affect the queue length. Only the transaction concurrency level 
does; on average, that is determined by mean response time multiplied by 
the I/O request rate from a particular worker to a particular disker.


> So my though was when the request is larger than 4k enqueue multiple 
> pending reads in the worker and only notify after a certain amount has 
> been added to the queue, vice versa in the disker.

> So I messed around a bit trying to reduce the notifications by delaying 
> the Notify call in src/DiskIO/IpcIo/IpcIoFile.cc for larger requests but 
> it ended up blocking after the first queue push with no notify. If I 
> understand the queue correctly this is due to the reader requires a 
> notify to initially start and and simply pushing multiple read requests 
> onto the queue without notifying will not work

You are correct.


> Is this approach feasible or am I misunderstanding how it works?

Prefetching is feasible in principle, but is not easy to implement well 
and will probably require configuration options (because it will slow 
down busy Squids that do not have the time to prefetch but may not know 
that).

I would consider increasing I/O size (and shared memory page size) 
instead, at least as the first step. Doing so well is not trivial 
either, but may be easier and beneficial to more use cases.


> I also tried to add reusing of the IPC connection between calls so the 
> major source of overhead,tearing down and reestablishing the connection, 
> is removed but that also turned out difficult as the connections are 
> closed in various places and the general complexity of the code.

Yes, that would be nice. Reusing sockets is especially difficult to get 
right with startup/bootstrapping, reconfigurations, and kid 
death/restarts problems in mind. On the other hand, it is probably much 
easier to optimize this than to implement disk hit "prefetching".

There may be some other, more effcient IPC notification mechanisms 
available on your OS that Squid can be enhanced to support. I have not 
surveyed what is available these days.


HTH,

Alex.



More information about the squid-users mailing list