[squid-users] Proxy server to support a large number of simultaneous requests

Mon Jun 12 08:08:04 UTC 2023

Hey Ankor,

Thanks for sharing the scenario.
At the beginning I was thinking to myself: Why Squid? Is it the best choice for the scenario?
And after walking through my list of caching proxies, including couple I wrote myself I got to the conclusion:
Well.. Squid-Cache is simple to use and just works.
Compared to other caching mechanisms squid is so simple to configure and it really leaves dust
behind to all many other cache mechanisms.

Thanks,
Eliezer

From: squid-users <squid-users-bounces at lists.squid-cache.org> On Behalf Of Andrey K
Sent: Tuesday, June 6, 2023 16:08
To: Alex Rousskov <rousskov at measurement-factory.com>
Cc: squid-users at lists.squid-cache.org
Subject: Re: [squid-users] Proxy server to support a large number of simultaneous requests

Hello, Alex,

I have shortened the correspondence because it does not meet the size requirements for the mailing list.

Thank you so much for your time, the analysis and recommendations.

I disabled the cache_dir and now squid works as expected -  there is only one request to the original content server:
- on the small file:
      1 NONE_NONE/503/- HIER_NONE/-
      4 TCP_CF_HIT/200/- HIER_NONE/-
    128 TCP_HIT/200/- HIER_NONE/-
    366 TCP_MEM_HIT/200/- HIER_NONE/-
      1 TCP_MISS/200/200 FIRSTUP_PARENT/parent_proxy

- on the large file:
     17 TCP_CF_HIT/200/- HIER_NONE/-
    482 TCP_HIT/200/- HIER_NONE/-
      1 TCP_MISS/200/200 FIRSTUP_PARENT/parent_proxy

I think this configuration is perfect for caching online video broadcasts. Chanks of video are required by clients simultaneously only for a short period of time, so there is no need to save them to disk..
As my VM has 32 GB of RAM, I can configure a sufficient amount of cache_mem, say 20000 MB to provide caching of video broadcasts.

Kind regards,
     Ankor.

пн, 5 июн. 2023 г. в 17:31, Alex Rousskov <mailto:rousskov at measurement-factory.com>:
On 6/2/23 03:29, Andrey K wrote:

>  > Can you repeat this test and share a pointer to the corresponding
>  > compressed cache.log, containing those 500 (or fewer, as long as the
>  > problem is reproduced!) concurrent transactions. One or many of those
>  > concurrent transactions resulted in the unwanted entry deletion. The log
>  > may show what happened in that case.

> I cleared the rock cache, set the debug level, restarted squid, cleared 
> the cache.log, ran 500-threads test, waited for it to finish and 
> launched curl to make sure it returned TCP_MISS.
> Then stopped squid to limit the cache.log file.

Thank you for sharing that log! I only had time to study a few misses. 
They all stemmed from the same sequence of events:

1. A collapsed request finds the corresponding entry in the cache.
2. Squid decides that this request should open the disk file.
3. The rock disk entry is still being written (i.e. "swapped out"),
    so the attempt to swap it in fails (TCP_SWAPFAIL_MISS).
4. The request goes to the origin server.
5. The fresh response deletes the existing cached entry.
6. When a subsequent request finds the cached entry marked for
    deletion, it declares a cache miss (TCP_MISS) and goes to step 4.

Disclaimer: The above sequence of events causes misses, but it may not 
be the only or even the primary cause. I do not have enough free time to 
rule out or confirm other causes (and order them by severity).

Squid can (and should) handle concurrent swapout/swapins better, and we 
may be able to estimate that improvement potential for your workload 
without significant development, but, for the next step, I suggest 
disabling cache_dir and testing whether your get substantially better 
results with memory cache alone. Shared memory cache also has periods 
where a being-written entry cannot be read, but, compared to the disk 
cache, those periods are much shorter IIRC. I would like to confirm that 
this simplified mode of operation works well for your workload before I 
suggest code changes that would rely, in part, on this mode.

Thank you,

Alex.