[squid-users] External helper consumes too many DB connections

Tue Feb 8 16:08:33 UTC 2022

>
> You have at least two basic options:

A. Enhance Squid to let SMP workers share helpers. I assume that you
> have C SMP workers and N helpers per worker, with C and N significantly
> greater than 1. Instead of having N helpers per worker and C*N helpers
> total, you will have just one concurrent helper per worker and C helpers
> total. This will be a significant, generally useful improvement that
> should be officially accepted if implemented well. This enhancement
> requires serious Squid code modifications in a neglected error-prone
> area, but it is certainly doable -- Squid already shares rock diskers
> across workers, for example.

B. Convert your helper from a database client program to an Aggregator
> client program (and write the Aggregator). Depending on your needs and
> skill, you can use TCP or Unix Domain Sockets (UDS) for
> helper-Aggregator communication. The Aggregator may look very similar to
> the current helper, except it will not use stdin/stdout for
> receiving/sending helper queries/responses. This option also requires
> development, but it is much simpler than option A.

Thank you, Alex, I will keep these in mind.

I thought about the following approach:

1. Have only one python helper, this helper fetches the data every minute
from the main DB.
2. This helper has concurrency set for it.
3. The helper then spawns child processes using multithreading, each
process responds to std/stdout and reads the data from the main process
which spawned it.

What do you think about taking this route?

It will require no extra DBs and no tweaks to Squid, but maybe I am missing
something,

Best regards,
Roee

On Tue, Feb 8, 2022 at 5:12 PM Alex Rousskov <
rousskov at measurement-factory.com> wrote:

> On 2/8/22 09:50, roee klinger wrote:
>
> > Alex: If there are a lot more requests than your users/TTLs should
> >       generate, then you may be able to decrease db load by figuring out
> >       where the extra requests are coming from.
>
> > actually, I don't think it matters much now that I think about it
> > again, since as per my requirements, I need to reload the cache every
> > 60 seconds, which means that even if it is perfect, MariaDB will
> > still get a high load. I think the second approach will be better
> > suited.
>
> Your call. Wiping out the entire authentication cache every 60 seconds
> feels odd, but I do not know enough about your environment to judge.
>
>
> > Alex: aggregating helper-db connections (helpers can be written to
> >       talk through a central connection aggregator)
> >
>
> > That sounds like exactly what I am looking for, how would one go about
> > doing this?
>
> You have at least two basic options:
>
> A. Enhance Squid to let SMP workers share helpers. I assume that you
> have C SMP workers and N helpers per worker, with C and N significantly
> greater than 1. Instead of having N helpers per worker and C*N helpers
> total, you will have just one concurrent helper per worker and C helpers
> total. This will be a significant, generally useful improvement that
> should be officially accepted if implemented well. This enhancement
> requires serious Squid code modifications in a neglected error-prone
> area, but it is certainly doable -- Squid already shares rock diskers
> across workers, for example.
>
> B. Convert your helper from a database client program to an Aggregator
> client program (and write the Aggregator). Depending on your needs and
> skill, you can use TCP or Unix Domain Sockets (UDS) for
> helper-Aggregator communication. The Aggregator may look very similar to
> the current helper, except it will not use stdin/stdout for
> receiving/sending helper queries/responses. This option also requires
> development, but it is much simpler than option A.
>
>
> HTH,
>
> Alex.
>
>
> > On Tue, Feb 8, 2022 at 4:41 PM Alex Rousskov wrote:
> >
> >     On 2/8/22 09:13, roee klinger wrote:
> >
> >      > I am running multiple instances of Squid in a K8S environment,
> each
> >      > Squid instance has a helper that authenticates users based on
> their
> >      > username and password, the scripts are written in Python.
> >      >
> >      > I have been facing an issue, that when under load, the helpers
> (even
> >      > with 3600 sec TTL) swamp the MariaDB instance, causing it to
> >     reach 100%
> >      > CPU, basically I believe because each helper opens up its own
> >     connection
> >      > to MariaDB, which ends up as a lot of connections.
> >      >
> >      > My initial idea was to create a Redis DB next to each Squid
> >     instance and
> >      > connect each Squid to its own dedicated Redis. I will sync Redis
> >     with
> >      > MariaDB every minute, thus decreasing the connections count from
> >     a few
> >      > 100s to just 1 every minute. This will also improve speeds since
> >     Redis
> >      > is much faster than MariaDB.
> >      >
> >      > The problem is, however, that there will still be many
> >     connections from
> >      > Squid to Redis, and I probably that will consume a lot of DB
> >     resources
> >      > as well, which I don't actually know how to optimize, since it
> seems
> >      > that Squid opens many processes, and there is no way to get them
> >     to talk
> >      > to each other (expect TTL values, which seems not to help in my
> >     case,
> >      > which I also don't understand why that is).
> >      >
> >      > What is the best practice to handle this? considering I have the
> >      > following requirements:
> >      >
> >      >     1. Fast
> >      >     2. Refresh data every minute
> >      >     3. Consume as least amount of DB resources as possible
> >
> >     I would start from the beginning: Does the aggregate number of
> database
> >     requests match your expectations? In other words, do you see lots of
> >     database requests that should not be there given your user access
> >     patterns and authentication TTLs? In yet other words, are there many
> >     repeated authentication accesses that should have been authentication
> >     cache hits?
> >
> >     If there are a lot more requests than your users/TTLs should
> generate,
> >     then you may be able to decrease db load by figuring out where the
> >     extra
> >     requests are coming from. For example, it is possible that your
> >     authentication cache key includes some noise that renders caching
> >     ineffective (e.g., see comments about key_extras in
> >     squid.conf.documented). Or maybe you need a bigger authentication
> cache.
> >
> >     If the total stream of authentication requests during peak hours is
> >     reasonable, with few unwarranted cache misses, then you can start
> >     working on aggregating helper-db connections (helpers can be written
> to
> >     talk through a central connection aggregator) and/or adding database
> >     power (e.g., by introducing additional databases running on
> previously
> >     unused hardware -- just like your MariaDB idea).
> >
> >
> >     Cheers,
> >
> >     Alex.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20220208/ff590a25/attachment.htm>