[squid-users] External helper consumes too many DB connections

Tue Feb 8 17:02:57 UTC 2022

Hey Alex,

Thank you for your reply, I am sorry, I think I explained myself wrong.

What I meant by option C, is to have basically 3 functions, 2 functions for
std/stdout, and one function
that will fetch the data from the DB every 60 seconds, and save it into a
global variable for the other functions
to use.

Then, when a new std request comes in, the std handler will simply read
from that variable, instead of from the DB.

I see the following benefits in this approach:

1. We will have only one DB connection every 60 seconds, per Squid worker
instance.
2. It will be very fast since the std handler will simply read from a local
variable.

you will have as many database
> clients as there are workers in your Squid instance

You are definitely right, but as this will be much faster I think I will be
able to decrease my number of workers significantly.
Also, we might be able to use concurrency=n here to decrease it further?

Would love to hear your thoughts on this,
Roee

On Tue, Feb 8, 2022 at 6:38 PM Alex Rousskov <
rousskov at measurement-factory.com> wrote:

> On 2/8/22 11:08, roee klinger wrote:
>
> > I thought about the following approach:
> >
> > 1. Have only one python helper, this helper fetches the data every
> > minute from the main DB.
> > 2. This helper has concurrency set for it.
> > 3. The helper then spawns child processes using multithreading, each
> > process responds to std/stdout and reads the data from the main process
> > which spawned it.
> >
> > What do you think about taking this route?
> >
> > It will require no extra DBs and no tweaks to Squid, but maybe I am
> > missing something
>
> With this approach (let's call it C), you will have as many database
> clients as there are workers in your Squid instance, just like in option
> A. Option C is probably a lot easier to implement for a given helper
> than the generic option A. Option B gives you one database client per
> Squid instance.
>
> It is not clear to me why C parallelizes reading/writing from/to
> stdin/stdout -- I doubt that task is the bottleneck in your environment.
> I would expect a single stdin reader thread and a single stdout writer
> thread instead.
>
> This is not my area of expertise, but if you do go option C route, you
> may need to protect helper's stdin/stdout descriptors with a mutex so
> that threads can read/write from/to stdin/stdout without getting
> mangled/partial reads and mangled/overlapping writes.
>
> Alex.
>
>
> > On Tue, Feb 8, 2022 at 5:12 PM Alex Rousskov  wrote:
> >
> >     On 2/8/22 09:50, roee klinger wrote:
> >
> >      > Alex: If there are a lot more requests than your users/TTLs should
> >      >       generate, then you may be able to decrease db load by
> >     figuring out
> >      >       where the extra requests are coming from.
> >
> >      > actually, I don't think it matters much now that I think about it
> >      > again, since as per my requirements, I need to reload the cache
> every
> >      > 60 seconds, which means that even if it is perfect, MariaDB will
> >      > still get a high load. I think the second approach will be better
> >      > suited.
> >
> >     Your call. Wiping out the entire authentication cache every 60
> seconds
> >     feels odd, but I do not know enough about your environment to judge.
> >
> >
> >      > Alex: aggregating helper-db connections (helpers can be written to
> >      >       talk through a central connection aggregator)
> >      >
> >
> >      > That sounds like exactly what I am looking for, how would one go
> >     about
> >      > doing this?
> >
> >     You have at least two basic options:
> >
> >     A. Enhance Squid to let SMP workers share helpers. I assume that you
> >     have C SMP workers and N helpers per worker, with C and N
> significantly
> >     greater than 1. Instead of having N helpers per worker and C*N
> helpers
> >     total, you will have just one concurrent helper per worker and C
> >     helpers
> >     total. This will be a significant, generally useful improvement that
> >     should be officially accepted if implemented well. This enhancement
> >     requires serious Squid code modifications in a neglected error-prone
> >     area, but it is certainly doable -- Squid already shares rock diskers
> >     across workers, for example.
> >
> >     B. Convert your helper from a database client program to an
> Aggregator
> >     client program (and write the Aggregator). Depending on your needs
> and
> >     skill, you can use TCP or Unix Domain Sockets (UDS) for
> >     helper-Aggregator communication. The Aggregator may look very
> >     similar to
> >     the current helper, except it will not use stdin/stdout for
> >     receiving/sending helper queries/responses. This option also requires
> >     development, but it is much simpler than option A.
> >
> >
> >     HTH,
> >
> >     Alex.
> >
> >
> >      > On Tue, Feb 8, 2022 at 4:41 PM Alex Rousskov wrote:
> >      >
> >      >     On 2/8/22 09:13, roee klinger wrote:
> >      >
> >      >      > I am running multiple instances of Squid in a K8S
> >     environment, each
> >      >      > Squid instance has a helper that authenticates users based
> >     on their
> >      >      > username and password, the scripts are written in Python.
> >      >      >
> >      >      > I have been facing an issue, that when under load, the
> >     helpers (even
> >      >      > with 3600 sec TTL) swamp the MariaDB instance, causing it
> to
> >      >     reach 100%
> >      >      > CPU, basically I believe because each helper opens up its
> own
> >      >     connection
> >      >      > to MariaDB, which ends up as a lot of connections.
> >      >      >
> >      >      > My initial idea was to create a Redis DB next to each Squid
> >      >     instance and
> >      >      > connect each Squid to its own dedicated Redis. I will sync
> >     Redis
> >      >     with
> >      >      > MariaDB every minute, thus decreasing the connections
> >     count from
> >      >     a few
> >      >      > 100s to just 1 every minute. This will also improve speeds
> >     since
> >      >     Redis
> >      >      > is much faster than MariaDB.
> >      >      >
> >      >      > The problem is, however, that there will still be many
> >      >     connections from
> >      >      > Squid to Redis, and I probably that will consume a lot of
> DB
> >      >     resources
> >      >      > as well, which I don't actually know how to optimize,
> >     since it seems
> >      >      > that Squid opens many processes, and there is no way to
> >     get them
> >      >     to talk
> >      >      > to each other (expect TTL values, which seems not to help
> >     in my
> >      >     case,
> >      >      > which I also don't understand why that is).
> >      >      >
> >      >      > What is the best practice to handle this? considering I
> >     have the
> >      >      > following requirements:
> >      >      >
> >      >      >     1. Fast
> >      >      >     2. Refresh data every minute
> >      >      >     3. Consume as least amount of DB resources as possible
> >      >
> >      >     I would start from the beginning: Does the aggregate number
> >     of database
> >      >     requests match your expectations? In other words, do you see
> >     lots of
> >      >     database requests that should not be there given your user
> access
> >      >     patterns and authentication TTLs? In yet other words, are
> >     there many
> >      >     repeated authentication accesses that should have been
> >     authentication
> >      >     cache hits?
> >      >
> >      >     If there are a lot more requests than your users/TTLs should
> >     generate,
> >      >     then you may be able to decrease db load by figuring out
> >     where the
> >      >     extra
> >      >     requests are coming from. For example, it is possible that
> your
> >      >     authentication cache key includes some noise that renders
> caching
> >      >     ineffective (e.g., see comments about key_extras in
> >      >     squid.conf.documented). Or maybe you need a bigger
> >     authentication cache.
> >      >
> >      >     If the total stream of authentication requests during peak
> >     hours is
> >      >     reasonable, with few unwarranted cache misses, then you can
> start
> >      >     working on aggregating helper-db connections (helpers can be
> >     written to
> >      >     talk through a central connection aggregator) and/or adding
> >     database
> >      >     power (e.g., by introducing additional databases running on
> >     previously
> >      >     unused hardware -- just like your MariaDB idea).
> >      >
> >      >
> >      >     Cheers,
> >      >
> >      >     Alex.
> >      >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20220208/1bdfd8a1/attachment.htm>