[squid-users] External helper consumes too many DB connections

Tue Feb 8 17:42:41 UTC 2022

On 2/8/22 12:02, roee klinger wrote:

> What I meant by option C, is to have basically 3 functions, 2 functions 
> for std/stdout, and one function
> that will fetch the data from the DB every 60 seconds, and save it into 
> a global variable for the other functions to use.

Ah, thank you for that clarification. It addresses most of my earlier 
concerns. Ideally, you want four functions AFAICT:

1. stdin reader
2. stdout writer
3. answer generator
4. db updater

I agree that you can join 1+3 (or 2+3) together, but you are sacrificing 
some parallelism if you join. Whether that sacrifice is important or 
not, depends on various local factors.

> Then, when a new std request comes in, the std handler will simply read 
> from that variable, instead of from the DB.

Yes, and pass the answer to the stdout writing thread/function.

> I see the following benefits in this approach:
> 
>     1. We will have only one DB connection every 60 seconds, per Squid
>     worker instance.

Yes, one per Squid worker. Hopefully, 60 seconds (divided by the number 
of workers if the database cannot parallelize these "give me everything" 
queries) will be enough to receive (the relevant portion of) the database.

>     2. It will be very fast since the std handler will simply read from
>     a local variable.

Yes, assuming the query is simple and/or the database is small.

>     you will have as many database
>     clients as there are workers in your Squid instance

> You are definitely right, but as this will be much faster I think I will 
> be able to decrease my number of workers significantly.

Whether you can decrease the number of Squid workers depends on where 
the bottlenecks are. If your Squid workers are mostly idle now, then 
yes, you will be able to decrease their number (but you can do that even 
without helper rewrites then AFAICT).

> Also, we might be able to use concurrency=n here to decrease it further?

Yes, probably. Even with just one helper thread/function answering 
helper queries, giving Squid the ability to submit the next query 
without waiting for the answer to the previous one will parallelize I/O 
across the helper/Squid boundary, which is a good thing.

Cheers,

Alex.

> On Tue, Feb 8, 2022 at 6:38 PM Alex Rousskov wrote:
> 
>     On 2/8/22 11:08, roee klinger wrote:
> 
>      > I thought about the following approach:
>      >
>      > 1. Have only one python helper, this helper fetches the data every
>      > minute from the main DB.
>      > 2. This helper has concurrency set for it.
>      > 3. The helper then spawns child processes using multithreading, each
>      > process responds to std/stdout and reads the data from the main
>     process
>      > which spawned it.
>      >
>      > What do you think about taking this route?
>      >
>      > It will require no extra DBs and no tweaks to Squid, but maybe I am
>      > missing something
> 
>     With this approach (let's call it C), you will have as many database
>     clients as there are workers in your Squid instance, just like in
>     option
>     A. Option C is probably a lot easier to implement for a given helper
>     than the generic option A. Option B gives you one database client per
>     Squid instance.
> 
>     It is not clear to me why C parallelizes reading/writing from/to
>     stdin/stdout -- I doubt that task is the bottleneck in your
>     environment.
>     I would expect a single stdin reader thread and a single stdout writer
>     thread instead.
> 
>     This is not my area of expertise, but if you do go option C route, you
>     may need to protect helper's stdin/stdout descriptors with a mutex so
>     that threads can read/write from/to stdin/stdout without getting
>     mangled/partial reads and mangled/overlapping writes.
> 
>     Alex.
> 
> 
>      > On Tue, Feb 8, 2022 at 5:12 PM Alex Rousskov  wrote:
>      >
>      >     On 2/8/22 09:50, roee klinger wrote:
>      >
>      >      > Alex: If there are a lot more requests than your
>     users/TTLs should
>      >      >       generate, then you may be able to decrease db load by
>      >     figuring out
>      >      >       where the extra requests are coming from.
>      >
>      >      > actually, I don't think it matters much now that I think
>     about it
>      >      > again, since as per my requirements, I need to reload the
>     cache every
>      >      > 60 seconds, which means that even if it is perfect,
>     MariaDB will
>      >      > still get a high load. I think the second approach will be
>     better
>      >      > suited.
>      >
>      >     Your call. Wiping out the entire authentication cache every
>     60 seconds
>      >     feels odd, but I do not know enough about your environment to
>     judge.
>      >
>      >
>      >      > Alex: aggregating helper-db connections (helpers can be
>     written to
>      >      >       talk through a central connection aggregator)
>      >      >
>      >
>      >      > That sounds like exactly what I am looking for, how would
>     one go
>      >     about
>      >      > doing this?
>      >
>      >     You have at least two basic options:
>      >
>      >     A. Enhance Squid to let SMP workers share helpers. I assume
>     that you
>      >     have C SMP workers and N helpers per worker, with C and N
>     significantly
>      >     greater than 1. Instead of having N helpers per worker and
>     C*N helpers
>      >     total, you will have just one concurrent helper per worker and C
>      >     helpers
>      >     total. This will be a significant, generally useful
>     improvement that
>      >     should be officially accepted if implemented well. This
>     enhancement
>      >     requires serious Squid code modifications in a neglected
>     error-prone
>      >     area, but it is certainly doable -- Squid already shares rock
>     diskers
>      >     across workers, for example.
>      >
>      >     B. Convert your helper from a database client program to an
>     Aggregator
>      >     client program (and write the Aggregator). Depending on your
>     needs and
>      >     skill, you can use TCP or Unix Domain Sockets (UDS) for
>      >     helper-Aggregator communication. The Aggregator may look very
>      >     similar to
>      >     the current helper, except it will not use stdin/stdout for
>      >     receiving/sending helper queries/responses. This option also
>     requires
>      >     development, but it is much simpler than option A.
>      >
>      >
>      >     HTH,
>      >
>      >     Alex.
>      >
>      >
>      >      > On Tue, Feb 8, 2022 at 4:41 PM Alex Rousskov wrote:
>      >      >
>      >      >     On 2/8/22 09:13, roee klinger wrote:
>      >      >
>      >      >      > I am running multiple instances of Squid in a K8S
>      >     environment, each
>      >      >      > Squid instance has a helper that authenticates
>     users based
>      >     on their
>      >      >      > username and password, the scripts are written in
>     Python.
>      >      >      >
>      >      >      > I have been facing an issue, that when under load, the
>      >     helpers (even
>      >      >      > with 3600 sec TTL) swamp the MariaDB instance,
>     causing it to
>      >      >     reach 100%
>      >      >      > CPU, basically I believe because each helper opens
>     up its own
>      >      >     connection
>      >      >      > to MariaDB, which ends up as a lot of connections.
>      >      >      >
>      >      >      > My initial idea was to create a Redis DB next to
>     each Squid
>      >      >     instance and
>      >      >      > connect each Squid to its own dedicated Redis. I
>     will sync
>      >     Redis
>      >      >     with
>      >      >      > MariaDB every minute, thus decreasing the connections
>      >     count from
>      >      >     a few
>      >      >      > 100s to just 1 every minute. This will also improve
>     speeds
>      >     since
>      >      >     Redis
>      >      >      > is much faster than MariaDB.
>      >      >      >
>      >      >      > The problem is, however, that there will still be many
>      >      >     connections from
>      >      >      > Squid to Redis, and I probably that will consume a
>     lot of DB
>      >      >     resources
>      >      >      > as well, which I don't actually know how to optimize,
>      >     since it seems
>      >      >      > that Squid opens many processes, and there is no way to
>      >     get them
>      >      >     to talk
>      >      >      > to each other (expect TTL values, which seems not
>     to help
>      >     in my
>      >      >     case,
>      >      >      > which I also don't understand why that is).
>      >      >      >
>      >      >      > What is the best practice to handle this? considering I
>      >     have the
>      >      >      > following requirements:
>      >      >      >
>      >      >      >     1. Fast
>      >      >      >     2. Refresh data every minute
>      >      >      >     3. Consume as least amount of DB resources as
>     possible
>      >      >
>      >      >     I would start from the beginning: Does the aggregate
>     number
>      >     of database
>      >      >     requests match your expectations? In other words, do
>     you see
>      >     lots of
>      >      >     database requests that should not be there given your
>     user access
>      >      >     patterns and authentication TTLs? In yet other words, are
>      >     there many
>      >      >     repeated authentication accesses that should have been
>      >     authentication
>      >      >     cache hits?
>      >      >
>      >      >     If there are a lot more requests than your users/TTLs
>     should
>      >     generate,
>      >      >     then you may be able to decrease db load by figuring out
>      >     where the
>      >      >     extra
>      >      >     requests are coming from. For example, it is possible
>     that your
>      >      >     authentication cache key includes some noise that
>     renders caching
>      >      >     ineffective (e.g., see comments about key_extras in
>      >      >     squid.conf.documented). Or maybe you need a bigger
>      >     authentication cache.
>      >      >
>      >      >     If the total stream of authentication requests during peak
>      >     hours is
>      >      >     reasonable, with few unwarranted cache misses, then
>     you can start
>      >      >     working on aggregating helper-db connections (helpers
>     can be
>      >     written to
>      >      >     talk through a central connection aggregator) and/or
>     adding
>      >     database
>      >      >     power (e.g., by introducing additional databases
>     running on
>      >     previously
>      >      >     unused hardware -- just like your MariaDB idea).
>      >      >
>      >      >
>      >      >     Cheers,
>      >      >
>      >      >     Alex.
>      >      >
>      >
>