[squid-users] External helper consumes too many DB connections

Tue Feb 8 16:38:25 UTC 2022

On 2/8/22 11:08, roee klinger wrote:

> I thought about the following approach:
> 
> 1. Have only one python helper, this helper fetches the data every 
> minute from the main DB.
> 2. This helper has concurrency set for it.
> 3. The helper then spawns child processes using multithreading, each 
> process responds to std/stdout and reads the data from the main process 
> which spawned it.
> 
> What do you think about taking this route?
> 
> It will require no extra DBs and no tweaks to Squid, but maybe I am 
> missing something

With this approach (let's call it C), you will have as many database 
clients as there are workers in your Squid instance, just like in option 
A. Option C is probably a lot easier to implement for a given helper 
than the generic option A. Option B gives you one database client per 
Squid instance.

It is not clear to me why C parallelizes reading/writing from/to 
stdin/stdout -- I doubt that task is the bottleneck in your environment. 
I would expect a single stdin reader thread and a single stdout writer 
thread instead.

This is not my area of expertise, but if you do go option C route, you 
may need to protect helper's stdin/stdout descriptors with a mutex so 
that threads can read/write from/to stdin/stdout without getting 
mangled/partial reads and mangled/overlapping writes.

Alex.

> On Tue, Feb 8, 2022 at 5:12 PM Alex Rousskov  wrote:
> 
>     On 2/8/22 09:50, roee klinger wrote:
> 
>      > Alex: If there are a lot more requests than your users/TTLs should
>      >       generate, then you may be able to decrease db load by
>     figuring out
>      >       where the extra requests are coming from.
> 
>      > actually, I don't think it matters much now that I think about it
>      > again, since as per my requirements, I need to reload the cache every
>      > 60 seconds, which means that even if it is perfect, MariaDB will
>      > still get a high load. I think the second approach will be better
>      > suited.
> 
>     Your call. Wiping out the entire authentication cache every 60 seconds
>     feels odd, but I do not know enough about your environment to judge.
> 
> 
>      > Alex: aggregating helper-db connections (helpers can be written to
>      >       talk through a central connection aggregator)
>      >
> 
>      > That sounds like exactly what I am looking for, how would one go
>     about
>      > doing this?
> 
>     You have at least two basic options:
> 
>     A. Enhance Squid to let SMP workers share helpers. I assume that you
>     have C SMP workers and N helpers per worker, with C and N significantly
>     greater than 1. Instead of having N helpers per worker and C*N helpers
>     total, you will have just one concurrent helper per worker and C
>     helpers
>     total. This will be a significant, generally useful improvement that
>     should be officially accepted if implemented well. This enhancement
>     requires serious Squid code modifications in a neglected error-prone
>     area, but it is certainly doable -- Squid already shares rock diskers
>     across workers, for example.
> 
>     B. Convert your helper from a database client program to an Aggregator
>     client program (and write the Aggregator). Depending on your needs and
>     skill, you can use TCP or Unix Domain Sockets (UDS) for
>     helper-Aggregator communication. The Aggregator may look very
>     similar to
>     the current helper, except it will not use stdin/stdout for
>     receiving/sending helper queries/responses. This option also requires
>     development, but it is much simpler than option A.
> 
> 
>     HTH,
> 
>     Alex.
> 
> 
>      > On Tue, Feb 8, 2022 at 4:41 PM Alex Rousskov wrote:
>      >
>      >     On 2/8/22 09:13, roee klinger wrote:
>      >
>      >      > I am running multiple instances of Squid in a K8S
>     environment, each
>      >      > Squid instance has a helper that authenticates users based
>     on their
>      >      > username and password, the scripts are written in Python.
>      >      >
>      >      > I have been facing an issue, that when under load, the
>     helpers (even
>      >      > with 3600 sec TTL) swamp the MariaDB instance, causing it to
>      >     reach 100%
>      >      > CPU, basically I believe because each helper opens up its own
>      >     connection
>      >      > to MariaDB, which ends up as a lot of connections.
>      >      >
>      >      > My initial idea was to create a Redis DB next to each Squid
>      >     instance and
>      >      > connect each Squid to its own dedicated Redis. I will sync
>     Redis
>      >     with
>      >      > MariaDB every minute, thus decreasing the connections
>     count from
>      >     a few
>      >      > 100s to just 1 every minute. This will also improve speeds
>     since
>      >     Redis
>      >      > is much faster than MariaDB.
>      >      >
>      >      > The problem is, however, that there will still be many
>      >     connections from
>      >      > Squid to Redis, and I probably that will consume a lot of DB
>      >     resources
>      >      > as well, which I don't actually know how to optimize,
>     since it seems
>      >      > that Squid opens many processes, and there is no way to
>     get them
>      >     to talk
>      >      > to each other (expect TTL values, which seems not to help
>     in my
>      >     case,
>      >      > which I also don't understand why that is).
>      >      >
>      >      > What is the best practice to handle this? considering I
>     have the
>      >      > following requirements:
>      >      >
>      >      >     1. Fast
>      >      >     2. Refresh data every minute
>      >      >     3. Consume as least amount of DB resources as possible
>      >
>      >     I would start from the beginning: Does the aggregate number
>     of database
>      >     requests match your expectations? In other words, do you see
>     lots of
>      >     database requests that should not be there given your user access
>      >     patterns and authentication TTLs? In yet other words, are
>     there many
>      >     repeated authentication accesses that should have been
>     authentication
>      >     cache hits?
>      >
>      >     If there are a lot more requests than your users/TTLs should
>     generate,
>      >     then you may be able to decrease db load by figuring out
>     where the
>      >     extra
>      >     requests are coming from. For example, it is possible that your
>      >     authentication cache key includes some noise that renders caching
>      >     ineffective (e.g., see comments about key_extras in
>      >     squid.conf.documented). Or maybe you need a bigger
>     authentication cache.
>      >
>      >     If the total stream of authentication requests during peak
>     hours is
>      >     reasonable, with few unwarranted cache misses, then you can start
>      >     working on aggregating helper-db connections (helpers can be
>     written to
>      >     talk through a central connection aggregator) and/or adding
>     database
>      >     power (e.g., by introducing additional databases running on
>     previously
>      >     unused hardware -- just like your MariaDB idea).
>      >
>      >
>      >     Cheers,
>      >
>      >     Alex.
>      >
>