[squid-users] External helper consumes too many DB connections
roee klinger
roeeklinger60 at gmail.com
Tue Feb 8 17:02:57 UTC 2022
Hey Alex,
Thank you for your reply, I am sorry, I think I explained myself wrong.
What I meant by option C, is to have basically 3 functions, 2 functions for
std/stdout, and one function
that will fetch the data from the DB every 60 seconds, and save it into a
global variable for the other functions
to use.
Then, when a new std request comes in, the std handler will simply read
from that variable, instead of from the DB.
I see the following benefits in this approach:
1. We will have only one DB connection every 60 seconds, per Squid worker
instance.
2. It will be very fast since the std handler will simply read from a local
variable.
you will have as many database
> clients as there are workers in your Squid instance
You are definitely right, but as this will be much faster I think I will be
able to decrease my number of workers significantly.
Also, we might be able to use concurrency=n here to decrease it further?
Would love to hear your thoughts on this,
Roee
On Tue, Feb 8, 2022 at 6:38 PM Alex Rousskov <
rousskov at measurement-factory.com> wrote:
> On 2/8/22 11:08, roee klinger wrote:
>
> > I thought about the following approach:
> >
> > 1. Have only one python helper, this helper fetches the data every
> > minute from the main DB.
> > 2. This helper has concurrency set for it.
> > 3. The helper then spawns child processes using multithreading, each
> > process responds to std/stdout and reads the data from the main process
> > which spawned it.
> >
> > What do you think about taking this route?
> >
> > It will require no extra DBs and no tweaks to Squid, but maybe I am
> > missing something
>
> With this approach (let's call it C), you will have as many database
> clients as there are workers in your Squid instance, just like in option
> A. Option C is probably a lot easier to implement for a given helper
> than the generic option A. Option B gives you one database client per
> Squid instance.
>
> It is not clear to me why C parallelizes reading/writing from/to
> stdin/stdout -- I doubt that task is the bottleneck in your environment.
> I would expect a single stdin reader thread and a single stdout writer
> thread instead.
>
> This is not my area of expertise, but if you do go option C route, you
> may need to protect helper's stdin/stdout descriptors with a mutex so
> that threads can read/write from/to stdin/stdout without getting
> mangled/partial reads and mangled/overlapping writes.
>
> Alex.
>
>
> > On Tue, Feb 8, 2022 at 5:12 PM Alex Rousskov wrote:
> >
> > On 2/8/22 09:50, roee klinger wrote:
> >
> > > Alex: If there are a lot more requests than your users/TTLs should
> > > generate, then you may be able to decrease db load by
> > figuring out
> > > where the extra requests are coming from.
> >
> > > actually, I don't think it matters much now that I think about it
> > > again, since as per my requirements, I need to reload the cache
> every
> > > 60 seconds, which means that even if it is perfect, MariaDB will
> > > still get a high load. I think the second approach will be better
> > > suited.
> >
> > Your call. Wiping out the entire authentication cache every 60
> seconds
> > feels odd, but I do not know enough about your environment to judge.
> >
> >
> > > Alex: aggregating helper-db connections (helpers can be written to
> > > talk through a central connection aggregator)
> > >
> >
> > > That sounds like exactly what I am looking for, how would one go
> > about
> > > doing this?
> >
> > You have at least two basic options:
> >
> > A. Enhance Squid to let SMP workers share helpers. I assume that you
> > have C SMP workers and N helpers per worker, with C and N
> significantly
> > greater than 1. Instead of having N helpers per worker and C*N
> helpers
> > total, you will have just one concurrent helper per worker and C
> > helpers
> > total. This will be a significant, generally useful improvement that
> > should be officially accepted if implemented well. This enhancement
> > requires serious Squid code modifications in a neglected error-prone
> > area, but it is certainly doable -- Squid already shares rock diskers
> > across workers, for example.
> >
> > B. Convert your helper from a database client program to an
> Aggregator
> > client program (and write the Aggregator). Depending on your needs
> and
> > skill, you can use TCP or Unix Domain Sockets (UDS) for
> > helper-Aggregator communication. The Aggregator may look very
> > similar to
> > the current helper, except it will not use stdin/stdout for
> > receiving/sending helper queries/responses. This option also requires
> > development, but it is much simpler than option A.
> >
> >
> > HTH,
> >
> > Alex.
> >
> >
> > > On Tue, Feb 8, 2022 at 4:41 PM Alex Rousskov wrote:
> > >
> > > On 2/8/22 09:13, roee klinger wrote:
> > >
> > > > I am running multiple instances of Squid in a K8S
> > environment, each
> > > > Squid instance has a helper that authenticates users based
> > on their
> > > > username and password, the scripts are written in Python.
> > > >
> > > > I have been facing an issue, that when under load, the
> > helpers (even
> > > > with 3600 sec TTL) swamp the MariaDB instance, causing it
> to
> > > reach 100%
> > > > CPU, basically I believe because each helper opens up its
> own
> > > connection
> > > > to MariaDB, which ends up as a lot of connections.
> > > >
> > > > My initial idea was to create a Redis DB next to each Squid
> > > instance and
> > > > connect each Squid to its own dedicated Redis. I will sync
> > Redis
> > > with
> > > > MariaDB every minute, thus decreasing the connections
> > count from
> > > a few
> > > > 100s to just 1 every minute. This will also improve speeds
> > since
> > > Redis
> > > > is much faster than MariaDB.
> > > >
> > > > The problem is, however, that there will still be many
> > > connections from
> > > > Squid to Redis, and I probably that will consume a lot of
> DB
> > > resources
> > > > as well, which I don't actually know how to optimize,
> > since it seems
> > > > that Squid opens many processes, and there is no way to
> > get them
> > > to talk
> > > > to each other (expect TTL values, which seems not to help
> > in my
> > > case,
> > > > which I also don't understand why that is).
> > > >
> > > > What is the best practice to handle this? considering I
> > have the
> > > > following requirements:
> > > >
> > > > 1. Fast
> > > > 2. Refresh data every minute
> > > > 3. Consume as least amount of DB resources as possible
> > >
> > > I would start from the beginning: Does the aggregate number
> > of database
> > > requests match your expectations? In other words, do you see
> > lots of
> > > database requests that should not be there given your user
> access
> > > patterns and authentication TTLs? In yet other words, are
> > there many
> > > repeated authentication accesses that should have been
> > authentication
> > > cache hits?
> > >
> > > If there are a lot more requests than your users/TTLs should
> > generate,
> > > then you may be able to decrease db load by figuring out
> > where the
> > > extra
> > > requests are coming from. For example, it is possible that
> your
> > > authentication cache key includes some noise that renders
> caching
> > > ineffective (e.g., see comments about key_extras in
> > > squid.conf.documented). Or maybe you need a bigger
> > authentication cache.
> > >
> > > If the total stream of authentication requests during peak
> > hours is
> > > reasonable, with few unwarranted cache misses, then you can
> start
> > > working on aggregating helper-db connections (helpers can be
> > written to
> > > talk through a central connection aggregator) and/or adding
> > database
> > > power (e.g., by introducing additional databases running on
> > previously
> > > unused hardware -- just like your MariaDB idea).
> > >
> > >
> > > Cheers,
> > >
> > > Alex.
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20220208/1bdfd8a1/attachment.htm>
More information about the squid-users
mailing list