<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You have at least two basic options:</blockquote><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">A. Enhance Squid to let SMP workers share helpers. I assume that you<br>have C SMP workers and N helpers per worker, with C and N significantly<br>greater than 1. Instead of having N helpers per worker and C*N helpers<br>total, you will have just one concurrent helper per worker and C helpers<br>total. This will be a significant, generally useful improvement that<br>should be officially accepted if implemented well. This enhancement<br>requires serious Squid code modifications in a neglected error-prone<br>area, but it is certainly doable -- Squid already shares rock diskers<br>across workers, for example.</blockquote><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">B. Convert your helper from a database client program to an Aggregator<br>client program (and write the Aggregator). Depending on your needs and<br>skill, you can use TCP or Unix Domain Sockets (UDS) for<br>helper-Aggregator communication. The Aggregator may look very similar to<br>the current helper, except it will not use stdin/stdout for<br>receiving/sending helper queries/responses. This option also requires<br>development, but it is much simpler than option A.</blockquote><div><br></div><div>Thank you, Alex, I will keep these in mind.</div><div><br></div><div>I thought about the following approach:</div><div><br></div><div>1. Have only one python helper, this helper fetches the data every minute from the main DB.</div><div>2. This helper has concurrency set for it.</div><div>3. The helper then spawns child processes using multithreading, each process responds to std/stdout and reads the data from the main process which spawned it.</div><div><br></div><div>What do you think about taking this route?</div><div><br></div><div>It will require no extra DBs and no tweaks to Squid, but maybe I am missing something,</div><div><br></div><div>Best regards,</div><div>Roee</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 8, 2022 at 5:12 PM Alex Rousskov <<a href="mailto:rousskov@measurement-factory.com">rousskov@measurement-factory.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2/8/22 09:50, roee klinger wrote:<br>
<br>
> Alex: If there are a lot more requests than your users/TTLs should<br>
> generate, then you may be able to decrease db load by figuring out<br>
> where the extra requests are coming from.<br>
<br>
> actually, I don't think it matters much now that I think about it<br>
> again, since as per my requirements, I need to reload the cache every<br>
> 60 seconds, which means that even if it is perfect, MariaDB will<br>
> still get a high load. I think the second approach will be better<br>
> suited.<br>
<br>
Your call. Wiping out the entire authentication cache every 60 seconds <br>
feels odd, but I do not know enough about your environment to judge.<br>
<br>
<br>
> Alex: aggregating helper-db connections (helpers can be written to<br>
> talk through a central connection aggregator)<br>
> <br>
<br>
> That sounds like exactly what I am looking for, how would one go about <br>
> doing this?<br>
<br>
You have at least two basic options:<br>
<br>
A. Enhance Squid to let SMP workers share helpers. I assume that you <br>
have C SMP workers and N helpers per worker, with C and N significantly <br>
greater than 1. Instead of having N helpers per worker and C*N helpers <br>
total, you will have just one concurrent helper per worker and C helpers <br>
total. This will be a significant, generally useful improvement that <br>
should be officially accepted if implemented well. This enhancement <br>
requires serious Squid code modifications in a neglected error-prone <br>
area, but it is certainly doable -- Squid already shares rock diskers <br>
across workers, for example.<br>
<br>
B. Convert your helper from a database client program to an Aggregator <br>
client program (and write the Aggregator). Depending on your needs and <br>
skill, you can use TCP or Unix Domain Sockets (UDS) for <br>
helper-Aggregator communication. The Aggregator may look very similar to <br>
the current helper, except it will not use stdin/stdout for <br>
receiving/sending helper queries/responses. This option also requires <br>
development, but it is much simpler than option A.<br>
<br>
<br>
HTH,<br>
<br>
Alex.<br>
<br>
<br>
> On Tue, Feb 8, 2022 at 4:41 PM Alex Rousskov wrote:<br>
> <br>
> On 2/8/22 09:13, roee klinger wrote:<br>
> <br>
> > I am running multiple instances of Squid in a K8S environment, each<br>
> > Squid instance has a helper that authenticates users based on their<br>
> > username and password, the scripts are written in Python.<br>
> ><br>
> > I have been facing an issue, that when under load, the helpers (even<br>
> > with 3600 sec TTL) swamp the MariaDB instance, causing it to<br>
> reach 100%<br>
> > CPU, basically I believe because each helper opens up its own<br>
> connection<br>
> > to MariaDB, which ends up as a lot of connections.<br>
> ><br>
> > My initial idea was to create a Redis DB next to each Squid<br>
> instance and<br>
> > connect each Squid to its own dedicated Redis. I will sync Redis<br>
> with<br>
> > MariaDB every minute, thus decreasing the connections count from<br>
> a few<br>
> > 100s to just 1 every minute. This will also improve speeds since<br>
> Redis<br>
> > is much faster than MariaDB.<br>
> ><br>
> > The problem is, however, that there will still be many<br>
> connections from<br>
> > Squid to Redis, and I probably that will consume a lot of DB<br>
> resources<br>
> > as well, which I don't actually know how to optimize, since it seems<br>
> > that Squid opens many processes, and there is no way to get them<br>
> to talk<br>
> > to each other (expect TTL values, which seems not to help in my<br>
> case,<br>
> > which I also don't understand why that is).<br>
> ><br>
> > What is the best practice to handle this? considering I have the<br>
> > following requirements:<br>
> ><br>
> > 1. Fast<br>
> > 2. Refresh data every minute<br>
> > 3. Consume as least amount of DB resources as possible<br>
> <br>
> I would start from the beginning: Does the aggregate number of database<br>
> requests match your expectations? In other words, do you see lots of<br>
> database requests that should not be there given your user access<br>
> patterns and authentication TTLs? In yet other words, are there many<br>
> repeated authentication accesses that should have been authentication<br>
> cache hits?<br>
> <br>
> If there are a lot more requests than your users/TTLs should generate,<br>
> then you may be able to decrease db load by figuring out where the<br>
> extra<br>
> requests are coming from. For example, it is possible that your<br>
> authentication cache key includes some noise that renders caching<br>
> ineffective (e.g., see comments about key_extras in<br>
> squid.conf.documented). Or maybe you need a bigger authentication cache.<br>
> <br>
> If the total stream of authentication requests during peak hours is<br>
> reasonable, with few unwarranted cache misses, then you can start<br>
> working on aggregating helper-db connections (helpers can be written to<br>
> talk through a central connection aggregator) and/or adding database<br>
> power (e.g., by introducing additional databases running on previously<br>
> unused hardware -- just like your MariaDB idea).<br>
> <br>
> <br>
> Cheers,<br>
> <br>
> Alex.<br>
> <br>
<br>
</blockquote></div>