[squid-users] external helper development
David Touzeau
david at articatech.com
Tue Feb 8 00:36:44 UTC 2022
You are the best,
We will launch a benchmark to see the diff
Le 07/02/2022 à 16:14, Eliezer Croitoru a écrit :
>
> Hey David,
>
> Since the handle_stdout runs in it’s own thread it’s sole purpose is
> to send results to stdout.
>
> If I will run the next code in a simple software without the 0.5 sleep
> time:
>
> while RUNNING:
>
> if quit > 0:
>
> return
>
> while len(queue) > 0:
>
> item = queue.pop(0)
>
> sys.stdout.write(item)
>
> sys.stdout.flush()
>
> time.sleep(0.5)
>
> what will happen is that the software will run with 100% CPU looping
> over and over on the size of the queue
> while sometimes it will spit some data to stdout.
>
> Adding a small delay with 0.5 secs will allow some “idle” time for the
> cpu in the loop preventing it from consuming
> all the CPU time.
>
> It’s a very old technique and there are others which are more
> efficient but it’s enough to demonstrate that a simple
> threaded helper is much better then any PHP code that was not meant to
> be running as a STDIN/OUT daemon/helper software.
>
> All The Bests,
>
> Eliezer
>
> ----
>
> Eliezer Croitoru
>
> NgTech, Tech Support
>
> Mobile: +972-5-28704261
>
> Email: ngtech1ltd at gmail.com
>
> *From:*David Touzeau <david at articatech.com>
> *Sent:* Monday, February 7, 2022 02:42
> *To:* Eliezer Croitoru <ngtech1ltd at gmail.com>;
> squid-users at lists.squid-cache.org
> *Subject:* Re: [squid-users] external helper development
>
> Sorry Elizer
>
> It was a mistake... No, your code is clean..
> Impressive for the first shot
> Many thanks for your example, we will run our stress tool to see the
> difference...
>
> Just a question
>
> Why did you send 500 milliseconds of sleep in the handle_stdoud ? Is
> it for let squid closing the pipe ?
>
>
> Le 06/02/2022 à 11:46, Eliezer Croitoru a écrit :
>
> Hey David,
>
> Not a fully completed helper but it seems to works pretty nice and
> might be better then what exist already:
>
> https://gist.githubusercontent.com/elico/03938e3a796c53f7c925872bade78195/raw/21ff1bbc0cf3d91719db27d9d027652e8bd3de4e/threaded-helper-example.py
>
> #!/usr/bin/env python
>
> import sys
>
> import time
>
> import urllib.request
>
> import signal
>
> import threading
>
> #set debug mode for True or False
>
> debug = False
>
> #debug = True
>
> queue = []
>
> threads = []
>
> RUNNING = True
>
> quit = 0
>
> rand_api_url = "https://cloud1.ngtech.co.il/api/test.php"
> <https://cloud1.ngtech.co.il/api/test.php>
>
> def sig_handler(signum, frame):
>
> sys.stderr.write("Signal is received:" + str(signum) + "\n")
>
> global quit
>
> quit = 1
>
> global RUNNING
>
> RUNNING=False
>
> def handle_line(line):
>
> if not RUNNING:
>
> return
>
> if not line:
>
> return
>
> if quit > 0:
>
> return
>
> arr = line.split()
>
> response = urllib.request.urlopen( rand_api_url )
>
> response_text = response.read()
>
> queue.append(arr[0] + " " + response_text.decode("utf-8"))
>
> def handle_stdout(n):
>
> while RUNNING:
>
> if quit > 0:
>
> return
>
> while len(queue) > 0:
>
> item = queue.pop(0)
>
> sys.stdout.write(item)
>
> sys.stdout.flush()
>
> time.sleep(0.5)
>
> def handle_stdin(n):
>
> while RUNNING:
>
> line = sys.stdin.readline()
>
> if not line:
>
> break
>
> if quit > 0:
>
> break
>
> line = line.strip()
>
> thread = threading.Thread(target=handle_line, args=(line,))
>
> thread.start()
>
> threads.append(thread)
>
> signal.signal(signal.SIGUSR1, sig_handler)
>
> signal.signal(signal.SIGUSR2, sig_handler)
>
> signal.signal(signal.SIGALRM, sig_handler)
>
> signal.signal(signal.SIGINT, sig_handler)
>
> signal.signal(signal.SIGQUIT, sig_handler)
>
> signal.signal(signal.SIGTERM, sig_handler)
>
> stdout_thread = threading.Thread(target=handle_stdout, args=(1,))
>
> stdout_thread.start()
>
> threads.append(stdout_thread)
>
> stdin_thread = threading.Thread(target=handle_stdin, args=(2,))
>
> stdin_thread.start()
>
> threads.append(stdin_thread)
>
> while(RUNNING):
>
> time.sleep(3)
>
> print("Not RUNNING")
>
> for thread in threads:
>
> thread.join()
>
> print("All threads stopped.")
>
> ## END
>
> Eliezer
>
> ----
>
> Eliezer Croitoru
>
> NgTech, Tech Support
>
> Mobile: +972-5-28704261
>
> Email: ngtech1ltd at gmail.com
>
> *From:*squid-users <squid-users-bounces at lists.squid-cache.org>
> <mailto:squid-users-bounces at lists.squid-cache.org> *On Behalf Of
> *David Touzeau
> *Sent:* Friday, February 4, 2022 16:29
> *To:* squid-users at lists.squid-cache.org
> *Subject:* Re: [squid-users] external helper development
>
> Elizer,
>
> Thanks for all this advice and indeed your arguments are valid
> between opening a socket, sending data, receiving data and closing
> the socket unlike direct access to a regex or a memory entry even
> if the calculation has already been done.
>
> But what surprises me the most is that we have produced a python
> plugin in thread which I provide you a code below.
> The php code is like your mentioned example ( No thread, just a
> loop and output OK )
>
> Results are after 6k requests, squid freeze and no surf can be
> made as with PHP code we can up to 10K requests and squid is happy
> really, we did not understand why python is so low.
>
> Here a python code using threads
>
> #!/usr/bin/env python
> import os
> import sys
> import time
> import signal
> import locale
> import traceback
> import threading
> import select
> import traceback as tb
>
> class ClienThread():
>
> def __init__(self):
> self._exiting = False
> self._cache = {}
>
> def exit(self):
> self._exiting = True
>
> def stdout(self, lineToSend):
> try:
> sys.stdout.write(lineToSend)
> sys.stdout.flush()
>
> except IOError as e:
> if e.errno==32:
> # Error Broken PIPE!"
> pass
> except:
> # other execpt
> pass
>
> def run(self):
> while not self._exiting:
> if sys.stdin in select.select([sys.stdin], [], [],
> 0.5)[0]:
> line = sys.stdin.readline()
> LenOfline=len(line)
>
> if LenOfline==0:
> self._exiting=True
> break
>
> if line[-1] == '\n':line = line[:-1]
> channel = None
> options = line.split()
>
> try:
> if options[0].isdigit(): channel = options.pop(0)
> except IndexError:
> self.stdout("0 OK first=ERROR\n")
> continue
>
> # Processing here
>
> try:
> self.stdout("%s OK\n" % channel)
> except:
> self.stdout("%s ERROR first=ERROR\n" % channel)
>
>
>
>
> class Main(object):
> def __init__(self):
> self._threads = []
> self._exiting = False
> self._reload = False
> self._config = ""
>
> for sig, action in (
> (signal.SIGINT, self.shutdown),
> (signal.SIGQUIT, self.shutdown),
> (signal.SIGTERM, self.shutdown),
> (signal.SIGHUP, lambda s, f: setattr(self, '_reload',
> True)),
> (signal.SIGPIPE, signal.SIG_IGN),
> ):
> try:
> signal.signal(sig, action)
> except AttributeError:
> pass
>
>
>
> def shutdown(self, sig = None, frame = None):
> self._exiting = True
> self.stop_threads()
>
> def start_threads(self):
>
> sThread = ClienThread()
> t = threading.Thread(target = sThread.run)
> t.start()
> self._threads.append((sThread, t))
>
>
>
> def stop_threads(self):
> for p, t in self._threads:
> p.exit()
> for p, t in self._threads:
> t.join(timeout = 1.0)
> self._threads = []
>
> def run(self):
> """ main loop """
> ret = 0
> self.start_threads()
> return ret
>
>
> if __name__ == '__main__':
> # set C locale
> locale.setlocale(locale.LC_ALL, 'C')
> os.environ['LANG'] = 'C'
> ret = 0
> try:
> main = Main()
> ret = main.run()
> except SystemExit:
> pass
> except KeyboardInterrupt:
> ret = 4
> except:
> sys.exit(ret)
>
> Le 04/02/2022 à 07:06, Eliezer Croitoru a écrit :
>
> And about the cache of each helpers, the cost of a cache on a
> single helper is not much in terms of memory comparing to some
> network access.
>
> Again it’s possible to test and verify this on a loaded system
> to get results. The delay itself can be seen from squid side
> in the cache manager statistics.
>
> You can also try to compare the next ruby helper:
>
> https://wiki.squid-cache.org/EliezerCroitoru/SessionHelper
>
> About a shared “base” which allows helpers to avoid
> computation of the query…. It’s a good argument, however it
> depends what is the cost of
> pulling from the cache compared to calculating the answer.
>
> A very simple string comparison or regex matching would
> probably be faster than reaching a shared storage in many cases.
>
> Also take into account the “concurrency” support from the
> helper side.
>
> A helper that supports parallel processing of requests/lines
> can do better then many single helpers in more than once use case.
>
> In any case I would suggest to enable requests concurrency
> from squid side since the STDIN buffer will emulate some level
> of concurrency
> by itself and will allow squid to keep going forward faster.
>
> Just to mention that SquidGuard have used a single helper
> cache for a very long time, ie every single SquidGuard helper
> has it’s own copy of the whole
>
> configuration and database files in memory.
>
> And again, if you do have any option to implement a server
> service model and that the helpers will contact this main
> service you will be able to implement
> much faster internal in-memory cache compared to a
> redis/memcahe/other external daemon(need to be tested).
>
> A good example for this is ufdbguard which has helpers that
> are clients of the main service which does the whole heavy
> lifting and also holds
> one copy of the DB.
>
> I have implemented SquidBlocker this way and have seen that it
> out-performs any other service I have tried until now.
>
>
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20220208/5541f8d5/attachment-0001.htm>
More information about the squid-users
mailing list