[squid-users] external helper development

Tue Feb 8 00:36:44 UTC 2022

You are the best,
We will launch a benchmark to see the diff

Le 07/02/2022 à 16:14, Eliezer Croitoru a écrit :
>
> Hey David,
>
> Since the handle_stdout runs in it’s own thread it’s sole purpose is 
> to send results to stdout.
>
> If I will run the next code in a simple software without the 0.5 sleep 
> time:
>
>      while RUNNING:
>
>          if quit > 0:
>
>            return
>
>          while len(queue) > 0:
>
>              item = queue.pop(0)
>
> sys.stdout.write(item)
>
> sys.stdout.flush()
>
>          time.sleep(0.5)
>
> what will happen is that the software will run with 100% CPU looping 
> over and over on the size of the queue
> while sometimes it will spit some data to stdout.
>
> Adding a small delay with 0.5 secs will allow some “idle” time for the 
> cpu in the loop preventing it from consuming
> all the CPU time.
>
> It’s a very old technique and there are others which are more 
> efficient but it’s enough to demonstrate that a simple
> threaded helper is much better then any PHP code that was not meant to 
> be running as a STDIN/OUT daemon/helper software.
>
> All The Bests,
>
> Eliezer
>
> ----
>
> Eliezer Croitoru
>
> NgTech, Tech Support
>
> Mobile: +972-5-28704261
>
> Email: ngtech1ltd at gmail.com
>
> *From:*David Touzeau <david at articatech.com>
> *Sent:* Monday, February 7, 2022 02:42
> *To:* Eliezer Croitoru <ngtech1ltd at gmail.com>; 
> squid-users at lists.squid-cache.org
> *Subject:* Re: [squid-users] external helper development
>
> Sorry Elizer
>
> It was a mistake... No, your code is clean..
> Impressive for the first shot
> Many thanks for your example, we will run our stress tool to see the 
> difference...
>
> Just a question
>
> Why did you send 500 milliseconds of sleep in the handle_stdoud ? Is 
> it for let squid closing the pipe ?
>
>
> Le 06/02/2022 à 11:46, Eliezer Croitoru a écrit :
>
>     Hey David,
>
>     Not a fully completed helper but it seems to works pretty nice and
>     might be better then what exist already:
>
>     https://gist.githubusercontent.com/elico/03938e3a796c53f7c925872bade78195/raw/21ff1bbc0cf3d91719db27d9d027652e8bd3de4e/threaded-helper-example.py
>
>     #!/usr/bin/env python
>
>     import sys
>
>     import time
>
>     import urllib.request
>
>     import signal
>
>     import threading
>
>     #set debug mode for True or False
>
>     debug = False
>
>     #debug = True
>
>     queue = []
>
>     threads = []
>
>     RUNNING = True
>
>     quit = 0
>
>     rand_api_url = "https://cloud1.ngtech.co.il/api/test.php"
>     <https://cloud1.ngtech.co.il/api/test.php>
>
>     def sig_handler(signum, frame):
>
>     sys.stderr.write("Signal is received:" + str(signum) + "\n")
>
>         global quit
>
>         quit = 1
>
>         global RUNNING
>
>         RUNNING=False
>
>     def handle_line(line):
>
>          if not RUNNING:
>
>              return
>
>          if not line:
>
>              return
>
>          if quit > 0:
>
>              return
>
>          arr = line.split()
>
>          response = urllib.request.urlopen( rand_api_url )
>
>          response_text = response.read()
>
>          queue.append(arr[0] + " " + response_text.decode("utf-8"))
>
>     def handle_stdout(n):
>
>          while RUNNING:
>
>              if quit > 0:
>
>                return
>
>              while len(queue) > 0:
>
>                  item = queue.pop(0)
>
>     sys.stdout.write(item)
>
>     sys.stdout.flush()
>
>              time.sleep(0.5)
>
>     def handle_stdin(n):
>
>         while RUNNING:
>
>              line = sys.stdin.readline()
>
>              if not line:
>
>                  break
>
>              if quit > 0:
>
>                  break
>
>              line = line.strip()
>
>              thread = threading.Thread(target=handle_line, args=(line,))
>
>              thread.start()
>
>     threads.append(thread)
>
>     signal.signal(signal.SIGUSR1, sig_handler)
>
>     signal.signal(signal.SIGUSR2, sig_handler)
>
>     signal.signal(signal.SIGALRM, sig_handler)
>
>     signal.signal(signal.SIGINT, sig_handler)
>
>     signal.signal(signal.SIGQUIT, sig_handler)
>
>     signal.signal(signal.SIGTERM, sig_handler)
>
>     stdout_thread = threading.Thread(target=handle_stdout, args=(1,))
>
>     stdout_thread.start()
>
>     threads.append(stdout_thread)
>
>     stdin_thread = threading.Thread(target=handle_stdin, args=(2,))
>
>     stdin_thread.start()
>
>     threads.append(stdin_thread)
>
>     while(RUNNING):
>
>         time.sleep(3)
>
>     print("Not RUNNING")
>
>     for thread in threads:
>
>         thread.join()
>
>     print("All threads stopped.")
>
>     ## END
>
>     Eliezer
>
>     ----
>
>     Eliezer Croitoru
>
>     NgTech, Tech Support
>
>     Mobile: +972-5-28704261
>
>     Email: ngtech1ltd at gmail.com
>
>     *From:*squid-users <squid-users-bounces at lists.squid-cache.org>
>     <mailto:squid-users-bounces at lists.squid-cache.org> *On Behalf Of
>     *David Touzeau
>     *Sent:* Friday, February 4, 2022 16:29
>     *To:* squid-users at lists.squid-cache.org
>     *Subject:* Re: [squid-users] external helper development
>
>     Elizer,
>
>     Thanks for all this advice and indeed your arguments are valid
>     between opening a socket, sending data, receiving data and closing
>     the socket unlike direct access to a regex or a memory entry even
>     if the calculation has already been done.
>
>     But what surprises me the most is that we have produced a python
>     plugin in thread which I provide you a code below.
>     The php code is like your mentioned example ( No thread, just a
>     loop and output OK )
>
>     Results are after 6k requests, squid freeze and no surf can be
>     made as with PHP code we can up to 10K requests and squid is happy
>     really, we did not understand why python is so low.
>
>     Here a python code using threads
>
>     #!/usr/bin/env python
>     import os
>     import sys
>     import time
>     import signal
>     import locale
>     import traceback
>     import threading
>     import select
>     import traceback as tb
>
>     class ClienThread():
>
>         def __init__(self):
>             self._exiting = False
>             self._cache = {}
>
>         def exit(self):
>             self._exiting = True
>
>         def stdout(self, lineToSend):
>             try:
>                 sys.stdout.write(lineToSend)
>                 sys.stdout.flush()
>
>             except IOError as e:
>                 if e.errno==32:
>                     # Error Broken PIPE!"
>                     pass
>             except:
>                 # other execpt
>                 pass
>
>         def run(self):
>             while not self._exiting:
>                 if sys.stdin in select.select([sys.stdin], [], [],
>     0.5)[0]:
>                     line = sys.stdin.readline()
>                     LenOfline=len(line)
>
>                     if LenOfline==0:
>                         self._exiting=True
>                         break
>
>                     if line[-1] == '\n':line = line[:-1]
>                     channel = None
>                     options = line.split()
>
>                     try:
>                         if options[0].isdigit(): channel = options.pop(0)
>                     except IndexError:
>                         self.stdout("0 OK first=ERROR\n")
>                         continue
>
>                     # Processing here
>
>                     try:
>                         self.stdout("%s OK\n" % channel)
>                     except:
>                         self.stdout("%s ERROR first=ERROR\n" % channel)
>
>
>
>
>     class Main(object):
>         def __init__(self):
>             self._threads = []
>             self._exiting = False
>             self._reload = False
>             self._config = ""
>
>             for sig, action in (
>                 (signal.SIGINT, self.shutdown),
>                 (signal.SIGQUIT, self.shutdown),
>                 (signal.SIGTERM, self.shutdown),
>                 (signal.SIGHUP, lambda s, f: setattr(self, '_reload',
>     True)),
>                 (signal.SIGPIPE, signal.SIG_IGN),
>             ):
>                 try:
>                     signal.signal(sig, action)
>                 except AttributeError:
>                     pass
>
>
>
>         def shutdown(self, sig = None, frame = None):
>             self._exiting = True
>             self.stop_threads()
>
>         def start_threads(self):
>
>             sThread = ClienThread()
>             t = threading.Thread(target = sThread.run)
>             t.start()
>             self._threads.append((sThread, t))
>
>
>
>         def stop_threads(self):
>             for p, t in self._threads:
>                 p.exit()
>             for p, t in self._threads:
>                 t.join(timeout =  1.0)
>             self._threads = []
>
>         def run(self):
>             """ main loop """
>             ret = 0
>             self.start_threads()
>             return ret
>
>
>     if __name__ == '__main__':
>         # set C locale
>         locale.setlocale(locale.LC_ALL, 'C')
>         os.environ['LANG'] = 'C'
>         ret = 0
>         try:
>             main = Main()
>             ret = main.run()
>         except SystemExit:
>             pass
>         except KeyboardInterrupt:
>             ret = 4
>         except:
>         sys.exit(ret)
>
>     Le 04/02/2022 à 07:06, Eliezer Croitoru a écrit :
>
>         And about the cache of each helpers, the cost of a cache on a
>         single helper is not much in terms of memory comparing to some
>         network access.
>
>         Again it’s possible to test and verify this on a loaded system
>         to get results. The delay itself can be seen from squid side
>         in the cache manager statistics.
>
>         You can also try to compare the next ruby helper:
>
>         https://wiki.squid-cache.org/EliezerCroitoru/SessionHelper
>
>         About a shared “base” which allows helpers to avoid
>         computation of the query…. It’s a good argument, however it
>         depends what is the cost of
>         pulling from the cache compared to calculating the answer.
>
>         A very simple string comparison or regex matching would
>         probably be faster than reaching a shared storage in many cases.
>
>         Also take into account the “concurrency” support from the
>         helper side.
>
>         A helper that supports parallel processing of requests/lines
>         can do better then many single helpers in more than once use case.
>
>         In any case I would suggest to enable requests concurrency
>         from squid side since the STDIN buffer will emulate some level
>         of concurrency
>         by itself and will allow squid to keep going forward faster.
>
>         Just to mention that SquidGuard have used a single helper
>         cache for a very long time, ie every single SquidGuard helper
>         has it’s own copy of the whole
>
>         configuration and database files in memory.
>
>         And again, if you do have any option to implement a server
>         service model and that the helpers will contact this main
>         service you will be able to implement
>         much faster internal in-memory cache compared to a
>         redis/memcahe/other external daemon(need to be tested).
>
>         A good example for this is ufdbguard which has helpers that
>         are clients of the main service which does the whole heavy
>         lifting and also holds
>         one copy of the DB.
>
>         I have implemented SquidBlocker this way and have seen that it
>         out-performs any other service I have tried until now.
>
>
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20220208/5541f8d5/attachment.htm>