[squid-users] external helper development
Eliezer Croitoru
ngtech1ltd at gmail.com
Mon Feb 7 15:14:55 UTC 2022
Hey David,
Since the handle_stdout runs in it’s own thread it’s sole purpose is to send results to stdout.
If I will run the next code in a simple software without the 0.5 sleep time:
while RUNNING:
if quit > 0:
return
while len(queue) > 0:
item = queue.pop(0)
sys.stdout.write(item)
sys.stdout.flush()
time.sleep(0.5)
what will happen is that the software will run with 100% CPU looping over and over on the size of the queue
while sometimes it will spit some data to stdout.
Adding a small delay with 0.5 secs will allow some “idle” time for the cpu in the loop preventing it from consuming
all the CPU time.
It’s a very old technique and there are others which are more efficient but it’s enough to demonstrate that a simple
threaded helper is much better then any PHP code that was not meant to be running as a STDIN/OUT daemon/helper software.
All The Bests,
Eliezer
----
Eliezer Croitoru
NgTech, Tech Support
Mobile: +972-5-28704261
Email: ngtech1ltd at gmail.com <mailto:ngtech1ltd at gmail.com>
From: David Touzeau <david at articatech.com>
Sent: Monday, February 7, 2022 02:42
To: Eliezer Croitoru <ngtech1ltd at gmail.com>; squid-users at lists.squid-cache.org
Subject: Re: [squid-users] external helper development
Sorry Elizer
It was a mistake... No, your code is clean..
Impressive for the first shot
Many thanks for your example, we will run our stress tool to see the difference...
Just a question
Why did you send 500 milliseconds of sleep in the handle_stdoud ? Is it for let squid closing the pipe ?
Le 06/02/2022 à 11:46, Eliezer Croitoru a écrit :
Hey David,
Not a fully completed helper but it seems to works pretty nice and might be better then what exist already:
https://gist.githubusercontent.com/elico/03938e3a796c53f7c925872bade78195/raw/21ff1bbc0cf3d91719db27d9d027652e8bd3de4e/threaded-helper-example.py
#!/usr/bin/env python
import sys
import time
import urllib.request
import signal
import threading
#set debug mode for True or False
debug = False
#debug = True
queue = []
threads = []
RUNNING = True
quit = 0
rand_api_url = <https://cloud1.ngtech.co.il/api/test.php> "https://cloud1.ngtech.co.il/api/test.php"
def sig_handler(signum, frame):
sys.stderr.write("Signal is received:" + str(signum) + "\n")
global quit
quit = 1
global RUNNING
RUNNING=False
def handle_line(line):
if not RUNNING:
return
if not line:
return
if quit > 0:
return
arr = line.split()
response = urllib.request.urlopen( rand_api_url )
response_text = response.read()
queue.append(arr[0] + " " + response_text.decode("utf-8"))
def handle_stdout(n):
while RUNNING:
if quit > 0:
return
while len(queue) > 0:
item = queue.pop(0)
sys.stdout.write(item)
sys.stdout.flush()
time.sleep(0.5)
def handle_stdin(n):
while RUNNING:
line = sys.stdin.readline()
if not line:
break
if quit > 0:
break
line = line.strip()
thread = threading.Thread(target=handle_line, args=(line,))
thread.start()
threads.append(thread)
signal.signal(signal.SIGUSR1, sig_handler)
signal.signal(signal.SIGUSR2, sig_handler)
signal.signal(signal.SIGALRM, sig_handler)
signal.signal(signal.SIGINT, sig_handler)
signal.signal(signal.SIGQUIT, sig_handler)
signal.signal(signal.SIGTERM, sig_handler)
stdout_thread = threading.Thread(target=handle_stdout, args=(1,))
stdout_thread.start()
threads.append(stdout_thread)
stdin_thread = threading.Thread(target=handle_stdin, args=(2,))
stdin_thread.start()
threads.append(stdin_thread)
while(RUNNING):
time.sleep(3)
print("Not RUNNING")
for thread in threads:
thread.join()
print("All threads stopped.")
## END
Eliezer
----
Eliezer Croitoru
NgTech, Tech Support
Mobile: +972-5-28704261
Email: ngtech1ltd at gmail.com <mailto:ngtech1ltd at gmail.com>
From: squid-users <mailto:squid-users-bounces at lists.squid-cache.org> <squid-users-bounces at lists.squid-cache.org> On Behalf Of David Touzeau
Sent: Friday, February 4, 2022 16:29
To: squid-users at lists.squid-cache.org <mailto:squid-users at lists.squid-cache.org>
Subject: Re: [squid-users] external helper development
Elizer,
Thanks for all this advice and indeed your arguments are valid between opening a socket, sending data, receiving data and closing the socket unlike direct access to a regex or a memory entry even if the calculation has already been done.
But what surprises me the most is that we have produced a python plugin in thread which I provide you a code below.
The php code is like your mentioned example ( No thread, just a loop and output OK )
Results are after 6k requests, squid freeze and no surf can be made as with PHP code we can up to 10K requests and squid is happy
really, we did not understand why python is so low.
Here a python code using threads
#!/usr/bin/env python
import os
import sys
import time
import signal
import locale
import traceback
import threading
import select
import traceback as tb
class ClienThread():
def __init__(self):
self._exiting = False
self._cache = {}
def exit(self):
self._exiting = True
def stdout(self, lineToSend):
try:
sys.stdout.write(lineToSend)
sys.stdout.flush()
except IOError as e:
if e.errno==32:
# Error Broken PIPE!"
pass
except:
# other execpt
pass
def run(self):
while not self._exiting:
if sys.stdin in select.select([sys.stdin], [], [], 0.5)[0]:
line = sys.stdin.readline()
LenOfline=len(line)
if LenOfline==0:
self._exiting=True
break
if line[-1] == '\n':line = line[:-1]
channel = None
options = line.split()
try:
if options[0].isdigit(): channel = options.pop(0)
except IndexError:
self.stdout("0 OK first=ERROR\n")
continue
# Processing here
try:
self.stdout("%s OK\n" % channel)
except:
self.stdout("%s ERROR first=ERROR\n" % channel)
class Main(object):
def __init__(self):
self._threads = []
self._exiting = False
self._reload = False
self._config = ""
for sig, action in (
(signal.SIGINT, self.shutdown),
(signal.SIGQUIT, self.shutdown),
(signal.SIGTERM, self.shutdown),
(signal.SIGHUP, lambda s, f: setattr(self, '_reload', True)),
(signal.SIGPIPE, signal.SIG_IGN),
):
try:
signal.signal(sig, action)
except AttributeError:
pass
def shutdown(self, sig = None, frame = None):
self._exiting = True
self.stop_threads()
def start_threads(self):
sThread = ClienThread()
t = threading.Thread(target = sThread.run)
t.start()
self._threads.append((sThread, t))
def stop_threads(self):
for p, t in self._threads:
p.exit()
for p, t in self._threads:
t.join(timeout = 1.0)
self._threads = []
def run(self):
""" main loop """
ret = 0
self.start_threads()
return ret
if __name__ == '__main__':
# set C locale
locale.setlocale(locale.LC_ALL, 'C')
os.environ['LANG'] = 'C'
ret = 0
try:
main = Main()
ret = main.run()
except SystemExit:
pass
except KeyboardInterrupt:
ret = 4
except:
sys.exit(ret)
Le 04/02/2022 à 07:06, Eliezer Croitoru a écrit :
And about the cache of each helpers, the cost of a cache on a single helper is not much in terms of memory comparing to some network access.
Again it’s possible to test and verify this on a loaded system to get results. The delay itself can be seen from squid side in the cache manager statistics.
You can also try to compare the next ruby helper:
https://wiki.squid-cache.org/EliezerCroitoru/SessionHelper
About a shared “base” which allows helpers to avoid computation of the query…. It’s a good argument, however it depends what is the cost of
pulling from the cache compared to calculating the answer.
A very simple string comparison or regex matching would probably be faster than reaching a shared storage in many cases.
Also take into account the “concurrency” support from the helper side.
A helper that supports parallel processing of requests/lines can do better then many single helpers in more than once use case.
In any case I would suggest to enable requests concurrency from squid side since the STDIN buffer will emulate some level of concurrency
by itself and will allow squid to keep going forward faster.
Just to mention that SquidGuard have used a single helper cache for a very long time, ie every single SquidGuard helper has it’s own copy of the whole
configuration and database files in memory.
And again, if you do have any option to implement a server service model and that the helpers will contact this main service you will be able to implement
much faster internal in-memory cache compared to a redis/memcahe/other external daemon(need to be tested).
A good example for this is ufdbguard which has helpers that are clients of the main service which does the whole heavy lifting and also holds
one copy of the DB.
I have implemented SquidBlocker this way and have seen that it out-performs any other service I have tried until now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20220207/68558743/attachment-0001.htm>
More information about the squid-users
mailing list