[squid-dev] Squid 3.5 with nonblocking ecap adapter

Alex Rousskov rousskov at measurement-factory.com
Thu Nov 30 17:54:47 UTC 2017


On 11/30/2017 09:39 AM, Christof Gerber wrote:

> First tests with my eCAP adapter in asynchronous mode which performs
> lookups to non-blocking unix sockets seem promising.  I use std::queue
> to implement a FIFO queue in the adapter which tracks pointers to the
> file-descriptor (non-blocking unix socket) of pending lookups as well
> as a link (shared pointer) to the corresponding Xaction. Similiar as
> it is solved in ClamAV adapter.

Glad you are making progress!


> The difference though is that I do not use any threads for the actual
> asynchronous action. This of course is only possible for applications
> in which a second process on the other side of the socket is doing
> the work (processing the lookup) for which to finish one can poll
> within resume().

Correct. In the ClamAV adapter case, the primary ClamAV library call is
blocking so we used threads to make that asynchronous.


> One thing I noticed which I am concerned about:
> 
> Squid with one async ecap adapter attached decreases the content
> sending speed (squid -> ecap only) significantly (from ~64kB before to
> ~16kB per vbContent cycle). I wonder if this is due to the additional
> polling of (resume() and suspend()) that happens also every time
> before noteVbContentAvailable()? 

I think it might be, provided your adapter forces Squid to use a shorter
polling timeout:

* In (simplified) theory, polling should stop as soon as the first data
packet arrives, and so Squid should receive the same amount of data
regardless of the polling timeout.

* In practice, I would not be surprised if less frequent (higher
timeout) polling may result in larger data chunks accumulated by the TCP
stack. The theoretical "as soon as" principles are trumped by interrupt
granularity and kernel implementation assumptions/simplifications. In
other words, if we let the kernel poll longer, the kernel might poll
longer, even if there is some data available already. Since Squid reads
after polling, the longer we poll, the more data may arrive by the time
Squid reads.


> Is Squid reducing the file chunk size
> from 64kB to 16kB to meet the polling timeout time (polling interval)?

I do not think so. Squid should be using the same network buffer size
regardless of the polling timeout (unless there is some
hidden/unintended dependency that I do not know about). The amount of
data in that Squid buffer depends on network traffic, polling frequency,
and other external factors.


> As speed is only slow in the squid->ecap virgin body transaction and
> it stays the same as in non-async mode for the adapted body
> transaction (ecap -> squid), I wonder why Squid behaves like that and
> how I can change or influence that?

I would start by validating the working theory that the polling timeout
affects available virgin data sizes. Do larger adapter-set resume()
timeouts increase available virgin data sizes?

Alex.


> On 2 November 2017 at 16:47, Alex Rousskov wrote:
>> On 11/02/2017 03:49 AM, Christof Gerber wrote:
>>
>>> One thing I still don't fully understand is if the asynchronous way to
>>> program and operate Squid with an eCAP adapter necessarily relies on
>>> threads?
>>
>> No, threads are just one popular way to achieve asynchrony. One may also
>> use multiple processes or (for some definition of asynchrony) event loops.
>>
>>
>>> Are A) and B) alternatives or the only options?
>>
>> If you replace B's "threads" with "threads or other asynchrony
>> mechanisms", then yes, IMO. Threads are probably the most popular way to
>> achieve asynchrony in new code.
>>
>>
>>> I've seen that both the
>>> ClamAV and sample async adapter use pthreads. In my use case I need to
>>> do a simple hash lookup to a file socket somewhen during the eCAP
>>> interaction. As it will take some milliseconds until the response will
>>> be available on the socket I don't want to block Squid during this
>>> time.
>>
>> Yes, this is a common problem.
>>
>>
>>> But the eCAP adapter won't need to process/compute anything else
>>> during this time. So why would I bother to use threads?
>>
>> ... because without threads (or processes or event loops) your adapter
>> lookup will block the whole Squid worker while waiting for socket I/O.
>>
>> If you do not want to use threads or processes in your adapter, then you
>> can try to use an event loop model. That is what Squid uses internally
>> (each Squid worker does not have threads to process thousands of
>> transactions "concurrently").
>>
>> To use an event loop model, your adapter will need to use non-blocking
>> socket I/O and schedule _one_ I/O loop iteration every ~X milliseconds,
>> when Squid calls your Service::resume(). Try googling "I/O loop" or
>> "select loop" for starting points if you are not familiar with that
>> design pattern. The overall host-adapter interaction would be very
>> similar to what you find in the sample and ClamAV adapters.
>>
>> Disclaimer: I have not seen anybody using event loops with eCAP. I think
>> it is possible to implement that model, but there may be important
>> caveats that I am not aware of.
>>
>>
>> HTH,
>>
>> Alex.
>>
>>
>>> On 1 November 2017 at 16:23, Alex Rousskov wrote:
>>>> On 11/01/2017 03:20 AM, Christof Gerber wrote:
>>>>
>>>>> [Will Squid] be blocked until the eCAP API call returns?
>>>>
>>>> To answer the exact question above: Yes, the Squid worker making an eCAP
>>>> API call will block until that call returns. The same is true for all
>>>> other API calls, all system calls, and all internal calls. This is how
>>>> C/C++ works. I am stating the obvious for the record, in case somebody
>>>> with a different (or insufficient) programming languages background
>>>> stumbles upon this thread.
>>>>
>>>> What you are really asking, I suspect, is whether Squid or the eCAP
>>>> library uses threads to automatically make eCAP adapter operations
>>>> asynchronous to the primary Squid operations. The answer to that
>>>> question is "no": The relevant Squid code does not use threads, and
>>>> there are no threads in the eCAP library code.
>>>>
>>>> Also, there is no magical layer between Squid and 99% of eCAP calls --
>>>> Squid calls go directly to your eCAP adapter code and vice versa. IIRC,
>>>> the only (unimportant) exception to that "direct calls" observation is
>>>> the eCAP service registry API, where there is a thin eCAP layer
>>>> insulating the adapter from the host application. That layer is also
>>>> synchronous though.
>>>>
>>>>
>>>>> Is there a way other than
>>>>> programming the eCAP adapter in asynchronous mode?
>>>>
>>>> I do not think there is a better alternative. AFAICT, you only have two
>>>> options:
>>>>
>>>>   A) Change Squid to move eCAP calls to thread(s).
>>>>   B) Use threads inside the adapter to make its operations asynchronous.
>>>>
>>>> As you know, the sample async adapter and the ClamAV adapter use (B).
>>>> That approach has its problems (because it currently does not require
>>>> the host application to be threads-aware), but it works reasonably well
>>>> for many use cases.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Alex.
>>
> 
> 
> 



More information about the squid-dev mailing list