[squid-dev] [PATCH] url_rewrite_timeout directive

Tue Nov 18 11:25:46 UTC 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 18/11/2014 10:44 p.m., Tsantilas Christos wrote:
> On 11/18/2014 01:27 AM, Amos Jeffries wrote:

>> 
>>>> in src/helper.cc:
>>>> 
>>>> * adding multiple double-empty lines in the chunk @195
>>>> 
>>>> * helper_server::checkForTimedOutRequests() is deducting
>>>> from stats.pending pending. This is incorrect, the helper
>>>> request is still pending.
>>> 
>>> Amos, I think it is better as is.... Maybe in a point of view
>>> the request is still pending, but we are not waiting an answer
>>> for this request any more.
>> 
>> Any helper which is obeying the helper protocol *will* still be 
>> consuming resources trying to satisfy it. This is what I mean by
>> forgetting and sending more requests to the helper "making the
>> overload problem worse".
> 
> Maybe we should document this behaviour in option documentation.
> The administrators who want to use timeout they have to take care
> of it.

So instead of being pointed at a specific problem needing attention
and which they probably can fix. They have to read and act on some
documentation to prepare for the risk of some future possibility
happening.

Sadly the commonness of "RTFM" shows which of those approaches is most
likely to result in critical failure.

> 
>>> 
>>>> - your act of removing the requests from the queue
>>>> invalidates the logic in helperReturnBuffer() recording and
>>>> managing late helper responses.
>>> 
>>> why?
>> 
>> Non-concurrent helpers: * #1 query times out and gets discarded. 
>> * Lookup logics finding a helper to use see a helper without
>> pending requests and deliver a second lookup (#2). * #1 response
>> arrives late and is used as response to the current pending (#2)
>> lookup.
> 
> This is will be discarded. This is because the request will get new
> requestId before retried.  The old requestId is not valid any
> more.

Non-concurrent helpers have no ID on the response with which to
determine whether it was for the currently waiting request or
something earlier.

> 
>> * #2 response arrives and is discarded.
> 
> This is will be accepted.
> 
>> - --> helper responses are hereafter off-by-1 until another
>> timeout makes it off-by-2 or lucky random timing makes it
>> off-by-0 again.
>> 
>> Please notice that the helpers with most common long duration
>> lookups are NTLM and Kerberos crypto validation. Which do not
>> support concurrency channels.
> 
> Yes. They can not use timeout feature.

But they can configure it and when configured it is always used.

For non-concurrent helpers the request-ID is only saved in the request
object which was discarded from the queue when it timed out. The #2
request updated both the srv/hlp latest-requestID on record and added
a new queued 'pending' request.

The un-numbered reply coming back can mistakenly use those stored
details about the request which has not yet come back.

With this patch concurrency is effectively mandatory - except when it
isn't.

> 
>>> 
>>>> - both of the above also corrupts the mgr report content for 
>>>> expected pending responses. By all means call (and clear)
>>>> the callback on a timeout but please do not remove the queue
>>>> entries for them until the helper actually responds to that
>>>> query.
>>> 
>>> Some of the queries will never  answered. This is may result
>>> to memory leaks.
>> 
>> see the point I made next:
>> 
>>>> - if a helper gets too many un-responded queries the traffic 
>>>> being served by it is f'ckd and we should shutdown the
>>>> helper. If an auto-restart does not work Manual intervention
>>>> is probably required from the admin to resolve why it is
>>>> timing out *at all*.
> 
> Yes this is true. But the admin can find out if many requests are
> timed out for a server using mgr. And then he can kill problematic
> server.
> 

How often do you log into your machines and check the TCP packet
failure rates in order to see if the NIC or cables need replacing?
How often do you run the fsck to see if disk drive errors are
accumulating and will need a replacement in future?

I posit you dont notice until the errors start popping up in your face.

>> 
>>>> 
>>>> Please also document clearly in the release notes that:
>>>> 
>>>> A) helper-mux.pl we have been distributing for the past few
>>>> years to encourage use of concurrency is no longer compatible
>>>> with Squid. If used it will spawn up to 2^64 helpers and DoS
>>>> the Squid server.
>>> 
>>> What is this helper? I can not find it inside squid
>>> sources....
>> 
>> Under tools/ 
>> http://wiki.squid-cache.org/Features/HelperMultiplexer
> 
> We need to fix this helper to work with new changes too.... OK.
> 
> 
>>>> B) helpers utilizing arrays to handle fixed amounts of 
>>>> concurrency channels MUST be re-written to use queues and
>>>> capable of handling a 64-bit int as index or they will be
>>>> vulnerable to buffer overrun and arbitrary memory accesses.
>>> 
>>> 
>>> OK for this
>>> 
>>>> 
>>>> C) 32-bit helpers need re-writing to handle the concurrency 
>>>> channel ID as a 64-bit integer value. If not updated they
>>>> will cause proxies to return unexpected results or timeout
>>>> once crossing the 32-bit wrap boundary. Leading to undefined
>>>> behaviour in the client HTTP traffic.
>>>> 
>>> 
>>> OK.
>>> 
>>>> 
>>>> 
>>>> My vote on this is -0. Too many problems and still no clear 
>>>> use-case has been described[1] for this to be a reasonable 
>>>> addition.
>>> 
>>> 
>>> OK. I tried to do most  of the fixes you are suggested. I hope
>>> it is OK.
>>> 

Amos

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUayy4AAoJELJo5wb/XPRjjRQH/A2yT762PpCFLTIPzHI9RSaM
FoeNfzC6Z+Gf29eWgcCkzBQMCOJjm3G950IK4BWtSd5OivI/7zAJO6KY3YQ4dnfc
KXU3+E+DrOYHdmhljBaPDYbOA9mCfMaCH32jISYTTtDYBEvNuQh1KSPh5so63aGV
u4SVCGkISACAUATy+vaeKTR4ry0c0XYDc+l55E7LvQLNJlRyQYb2H9DG2YjV8j6h
l/gNlwK0K9VJAOu0+B3GmdNSEwW05i7mdSDqw2zuuLdaORizyDpHb/IgzHAjBi9D
zZl6kxAngzfooBxwnllOnr1eTzRSAoGe7NmZMPhh89K/olGvF2I9bbefqzXHiL4=
=X+Xo
-----END PGP SIGNATURE-----