[squid-dev] [PATCH] Reuse reserved Negotiate and NTLM helpers after an idle timeout.

Sun Jul 30 03:48:33 UTC 2017

On 27/07/17 18:52, Christos Tsantilas wrote:
> The patch.
> 
> Στις 26/07/2017 12:37 μμ, ο Christos Tsantilas έγραψε:
>> Squid can be killed or maimed by enough clients that start multi-step 
>> connection authentication but never follow up with the second HTTP 
>> request while keeping their HTTP connection open. Affected helpers 
>> remain in the "reserved" state and cannot be reused for other clients. 
>> Observed helper exhaustion has happened without any malicious intent.
>>
>> To address the problem, we add a helper reservation timeout. Timed out 
>> reserved helpers may be reused by new clients/connections. To minimize 
>> problems with slow-to-resume-authentication clients, timed out 
>> reserved helpers are not reused until there are no unreserved running 
>> helpers left. The reservations are tracked using unique integer IDs.
>>

Since NTLM and Negotiate are both very stateful security protocols this 
re-use is only possible if the helper is using concurrency in its 
communications with Squid. To do so otherwise would randomly allow 
replay attacks to succeed - in a way that would be extremely nasty to 
troubleshoot. Attackers are fully able to flood an auth backend with 
traffic outside of Squid and slow it down sufficiently for this attack 
to become a problem.
  Note that the type-1 tokens where the TCP connection can be closed 
also should not reserve a helper - if that is happening it is a bug 
regardless of this patch.

To reach the desired behaviour it would be better to actually implement 
concurrency support for the stateful helpers interfaces. So we can 
ensure the helper is fully aware of the separation between clients auth 
sessions regardless of whether any given token gets replayed.

It should not be much of a step to use the concurrency channel-ID as 
part of the reservationId. Helpers that support concurrency should not 
have trouble servicing auth on other channel-ID until the total number 
of reserved channels gets extremely high. At that point restarting the 
helper is sane and the existing on-persistent-overload logics can be 
applied to whether Squid dies or simply kills the blocked helper (ERR).

This patch is going most of the way towards that already by using 
reservation-ID to replace hardcoded class dependencies, and moving a lot 
of code to the Base class. But it does not go far enough to make the 
change safe to add to Squid IMO.

NP: I have only briefly checked the code to be sure this was not 
actually doing the concurrency change yet without mentioning it. 
However, one other major issue stands out;

StatefulGetFirstAvailable() is a C-style accessor method for the 
stateful_helper, we simply have not managed to make it one properly yet. 
The value it is given is the context within/for which the caller needs 
it to find a usable *_server. Changing that context in order to find a 
different server result (for some other context) is not an appropriate 
thing to be doing.

The only thing in the new code for StatefulGetFirstAvailable() which 
might lead to it needing non-const is the HelperServerBase::reserved() 
method being indirectly called. That method should be "virtual bool 
reserved() const = 0". Fixing that should make the 
StatefulGetFirstAvailable() API const'ness change unnecessary.

-1 for now, and please do not apply this until the replay attack problem 
is fully resolved.

Amos