[squid-dev] [PATCH] url_rewrite_timeout directive

Sun Nov 16 11:05:26 UTC 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 16/11/2014 7:38 a.m., Tsantilas Christos wrote:
> Hi all,
> 
> This patch adds the url_rewrite_timeout directive.
> 
> When configured, Squid keeps track of active requests and treats
> timed out requests to redirector as failed requests.
> 
> url_rewrite_timeout format: url_rewrite_timeout timeout time-units 
> on_timeout=<fail|bypass|retry|use_configured_response> 
> [response=<quoted-string>]
> 
> The url_rewrite_timeout directive can accept the on_timeout
> argument to allow user configure the action when the helper request
> times out. The available actions are: - fail: squid return a
> ERR_GATEWAY_FAILURE error page

Why?
 It seems to me this should be doing one of the below:

 on_timeout=use_configured_response response=ERR

 on_timeout=use_configured_response response=BH

although being able to redirect to a Squid default error page template
has its attractions regardless of this. Perhapse BH with an
error=ERR_GATEWAY_FAILURE kv-pair ?

> - bypass: the url is not rewritten.

Identical to:
 on_timeout=use_configured_response response=OK

> - retry: retry the request to helper

Equivalent to what is supposed to happen for:
 on_timeout=use_configured_response response=BH

NP: if retry with different helper process is not already being done
on BH then it needs to be added.

> - use_configured_response: use a response which can be configured 
> using the the response= option
> 

So as you can see, with a change to make the URL-rewriter capable of
redirecting to one of the built-in error page templates we could
completely drop the on_timeout setting.

> 
> Technical details =====================
> 
> This patch: - adds mechanism inside helpers.cc code to handle
> timeouts: - return a pre-configured response on timeout - or
> retries on timeouts. - or timedout (Helper::TimedOut code) response
> to the  caller. The caller can select to ignore the timedout
> request, or produce an error.
> 
> - modify the client_side_request.cc to return ERR_GATEWAY_FAILURE
> error page. Also the error detail ERR_DETAIL_REDIRECTOR_TIMEDOUT is
> set to identify the type of error.
> 

For the record I am still extremely skeptical about the use-case
behind this feature. Timeout i a clear sign that the helper or system
behind it is broken (capacity overload / network congestion).
Continuing to use such systems in a way which further increases
network load is very often a bad choice. Hiding the issue an even
worse choice.

Now the code:

in src/ConfigParser.cc:

* please call the new type flag ConfigParser::ParseKvPair_ instead.
"ParseKeyValue_" implies that only the value portion is being parsed,
not the entire kv-pair.
 - similary NextKvPair() instead of NextKeyValue().

in src/ConfigParser.h:

* We are (or should be) calling key=value "kv-pair" through the rest
of Squid. Not "key=value pairs" nor "key value pair which must be
separated by '='" nor "Key=value token"

in src/auth/digest/UserRequest.cc:

* please shift the Helper::TimedOut case up above
Helper::BrokenHelper. So it is clear the case is by default a BH not
an ERR response.

in src/cache_cf.cc

* please move the *_url_rewrite_timeout() functions to redirect.cc
 - that will help reduce dependency issues when we get around to
making the redirector config a standalone FooConfig object.

* missing implementation of dump_url_rewrite_timeout().
 - if the function is really meant to be empty use a #define to set it
to no-op like the rest of the parser code instead off adding to the
already large symbol table.

* it would be simpler and easier on users to omit the
"on_timeout=use_configured_response response=" and just have the
existence of a response= kv-pair imply the on_timeout action.
 - that also allows the above suggested removal of bypass and retry
special actions without loosing any functionality.
 - that removes the FATAL errors if admin gets the options ordering
different from what you expect.
 - and allows for forward-compatibility with unknown options. A while
loop fetching kv-pair and processing them as found regardess of order
seems best for a field documented a " [options] ".

in src/cf.data.pre:

* omit the ':' after documented on_timeout= values (if any after the
above changes). It makes it seem like the ':' is part of the value.

* s/the url is not rewritten./Do not re-write the URL/

* s/retry the request to helper/Send the lookup to the helper again/

in src/client_side_request.cc:

* Bad logic "if (!Config.onUrlRewriteTimeout.action == toutActBypass)"
 - aka. if (true == toutActBypass || false == toutActBypass)  ???
 - I think you mean:
    (Config.onUrlRewriteTimeout.action != toutActBypass)

* please add empty line after break; between switch cases

* please define the new handleAdaptationFailure() *above* the new
calloutsError().
 - that will reduce the diff and make it clearer what actually is
changing and what not.

* its now possible for adaptation error to silently continue
processing. Before it would hang. I am not sure that is a good or bad
change. But it may be worth noting the difference in
handleAdaptationFailure() where it not unconditionally calls
doCallouts() at the end.

in src/helper.cc:

* adding multiple double-empty lines in the chunk @195

* helper_server::checkForTimedOutRequests() is deducting from
stats.pending pending. This is incorrect, the helper request is still
pending.
 - your act of removing the requests from the queue invalidates the
logic in helperReturnBuffer() recording and managing late helper
responses.
 - both of the above also corrupts the mgr report content for expected
pending responses.
 By all means call (and clear) the callback on a timeout but please do
not remove the queue entries for them until the helper actually
responds to that query.
 - if a helper gets too many un-responded queries the traffic being
served by it is f'ckd and we should shutdown the helper. If an
auto-restart does not work Manual intervention is probably required
from the admin to resolve why it is timing out *at all*.

 NP: when doing a re-try we still may get a response to this lookup
from *either* of the helpers which have been asked for it. The frst
helper is in fact more likely to respond first. IMO we should be
taking advantage of that and leaving both helpers with shared pointer
which either can call+unset the callback for.

in src/helper/Reply.cc:
* Timeout does not need to be allocated a on-wire protocol code
("TO"). Please just use the word "Timedout", like "Unknown" it is
purely for debug reporting.

Please also document clearly in the release notes that:

A) helper-mux.pl we have been distributing for the past few years to
encourage use of concurrency is no longer compatible with Squid. If
used it will spawn up to 2^64 helpers and DoS the Squid server.

B) helpers utilizing arrays to handle fixed amounts of concurrency
channels MUST be re-written to use queues and capable of handling a
64-bit int as index or they will be vulnerable to buffer overrun and
arbitrary memory accesses.

C) 32-bit helpers need re-writing to handle the concurrency channel ID
as a 64-bit integer value. If not updated they will cause proxies to
return unexpected results or timeout once crossing the 32-bit wrap
boundary. Leading to undefined behaviour in the client HTTP traffic.

My vote on this is -0. Too many problems and still no clear use-case
has been described[1] for this to be a reasonable addition.

[1] "helper is slow" and "helper times out" are not use-case
descriptions. They are symptoms of an underlying problem. The "WHY" is
highly important.
 Ignoring helper on timeout bypass/auto-reply *raises* network req/sec
throughput. If the helper was slow because it was network congested or
capacity congested at the backend API you just made the problem
infinitely worse by adding more req/sec on top of it. Slow responses
are the natural feedback loop restricting traffic to levels the helper
backend system can handle. The solution is not to ignore the delay
itself, but to remove the bottleneck limit which is causing it to be
slow. Without that bottleneck the throughput performance is improved
permanently.

Amos

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUaIT2AAoJELJo5wb/XPRjCeEH/i65891D+rRk96ZCJVavYKpH
GDb9KdjGIwQ5O4Lp5Ia9YBMcPXCxg8cqcgyfVAeA18t3AlTImG3l+QBA50/XqyZ3
BvbPea2/AKLOCBYNkDWEGsZymPWFU2De6IBoWiNFRjzr0ljPCTtjQzVFHpXuQa8V
imZC6GRC7pwFEJDNgCgxPPy+8KrX13DyNUuVQAIKL7ipCR28njwoVuMDgaw8tGk4
TgW5w6/E3wkiVzIriE3G49zft0ktr6gwH6xH8mUUTG9HfJC4wVBO0TK4WhBJagFR
0JIIWUzxr4iP21ncxWEFj4jwCX0QGd0quZDGYv/uo56ZS7C2dmJmWcn00DH8Pl4=
=S4Rs
-----END PGP SIGNATURE-----