[squid-dev] [PATCH] url_rewrite_timeout directive

Tsantilas Christos chtsanti at users.sourceforge.net
Mon Nov 17 17:10:51 UTC 2014


On 11/16/2014 01:05 PM, Amos Jeffries wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 16/11/2014 7:38 a.m., Tsantilas Christos wrote:
>> Hi all,
>>
>> This patch adds the url_rewrite_timeout directive.
>>
>> When configured, Squid keeps track of active requests and treats
>> timed out requests to redirector as failed requests.
>>
>> url_rewrite_timeout format: url_rewrite_timeout timeout time-units
>> on_timeout=<fail|bypass|retry|use_configured_response>
>> [response=<quoted-string>]
>>
>> The url_rewrite_timeout directive can accept the on_timeout
>> argument to allow user configure the action when the helper request
>> times out. The available actions are: - fail: squid return a
>> ERR_GATEWAY_FAILURE error page
>
> Why?
>   It seems to me this should be doing one of the below:
>
>   on_timeout=use_configured_response response=ERR
>
>   on_timeout=use_configured_response response=BH
>
> although being able to redirect to a Squid default error page template
> has its attractions regardless of this. Perhapse BH with an
> error=ERR_GATEWAY_FAILURE kv-pair ?

The on_timeout= option is easier to understand and configure.

Also this patch add the retries operation inside helpers.cc code and 
make it easier to implement similar features for other helpers too...

Also the BH retry can be implemented now using helpers retry operation.

>
>
>> - bypass: the url is not rewritten.
>
> Identical to:
>   on_timeout=use_configured_response response=OK
>
>> - retry: retry the request to helper
>
> Equivalent to what is supposed to happen for:
>   on_timeout=use_configured_response response=BH
>
> NP: if retry with different helper process is not already being done
> on BH then it needs to be added.

It is not implemented for BH redirects. Also although it is make sense 
to not use the same server for a BH response, it is not clear why it is 
needed for a timedout server?

I believe this is should have a different form. If there are many 
timedout responses from a server, then do not sent more requests for a 
while, or maybe shut it down.
We can add a todo for this one.

>
>> - use_configured_response: use a response which can be configured
>> using the the response= option
>>
>
> So as you can see, with a change to make the URL-rewriter capable of
> redirecting to one of the built-in error page templates we could
> completely drop the on_timeout setting.

I still believe that the "on-timeout" is better options and easier to 
understand and configure.

>>
>> Technical details =====================
>>
>> This patch: - adds mechanism inside helpers.cc code to handle
>> timeouts: - return a pre-configured response on timeout - or
>> retries on timeouts. - or timedout (Helper::TimedOut code) response
>> to the  caller. The caller can select to ignore the timedout
>> request, or produce an error.
>>
>> - modify the client_side_request.cc to return ERR_GATEWAY_FAILURE
>> error page. Also the error detail ERR_DETAIL_REDIRECTOR_TIMEDOUT is
>> set to identify the type of error.
>>
>
> For the record I am still extremely skeptical about the use-case
> behind this feature. Timeout i a clear sign that the helper or system
> behind it is broken (capacity overload / network congestion).
> Continuing to use such systems in a way which further increases
> network load is very often a bad choice. Hiding the issue an even
> worse choice.


Maybe the administrator does not care about unanswered queries.
Imagine a huge squid proxy, which may have to process thousands of URLs 
per minute and the administrator decided that a 10% of helpers request 
can fail. In this case we are giving him a way to configure the squid 
behaviour in this case: may ignore the probelm, or use a predefined 
answer etc

The reason the helper is not answered enough fast, maybe is a database 
or an external lookup failure (for example categorized urls list as a 
DNS service).
In these cases the system admin or service provider, may prefer a none 
answer from the helper, or a preconfigured answer, instead of waiting 
too long for the answer.

The helper can be implemented to not answer at all after a timeout period.
A policy of "if you do not answer in a day, please do not answer at all, 
I am not interested any mode" is common in human world, and in business.

>
>
> Now the code:
>
> in src/ConfigParser.cc:
>
> * please call the new type flag ConfigParser::ParseKvPair_ instead.
> "ParseKeyValue_" implies that only the value portion is being parsed,
> not the entire kv-pair.
>   - similary NextKvPair() instead of NextKeyValue().
>
>
> in src/ConfigParser.h:
>
> * We are (or should be) calling key=value "kv-pair" through the rest
> of Squid. Not "key=value pairs" nor "key value pair which must be
> separated by '='" nor "Key=value token"
>
>
> in src/auth/digest/UserRequest.cc:
>
> * please shift the Helper::TimedOut case up above
> Helper::BrokenHelper. So it is clear the case is by default a BH not
> an ERR response.
>

ok to all.

>
> in src/cache_cf.cc
>
> * please move the *_url_rewrite_timeout() functions to redirect.cc
>   - that will help reduce dependency issues when we get around to
> making the redirector config a standalone FooConfig object.

Not so easy, because of dependencies on time-related parsing functions....
I let it for now. If required we must move time parse functions to a 
library file, for example Config.cc


>
> * missing implementation of dump_url_rewrite_timeout().
>   - if the function is really meant to be empty use a #define to set it
> to no-op like the rest of the parser code instead off adding to the
> already large symbol table.

True, I forgot to implement it...
It is implemented in this patch.

>
> * it would be simpler and easier on users to omit the
> "on_timeout=use_configured_response response=" and just have the
> existence of a response= kv-pair imply the on_timeout action.
>   - that also allows the above suggested removal of bypass and retry
> special actions without loosing any functionality.
>   - that removes the FATAL errors if admin gets the options ordering
> different from what you expect.
>   - and allows for forward-compatibility with unknown options. A while
> loop fetching kv-pair and processing them as found regardess of order
> seems best for a field documented a " [options] ".
>

True.
This is fixed in this patch

>
> in src/cf.data.pre:
>
> * omit the ':' after documented on_timeout= values (if any after the
> above changes). It makes it seem like the ':' is part of the value.
>
> * s/the url is not rewritten./Do not re-write the URL/
>
> * s/retry the request to helper/Send the lookup to the helper again/

ok, fixed as you suggested in this patch

> in src/client_side_request.cc:
>
> * Bad logic "if (!Config.onUrlRewriteTimeout.action == toutActBypass)"
>   - aka. if (true == toutActBypass || false == toutActBypass)  ???
>   - I think you mean:
>      (Config.onUrlRewriteTimeout.action != toutActBypass)

Yes!

>
> * please add empty line after break; between switch cases

ok

>
> * please define the new handleAdaptationFailure() *above* the new
> calloutsError().
>   - that will reduce the diff and make it clearer what actually is
> changing and what not.


>
> * its now possible for adaptation error to silently continue
> processing. Before it would hang. I am not sure that is a good or bad
> change. But it may be worth noting the difference in
> handleAdaptationFailure() where it not unconditionally calls
> doCallouts() at the end.

There is not any change on handling adaptation errors, or should not be.
Am I loosing something?

>
>
> in src/helper.cc:
>
> * adding multiple double-empty lines in the chunk @195
>
> * helper_server::checkForTimedOutRequests() is deducting from
> stats.pending pending. This is incorrect, the helper request is still
> pending.

Amos, I think it is better as is....
Maybe in a point of view the request is still pending, but we are not 
waiting an answer for this request any more.

>   - your act of removing the requests from the queue invalidates the
> logic in helperReturnBuffer() recording and managing late helper
> responses.

why?

>   - both of the above also corrupts the mgr report content for expected
> pending responses.
>   By all means call (and clear) the callback on a timeout but please do
> not remove the queue entries for them until the helper actually
> responds to that query.

Some of the queries will never  answered. This is may result to memory 
leaks.


>   - if a helper gets too many un-responded queries the traffic being
> served by it is f'ckd and we should shutdown the helper. If an
> auto-restart does not work Manual intervention is probably required
> from the admin to resolve why it is timing out *at all*.




>   NP: when doing a re-try we still may get a response to this lookup
> from *either* of the helpers which have been asked for it. The frst
> helper is in fact more likely to respond first. IMO we should be
> taking advantage of that and leaving both helpers with shared pointer
> which either can call+unset the callback for.

Maybe this is good optimization.
However requires some work.... We can add it later if required as 
optimization...

>
>
> in src/helper/Reply.cc:
> * Timeout does not need to be allocated a on-wire protocol code
> ("TO"). Please just use the word "Timedout", like "Unknown" it is
> purely for debug reporting.

ok.


>
>
> Please also document clearly in the release notes that:
>
> A) helper-mux.pl we have been distributing for the past few years to
> encourage use of concurrency is no longer compatible with Squid. If
> used it will spawn up to 2^64 helpers and DoS the Squid server.

What is this helper? I can not find it inside squid sources....

>
> B) helpers utilizing arrays to handle fixed amounts of concurrency
> channels MUST be re-written to use queues and capable of handling a
> 64-bit int as index or they will be vulnerable to buffer overrun and
> arbitrary memory accesses.


OK for this

>
> C) 32-bit helpers need re-writing to handle the concurrency channel ID
> as a 64-bit integer value. If not updated they will cause proxies to
> return unexpected results or timeout once crossing the 32-bit wrap
> boundary. Leading to undefined behaviour in the client HTTP traffic.
>

OK.

>
>
> My vote on this is -0. Too many problems and still no clear use-case
> has been described[1] for this to be a reasonable addition.


OK.
I tried to do most  of the fixes you are suggested.
I hope it is OK.



>
>
> [1] "helper is slow" and "helper times out" are not use-case
> descriptions. They are symptoms of an underlying problem. The "WHY" is
> highly important.
>   Ignoring helper on timeout bypass/auto-reply *raises* network req/sec
> throughput. If the helper was slow because it was network congested or
> capacity congested at the backend API you just made the problem
> infinitely worse by adding more req/sec on top of it. Slow responses
> are the natural feedback loop restricting traffic to levels the helper
> backend system can handle. The solution is not to ignore the delay
> itself, but to remove the bottleneck limit which is causing it to be
> slow. Without that bottleneck the throughput performance is improved
> permanently.
>
> Amos
-------------- next part --------------
A non-text attachment was scrubbed...
Name: url_rewrite_timeout-all-t3.patch
Type: text/x-patch
Size: 161566 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20141117/578a20af/attachment-0001.bin>


More information about the squid-dev mailing list