[squid-dev] [PATCH] Retry cache peer DNS failures more frequently

Tue May 17 05:57:05 UTC 2016

Hello,

Attached is a patch which makes the changes recommended by Amos - each peer
now gets its own event for retrying resolution, dependent on the DNS TTL.
This should also fix up the concerns up by Alex. A few caveats though:

 - the cache manager shows generic "peerRefreshDNS" names for each event. I
can't find any examples that give it a dynamic name, e.g. I'd like
something like "peerRefreshDNS(example.com)", but I can't think of how I'd
do that without leaking memory or making some significant changes to the
event handler system.

- I can't figure out how to reproduce the second failure case, where a
result comes back but it has no IP addresses. I _think_ using the TTL would
be valid instead of negative_dns_ttl would be valid in that situation, but
I can't be sure. I figured this was the safest option.

 - eventDelete does not appear to be clearing out events as I expect it to,
so if you reconfigure Squid you end up with some dead events, like so:

[root at xxx ~]# squidmgr events | grep peerRefresh
Last event to run: peerRefreshDNS
peerRefreshDNS                  0.331 sec           1    yes
peerRefreshDNS                  0.679 sec           1    yes
peerRefreshDNS                  47.649 sec          1    yes
peerRefreshDNS                  61.619 sec          1    yes
peerRefreshDNS                  207.682 sec         1    yes
peerRefreshDNS                  207.682 sec         1    yes
peerRefreshDNS                  207.682 sec         1    yes
peerRefreshDNS                  207.682 sec         1    yes
peerRefreshDNS                  207.682 sec         1    yes
[root at xxx ~]# squid -k
reconfigure
[root at xxx ~]# squidmgr events | grep peerRefresh
Last event to run: peerRefreshDNS
peerRefreshDNS                  0.763 sec           1    yes
peerRefreshDNS                  0.763 sec           1    yes
peerRefreshDNS                  41.755 sec          1    yes
peerRefreshDNS                  55.755 sec          1    yes
peerRefreshDNS                  56.187 sec          1    no
peerRefreshDNS                  202.250 sec         1    no
peerRefreshDNS                  202.250 sec         1    no
peerRefreshDNS                  3599.758 sec        1    yes
peerRefreshDNS                  3599.758 sec        1    yes
peerRefreshDNS                  3599.758 sec        1    yes
peerRefreshDNS                  3599.758 sec        1    yes
peerRefreshDNS                  3599.758 sec        1    yes

If I run squid -k reconfigure again, then the events with invalid callback
data are cleared out, so it doesn't grow indefinitely at least. I'm not
sure how or if I should fix this.

Thank you,

Nathan.

On 10 May 2016 at 18:13, Alex Rousskov <rousskov at measurement-factory.com>
wrote:

> On 05/10/2016 01:50 AM, Amos Jeffries wrote:
>
> > Then each peer gets its own re-lookup event scheduled
>
> If applied correctly, this approach would also solve the misapplication
> problem I described in my concurrent review. Unfortunately, it requires
> serious work. Fortunately, you have already converted CachePeer from
> being a POD into a proper class. That will help!
>
>
> Thank you,
>
> Alex.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20160517/cd4bb5be/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: retry-failed-peers-v2.patch
Type: text/x-patch
Size: 20288 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20160517/cd4bb5be/attachment-0001.bin>