[squid-users] Inconsistent accessing of the cache, craigslist.org images, wacky stuff.

Fri Oct 30 09:31:17 UTC 2015

On 30/10/2015 4:09 p.m., Jester Purtteman wrote:
> 
> We've got a couple thoughts going at once here, so let me condense it
> a bit.  First, yes, this is coming in over a satellite and that is
> part of the bugger.  Nothing like 560 ms to bring a connection to a
> halt.  Part of my plan is exactly as you say, optimize the links by
> setting huge tcp_windows and all the rest so that I can get full
> bandwidth.  The other part of the story (and I could just be
> misunderstanding this too) is that it appears that if I have say, 3
> or 4 clients connect for a file over the course of the period of the
> download, if any one of them (or maybe just the last one, again,
> insufficient testing so I don't know the exactly course of events
> here) ends up requesting an IP different than what is looked up, it
> appeared to drop the file.

If this is a big problem you can try "collapsed_forwarding on".

It has a few annoying bugs of its own still. One of which is that it
could slow down collapsed requests even more if the response turns out
to be non-cacheable or unusable for that client. But it would resolve
the situation above. YMMV.

> 
>> I think a worse problem is if the DNS TTL is shorter than a client
>> connections TCP connected time. Then requests arriving after the
>> DNS TTL expired would no longer match the initial dst-IP.
> 
> That is what I think I was seeing:  if by that you mean, clients A,
> B, and C all request a large file (few hundred MB), it downloads but
> takes more than 300 seconds (which has become a pretty common TTL,
> when did that happen?), and then D requests it too, but the DNS
> updates while its coming in and suddenly gets flagged as a host
> forgery and is no longer cacheable.  I could be wrong, so I need to
> experiment, but I think that’s what I am seeing.
> 
> My crazy solution is, I have a server on a fast connection on which I
> setup a cache there with a pretty big minimum and maximum file size
> (say, 10 MB minimum object size, 8,000MB maximum) and set it up as a
> parent cache to the cache out at the slow end of the universe, which
> is a transparent proxy.  The transparent proxy then uses the parent
> proxy to request the files, and when the files happen to be very big,
> I set up the connection to do a pre-cache (because a 100 MB file is a
> piece of cake for a 100 mbps connection) and it stores it, because
> the time to download was trivial compared to the DNS TTL.  I set the
> cache up no the slow end to cache more aggressively, but the point is
> that once the cache down south has the file, the cache up north is
> requesting the file from a system much more optimized to pull big
> files over, and that improves the odds that the DNS has not updated
> before the transfer completes.
> 
> I'm not convinced my idea is valid, so I'll have to ponder it a bit,
> but I'm going to give it a shot and let you know if it makes a
> difference.  Bottom line is, it is a pretty nasty work around, and
> there is probably a better solution if someone that knows C out there
> worth beans is into it.  I don't think there are ANY answers that
> don't involve setting up your own DNS, but after configuring BIND in
> about 7 minutes last night, I am thinking that’s not a big issue.
> The obvious answers I can think of are (1) to maintain a short table
> of IPs associated with a specific domain request until all transfers
> referring back to it have passed and rewrite the DNS resolution calls
> to refer to that table or (2) tag the requested IP and resolved IP.

Unfortunately in a busy proxy very popular domains stay popular and have
outstanding requests almost all the time. If they have a short outage or
migrate to other IPs (like the cloud and CDN services causing this
grief) then the proxy is left sending new requests to dead servers and
the clients see a lot of extra unnecessary breakage.

> 
> The last line of C I wrote was in the 90s, but I'll dig in and see if
> I can find the right place to start making a mess :).
> 

Of my ideas the simplest to put together would be to tie the intercepted
connection to the single Host header value that got verified.

To work on that you need to edit src/client_side_request.cc, look for
the hostVerify methods. When a verify passes add the confirmed hostname
to the ConnStateData object (src/client_side.h), and use that stored
value to accept as verified the furture requests have the same Host
value as seen before, instead of re-doing DNS lookups on the 2nd, 3rd
etc requests.

> In any event, you and Eliezer have helped me get farther since
> Tuesday night than I had since August, Thank you both!
> 

Welcome, thats what we are here for. :-)

Amos