[squid-users] Inconsistent accessing of the cache, craigslist.org images, wacky stuff.

Eliezer Croitoru eliezer at ngtech.co.il
Fri Oct 30 04:44:33 UTC 2015


Hey,

I was convinced that there was an option to disable the host forgery 
test, which will make more sense if you will use bind and will intercept 
all DNS traffic into it.

About your idea for an upstream cache.
It's a pretty nice idea, I am pretty sure that the host forgery test can 
be disabled in a case you are using an upstream cache_peer.
If it is not in the code yet it should be reported as a bug.(some can 
argue it is a wanted feature)
The idea by itself is not crazier that what I have done at: 
http://wiki.squid-cache.org/ConfigExamples/DynamicContent/Coordinator

The idea of pre-fetching is old and I had an intention to write and ICAP 
service that will do something like that but with the full original 
request headers.
The issue with a pre-fetching of a file is that you will be required to 
download the file at-least twice and the first request will might not be 
saved into the cache as it should.
If you plan to implement pre-fetching consider using some ICAP service 
that will know about the full request headers to mimic the exact same 
request.
If you do have interest in the ICAP idea take a look at the the ICAP 
service I wrote(in Golang) at:
https://github.com/elico/squidblocker-icap-server
You can see that in the filterByUrl function the req.Request object 
content can be dumped and re-used for the pre-fetch.

Eliezer

On 30/10/2015 05:09, Jester Purtteman wrote:
>
> We've got a couple thoughts going at once here, so let me condense it a bit.  First, yes, this is coming in over a satellite and that is part of the bugger.  Nothing like 560 ms to bring a connection to a halt.  Part of my plan is exactly as you say, optimize the links by setting huge tcp_windows and all the rest so that I can get full bandwidth.  The other part of the story (and I could just be misunderstanding this too) is that it appears that if I have say, 3 or 4 clients connect for a file over the course of the period of the download, if any one of them (or maybe just the last one, again, insufficient testing so I don't know the exactly course of events here) ends up requesting an IP different than what is looked up, it appeared to drop the file.
>
>> >I think a worse problem is if the DNS TTL is shorter than a client connections TCP connected time.
>> >Then requests arriving after the DNS TTL expired would no longer match the initial dst-IP.
> That is what I think I was seeing:  if by that you mean, clients A, B, and C all request a large file (few hundred MB), it downloads but takes more than 300 seconds (which has become a pretty common TTL, when did that happen?), and then D requests it too, but the DNS updates while its coming in and suddenly gets flagged as a host forgery and is no longer cacheable.  I could be wrong, so I need to experiment, but I think that’s what I am seeing.
>
> My crazy solution is, I have a server on a fast connection on which I setup a cache there with a pretty big minimum and maximum file size (say, 10 MB minimum object size, 8,000MB maximum) and set it up as a parent cache to the cache out at the slow end of the universe, which is a transparent proxy.  The transparent proxy then uses the parent proxy to request the files, and when the files happen to be very big, I set up the connection to do a pre-cache (because a 100 MB file is a piece of cake for a 100 mbps connection) and it stores it, because the time to download was trivial compared to the DNS TTL.  I set the cache up no the slow end to cache more aggressively, but the point is that once the cache down south has the file, the cache up north is requesting the file from a system much more optimized to pull big files over, and that improves the odds that the DNS has not updated before the transfer completes.
>
> I'm not convinced my idea is valid, so I'll have to ponder it a bit, but I'm going to give it a shot and let you know if it makes a difference.  Bottom line is, it is a pretty nasty work around, and there is probably a better solution if someone that knows C out there worth beans is into it.  I don't think there are ANY answers that don't involve setting up your own DNS, but after configuring BIND in about 7 minutes last night, I am thinking that’s not a big issue.  The obvious answers I can think of are (1) to maintain a short table of IPs associated with a specific domain request until all transfers referring back to it have passed and rewrite the DNS resolution calls to refer to that table or (2) tag the requested IP and resolved IP.
>
> The last line of C I wrote was in the 90s, but I'll dig in and see if I can find the right place to start making a mess:).
>
> In any event, you and Eliezer have helped me get farther since Tuesday night than I had since August, Thank you both!




More information about the squid-users mailing list