[squid-users] Time for cache synchronization between siblings

Fri Dec 18 15:14:19 UTC 2015

Hi Amos,

It was definitely ignorance of the tools on my part. I am using curl
for testing my setup.
I was using different URLs (different host/IP address as part of URL)
when issuing request to to Squid. That caused the problem I observed.

I read about Curl tool and found out that I can set Host header.
So when I set Host header to same value in all curl commands, cache
hits are happening as I was expecting it to. Access.log shows clearly
that the URLs are same.

So I can say that squid has solved our problem nicely.

A few words about our setup. Squid is used as a reverse caching proxy.

Our application serves video fragments. We use HLS (HTTP Live
streaming) where a given video is broken up into small fragments and
served on demand. Since transcoding to different formats and bit rates
is CPU intensive, we want to cache frequently accessed video
fragments.

Also, we use CARP to make sure that  requests for all fragments of a
given video file go to same backend webserver, because it would not
have do download the single large video file from backend server.

Thanks to this mailing list I have been able to successfully use squid.

rgds,
Sreenath

On 12/18/15, Amos Jeffries <squid3 at treenet.co.nz> wrote:
> On 18/12/2015 1:21 a.m., Sreenath BH wrote:
>> Hi,
>>
>> Thanks for the detailed response. I really appreciate it.
>>
>> Unfortunately the load balancer we use is not a squid load balancer
>> and for now I will have to use HTCP.
>>
>> Please take a look at the following lines from access.log of one of
>> the three squid servers.
>> ------------
>> 1450351827.534      0 10.135.83.129 UDP_HIT/000 0 HTCP_TST
>> http://127.0.0.1:3128/media/stream/video/NDo3ZDY1OTRjOS02NjM4LTQyNDMtOGMyNi0zYTc3YmI1MzI3ZjAubXA0?size=xs&start=0.000&end=5.930&
>> - HIER_NONE/- -
>>
>> 1450351827.562     20 10.135.83.129 TCP_HIT/200 553852 GET
>> http://127.0.0.1:3128/media/stream/video/NDo3ZDY1OTRjOS02NjM4LTQyNDMtOGMyNi0zYTc3YmI1MzI3ZjAubXA0?
>> - HIER_NONE/- video/mp2t
>>
>> 1450352028.731      0 10.135.83.128 UDP_MISS/000 0 HTCP_TST
>> http://10.135.83.128:3128/media/stream/video/NDo3ZDY1OTRjOS02NjM4LTQyNDMtOGMyNi0zYTc3YmI1MzI3ZjAubXA0?size=xs&start=0.000&end=5.930&
>> - HIER_NONE/- -
>> --------
>>
>> The first line indicates a hit when queried by a peer. Note that the
>> IP address is 127.0.0.1.
>> It was a UDP HIT and it was followed by the actual request for the
>> cached object, which succeeded.
>>
>> Now the third line indicates UDP query for same object, except that
>> URL has a different IP address, and the log says it was a MISS.
>>
>> I don't know what I am doing wrong, but it consistently seems to treat
>> the IP address as part of the URL for purpose of HIT/MISS decision.
>
>
> Notice how the normal HTTP request (TCP_* line) is also using
> "127.0.0.1" for origin server name.
>
> This means the two HTCP requests really are for two very different URLs.
> Completely different origin servers being contacted.
>
>
>>
>> If all requests were made from a local client(say using curl running
>> locally on the machine) and using 127.0.0.1 as IP address, HTCP works
>> correctly.
>
> What is the output of "squid -v" ?
>
> And beyond being a layer of caches behind a LB. What is the purpose of
> this installation;
>  reverse-proxy / CDN ?
>  ISP forward/explicit proxy farm?
>  Intranet gateway?
>  some mix of the above?
>
> and what is your full squid.conf ? (without comments and empty lines of
> course).
>
>
> What exactly are the clients (curl) requesting?
>  from the Squid siblings directly? or through the LB?
>
>
>>
>> Even without HTCP, just issuing same request from localhost and
>> another machine using a the externally visible IP address, squid does
>> not appear to use cached object. I am new to HTTP and think I must be
>> doing something wrong, but cant say what.
>
> Huh? Those log lines you posted above contradict. The first HTCP said
> HIT and the TCP object fetch was served from the cache. The second said
> MISS on the other URL, so no TCP fetch.
>
> The sibling lookup and HTCP appears to be working perfectly correct.
>
>>
>> I wonder if ICP would have fared better since it uses just the URL.
>> Might that be a reason?
>
> No. ICP always fares worse. It says UDP_HIT in a lot of cases where the
> URL is same but the followup TCP fetch discovers HTTP mime headers
> negotiating some variant object not in the sibling cache. So UDP_HIT
> followed by TCP_MISS.
>  That is almost the worst-case scenario: it causes a minimum of 2x proxy
> processing latency delays on the whole transaction from the client
> viewpoint. Unltimate worst-case is where a whole chain of parallel
> siblings get relayed through due to UDP_MISS with N times the latency
> multiplication.
>
> At least with HTCP trying to fetch variants results in UDP_MISS. That is
> the second-best *good* result.
>
>
> That is all irrelevant to this case though. Since the raw-IP is part of
> the URL, both of these siblings are working correctly.
>
>
> Problem is why is the raw-IP with port 3128 part of the URL? It smells
> like a miscnfiguration or bad design somewhere. The answers to my above
> questions should help find the solution to that.
>
> Amos
>
>