[squid-users] Dynamic/CDN Content Caching Challenges
Muhammad Faisal
faisalusuf at yahoo.com
Thu Apr 14 13:18:46 UTC 2016
Hi Amos,
As you mentioned "Better to Store-ID cache the thing its Location header
is pointing to." The problem is Location header has random strings in
the URL that caused unique URL for the same object.
Location:
http://fs37.filehippo.com/9546/46cfd241f1da4ae9812f512f7b36643c/vlc-2.2.2-win64.exe
Random string in the URL "/9546/46cfd241f1da4ae9812f512f7b36643c"
I was trying to deal with this situation.
--
Regards,
Faisal.
------ Original Message ------
From: "Muhammad Faisal" <faisalusuf at yahoo.com>
To: "Amos Jeffries" <squid3 at treenet.co.nz>;
squid-users at lists.squid-cache.org
Sent: 4/14/2016 4:21:16 PM
Subject: Re: [squid-users] Dynamic/CDN Content Caching Challenges
>Thanks i will keep grinding on other websites. Currently working on
>streaming videos to be served from Cache. I'm a bit confuse on cache
>hit reason why its miss is it because of 206 or some other reason:
>
>TCP_MISS/206 3874196 GET
>http://cw002.foo.net/files/videos/2015/12/30/145148227265e28-360.mp4 -
>ORIGINAL_DST/a.b.c.d video/mp4
>
>Im trying with the regexp with store-ID helper to be served from the
>cache and save it as single object because [cw002] could change and
>will result in a different object.
>
>so my understanding with storeid helper to deal with those objects
>which are similar but originating from different hosts is correct?
>http:\/\/(cws[0-9]+)\.foo.net\/files\/videos\/.*\/.*\/(.*\.mp4)
>
>to store as http://cdn.foo.net/" . $1
>
>
>
>--
>Regards,
>Faisal.
>
>
>
>------ Original Message ------
>From: "Amos Jeffries" <squid3 at treenet.co.nz>
>To: "Muhammad Faisal" <faisalusuf at yahoo.com>;
>squid-users at lists.squid-cache.org
>Sent: 4/14/2016 3:59:14 PM
>Subject: Re: [squid-users] Dynamic/CDN Content Caching Challenges
>
>>On 14/04/2016 9:32 p.m., Muhammad Faisal wrote:
>>> Thanks Amos for a detailed response.
>>> Well for Squid we are redirecting only HTTP traffic from policy
>>>routing.
>>> The object is unique which is being served to clients but due to
>>> different redirection of every user a new object is stored.
>>>
>>> What about http streaming content having 206 response code how to
>>>deal
>>> with it? afaik squid dont cache 206 partial content. Is this
>>>correct?
>>
>>Squid does not cache 206 from the server. But a HIT served by Squid
>>can
>>be 206 status.
>>
>>>
>>> e.g filehippo below is the sequence:
>>>
>>> When I click download button there are two requests one 301 which
>>> contains (Location header for the requested content) and second 200:
>>>
>>> 301 Headers: ?
>>>
>>> GET
>>>
>>>/download/file/6853a2c840eaefd1d7da43d6f2c94863adc5f470927402e6518d70573a99114d/
>>> HTTP/1.1
>>> Host: filehippo.com
>>> Accept:
>>>
>>>text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
>>> Accept-Encoding: gzip, deflate, sdch
>>> Accept-Language: en-US,en;q=0.8
>>> Cookie: FHSession=mfzdaugt4nu11q3yfxfkjyox;
>>> FH_PreferredCulture=l=en-US&e=3/30/2017 1:38:22 PM;
>>> __utmt_UA-5815250-1=1; __qca=P0-1359511593-1459345103148;
>>> __utma=144473122.1934842269.1459345103.1459345103.1459345103.1;
>>> __utmb=144473122.3.10.1459345119355; __utmc=144473122;
>>>
>>>__utmz=144473122.1459345103.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
>>> __utmv=144473122.|1=AB%20Test=new-home-v1=1
>>> Referer:
>>>
>>>http://filehippo.com/download_vlc_64/download/56a450f832aee6bb4fda3b01259f9866/
>>>
>>> Upgrade-Insecure-Requests: 1
>>> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
>>> (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36
>>>
>>> HTTP/1.1 301 Moved Permanently
>>> Accept-Ranges: bytes
>>> Age: 0
>>> Cache-Control: private
>>> Connection: keep-alive
>>> Content-Length: 0
>>> Content-Type: text/html
>>> Date: Wed, 30 Mar 2016 13:38:45 GMT
>>> Location:
>>>
>>>http://fs37.filehippo.com/9546/46cfd241f1da4ae9812f512f7b36643c/vlc-2.2.2-win64.exe
>>>
>>> Via: 1.1 varnish
>>> X-Cache: MISS
>>> X-Cache-Hits: 0
>>> x-debug-output: FHSession=mfzdaugt4nu11q3yfxfkjyox;
>>> FH_PreferredCulture=l=en-US&e=3/30/2017 1:38:22 PM;
>>> __utmt_UA-5815250-1=1; __qca=P0-1359511593-1459345103148;
>>> __utma=144473122.1934842269.1459345103.1459345103.1459345103.1;
>>> __utmb=144473122.3.10.1459345119355; __utmc=144473122;
>>>
>>>__utmz=144473122.1459345103.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
>>> __utmv=144473122.|1=AB%20Test=new-home-v1=1
>>> X-Served-By: cache-lhr6334-LHR
>>>
>>
>>Ew. Borked server. 302 may be old but there are situations (this being
>>one) where it actually is appropriate to respond with a temporary
>>status.
>>
>>It also seems to contain an amateur attempt at cache-optimization by
>>someone who does not understand what middleware does.
>>
>>
>>You could technically force this to cache. But its not worth it. Let
>>the
>>site admin who made that yucky response deal with the 2x latency cost
>>they created. Better to Store-ID cache the thing its Location header
>>is
>>pointing to.
>>
>>
>>> 200 Header: Why ATS is not caching octet stream despite having
>>>CONFIG
>>> proxy.config.http.cache.required_headers INT 1
>>
>>Squid is not ATS. The 301 response above is CC:private so only the
>>receiving browser is allowed to cache it. What was the question?
>>
>>> GET /9546/46cfd241f1da4ae9812f512f7b36643c/vlc-2.2.2-win64.exe
>>>HTTP/1.1
>>> Host: fs37.filehippo.com
>>
>>What do you know about the components of that URL...
>>
>>* What does "9546" mean;
>> - just a random number?
>> - some form of customer-ID videolan have with Filehippo ?
>> - some form of category ID that represents VLC software type etc?
>>
>>* What does the long random looking hex number mean;
>> - just a random visitor session ID?
>> - the hash sum for the VLC binary being fetched?
>>
>>... or something else?
>>
>>try some manual requests with different values and see what happens to
>>the response. Pay particular attention to the ETag response header,
>>its
>>size, and if you want to be paranoid take the SHA1 and MD5 hashes of
>>the
>>response object when it looks like it should be identical.
>>
>>Check your logs for patterns in the URLs and test in teh same ways the
>>other files you find people fetching.
>>
>>If that checks out then you know what your Store-ID pattern can drop
>>and
>>what needs to be kept.
>>
>>This is the hard way, and a "lot of work" as I mentioned earlier. If
>>you
>>want to help the community then please contribute back by putting your
>>findings into the wiki Store-ID database pages so all that work does
>>not
>>go to waste.
>>
>>>
>>> HTTP/1.1 200 OK
>>> Accept-Ranges: bytes
>>> Age: 739
>>> Connection: keep-alive
>>> Content-Length: 31367109
>>> Content-Type: application/octet-stream
>>> Date: Wed, 30 Mar 2016 13:26:43 GMT
>>> ETag: "81341be3a62d11:0"
>>> Last-Modified: Mon, 08 Feb 2016 06:34:21 GMT
>>>
>>
>>Amos
>>
More information about the squid-users
mailing list