[squid-users] Dynamic/CDN Content Caching Challenges

Muhammad Faisal faisalusuf at yahoo.com
Thu Apr 14 13:18:46 UTC 2016


Hi Amos,
As you mentioned "Better to Store-ID cache the thing its Location header 
is pointing to." The problem is Location header has random strings in 
the URL that caused unique URL for the same object.
Location: 
http://fs37.filehippo.com/9546/46cfd241f1da4ae9812f512f7b36643c/vlc-2.2.2-win64.exe

Random string in the URL "/9546/46cfd241f1da4ae9812f512f7b36643c"

I was trying to deal with this situation.

--
Regards,
Faisal.



------ Original Message ------
From: "Muhammad Faisal" <faisalusuf at yahoo.com>
To: "Amos Jeffries" <squid3 at treenet.co.nz>; 
squid-users at lists.squid-cache.org
Sent: 4/14/2016 4:21:16 PM
Subject: Re: [squid-users] Dynamic/CDN Content Caching Challenges

>Thanks i will keep grinding on other websites. Currently working on 
>streaming videos to be served from Cache. I'm a bit confuse on cache 
>hit reason why its miss is it because of 206 or some other reason:
>
>TCP_MISS/206 3874196 GET 
>http://cw002.foo.net/files/videos/2015/12/30/145148227265e28-360.mp4 - 
>ORIGINAL_DST/a.b.c.d video/mp4
>
>Im trying with the regexp with store-ID helper to be served from the 
>cache and save it as single object because [cw002] could change and 
>will result in a different object.
>
>so my understanding with storeid helper to deal with those objects 
>which are similar but originating from different hosts is correct?
>http:\/\/(cws[0-9]+)\.foo.net\/files\/videos\/.*\/.*\/(.*\.mp4)
>
>to store as http://cdn.foo.net/" . $1
>
>
>
>--
>Regards,
>Faisal.
>
>
>
>------ Original Message ------
>From: "Amos Jeffries" <squid3 at treenet.co.nz>
>To: "Muhammad Faisal" <faisalusuf at yahoo.com>; 
>squid-users at lists.squid-cache.org
>Sent: 4/14/2016 3:59:14 PM
>Subject: Re: [squid-users] Dynamic/CDN Content Caching Challenges
>
>>On 14/04/2016 9:32 p.m., Muhammad Faisal wrote:
>>>  Thanks Amos for a detailed response.
>>>  Well for Squid we are redirecting only HTTP traffic from policy 
>>>routing.
>>>  The object is unique which is being served to clients but due to
>>>  different redirection of every user a new object is stored.
>>>
>>>  What about http streaming content having 206 response code how to 
>>>deal
>>>  with it? afaik squid dont cache 206 partial content. Is this 
>>>correct?
>>
>>Squid does not cache 206 from the server. But a HIT served by Squid 
>>can
>>be 206 status.
>>
>>>
>>>  e.g filehippo below is the sequence:
>>>
>>>  When I click download button there are two requests one 301 which
>>>  contains (Location header for the requested content) and second 200:
>>>
>>>  301 Headers: ?
>>>
>>>  GET
>>>  
>>>/download/file/6853a2c840eaefd1d7da43d6f2c94863adc5f470927402e6518d70573a99114d/
>>>  HTTP/1.1
>>>  Host: filehippo.com
>>>  Accept:
>>>  
>>>text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
>>>  Accept-Encoding: gzip, deflate, sdch
>>>  Accept-Language: en-US,en;q=0.8
>>>  Cookie: FHSession=mfzdaugt4nu11q3yfxfkjyox;
>>>  FH_PreferredCulture=l=en-US&e=3/30/2017 1:38:22 PM;
>>>  __utmt_UA-5815250-1=1; __qca=P0-1359511593-1459345103148;
>>>  __utma=144473122.1934842269.1459345103.1459345103.1459345103.1;
>>>  __utmb=144473122.3.10.1459345119355; __utmc=144473122;
>>>  
>>>__utmz=144473122.1459345103.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
>>>  __utmv=144473122.|1=AB%20Test=new-home-v1=1
>>>  Referer:
>>>  
>>>http://filehippo.com/download_vlc_64/download/56a450f832aee6bb4fda3b01259f9866/
>>>
>>>  Upgrade-Insecure-Requests: 1
>>>  User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
>>>  (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36
>>>
>>>  HTTP/1.1 301 Moved Permanently
>>>  Accept-Ranges: bytes
>>>  Age: 0
>>>  Cache-Control: private
>>>  Connection: keep-alive
>>>  Content-Length: 0
>>>  Content-Type: text/html
>>>  Date: Wed, 30 Mar 2016 13:38:45 GMT
>>>  Location:
>>>  
>>>http://fs37.filehippo.com/9546/46cfd241f1da4ae9812f512f7b36643c/vlc-2.2.2-win64.exe
>>>
>>>  Via: 1.1 varnish
>>>  X-Cache: MISS
>>>  X-Cache-Hits: 0
>>>  x-debug-output: FHSession=mfzdaugt4nu11q3yfxfkjyox;
>>>  FH_PreferredCulture=l=en-US&e=3/30/2017 1:38:22 PM;
>>>  __utmt_UA-5815250-1=1; __qca=P0-1359511593-1459345103148;
>>>  __utma=144473122.1934842269.1459345103.1459345103.1459345103.1;
>>>  __utmb=144473122.3.10.1459345119355; __utmc=144473122;
>>>  
>>>__utmz=144473122.1459345103.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
>>>  __utmv=144473122.|1=AB%20Test=new-home-v1=1
>>>  X-Served-By: cache-lhr6334-LHR
>>>
>>
>>Ew. Borked server. 302 may be old but there are situations (this being
>>one) where it actually is appropriate to respond with a temporary 
>>status.
>>
>>It also seems to contain an amateur attempt at cache-optimization by
>>someone who does not understand what middleware does.
>>
>>
>>You could technically force this to cache. But its not worth it. Let 
>>the
>>site admin who made that yucky response deal with the 2x latency cost
>>they created. Better to Store-ID cache the thing its Location header 
>>is
>>pointing to.
>>
>>
>>>  200 Header: Why ATS is not caching octet stream despite having 
>>>CONFIG
>>>  proxy.config.http.cache.required_headers INT 1
>>
>>Squid is not ATS. The 301 response above is CC:private so only the
>>receiving browser is allowed to cache it. What was the question?
>>
>>>  GET /9546/46cfd241f1da4ae9812f512f7b36643c/vlc-2.2.2-win64.exe 
>>>HTTP/1.1
>>>  Host: fs37.filehippo.com
>>
>>What do you know about the components of that URL...
>>
>>* What does "9546" mean;
>>  - just a random number?
>>  - some form of customer-ID videolan have with Filehippo ?
>>  - some form of category ID that represents VLC software type etc?
>>
>>* What does the long random looking hex number mean;
>>  - just a random visitor session ID?
>>  - the hash sum for the VLC binary being fetched?
>>
>>... or something else?
>>
>>try some manual requests with different values and see what happens to
>>the response. Pay particular attention to the ETag response header, 
>>its
>>size, and if you want to be paranoid take the SHA1 and MD5 hashes of 
>>the
>>response object when it looks like it should be identical.
>>
>>Check your logs for patterns in the URLs and test in teh same ways the
>>other files you find people fetching.
>>
>>If that checks out then you know what your Store-ID pattern can drop 
>>and
>>what needs to be kept.
>>
>>This is the hard way, and a "lot of work" as I mentioned earlier. If 
>>you
>>want to help the community then please contribute back by putting your
>>findings into the wiki Store-ID database pages so all that work does 
>>not
>>go to waste.
>>
>>>
>>>  HTTP/1.1 200 OK
>>>  Accept-Ranges: bytes
>>>  Age: 739
>>>  Connection: keep-alive
>>>  Content-Length: 31367109
>>>  Content-Type: application/octet-stream
>>>  Date: Wed, 30 Mar 2016 13:26:43 GMT
>>>  ETag: "81341be3a62d11:0"
>>>  Last-Modified: Mon, 08 Feb 2016 06:34:21 GMT
>>>
>>
>>Amos
>>



More information about the squid-users mailing list