[squid-users] a bit off topic. New user question
Amos Jeffries
squid3 at treenet.co.nz
Tue May 23 05:29:30 UTC 2017
On 23/05/17 11:39, George Diaz wrote:
>
> Hi
>
> sorry this off-topic question ...
>
> I want pre-cache some object from some interest host with wget.
>
> My question is : I want: the wget download the object to the /dev/null
> but
> I'm not found this switches....
> (GNU Wget 1.5.3)
>
> I'm probe this :
> export http_proxy=http://mycache.com:8080/
> wget -r http://sobredinero.com -P /dev/null -nH -nd -Y on -b -l5 -t1
> -o /dev/null
>
> but this is create a /dev/null directory :) and download the files
> into this.
>
> any suggestions ?
>
Some advice before you get too far into this project;
Pre-caching was an good idea back in the days of HTTP/1.0 and static
websites where the URL was all that mattered. In todays HTTP/1.1 and
HTTP/2 world dynamic content and variants are much more common things,
and both make pre-caching pretty much useless.
Before you attempt it for any domain I recommend passing a few of its
URLs through the tool at <https://redbot.org>. If that tool indicates
the site uses content negotiation or conditional HTTP features then
pre-caching is just going to be causing problems.
For example; that sobredinero domain above produces these details:
Content Negotiation
* The response body is different when content negotiation happens.
Caching
* Vary: User-Agent can cause cache inefficiency.
This means that anything you pre-cache with wget will be ignored and
probably replaced when any non-wget agent (ie a browser) is used to
fetch through the proxy. So you just waste all the bandwidth, time, and
storage space used pre-caching it.
Vary:User-Agent is particularly bad since any single character
difference in the User-Agent header will cause a different object to be
referred to in the cache storage. If you wish to pre-cache these objects
in any useful way you have to know and mimic the *exact* User-Agent
header values that will be used to fetch it. For example; two different
version of Chrome -> different User-Agent header. Internet Explorer with
different Windows Updates applied -> different User-Agent header. As you
can imagine that is a very hard thing to predict.
Note: if you have come to this idea after seeing objects from that
domain getting a lot of MISS records, the problem is very much that Vary
header causing so many different objects to be needed that objects being
stored are often not the right one(s) for any later client request.
pre-caching will not solve this but make it worse as wget is just
another different User-Agent.
Amos
More information about the squid-users
mailing list