[squid-users] a bit off topic. New user question

Tue May 23 05:29:30 UTC 2017

On 23/05/17 11:39, George Diaz wrote:
>
> Hi
>
> sorry this off-topic question ...
>
> I want pre-cache some object from some interest host with wget.
>
> My question is : I want: the wget download the object to the /dev/null 
> but
> I'm not found this switches....
> (GNU Wget 1.5.3)
>
> I'm probe this :
> export http_proxy=http://mycache.com:8080/
> wget -r http://sobredinero.com -P /dev/null -nH -nd -Y on -b -l5 -t1 
> -o /dev/null
>
> but this is create a /dev/null directory :) and download the files 
> into this.
>
> any suggestions ?
>

Some advice before you get too far into this project;

  Pre-caching was an good idea back in the days of HTTP/1.0 and static 
websites where the URL was all that mattered. In todays HTTP/1.1 and 
HTTP/2 world dynamic content and variants are much more common things, 
and both make pre-caching pretty much useless.

Before you attempt it for any domain I recommend passing a few of its 
URLs through the tool at <https://redbot.org>. If that tool indicates 
the site uses content negotiation or conditional HTTP features then 
pre-caching is just going to be causing problems.

For example; that sobredinero domain above produces these details:

      Content Negotiation

  * The response body is different when content negotiation happens.

      Caching

  * Vary: User-Agent can cause cache inefficiency.

This means that anything you pre-cache with wget will be ignored and 
probably replaced when any non-wget agent (ie a browser) is used to 
fetch through the proxy. So you just waste all the bandwidth, time, and 
storage space used pre-caching it.

Vary:User-Agent is particularly bad since any single character 
difference in the User-Agent header will cause a different object to be 
referred to in the cache storage. If you wish to pre-cache these objects 
in any useful way you have to know and mimic the *exact* User-Agent 
header values that will be used to fetch it. For example; two different 
version of Chrome -> different User-Agent header. Internet Explorer with 
different Windows Updates applied -> different User-Agent header. As you 
can imagine that is a very hard thing to predict.

Note: if you have come to this idea after seeing objects from that 
domain getting a lot of MISS records, the problem is very much that Vary 
header causing so many different objects to be needed that objects being 
stored are often not the right one(s) for any later client request. 
pre-caching will not solve this but make it worse as wget is just 
another different User-Agent.

Amos