[squid-users] Caching requests to a non standard http port help please.

Amos Jeffries squid3 at treenet.co.nz
Mon Aug 10 04:08:05 UTC 2015


On 10/08/2015 4:27 a.m., 1508 wrote:
> Hello Anthony,
> 
> Thank you for your reply.
> 
> My quest is to try and cache stuff from a site called Second Life.
> 
> My bandwidth is poor and I'm trying to cache large objects that regulary get
> re-requested.
> 
> The thing is part of the url will change, however the content willl not.
> 
> Here are some examples of urls I am trying to cache:
> 
> http://sim9077.agni.lindenlab.com:12046/cap/db301865-157f-89d3-a98f-34acc5c9537a
> http://sim17097.agni.lindenlab.com:12046/cap/db301865-157f-89d3-a98f-34acc5c9537a
> 
> OK, some answers.
> 
> Port 10050 was an example. 
> 
> acl Safe_ports port 80		# http
> acl Safe_ports port 443	# https
> acl Safe_ports port 12043 # LL port
> acl Safe_ports port 12046 # LL port
> 

The definition for Safe_ports provided with clean installs contains the
entry:
  acl Safe_ports port 1025-65535

The only ports which are unsafe are those used by protocols such as
email where the syntax overlaps with HTTP in dangerous ways. High
numbered ports do not usually have that problem.


> I am trying to rewrite part of the URL because the same data gets fetched
> from different computers. So
> in the RLs above the same item can be sent twice.
> 

DO NOT re-write the URL for that.

Instead use a Store-ID helper to set the internal Squid cache ID for the
object. They work very similar to url-rewriting, so you should be able
to adjust your helper easily.


> I cannot see how the standard configuration can assume part of a URL is a
> wildcard.

Those URLs above are a perfect example of why you should not be
URL-rewriting. Or even assuming that they are the same object coming back.

For me the top URL produces a 404 message saying the thing does not
exist. The bottom one produces something else.

If you were re-writing (and caching) the bad hash to point at the top
server, all visitors would get the 404 message while it was
force-cached. Nobody would be able to get to the working second URL
server. Even if they explicitly asked for that second one.


With Store-ID the helper just tells Squid that both objects are to be
stored in the same cache location (a URI which you provide from the
helper). When anybody asks for one of those the cache location is
checked, and if not found the URL *they asked for* is fetched and cached.

So, visitors who ask for URL #1 will get 404 until somebody asks for URL
#2 and gets the object cached. After which both URLs work using the
stored data from URL #2.

Much nicer.


There is one huge requirement that you need to pay very close attention
to. For any two objects to be collapsed they need to have both semantic
and binary equivalence to each other. That means the use by the client
has to be the same, and the binary form of the object has to be storing
the same details.

For games and REST APIs (things like Second Life), what appears to you
to be the same URL re-fetched constantly may actually be a constantly
changing object with large amounts of game environment updates in it. Or
just different encodings of the object being requested by decoders.
Passing the wrong version of object out could break things in hidden
ways. Be VERY careful what objects you combine.



> 
> I have managed to rewrite static URL domain names and just change a ? into a
> / so items get stored OK.

Not okay. Items after the '?' are variable in what order they can exist
(different order, different cache location). And in what characters are
allowed to be used there.

 Absolutely dont do that unless you are in control of the origin server.
And then you can have the origin tell visitors URLs without the '?' in
the first place.


Hope this helps, and good luck.

Amos


More information about the squid-users mailing list