[squid-users] HTTPS cache for Java application - only getting TCP_MISS

Thu Jun 14 11:25:29 UTC 2018

On 14/06/18 07:28, baretomas wrote:
> Hello,
> 
> I'm setting up a Squid proxy as a cache for a number (as many as possible)
> of identical JAVA applications to run their web calls through. The calls are
> ofc identical, and the response they get can safely be cached for 5-10
> seconds. 
> I do this because most of the calls is directed at a single server on the
> internet that I don't want to hammer, since I will ofc be locked out of it
> then.
> 
> Currently Im simply testing this on a single computer: the application and
> squid
> 
> The calls from the application is done using ssl / https by telling java to
> use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost). I've set up
> squid and JAVA with self-signed certificates, and the application sends its
> calls through squid and gets the reponse. No problem there (wasnt easy that
> either I must say :P ).

I was going to ask what was so hard about it. Then I looked at your
config and see that your are in fact using NAT interception instead of
the easy way.

So what _exactly_ do those -D options cause the Java applications to do
with the proxy?
 I have some suspicions, but am not familiar enough with Java API and
the specific details are critical to what you need the proxy to be doing.

> 
> The problem is that none of the calls get cached: All rows in the access.log
> hava a TCP_MISS/200 tag in them. 
> 
> I've searched all through the web for a solution to this, and have tried
> everything people have suggested. So I was hoping someone could help me?
> 
> Anyone have any tips on what to try?
> 

There are three ways to do this:

1) if you own the domain the apps are connecting to. Setup the proxy as
a normal TLS / HTTPS reverse-proxy.

2) if you have enough control of the apps to get them connecting with
TLS *to the proxy* and sending their requests there. Do that.

3) the (relatively) complicated SSL-Bump way you found. The proxy is
fully at the mercy of the the messages sent by apps and servers. Caching
is a luxury here, easily broken / prevented.

Well, there is a forth way with intercept. But that is a VERY last
resort and you already have (3) going and that is already better than
intercept. Getting to (1) or (2) would be simplest if you meet the "if
..." requirements for those.

> MY config (note Ive set the refresh_pattern like that just to see if I could
> catch anything. The plan is to modify it so it actualyl does refresh the
> responses frmo the web calls in 5-10 seconds intervals. There are commented
> out pats Ive tried with no luck there too):
> 
...

Ah. The way you write that implies a misunderstanding about refresh_pattern.

HTTP has some fixed algorithms written into the protocol that caches are
required to perform to determine if any object stored can be used or
requires replacement.

The parameters used by these algorithms come in the form of headers in
the originally stored reply message, the current clients request.
Sometimes they require revalidation, which is a quick check with the
server for updated instructions and/or content.

What refresh_pattern actually does is provide default values for those
algorithm parameters IF any one (or more) of them are missing from those
HTTP messages.

The proper way to make caching happen with your desired behaviour is for
the server to present HTTP Cache-Control header saying the object is
cacheable (ie does not forbid caching), but not for more than 10seconds.
 Cache-Control: max-age=10
OR to say that objects need revalidation, but presents a 304 status for
revalidation checks. (ie Cache-Control:no-cache)  (yeah, thats right,
"no-cache" means *do* cache).

That said, I doubt you really are wanting to force that and would be
happy if the server was instructing the the proxy as being safe to cache
an object for several minutes or any value larger than 10sec.

So what we circle back to is that you are probably trying to force
things to cache and be used long past their actual safe-to-use lifetimes
as specified by the devs most authoritative on that subject (under
10sec?). As you should be aware, this is highly unsafe thing to be doing
unless you are one of those devs - be very careful what you choose to do.

> 
> 
> # Squid normally listens to port 3128
> #http_port 3128 ssl-bump generate-host-certificates=on
> dynamic_cert_mem_cache_size=4MB cert=/cygdrive/c/squid/etc/squid/correct.pem
> key=/cygdrive/c/squid/etc/squid/ssl/myca.key 
> 
> http_port 3128 ssl-bump generate-host-certificates=on
> dynamic_cert_mem_cache_size=4MB
> cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
> key=/cygdrive/c/squid/etc/squid/proxyCA.pem
> 
> #https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
> key=/cygdrive/c/squid/etc/squid/proxyCA.pem
> 

Hmm. This is a Windows machine running Cygwin?
FYI: Performance is going to be terrible. It may not be super relevant
yet. Just be aware that Windows imposes limitations on usable sockets
per application - which is much smaller than a typical proxy requires.
The Cygwin people do a lot but they cannot solve some OS limitation
problems.

To meet your very first sentence "as many as possible" requirement you
will need a non-Windows machine to run the proxy on. That simple change
will get you something around 3 orders of magnitude higher peak client
capacity on the proxy.

> 
> # Uncomment the line below to enable disk caching - path format is
> /cygdrive/<full path to cache folder>, i.e.
> #cache_dir aufs /cygdrive/c/squid/var/cache/ 3000 16 256
> 
> # certificate generation program
> sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
> /cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
> 
> # Leave coredumps in the first cache dir
> coredump_dir /var/cache/squid
> 
> # Add any of your own refresh_pattern entries above these.
> #refresh_pattern ^ftp:		1440	20%	10080
> #refresh_pattern ^gopher:	1440	0%	1440
> #refresh_pattern -i (/cgi-bin/|\?) 0	0%	0
> #refresh_pattern -i (/cgi-bin/|\?) 1440 100% 4320 ignore-no-store
> override-lastmod override-expire ignore-must-revalidate ignore-reload
> ignore-private ignore-auth
> refresh_pattern .		1440	100%	4320 ignore-no-store override-lastmod
> override-expire ignore-must-revalidate ignore-reload ignore-private
> ignore-auth override-lastmod 
> 

* ignore-must-revalidate actively *reduces* caching. Because it disables
several of the widely used HTTP mechanisms that rely on revalidation to
allow things to be stored in a cache.
 It is *only* beneficial if the server is broken; requiring revalidation
plus not supporting revalidation.

* ignore-auth same un-intuitive effects as ignoring revalidation, again
reducing caching ability.
 This is only useful if you want to prevent caching of contents which
require any form of login to view. High security networks dealing with
classified or confidential materials find this useful - regular Internet
admin not so much.

* ignore-no-store is highly dangerous and rarely necessary. The "nuclear
option" for caching. It has the potential to eradicate user privacy and
scramble up any server personalized content (not in a good way).
 This is a last resort intended only to copy with severely braindead
applications. YMMV whether you have to deal with any of those - just
treat this an absolute last resort rather than something to play with.

Overall - in order to use these refresh-pattern controls you *need* to
know what the HTTP(S) messages going through your proxy contain in terms
of caching headers AND what those messages are doing semantically /
content wise for the client application. Using any of them as a generic
"makes caching better" thing only leads to problems in todays HTTP protocol.

> # Bumped requests have relative URLs so Squid has to use reverse proxy
> # or accelerator code. By default, that code denies direct forwarding.
> # The need for this option may disappear in the future.
> #always_direct allow all
> 
> dns_nameservers 8.8.8.8 208.67.222.222

Use of 8.8.8.8 is known to be explicitly detrimental to caching
intercepted traffic.

Those servers present different result sets based on the timing and IP
sending the query. The #1 requirement of caching intercepted (or
SSL-Bump'ed) content is that the client and proxy have the exact same
view of DNS system contents. Having the DNS reply contents change
between two consecutive and identical queries breaks that requirement.

> 
> max_filedescriptors 3200
> 
> # Max Object Size Cache
> maximum_object_size 10240 KB
> 
> 
> acl step1 at_step SslBump1
> 
> ssl_bump peek step1
> ssl_bump bump all

This causes the proxy to attempt decryption of the traffic using crypto
algorithms based solely on the ClientHello details and its own
capabilities. There is zero server crypto capabilities known for the
proxy to use to ensure traffic can actually make it to the server.

You are rather lucky that it actually worked at all. Almost any
deviation (ie emergency security updates in future) at either client or
server or proxy endpoints risks breaking the communication through this
proxy.

Ideally there would be a stare action for step2 and them bump only at
step 3.

So in summary to the things to try to get better caching:

* ditch 8.8.8.8. Use a local DNS resolver within your own network,
shared by clients and proxy. That can use 8.8.8.8 itself, the important
part is that it should be responsible for caching DNS results and
ensuring the app clients and Squid see as much the same records as possible.

* try "debug_options 11,2" to get a cache.log of the HTTP(S) headers for
message being decrypted in the proxy. Look at those headers to see why
they are not caching normally. Use that info to inform your next
actions. It cannot tell you how the message is used by the application,
hopefully you can figure that out somehow before forcing anything unnatural.

* if you can, try pasting some of the transaction URLs into the tool at
redbot.org to see if there are any HTTP level mistakes in the apps that
could be fixed for better cacheability.

Amos