[squid-dev] [RFC] simplifying ssl_bump complexity

Sun Nov 20 12:55:50 UTC 2016

On 11/20/2016 12:06 AM, Amos Jeffries wrote:
> On 20/11/2016 12:08 p.m., Marcus Kool wrote:
>>
[snip]

>>
>> I like the intent of the proposal and the new directives tls_*.
>> What currently makes configuration in Squid 3/4 difficult is
>> the logic of 'define in step x what to do in the next step' and
>> IMO this logic is the main cause of misunderstandings and
>> incorrect configurations.  Also the terms 'bump' and 'splice'
>> do not help ease of understanding.  Since Squid evolved and
>> bumping changed from 3.3 - 3.4 - 3.5 to 4.x, and likely will
>> change again in 5.x, there is an opportunity to improve
>> things more than is proposed.
>> There is also a difference in dealing with transparent intercepted
>> connections and direct connections (browsers doing a CONNECT)
>> which also causes some misunderstandings.
>> The current ssl bump steps allow problematic configs where Squid
>> bumps or stares in one step and to splice in an other step,
>> which can be resolved (made impossible) in a new configuration syntax.
>>
>> I propose to use a new logic for the configuration directives
>> where 'define in step x what to do in the next step' is replaced
>> with a new logic 'define in step x what to do _now_'.
>
> From reading the below I think you are mistaking what "now" means to
> Squid. Input access control directives in squid.conf make a decision
> about what action to do based on some state that just arrived.

Maybe it is necessary to redefine 'now' but my point remains that
'define in step x what to do in the next step' is the cause of
most misunderstandings.

> For example:
>  HTTP message just finished parsing -> check http_access what to do with it.
>  HTTP reply message just arrived -> check http_reply_access what to do
> with it.
>
> Thus my proposal was along the lines of:
>   client hello recieved -> check tls_client_hello what to do with it.
>   server hello recieved -> check tls_server_hello what to do with it.

For both hello messages: is the decision moment the moment where it
has been peeked at?

>>
>> Below is a new proposal to attempt to make the configuration
>> more intuitive and less prone for admin misunderstandings.
>>
>> First the admin must define if there is any bumping at all.
>> This could be done with
>> https_decryption on|off
>> This is similar to tls_new_connection peek|splice but much
>> more intuitive.
>>
>> Iff https_decryption is on:
>>
>> 1) the "connection" step:
>> When a browser uses "CONNECT <FQDN>" Squid does not need to make
>> peek or splice decisions.
>> When Squid intercepts a connection to "port 443 of <IP>" no peek
>> or splice decision is made here any more.
>> This step becomes obsolete in the proposed configuration.
>
> I am still hoping that will happen. But also still getting pushback that
> people want to terminate or splice without even looking at the
> clear-text hello details.

We must know the reasons behind this pushback.  Only then sane decisions
can be made.

>>
>> 2) the "TLS client hello" step:
>> When a browser uses CONNECT, Squid has a FQDN and does not need
>> peeking a TLS client hello message. It can use the tls_client_hello
>> directives given below.
>
> Sadly this is not correct. Squid still needs to get the client hello
> details at this point. They are needed to perform bump before the server
> hello is received, and to "terminate with an error message" without
> contacting a server.

yes, correct.  Squid must do this.  But does it have to be configured?

>> When Squid intercepts a connection, Squid always peeks to retrieve
>> the SNI which is the equivalent of the FQDN used by a CONNECT.
>> In this step admins may want to define what Squid must do, e.g.
>> tls_client_hello passthrough aclfoo
>> Note that the acl 'aclfoo' can use tls::client_servername and
>> tls::client_servername should always have a FQDN if the connection
>> is https.  tls::client_servername expands to the IP address if
>> the SNI of an intercepted connection could not be retrieved.
>
> What if the SNI contradicts the CONNECT message FQDN ?
> What if a raw-IP in the CONNECT message (or TCP SYN) does not belong to
> the server named in SNI ?

:-)  I left this out on purpose to not make the post even larger than it was.
There is of course a lot of error checking.  The question is if
we have to configure it.  If yes, can we get away with one directive based
on an acl that uses tls::handshake_failure ?

> Squid would now be diverting the client transparently to a server other
> than the one it expects and caching under that FQDN. But the server cert
> would still authenticate as being the SNI host, so TLS cannot detect the
> diversion.
>
> The fake CONNECT's are a bit messy but IMHO we can only get rid of the
> first one done for intercepted connections. Although that alone would
> make both cases handle the same way.

I do not know anything about the code that generates the fake CONNECT
of an transparent interception connection, but logically there should
not be a fake CONNECT for true HTTPS (TLS+HTTP) since a browser does
not do a CONNECT, so why fake one?  Was the fake CONNECT introduced
to keep a state?  If yes, the state can be administered in a different way.

However, in case of other protocols on port 443 a fake CONNECT is a
must to be able to filter and log it.

>> For https connections with a client hello without the SNI extension:
>> tls_client_hello passthrough|terminate aclbar
>> where aclbar can contain tls::client_hello_missing_sni
>>
>> For connections that do not use TLS (i.e. not a valid
>> TLS client hello message was seen):
>> tls_client_hello passthrough|terminate aclbar2
>> where aclbar2 may contain tls::handshake_failure
>>
>> To define that the TLS handshake continues, the config can contain
>> tls_client_hello continue
>> This is a basically a no-op and not required but enhances readability
>> of a configuration.
>
> This is getting complicated again. Avoiding the first (TCP SYN) access
> control cycles will make ssl::server_name ACL contain either FQDN/SNI or
> the actual server cert name. A default "none" or "-" value when no
> FQDN/SNI is available should be sufficient for matching these clients.
> And we should have that already I think, if not its a missing feature bug.

Maybe in some cases tls_new_connection is required to be used, but
lets start with a default of
tls_new_connection continue
Under normal circumstances there is no need to define that Squid must
peek or already define that the connection must be spliced.

In my proposal there is no ssl::server_name and I did that on purpose.
While it might seem a nice idea to evaluate it from 'none' to SNI
to server name, there is no need for it and the use of tls::client_servername
and tls::server_servername explicitly refers to which servername
Squid must look at.
And at the moment that tls_new_connection is evaluated, there
is no servername when intercepted.

> Also even "no-op" ACLs still require CPU time to assemble data for ACL
> checking. So avoiding that kind of thing in configs is good.

Maybe not, the no-op was there for readability and at config parse time
Squid can simply skip it to not have any CPU cycles wasted on it
by acl matching logic.

I am not familiar with all steps/stages in the implementation but
one may get better performance in a simpler design if that reduces
the number of events that Squid processes for each TLS connection.

>> 3) the "TLS server hello" step:
>> Usually no directives are needed since rarely actions are taken
>> based on the server hello message, so the default is
>> tls_server_hello continue
>
> Once Squid gets to this stage of processing "continue" could mean either
> splice or bump, but only one of the two is possible.
>
> Bump is not legal in a lot of situations but splice may not be possible.
> So terminate is the only realistic default we can set for general use.
> As you say, its uncommon to need it, and people who do tend to know what
> they want done. So will configure that.

Admin who want https decryption do not want that the connection is
terminated by default.

You are right that tls_server_hello needs a little more thought.
Somewhere must be defined if the connection will have mode passthrough
(splice) or mode decrypt (bump).
And the right place to make that decision is at tls_client_hello
(after the SNI has been peeked at).
Then for tls_server_hello remains 'continue' and 'terminate' decisions.

> The handshake containing error signals or other protocols are different
> decisions points. Remember Squid is deciding what to do with some data
> that just arrived. If its going to tunnel/splice the server hello, there
> is no point even evaluating what the protocol inside is, and error
> alerts get delivered to the client as-is.
>
> So the existing on_unsupported_protocol and sslproxy_cert_error to make
> those handling decisions with 'ssl_error' type ACL to match the specific
> problem are fine IMO.

yes, they look fine but may need a rename to have a tls_ prefix
but I cannot find a way to define
continue when cert-is-self-signed and server_name is '.nic.es' and
otherwise terminate when cert-is-self-signed.

>> The tls_server_hello can be used to terminate specific connections.
>> In this step many types of certificate errors can be detected
>> and in the Squid configuration there must be a way to define
>> what to do for specific errors and optionally for which FQDN.
>> E.g. allow to define that connections with self-signed certificates
>> are terminates but the self-signed cert for domain foo.example.com
>> is allowed.  See also the example config below and the use of
>> tls::server_servername.
>>
>> What is left, is a configuration directive for connections
>> that use TLS as an encryption wrapper but do not use HTTP
>> inside the TLS wrapper:
>> tls_no_http passthrough|terminate   # similar to on_unsupported_protocol
>>
>> An example configuration looks like this:
>> https_decryption on
>> acl banks tls::client_servername .bank1.example.org
>> acl no_sni tls::client_hello_missing_sni
>> acl no_handshake tls::handshake_failure
>> acl hacked_server tls::server_servername evil.example.com
>> tls_client_hello passthrough banks
>> tls_client_hello terminate no_sni
>> tls_client_hello passthrough no_handshake
>> tls_client_hello continue
>> tls_server_hello terminate hacked_server
>> tls_server_hello continue
>> tls_no_http passthrough
>>
>> The configuration directives as I proposed are IMO intuitive and
>> leave very little room for misunderstandings.
>
> Looks just like what I'm proposing except your config above lets clients
> connect to servers with invalid TLS credentials (maybe not so intuitive?).

Are you referring to 'tls_client_hello passthrough no_handshake' ?
That should be 'tls_client_hello terminate no_handshake'.

>  acl banks ssl::server_name .bank1.example.org
>  acl no_sni ssl::server_name "-"
>  acl hacked_server ssl::server_name evil.example.com
>
>  tls_client_hello splice banks
>  tls_client_hello terminate no_sni
>  tls_client_hello terminate hacked_server
>  tls_client_hello peek all
>
>  tls_server_hello terminate hacked_server
>  tls_server_hello splice all
>
>  # NP: "sslproxy_cert_error deny all" is default
>
>  on_unsupported_protocol tunnel all

In my example at the time that tls_client_hello is evaluated,
the client hello has not only arrived but has also been parsed
and the SNI is known.  You also seem to intend that given
the rule 'tls_client_hello splice banks', but then why do you
have the rule 'tls_client_hello peek all' ?  Does that mean
'peek at the received client hello and continue' ?

The 'tls_client_hello continue' was intended to mean 'decrypt
the connection' (bump) but I failed to mention that.  I think
that 'tls_client_hello continue' must be replaced with
'tls_client_hello decrypt'.

There is a distinction between tls::server_servername and
tls::client_servername on one hand and ssl:server_name on the
other hand.  I prefer tls::* since it explicitly refers to
one of the servernames while ssl::server_name refers to
a dynamic object that needs evaluation and may evaluate to
different server names at different times.

One of the rules with 'terminate hacked_server' seems
redundant.

So after corrections, my example becomes
https_decryption on
acl banks tls::client_servername .bank1.example.org
acl no_sni tls::client_hello_missing_sni
acl no_handshake tls::handshake_failure
acl hacked_server tls::server_servername evil.example.com
tls_client_hello passthrough banks
tls_client_hello terminate no_sni
# tls_client_hello terminate no_handshake  # default behavior, so no rule needed
tls_client_hello decrypt
tls_server_hello terminate hacked_server
tls_server_hello continue
tls_unsupported_protocol passthrough all

In your example the meaning is to splice all connections and
with my proposed syntax the config would look like this:

https_decryption on
acl banks tls::client_servername .bank1.example.org
acl no_sni tls::client_hello_missing_sni
acl no_handshake tls::handshake_failure
acl hacked_server tls::server_servername evil.example.com
tls_client_hello passthrough banks
tls_client_hello terminate no_sni
tls_client_hello passthrough
tls_server_hello terminate hacked_server
tls_server_hello continue
tls_unsupported_protocol passthrough all

The choice of keywords for the config may seem a matter of taste,
but I feel it is more than that and that the keyword
'decrypt' is far better than 'bump' and the keyword
'passthrough' is better than 'splice'.

Marcus

> Amos
>
> _______________________________________________
> squid-dev mailing list
> squid-dev at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-dev
>