[squid-dev] [PATCH] initial GnuTLS support for encrypted server connections

Amos Jeffries squid3 at treenet.co.nz
Thu Jan 19 07:16:09 UTC 2017


On 15/01/2017 8:09 p.m., Alex Rousskov wrote:
> On 01/14/2017 10:16 AM, Amos Jeffries wrote:
> 
>> The Security::SessionPointer is converted to std::shared_ptr. This is
>> required because GnuTLS does not expose the locking like OpenSSL. Since
>> we store the SessionPointer to fd_table[].ssl we can always access it
>> from there one way or another and there is actually no need for OpenSSL
>> locking sessions now.
> 
>> -typedef Security::LockingPointer<SSL, Security::SSL_free_cpp, HardFun<int, SSL *, SSL_up_ref> > SessionPointer;
>> +typedef std::shared_ptr<SSL> SessionPointer;
> 
> I am trying to understand how a standard std::shared_ptr can co-exist
> with OpenSSL locking. Even if Squid does not lock the SSL object,
> OpenSSL would still lock and unlock it, destroying the SSL object when
> its OpenSSL lock counter reaches zero, regardless of our
> std::shared_ptr<SSL> state. What am I missing?

OpenSSL itself registers one lock when SSL_new() is called, and only
unlocks that when SSL_free() is called by the last shared_ptr being
cleared/deleted/dropped/de-referenced.

When our CreateSession() function calls the library SSL_new()
function we pass the result immediately to a shared_ptr constructor,
along with a lambda which calls the matching SSL_free() function.

So long as the shared_ptr has references it does not call the Deletor.

In other words; whenever the shared_ptr has a non-nil value the library
lock count is >= 1.

Meaning we no longer perform the detailed lock management ourselves,
just use share_ptr the way it is designed to be used.


> 
> Also, if OpenSSL requires that the SSL object is freed using an
> SSL_free() function, how can std::shared_ptr<SSL> be compatible with
> OpenSSL without that knowledge?

Via the custom Deletor lambda.

> 
> Does the patched code continue to work well with OpenSSL?
> 
> 
>>      // Our job is done. The callabck recepient will probably close the failed
>>      // peer connection and try another peer or go direct (if possible). We
>>      // can close the connection ourselves (our error notification would reach
>>      // the recepient before the fd-closure notification), but we would rather
>>      // minimize the number of fd-closure notifications and let the recepient
>>      // manage the TCP state of the connection.
>> +
>> +#if USE_GNUTLS
>> +    // but we do need to release the bad TLS related details in fd_table
>> +    // ... or GnuTLS will SEGFAULT.
>> +    const int fd = serverConnection()->fd;
>> +    Security::SessionClose(fd_table[fd].ssl, fd);
>> +#endif
> 
> 
> The "or GnuTLS will SEGFAULT" part is a red flag for me. Squid may close
> the connection without or before reaching this particular method (for a
> variety of reasons). A connection closure should always be safe. Please
> explain why the usual connection closing steps cannot "release the bad
> TLS related details in fd_table" [but the above code can].
> 
> And why should GnuTLS be treated as a special case here compared to OpenSSL?
> 

Since I wrote that I have fixed some crashes related to priorities not
being set.  Re-testing has not repeated that particular crash, so removing.

However, I think long term we should probably do something along these
lines anyway when bailing out of PeerConnector. The fd_table is left in
a horrible state with some things added (or not) by failed Job.


> 
>> +#if !USE_OPENSSL && USE_GNUTLS
>> +typedef std::unique_ptr<struct gnutls_priority_st, HardFun<void, gnutls_priority_t, &gnutls_priority_deinit>> ParsedOptionsPointer;
>> +#else
>> +typedef std::unique_ptr<long> ParsedOptionsPointer;
>> +#endif
> 
> It looks strange that GnuTLS is OK with std::unique_ptr<long> as a
> ParsedOptionsPointer when built together with OpenSSL but not when built
> alone. I doubt that is what you meant, but that is how this #if
> statement reads.

That is exactly what I meant there and elsewhere with the same.

Squid _can_ be built with both libraries --with-* statements. In that
case the OpenSSL logic should be used where possible since it is the
more well-tested code. But there are some features (like squidclient)
that are only using GnuTLS even when the main binary uses OpenSSL.

So I/we need to enforce the 'only one library' thing explicitly with the
USE_* macros. The USE_GNUTLS code block should always be either an #elif
after a USE_OPENSSL block, or used in a way that prevents it being
enabled when OpenSSL is linked and does the same feature.

Double-checking that I've found a few that were not. They are now also
fixed in this updated patch.


> 
>> +    // 'options=' value being set to session is a GnuTLS specific thing.
>> +#if !USE_OPENSSL && USE_GNUTLS
> 
> A similar concern here: If options value being set to session is a
> GnuTLS-specific thing, why is the code not executed for GnuTLS when it
> is built together with OpenSSL? In other words, why mention OpenSSL in a
> GnuTLS-specific code?
> 
> I suspect that we can never build with both OpenSSL and GnuTLS enabled
> at the same time. If that assumption is correct, then please remove the
> !USE_OPENSSL part from these (and any similar) conditions. They only
> create unnecessary doubts. And add an OpenSSL-specific conditions
> elsewhere as/if needed, of course.

see above.

> 
>> +#if !USE_OPENSSL && USE_GNUTLS
>> +typedef std::unique_ptr<struct gnutls_priority_st, HardFun<void, gnutls_priority_t, &gnutls_priority_deinit>> ParsedOptionsPointer;
>> +#else
>> +typedef std::unique_ptr<long> ParsedOptionsPointer;
>> +#endif
> 
> Why does OpenSSL need a pointer here? 

Doesn't. Reverted to a plain long as suggested.

> 
> 
> What if parsedOptions is nil here? parseOptions() are not called if
> sslOptions are empty and if there are no options=... in configuration so
> parsedOptions might be nil in some cases. I do not know whether we are
> lucky that they are never nil in these particular cases.


Good catch. Probably just lucky. I was (badly) trying to make the
generic code just always use Pointer. But as you mention this one is
rarely used outside of that library-specific code anyway.

> 
> Moreover, should not the options always be initialized, even when
> sslOptions are empty, so that we do not need to worry about their
> pointer being nil? For OpenSSL where there is no
> gnutls_set_default_priority(), but even that code can use empty
> sslOptions string to detect that defaults are in use. After all, OpenSSL
> was fine with non-pointer options before these changes...
> 

Since we need to initialize memory from the library in the case of
GnuTLS it seems to be better to simply do nothing in that case and use
the absence of a pointer value as the flag.

This also works a bit cleaner with the sslOptions.isEmpty() case in
parseOptions() for the below issue.

> 
>> +    const char *priorities = (sslOptions.isEmpty() ? nullptr : sslOptions.c_str());
>> +    int x = gnutls_priority_init(&op, priorities, &err);
> 
> Are you sure gnutls_priority_init() can handle nil priorities c-string?
> Just checking -- the manual page does not say...

Good point. Making it skip this and fallback on the default priorities
use-case when sslOptions is empty.

> 
> BTW, you can make "x" constant here. Giving it a more telling name like
> "result" here and in similar code would be nice too.
> 

Shuffled the function call so x no longer exists.

> 
>> +        fatalf("Unknown TLS option '%s'", err);
> 
> s/option/option near/ -- the err does not isolate the unknown option name.

The err is pointed to the start of the option(s) which is unknown.
Simple test shows that is correct:

 2017/01/17 01:42:28| Processing: tls_outgoing_options
options=NORMAL:fooo_broken
 2017/01/17 01:42:29| Not currently OK to rewrite swap log.
 2017/01/17 01:42:29| storeDirWriteCleanLogs: Operation aborted.
 FATAL: Unknown TLS option 'fooo_broken'

The equivalent OpenSSL logic does the same output, but for each
individual broken option as an ERROR message. And does not exit with
fatal. Which is unfortunately not [yet] possible for GnuTLS.

> 
>> +        debugs(83, 1, "Failed to set TLS options. error: " << Security::ErrorString(x));
> 
> Use DBG_IMPORTANT instead of 1?
> 
> It would be nice to print sslOptions that we failed to set (in case of
> non-nil parsedOptions) or mention that they were "default" (otherwise).
> 

Done.

>>  {
>> +    if (!sslOptions.isEmpty())
>> +        parseOptions(parsedOptions); // re-parse after sslOptions copied.
>>      memcpy(&flags, &p.flags, sizeof(flags));
>>  }
> 
> Have you considered using a shared_ptr for GnuTLS so that we do not have
> to reparse when copying?

Now done the required research, and yes we can use shared_ptr there.

We cannot go the whole way to avoiding pointers entirely like OpenSSL
does since priority_t is a raw-pointer to a struct with complex setter
functionality hidden away internally.

> 
>> +Security::PeerOptions::operator =(const Security::PeerOptions &p)
>> +{
>> +    sslOptions = p.sslOptions;
>> +    if (!sslOptions.isEmpty())
>> +        parseOptions(parsedOptions); // re-parse after sslOptions copied.
> 
> Missing parseOptions reset when p.sslOptions are empty but
> this->sslOptions where not.

Done by removing the if and using shared_ptr. Same in copy-constructor.

> 
>>  #if USE_OPENSSL
>> +        // NP: GnuTLS uses 'priorities' which are set per-session instead.
>> +        SSL_CTX_set_options(t.get(), (setOptions ? *parsedOptions : 0));
>> +
>>          // XXX: temporary performance regression. c_str() data copies and prevents this being a const method
>> -        Ssl::InitClientContext(t, *this, (setOptions ? parsedOptions : 0), parsedFlags);
>> +        Ssl::InitClientContext(t, *this, parsedFlags);
>>  #endif
> 
> 
> It feels like the SSL_CTX_set_options() call belongs to the
> InitClientContext() function. I can easily imagine somebody moving the
> InitClientContext() call while forgetting to move the
> SSL_CTX_set_options() call with it. InitClientContext() is supposed to
> initialize the context and setting options is one of the initialization
> steps...

I am in the process of decomposing Ssl::InitClientContext() down into
methods of PeerOptions. So in future there wont be a
Ssl::InitClientContext(). Instead one will use the generic
cfg.secure.createClientContext().

That is already allowing a lot of de-duplication between the code
initialization of client and server contexts, and the ssl-bump vs accel
server contexts creation.


> 
> BTW, is that XXX comment about c_str() stale? I do not see a c_str in
> this code...
> 

It refers to a c_str() which still exists inside the
Ssl::InitClientContext and affects the const-ness of the method calling
that function. When the decomposing is done so will die that XXX.

> 
>> +        BIO_TO_CLIENT = GNUTLS_SERVER,
>> +        BIO_TO_SERVER = GNUTLS_CLIENT
> 
> BIO_WITH_CLIENT and BIO_WITH_SERVER may be better names for
> _bidirectional_ I/O. Might as well fix this if you have to touch all the
> code using these names anyway.
> 

Nod. Though I would like to drop the "BIO" part and keep the "TO".
 For now I left is as-was to reduce the impact on Christos.

The directional property of the word "TO" seems a more clear indicator
about the situation where the enum are used. By comparision, creating
credentials context "WITH_X" has silent implication that it is *our*
side of things that has/is the X.

And now the enum is outsde of class Bio, it could be used for
other non-bidirectional things as well.

Left this as-was for now.

> 
>> +Security::PeerOptions::parseOptions(Security::ParsedOptionsPointer &theOut)
> 
> Please do not use the prefix "the" for parameter names. The prefix is
> used for [older] class data member names so naming method parameters
> with that prefix confuses the reader of the method who keeps looking for
> a data member named theOut.

Done.

> 
> Moreover, if the parseOptions() method is always called with the
> parsedOptions argument that is a data member of the same PeerOptions
> class, then why do you need to pass parsedOptions as a parameter at all?
> The method can just reset parsedOptions directly.
> 

Done.

> 
>> +/// close an active TLS session.
>> +/// set fdOnError to the connection FD when the session is being closed
>> +/// due to an encryption error, otherwise omit.
>> +void SessionClose(const Security::SessionPointer &, int fdOnError = -1);
> 
> Please split SessionClose() into two different functions, one for each
> use case (initiating proper SSL connection closure vs. immediate SSL
> connection state destruction). Mixing the two actions results in
> confusing API instructions and dangerous usage/implementation IMO.
> 

Well, there is no such thing as a "SSL connection" - it is security
added onto some *other* Transport Protocol's layer.
<http://www.gnutls.org/manual/gnutls.html#TLS-layers>

I agree 'close' is a bad word there, but doing what you suggest assumes
there is a connection to close and/or data is being destroyed.
Neither of those is actually happening.

Since the only case where (fd != -1) has been removed in the above
alterations I dropped the FD stuff and named it SessionSendGoodbye() -
which is all it does now.


> I suggest StartSessionClosing() and DestroySession() names (while going
> along with the current terrible choice of calling both SSL connections
> and SSL sessions "sessions" which leads to code that close an SSL
> "session" while reusing the associated SSL session!).
> 

There is no explicit destroy for SSL sessions. They can timeout of the
resume cache or be stopped from performing I/O on a particular Transport
FD/socket. The shutdown/goodbye action appears to be a temporary state
in the presence of resuming sessions.

Amos


More information about the squid-dev mailing list