[squid-dev] Broken trunk after r14735, r14726

Christos Tsantilas christos at chtsanti.net
Wed Jul 20 16:04:13 UTC 2016


On 07/19/2016 09:52 AM, Amos Jeffries wrote:
> On 18/07/2016 11:12 p.m., Christos Tsantilas wrote:
>> On 07/16/2016 03:56 PM, Amos Jeffries wrote:
>>> On 16/07/2016 7:02 a.m., Alex Rousskov wrote:
>>>> Hello,
>>>>
>>>>       There are two more recent changes that broke trunk:
>>>>
>>>> * After r14735 (Replaced TidyPointer with std::unique_ptr), Squid cannot
>>>> start due to an "std::bad_function_call" exception.
>>>>
>>>> * After r14726 (GnuTLS: support for TLS session resume): Squid segfaults
>>>> when attempting to connect to a Secure ICAP service. Official Squid
>>>> v4.0.12 suffers from this bug.
>>>>
>>>> Stack traces from both crashes are quoted at the end of this email.
>>>>
>>>> Please fix these regressions or undo the changes that created or exposed
>>>> them.
>>>>
>>>
>>> <snip>
>>>> * segfault when attempting to connect to a Secure ICAP REQMOD service
>>>> (tested with r14726, r14734):
>>>>
>>>
>>> Does this patch fix the session issue ?
>>>
>>> === modified file 'src/security/Session.cc'
>>> --- src/security/Session.cc     2016-07-07 19:03:02 +0000
>>> +++ src/security/Session.cc     2016-07-16 12:43:38 +0000
>>> @@ -53,7 +53,7 @@
>>>    void
>>>    Security::SetSessionResumeData(const Security::SessionPtr &s, const
>>> Security::SessionStatePointer &data)
>>>    {
>>> -    if (s) {
>>> +    if (data) {
>>>    #if USE_OPENSSL
>>>            (void)SSL_set_session(s, data.get());
>>>    #elif USE_GNUTLS
>>>
>>>
>>> I'm a little worried about the code calling SetSessionResumeData.
>>> OpenSSL documentation states:
>>>     "If there is already a session set inside ssl (because it was set with
>>> SSL_set_session() before or because the same ssl was already used for a
>>> connection), SSL_SESSION_free() will be called for that session."
>>>
>>> But our SetSessionResumeData() is called after setting up the sessions
>>> host data, etc. So I'm thinking all that setup in
>>> Ssl::BlindPeerConnector::initializeTls() may be thrown away by the
>>> resume action being called.
>>>
>>
>> Squid crashes at the first TLS connection trying to establish to the
>> ICAP server.  There is not any session to resume so the SSL session
>> related methods should not called at all.
>>
>> It is a strange crash. Is it a corrupted SSL object?
>>
>> I must say that I am worrying a lot for all of these changes.
>> It is very difficult for me to follow them, and already I have
>> difficulties to read and debug squid openSSL relate code.
>>
>> We are using our own naming scheme for openSSL structures, eg
>> "Security::SessionPtr" instead of "SSL *" or
>> "Security::SessionStatePointer" instead of "SSL_SESSION *" and this is
>> make it very difficult to follow Squid/openSSL code.
>>
>> It is difficult to read an example openSSL code and trying convert it to
>> squid, or reading a reference implementation and trying check against
>> squid code. Someone is obligated to search for definitions and
>> equivalents types all the time.
>>
>> I have not ever study for a good way to support both GnuTLS and openSSL
>> for squid, to know the problems,  but probably I would start to
>> implement openSSLAcceptor/gnuTlsAcceptor and
>> openSSLPeerConnector/gnuTLSPeerConnector classes.
>> Internally gnuTLS and openSSL should use their own types. This is will
>> help current and future squid developers.
>
> That is indeed where I started, and what I started doing. It turns out
> we then have to maintain feature compatibility in those two relatively
> high-level classes. Which are internally designed and implemented in
> different ways. Thats a bit tricky to start with, but seemed reasonable.
>    Then there is the killer fact that to do ssl-bump those classes are
> all using fd_table[].ssl things shared between Server and Client and
> Icap and Ecap 'sides' (head => meet brick wall, doh!).
>
>>
>> I am just expressing my worries here, I do not want to impose anything
>> and if there was a related discussion in squid-dev, sorry that I do not
>> participate and I did not express my concerns earlier.
>
> As Alex noted, there has not been any detailed discussion. Just partial
> coverage of some details during the occasional patch reviews. And it is
> good to have opinions voiced/written.
>
>
> As the one most recently looking at the ways to do combined API. I'm
> just as frustrated as you. For the same reasons. Though in the reverse
> direction. It has been quite hard to look at nice 3-4 line GnuTLS
> examples and then trying to figure out a) what 10-20 OpenSSL calls are
> used to do the equivalent, and even more annoyingly b) where those
> OpenSSL calls are spinkled around the Squid codebase.
>
> Regarding (b); TBH, I'm quite impressed you can translate from the
> OpenSSL examples to the Squid use of OpenSSL.

It is already difficult, squid openSSL related code is not  trivial, we 
should not make it worst.


>
>
> The choices we have for combining basically come down to four ways:
>
> 1) implementing everything twice. The way OpenSSL is spreading its calls
> all over the place that means the entire SSL-Bump, HTTPS receiving,
> sending, BIO, and Comm I/O layers and FD socket handling.

About SSL-Bump, already there is a try to move out of openSSL huge 
parts. An example is the SSL messages parser.

You need to implement HTTPS receiving.sending, BIO, Comm I/O layers and 
fd socket twice. You can not avoid it.

>
> A *huge* amount of duplication. Almost everything that stores OpenSSL
> raw pointers today would need duplicating to work differently for
> GnuTLS. Just because the pointer usage is different. Sometimes only in
> the meaning of 0 return values - but thats enough.

Exactly.

> 2) picking one library API and writing our own compatibility layer for
> other libraries using that API.
>
> Unfortunately, there is a history of people trying this. The GnuTLS
> compatibility openssl.h header is lacking many OpenSSL API symbols for
> good reasons which boil down to intentional major design differences in
> the libraries:
>   a) OpenSSL exposes these annoying but useful locking references. GnuTLS
> hides any locking from the callers - leaving us unsure if something is
> referenced or copied, but safe to use any pointers we're given.
>   b) OpenSSL uses N more API calls to explicitly do all sorts of things,
> with ordering dependencies IIRC - where GnuTLS uses far fewer calls.
>
> Sadly the Squid logics are sometimes heavily leveraging the locking
> behaviour of OpenSSL to pass info from one context to another, by
> updating shared structures.

This is has to do with internal OpenSSL operation. This is how this API 
works.

All of the above problems you are describing show me how difficult and 
maybe wrong is to try implement GnuTLS having in your mind the existing 
OpenSSL implementation.
The two SDKs although have similarities are different in philosophy.

Why do we need common types for both SDKs?

The only type needed by squid for openSSL is the "SSL *" which is stored 
inside fde class. And the gnutls_session_int for gnutls. These are 
should be enough
(maybe with some other like the certificate errors.).

Squid should use SSL or gnutls_session_int and pass it to openSSL or 
gnuTLS  related classes and should not need to know about other internal 
OpenSSL or GnuTLS structures.

Recently we split the PeerConnector classes from FwdState class, the 
FwdState now does not have strong dependencies to openSSL.

We can do the same for ConnStateData class.

If you decide to implement  NEW GnuTls PeerConnector class, without 
sslbump for the beggining, it is easy. The PeerConnector class is about 
500 lines of code and the BlindPeerConnector other 100 lines.

And after this is ready the common code like the certificates validation 
(150 lines of code) can be moved into a parent class.

But trying to fit GnuTLS and openSSL into the same PeerConnector class 
will fail and will make the code complex.

The SSLBump also is still under heavy development.
Again here the PeekingPeerConnector is about 400 lines of code. Is this 
code we are worrying that it will be duplicated?
Because most of the other SSL related code (~6000 lines of code) MUST 
reimplemented for gnutls.

Recently we implemented our own Handshake messages parser, we are not 
relying on openSSL for detecting SSL features and basic errors. The *Bio 
classes are very simple now, they just do buffering.
This is will help a future GnuTLS implementation too.


>
>
> 3) sprinkling #if conditions like candy through the whole codebase.
>
> (Er, yuck). We have too much of that already for OpenSSL.
>
>
> 4) designing our code to use an abstraction API that renames all the
> library structures and functions to some thing we understand easier **.
>
> Still a lot of code re-writing, and even more re-arranging so the
> library call(s) themselves happen away from the huge number of
> components mentioned in (1). But doable within a decade (maybe).
>
>
> So, I've gone with this (4) and taking Alex suggestions on how to
> abstract better as I go.

I believe that that the PeerConnector is a part of this API.
It is an AsyncJob which when finishes calls a callback function.


>
> If you think its a good way to go, we might be able to use the reverse
> compatibility direction for (3). ie. Using GnuTLS symbol names and
> implementing them for OpenSSL API. But that wont help much when it comes
> to supporting NSS for Fedora/RHEL/CentOS, PolarSSL (for SuSE? I forget),
> or Crypto for MacOS.

On server side you do not need to use a lot of "#if".

This is required for ConnStateData. But for ConnStateData we should go 
like we did for FwdState. We should implement a new class and put here 
the SSL related code.

I do not know the form of this class. Maybe we need an TlsConnStateData 
kid of ConnStateData, or maybe we need to implement TlsAcceptor classes.

Then this class should be implemented for openSSL and for gnuTls.

I know it is not easy.

>
>
> ** YMMV on easier. I fully sympathize with the examples-vs-abstraction
> problem. It might help if we communicated a bit better on that so it
> turned out to be something you understand as well, not just my ideas.
>   I had the idea that you were flooded with SSL-Bump problems? has that
> situation eased?

The new naming scheme for SSL structures disrupted me, looks to me 
difficult to use it and adds an overhead.
I am just worry.


>
>
> Amos
>


More information about the squid-dev mailing list