[squid-dev] Broken trunk after r14735, r14726

Tue Jul 19 06:52:21 UTC 2016

On 18/07/2016 11:12 p.m., Christos Tsantilas wrote:
> On 07/16/2016 03:56 PM, Amos Jeffries wrote:
>> On 16/07/2016 7:02 a.m., Alex Rousskov wrote:
>>> Hello,
>>>
>>>      There are two more recent changes that broke trunk:
>>>
>>> * After r14735 (Replaced TidyPointer with std::unique_ptr), Squid cannot
>>> start due to an "std::bad_function_call" exception.
>>>
>>> * After r14726 (GnuTLS: support for TLS session resume): Squid segfaults
>>> when attempting to connect to a Secure ICAP service. Official Squid
>>> v4.0.12 suffers from this bug.
>>>
>>> Stack traces from both crashes are quoted at the end of this email.
>>>
>>> Please fix these regressions or undo the changes that created or exposed
>>> them.
>>>
>>
>> <snip>
>>> * segfault when attempting to connect to a Secure ICAP REQMOD service
>>> (tested with r14726, r14734):
>>>
>>
>> Does this patch fix the session issue ?
>>
>> === modified file 'src/security/Session.cc'
>> --- src/security/Session.cc     2016-07-07 19:03:02 +0000
>> +++ src/security/Session.cc     2016-07-16 12:43:38 +0000
>> @@ -53,7 +53,7 @@
>>   void
>>   Security::SetSessionResumeData(const Security::SessionPtr &s, const
>> Security::SessionStatePointer &data)
>>   {
>> -    if (s) {
>> +    if (data) {
>>   #if USE_OPENSSL
>>           (void)SSL_set_session(s, data.get());
>>   #elif USE_GNUTLS
>>
>>
>> I'm a little worried about the code calling SetSessionResumeData.
>> OpenSSL documentation states:
>>    "If there is already a session set inside ssl (because it was set with
>> SSL_set_session() before or because the same ssl was already used for a
>> connection), SSL_SESSION_free() will be called for that session."
>>
>> But our SetSessionResumeData() is called after setting up the sessions
>> host data, etc. So I'm thinking all that setup in
>> Ssl::BlindPeerConnector::initializeTls() may be thrown away by the
>> resume action being called.
>>
> 
> Squid crashes at the first TLS connection trying to establish to the
> ICAP server.  There is not any session to resume so the SSL session
> related methods should not called at all.
> 
> It is a strange crash. Is it a corrupted SSL object?
> 
> I must say that I am worrying a lot for all of these changes.
> It is very difficult for me to follow them, and already I have
> difficulties to read and debug squid openSSL relate code.
> 
> We are using our own naming scheme for openSSL structures, eg
> "Security::SessionPtr" instead of "SSL *" or
> "Security::SessionStatePointer" instead of "SSL_SESSION *" and this is
> make it very difficult to follow Squid/openSSL code.
> 
> It is difficult to read an example openSSL code and trying convert it to
> squid, or reading a reference implementation and trying check against
> squid code. Someone is obligated to search for definitions and
> equivalents types all the time.
> 
> I have not ever study for a good way to support both GnuTLS and openSSL
> for squid, to know the problems,  but probably I would start to
> implement openSSLAcceptor/gnuTlsAcceptor and
> openSSLPeerConnector/gnuTLSPeerConnector classes.
> Internally gnuTLS and openSSL should use their own types. This is will
> help current and future squid developers.

That is indeed where I started, and what I started doing. It turns out
we then have to maintain feature compatibility in those two relatively
high-level classes. Which are internally designed and implemented in
different ways. Thats a bit tricky to start with, but seemed reasonable.
  Then there is the killer fact that to do ssl-bump those classes are
all using fd_table[].ssl things shared between Server and Client and
Icap and Ecap 'sides' (head => meet brick wall, doh!).

> 
> I am just expressing my worries here, I do not want to impose anything
> and if there was a related discussion in squid-dev, sorry that I do not
> participate and I did not express my concerns earlier.

As Alex noted, there has not been any detailed discussion. Just partial
coverage of some details during the occasional patch reviews. And it is
good to have opinions voiced/written.

As the one most recently looking at the ways to do combined API. I'm
just as frustrated as you. For the same reasons. Though in the reverse
direction. It has been quite hard to look at nice 3-4 line GnuTLS
examples and then trying to figure out a) what 10-20 OpenSSL calls are
used to do the equivalent, and even more annoyingly b) where those
OpenSSL calls are spinkled around the Squid codebase.

Regarding (b); TBH, I'm quite impressed you can translate from the
OpenSSL examples to the Squid use of OpenSSL.

The choices we have for combining basically come down to four ways:

1) implementing everything twice. The way OpenSSL is spreading its calls
all over the place that means the entire SSL-Bump, HTTPS receiving,
sending, BIO, and Comm I/O layers and FD socket handling.

A *huge* amount of duplication. Almost everything that stores OpenSSL
raw pointers today would need duplicating to work differently for
GnuTLS. Just because the pointer usage is different. Sometimes only in
the meaning of 0 return values - but thats enough.

2) picking one library API and writing our own compatibility layer for
other libraries using that API.

Unfortunately, there is a history of people trying this. The GnuTLS
compatibility openssl.h header is lacking many OpenSSL API symbols for
good reasons which boil down to intentional major design differences in
the libraries:
 a) OpenSSL exposes these annoying but useful locking references. GnuTLS
hides any locking from the callers - leaving us unsure if something is
referenced or copied, but safe to use any pointers we're given.
 b) OpenSSL uses N more API calls to explicitly do all sorts of things,
with ordering dependencies IIRC - where GnuTLS uses far fewer calls.

Sadly the Squid logics are sometimes heavily leveraging the locking
behaviour of OpenSSL to pass info from one context to another, by
updating shared structures.

3) sprinkling #if conditions like candy through the whole codebase.

(Er, yuck). We have too much of that already for OpenSSL.

4) designing our code to use an abstraction API that renames all the
library structures and functions to some thing we understand easier **.

Still a lot of code re-writing, and even more re-arranging so the
library call(s) themselves happen away from the huge number of
components mentioned in (1). But doable within a decade (maybe).

So, I've gone with this (4) and taking Alex suggestions on how to
abstract better as I go.

If you think its a good way to go, we might be able to use the reverse
compatibility direction for (3). ie. Using GnuTLS symbol names and
implementing them for OpenSSL API. But that wont help much when it comes
to supporting NSS for Fedora/RHEL/CentOS, PolarSSL (for SuSE? I forget),
or Crypto for MacOS.

** YMMV on easier. I fully sympathize with the examples-vs-abstraction
problem. It might help if we communicated a bit better on that so it
turned out to be something you understand as well, not just my ideas.
 I had the idea that you were flooded with SSL-Bump problems? has that
situation eased?

Amos