[squid-dev] [PATCH] TidyPointer removal

Sun Jul 10 22:20:37 UTC 2016

On 07/10/2016 02:02 AM, Amos Jeffries wrote:
> On 10/07/2016 2:38 p.m., Alex Rousskov wrote:
>> On 07/09/2016 07:10 AM, Amos Jeffries wrote:
>>> On 9/07/2016 11:18 a.m., Alex Rousskov wrote:
>>>> On 07/08/2016 12:44 AM, Amos Jeffries wrote:
>>>>> On 8/07/2016 11:24 a.m., Alex Rousskov wrote:
>>>>>>>
>>>>>>> +    void resetWithoutLocking(T *t) {
>>>>>>> +        unlock();
>>>>>>> +        raw = t;
>>>>>>>      }
>>>>>>>  
>>>>>>>      void resetAndLock(T *t) {
>>>>>>> +        if (t != get()) {
>>>>>>> +            resetWithoutLocking(t);
>>>>>>> +            lock(t);
>>>>>>> +        }
>>>>>>> +    }
>>
>>>>>> We should probably add self-reset protection to both cases or to none.
>>
>>>> Self-assignment and self-reset is a gray area. Each
>>>> assignment/reset-handling class can define what should happen under
>>>> those circumstances. The possible sane definitions are:
>>>>
>>>>   1. Do nothing (side effects lead to leaks/crashes in non-benign cases)
>>>>   2. assertion/exception (explicit error in all cases)
>>>>   3. Do the usual (side effects lead to leaks/crashes in most cases)
>>>>
>>>> Which definition to pick is our choice, but we should be _consistent_ in
>>>> that choice. I recommend #1, but it is a weak recommendation.
>>>>
>>>> In the latest patch I have seen, resetAndLock() was using #1 but
>>>> resetWithoutLocking() was using #3.
>>
>>
>>> resetAndLock() is doing #1
>>
>> Agreed.
>>
>>
>>> and there are no side effects. Because the
>>> new lock() is performed by us, not the caller. So we can confidently do
>>> nothing with no side effects.

> * For the LockingPointer(T*) ctor and resetWithoutLocking() method
>  the caller is responsible for:
>  - doing a +1 lock itself, and
>  - exactly once for *each time either is used*, and
>  - doing it before the call/use.
> 
> * For release() the caller is taking on responsibility for exactly 1
> un-lock operation.
> 
> * For all other methods and operators the recieving Pointer is
> responsible for the locking.

> Please reconsider your code examples in light of those requirements and
> you should see where the bugs in the examples are confusing you.

I know the requirements. My examples are purposefully built to contain
problematic self-reset conditions and illustrate the differences between
self-reset treatment approaches. Real callers may think they follow all
the rules, but code complexities can easily hide self-assignment/reset
from them. My examples are simple enough to clearly expose those
normally hidden problems.

It is impossible to study self-reset treatment approaches using
perfectly correct caller code. Perfectly correct caller code does not
have self-resets.

>> We cannot be confident that doing nothing will not cause problems. For
>> example, the following well-meaning caller code will assert due to those
>> unexpected "side effects" of approach #1:
>>
>> void importCertificate(LockingPointer &p, X509 *x)
>> {
>>     const int referenceCount = x->references;
>>     p.resetAndLock(x);
>>     // check that we actually locked the new certificate
>>     assert(x->references > referenceCount);
>> }
>>
>>
>> ... if importCertificate() is fed p that already manages certificate x.

> The p.resetAndLock() semantically increments the x references and
> decrements the existing pointers references before it returns. Making
> the assert '>' a bug.

Yes, the example contains code that is intentionally buggy. The example
illustrates why the do-nothing approach #3 may cause problems. The
example function code "works" all the time except when dealing with a
self-reset. When dealing with a self-reset, it asserts. Note that the
function itself does not cause that self-reset -- its callers do.

And, I repeat, these are just simple examples meant to illustrate the
problem. Similar problems in real code are often well-hidden and may
result in far more obscure side effects than an immediate crash or
assertion.

> That is true for any proper reference counting. The self-check is an
> optimization of that case. Not a requirement.

In isolation, using a specific form of a self-check is not a requirement
-- we have a choice among the three options I itemized, one of which is
do-nothing (#3). However, #2 becomes a requirement if, like you claimed,
you want to detect self-resets reliably. And #1 becomes a requirement if
we decide to promise a no-op to the callers in self-reset cases.

>>> resetWithoutLocking() cannot do #1 without adding the listed side
>>> effects for all 'benign' cases.
>>
>> In what I call benign cases, approach #1 always works correctly:
>>
>>     p.resetWithoutLocking(p->get()); // never a problem with #1!
>>
> 
> Bug:  caller of resetWithoutLocking() is not doing a +1 lock increment
> itself before that method is called.

Yes, but that is not important here. The point here is that with
approach #1 the code works correctly in benign cases where the reset
call simply was not needed at all. Again, in real code, the above
self-reset will be obscured with something like:

    // NP: getNewCertificate() locks the certificate for us
    Root.resetWithoutLocking(getNewCertificate());

but the underlying code will have the same self-reset pattern if/when
getNewCertificate() returns a raw pointer that is already managed by
Root (e.g., because somebody changed getNewCertificate() to register
with the global Root pointer _without_ fixing this caller code).

For the purpose of this discussion, there is no difference between
clearly self-resetting "p.resetWithoutLocking(p.get())" and the
well-hidden Root example above.

> Yes the bug location failing to do +1 may be somewhere in the caller of
> the above importCertificate(). But in essence the whole chain of code
> passing 'x' variable down to resetWithoutLocking() is responsible.

The question is how LockingPointer should deal with these self-reset
bugs. I suggest that we pick one of the three approaches itemized and
use that approach consistently. You committed inconsistent
LockingPointer code. That may be an even better solution for some
unknown to me reason, but no valid defense of that inconsistency has
been offered so far IMO.

>>> The words "most cases" is incorrect on #3, because most cases will be
>>> caller acting correctly and doing its +1 on the locks before our
>>> unlock() is used. Only buggy callers will lead to the listed side effects.
>>
>> Unfortunately, #3 actually leads to leaks or crashes in benign cases
>> where caller _is_ doing its +1 (as well as in some non-benign cases, of
>> course). For example:
>>
>>   x = newX509(); // creates and locks certificate x
>>   assert(x->references == 1); // OK, newX509() +1ed as expected
>>   LockingPointer p(x);
>>   assert(x->references == 1); // OK, our constructor did nothing
>>   p.resetWithoutLocking(x); // Opos, x is deleted here due to #3
> 
> Bug: using resetWithoutLocking() instead of resetAndLock(), when 'x' has
> no caller lock associated.

Yes, you are consistently pointing out bugs in the purposefully buggy
code. You are not wrong -- the code in examples _is_ problematic -- but
to illustrate bug handling problems we need those bugs in the examples.

The problem we are trying to address is how LockingPointer should deal
with problematic caller code that is likely to be present in Squid, now
or in the future. You went as far as saying something like "I want us to
expose those bugs, not hide them" but the code you committed does not do
that. My attempts to explain why it does not do that have failed so far,
possibly because you focus on the purposefully introduced bugs in the
examples rather than on the problem we are trying to solve.

All my recent examples use problematic code because we are (or, rather,
I am and we should be) discussing how to deal with problematic code.

>>   std::cout << *p->get(); // crash
>>
>> The same is true for resetAndLock(), of course
> 
> No. If it was, then Reference Counting (the universal concept) would not
> work anywhere.

What I said is a fact: resetAndLock() would lead to problems similar to
those illustrated in the above resetWithoutLocking() example if
resetAndLock() were to be implemented using the do-nothing approach #3
without self-reset protection. I do not know how you got from that fact
to your "then a universal concept would not work" statement.

Also, a universal concept of reference counting does not define how
self-assignment and self-reset events are to be handled. For example,
std::shared_ptr::reset() specifically says that "If the object pointed
to by ptr is already owned, the function results in undefined behavior".

> resetAndLock() is the method by which regular normal reference counting
> is done. The recipient smart-pointer taking care of any +1/-1 that needs
> to happen and ordering them to ensure the pointer received stays valid.

Yes. And any assignment-like method in C++, including our resetAndLock()
needs to deal with self-assignment/reset. Nothing in that method specs
can avoid the problem because the problem comes from the language
itself. The only choice we have is how to deal with that self-assignment.

> self-reset is perfectly fine IFF the caller meets the requirements I
> outlined at the top. You can't make an assert that tests that reliably.

I agree that treating benign self-reset cases differently from
non-benign ones is impossible because the reset method cannot know which
self-reset is benign -- the code will be the same for both categories.
The definition of a benign case can be something like "if the caller
knew that this is a self-reset, it would have skipped calling the
reset() method and everything would have worked OK".

We either declare all self-reset cases as bugs (and then we may use the
asserting approach #2) or we cannot assert at all (i.e., we can only use
either approach #1 or approach #3).

>> By definition, self-assignment is "p=p" or equivalent.
>> Similarly, self-reset is, by definition:
>>     p.reset*(p->get())
>>
>> or equivalent. That is, you are resetting a smart pointer using the
>> smart pointer itSELF (or equivalent). That is what resetAndLock()
>> if-statement checks, and that is the check that resetWithoutLocking()
>> currently lacks.

> For resetAndLock() that above code is fine. p ensures that pointer stays
> valid through the locking/unlocking.

For resetAndLock(), the above code is "fine" only because we have added
self-reset protection to resetAndLock()! Without that self-reset
protection of some kind, the above self-reset code will crash
resetAndLock(). It is not something in the LockingPointer _specs_ that
makes the above self-reset code safe in resetAndLock() case. It is our
decision to treat all resetAndLock() self-resets as benign that makes it
safe.

Nothing in the LockingPointer specs requires us to add self-reset
protection to resetAndLock(). It is our experience that self-reset
happens in real code that prompts us to decide how to handle it, not the
specs.

Here is what we currently do:

* We know that some self-reset cases in resetAndLock() may be serious
bugs and some may be benign. We treat all self-reset cases in
resetAndLock() as benign bugs. That is, we assume that if the caller
knew that it is dealing with a self-reset, the caller would have skipped
the reset, and everything would have worked correctly after that. If
there are actually non-benign cases, then our resetAndLock() will lead
to undefined behavior in those cases (we know how resetAndLock() is
going to behave, but not what will happen to the self-reseting code
using resetAndLock()).

* We know that some self-reset cases in resetWithoutLocking() may be
serious bugs and some may be benign. Our resetWithoutLocking() results
in undefined behavior on any self-reset. Some of those self-reset cases
could correspond to benign bugs: If the caller knew that it is dealing
with a self-reset, the caller would have skipped the reset, and
everything would work correctly after that, but we may crash or
otherwise mishandle that benign caller anyway.

Why we treat self-reset bugs in resetAndLock() callers differently from
self-reset bugs in resetWithoutLocking() callers, I do not know. Nothing
in your answers explain that asymmetry.

> It is nasty that LockingPointer has to deal with external lock fiddling.
> But we can't get away from that. 

Agreed.

> All we can do is be very careful that
> the alternative API is used correctly.

We are certainly not very careful about that. We do not even assert that
the resetWithoutLocking() pointer has been locked by the caller.
However, this thread is currently not about that. It is about treating
self-reset differently in two reset*() methods.

>>> A buggy caller code is one that is not adding that lock and still using
>>> resetWithoutLocking()
>>
>> That would be indeed a bug, but it has nothing to do with self-reset.

> All the presented examples of self-reset are demonstrations of callers
> failing that one requirement.

Let's assume that is true. Why is that relevant?

> I am completely certain that if the callers using the non-locking API
> meet that requirement the locking math works out okay with the current
> trunk code.

That assertion is not being disputed here. Indeed, if everything is
correct, then everything is correct. Unfortunately, we are discussing a
different (and more realistic) case -- there are buggy callers out
there. They think they do the right thing, but they do not. Should we
treat buggy resetWithoutLocking() callers differently from buggy
resetAndLock() callers? A "no" is a natural answer, for symmetry sake. A
"yes" answer may be correct/better, but it requires some
backing/explanation (and a source code comment!) that we can agree on.

> I am also completely certain that if we add the if-statement to prevent
> self-reset/assign unlock() happening *inside* the resetWithoutLocking()
> that callers will become unable to use it reliably in those cases. 

Here is a trivial counter-example of a caller that reliably uses
resetWithoutLocking() with do-nothing protection:

  ... some perfect/rules-obeying code here ...

  p.resetWithoutLocking(p.get()); // do-nothing makes this a no-op

  ... some perfect/rules-obeying code here ...

> How can I be so certain?
> 
>  1) The same behaviour can be implemented by ordering the new values +1
> before the old values -1. With no self-protecting if-statement.
>  --> always both operations happen even on self-reset/assign.
> 
>  2) The normal pattern we are all familiar with wraps BOTH the lock and
> unlock operations inside the if-statemet.
>  --> either both happen or neither.

> When we come to the resetWithoutLocking() it is behaviourally only an
> unlock for the old value.

Yes, the method unlocks the old value and assigns the new value, under
the assumption that somebody has locked the new value for us. In a
self-reset case, the "new" part of the assumption fails (i.e., the new
value is actually the old one in a self-reset case, by definition),
resulting in crashes and such (unless protection is added).

In a self-reset case, we could say, OK, the caller probably locked the
value once, called resetWithoutLocking() once, and now mistakenly called
resetWithoutLocking() again for the same value. To keep lock/unlock
parity, we are going to do nothing (i.e., use approach #1).

> Critically it does not lock on the new value. That means we CANNOT be
> doing the 'BOTH' behaviour of (2) inside the method.

Yes, of course, the two reset*() methods perform different operations
(or we would not have two methods). resetWithoutLocking() never does two
operations. Why is that important here?

> Therefore (1) has to be the implementation for use of this part of the
> API.

Not really. Just because resetAndLock()'s locking/unlocking pattern #2
is not applicable to resetWithoutLocking() does not mean
resetAndLock()'s locking/unlocking pattern #1 is applicable.
resetWithoutLocking() has its own locking/unlocking pattern.

> With the +1 step happening in caller, and the -1 inside
> resetWithoutLocking().

Well, yes, that is true.

>  --> this is where the requirements list I wrote at the top come from.

Nobody is disputing the requirements.

> Is that clear enough now?

What you say is mostly clear and correct, but mostly irrelevant to the
problem I am discussing. You are not answering my questions. You are
attacking bugs specifically planted in examples to illustrate bug
handling. You are defending basic requirements that nobody is attacking.
Etc. In other words, a squid-dev discussion as usual.

I do not think we should waste more time on this. If we are lucky, no
Squid code will trigger self-reset on resetWithoutLocking() and we will
not waste days or weeks tracking those bugs. If we are not that lucky, I
know that I did what I reasonably could to prevent that.

Good luck,

Alex.