[squid-dev] [RFC] Support concurrent SBuf::c_str() calls

Alex Rousskov rousskov at measurement-factory.com
Mon Oct 3 00:03:49 UTC 2016


On 10/02/2016 03:25 PM, Kinkie wrote:
> On Fri, Sep 30, 2016 at 6:03 PM, Alex Rousskov
> <rousskov at measurement-factory.com> wrote:
>> On 09/29/2016 09:19 PM, Amos Jeffries wrote:
>>> On 30/09/2016 5:03 a.m., Alex Rousskov wrote:
>>>> Should we remove the increment to make concurrent c_str() calls safe?
>>
>>> The reason it exists remember is to prevent other SBuf sharing that
>>> storage MemBuf from thinking they can append to the storage oer top of
>>> the terminator.
>>
>> Yes, I know, but that increment not only prevents problems but _causes_
>> them as well: The increment may force the SBuf to realloc, which may
>> result in deallocation of the original storage still pointed to by the
>> earlier c_str() result.
>>
>>
>>> Given that we have no easy knowledge of how the c_str()
>>> is going to be used by the recipient code we cannot guarantee lack of
>>> terminator is safe.
>>
>> Yes, with or without the increment, Squid may crash or misbehave when
>> buggy code appends to SBuf while a previous c_str() result is still in
>> use. The lack of increment is not safe. The presence of the increment is
>> not safe. The problems are different, but they exist with or without the
>> increment.
>>
>>
>>> Time to push SBuf usage forward and aggressively remove the need for
>>> c_str()?
>>
>> The need for c_str() will never go away because libraries outside of
>> Squid control need c-strings. And, due to the lack of proper QA, any
>> "aggressive push" affecting lots of Squid code is a recipe for a
>> disaster IMO.
>>
>> The question is which c_str() implementation is safer? The one that may
>> crash Squid when called twice for the same SBuf in the same expression
>> or the one that may crash Squid when appending to the same SBuf in the
>> same expression? There are already two known cases of the former. There
>> are currently no known cases of the latter.
>>
>>
>> Overall, I know of three primary ways to implement c_str():
>>
>> 1. Always 0-terminate SBuf-used storage. Safest implementation possible,
>> but has serious negative performance overheads, especially for tokenizing.
> 
> If this is the decision, might as well drop sbuf altogether in favor
> of std::string (or possibly std::string + custom mempools-friendly
> Allocator).

You are probably right, although std::string may have other funny things
going that we might want to avoid/control.


>> 2. Terminate the used storage at the time of c_str().
>>
>>     2a. Increment the used storage size. Current implementation.
>>         Leads to problems discussed in this thread.
>>
>>     2b. Do not increment the used storage size.
>>         Leads to other problems discussed in this thread.
>>
>> 3. Allocate and store a new c-string as needed, copying from the
>> storage. Safest implementation possible but has serious negative
>> performance overheads. These overheads affect different use cases than
>> those of #1.

> If we are aware of the underlying issue, there's no reason not to save
> the temporary to a naked char* and use that, as long as the SBuf is
> not touched it'll work fine.

Save to a temporary variable outside SBuf? That is not a solution
because we are trying to address the cases where the developer forgot to
do that or could not really tell that doing that is necessary. Anything
from trivial/direct

  foo(buf.c_str(), buf.c_str());

to complex/hidden

  foo(buf, buf.c_str()) which calls bar(buf1.c_str(), cstr2);



>> Needless to say, we can add clever code to reduce risks associated with
>> some implementations. For example (these are just rough sketches/ideas):
>>
>> a) Appending code in #2b can check whether it is about to overwrite the
>> 0-terminator added by a previous c_str() call and, if yes, preserve the
>> old storage (we can preserve one such storage or maintain a global FIFO
>> queue).
>>
>> b) C-string allocating code in #4 can maintain a global FIFO queue of
>> previously allocated but now supposedly unused c-strings to minimize the
>> risk of de-allocating a still used c-string.
>>
>> c) C-string allocating code in #4 can delay allocation until it might be
>> needed, similar to how (a) decides that the 0-terminator is in danger.
>>
>>
>> Going forward, do you think we should:
>>
>> i. Keep using #2a and rewrite the known double-c_str() code.
>> ii. Make double-c_str() code safe (i.e., switch to #2b).
>> iii. Add code to make the existing #2 implementation safer.
>> iiii. Switch to a completely different implementation like #1 or #4.

> v. push SBuf as suggested by Amos and document this specific corner
> case so to avoid it

I do not think "push SBuf" is a comparable alternative. "Pushing SBuf"
is not something we can safely do in a reasonable time. It is and
remains a long-term goal, but we need to address the problem "now". The
offered alternatives can be implemented in a matter of hours or weeks.
"Pushing SBuf" safely will probably take a year and will not eliminate
all c_str() calls anyway so the same problems/alternatives will remain
even after the "push" is over.

And the problematic cases are already documented -- documentation does
not help much when the API itself invites errors like concurrent c_str()
calls. Developers will continue to misuse c_str() and reviewers will
continue to miss those bugs.

Alex.



More information about the squid-dev mailing list