[squid-dev] [RFC] Support concurrent SBuf::c_str() calls

Alex Rousskov rousskov at measurement-factory.com
Fri Sep 30 17:03:25 UTC 2016


On 09/29/2016 09:19 PM, Amos Jeffries wrote:
> On 30/09/2016 5:03 a.m., Alex Rousskov wrote:
>> Should we remove the increment to make concurrent c_str() calls safe?

> The reason it exists remember is to prevent other SBuf sharing that
> storage MemBuf from thinking they can append to the storage oer top of
> the terminator. 

Yes, I know, but that increment not only prevents problems but _causes_
them as well: The increment may force the SBuf to realloc, which may
result in deallocation of the original storage still pointed to by the
earlier c_str() result.


> Given that we have no easy knowledge of how the c_str()
> is going to be used by the recipient code we cannot guarantee lack of
> terminator is safe.

Yes, with or without the increment, Squid may crash or misbehave when
buggy code appends to SBuf while a previous c_str() result is still in
use. The lack of increment is not safe. The presence of the increment is
not safe. The problems are different, but they exist with or without the
increment.


> Time to push SBuf usage forward and aggressively remove the need for
> c_str()?

The need for c_str() will never go away because libraries outside of
Squid control need c-strings. And, due to the lack of proper QA, any
"aggressive push" affecting lots of Squid code is a recipe for a
disaster IMO.

The question is which c_str() implementation is safer? The one that may
crash Squid when called twice for the same SBuf in the same expression
or the one that may crash Squid when appending to the same SBuf in the
same expression? There are already two known cases of the former. There
are currently no known cases of the latter.


Overall, I know of three primary ways to implement c_str():

1. Always 0-terminate SBuf-used storage. Safest implementation possible,
but has serious negative performance overheads, especially for tokenizing.

2. Terminate the used storage at the time of c_str().

    2a. Increment the used storage size. Current implementation.
        Leads to problems discussed in this thread.

    2b. Do not increment the used storage size.
        Leads to other problems discussed in this thread.

3. Allocate and store a new c-string as needed, copying from the
storage. Safest implementation possible but has serious negative
performance overheads. These overheads affect different use cases than
those of #1.

Needless to say, we can add clever code to reduce risks associated with
some implementations. For example (these are just rough sketches/ideas):

a) Appending code in #2b can check whether it is about to overwrite the
0-terminator added by a previous c_str() call and, if yes, preserve the
old storage (we can preserve one such storage or maintain a global FIFO
queue).

b) C-string allocating code in #4 can maintain a global FIFO queue of
previously allocated but now supposedly unused c-strings to minimize the
risk of de-allocating a still used c-string.

c) C-string allocating code in #4 can delay allocation until it might be
needed, similar to how (a) decides that the 0-terminator is in danger.


Going forward, do you think we should:

i. Keep using #2a and rewrite the known double-c_str() code.
ii. Make double-c_str() code safe (i.e., switch to #2b).
iii. Add code to make the existing #2 implementation safer.
iiii. Switch to a completely different implementation like #1 or #4.


Thank you,

Alex.



More information about the squid-dev mailing list