[squid-dev] [PATCH] Bug 4438 - string pool refactoring

Wed Mar 30 18:29:35 UTC 2016

On 03/30/2016 11:15 AM, Amos Jeffries wrote:
> On 30/03/2016 8:19 a.m., Alex Rousskov wrote:
>> This committed code does nothing useful: Static variables, such as
>> "pools" are zero-initialized by default and that happens at constant
>> initialization time.

> Not necessarily. That is OS and allocator library dependent behaviour.

It is guaranteed by the C++ standard. See the "Static initialization"
section at http://en.cppreference.com/w/cpp/language/initialization

> For exmple, my compiler disagrees with you. use-before-initialization on
> the return statement for any pool pointer not initialized by Mem::Init()
> when the memset() is removed.

AFAICT, either you are compiling some code that is different from the
one I was talking about or your compiler is buggy: Static PODs *are*
zero-initialized by default.

>> The "initialized" flag is useful for the changes in your patch, but the
>> the comment in the patch would become misleading after the memset() call
>> is removed because the flag essentially applies to the Mem::Init() call
>> and not the "pools" variable. In other words, the final code should look
>> something like this:
>>
>>
>> static MemAllocator &
> 
> In order to use a reference here MemAllocator will have to forgo
> constructor parameters, and move its initialization into a virtual
> method.

Only if you leave the related bugs intact, which obviously does not make
sense. GetPool() should be returning a reference to an initialized pool
object. How that pool was initialized is a separate issue. I do not
think that individual pool initialization API needs any [drastic]
changes. What we do need to change is when that initialization happens
and, possibly, how some of the initialized pools are stored.

> By returning a reference to pointer here (possibly a nullptr) we retain
> all the existing behaviour and code throughout Squid creating pools.

Yes, including the existing or future bugs where that nullptr is
dereferenced.

>> GetPool(size_t type)
>> {
>>     static MemAllocator *pools[MEM_MAX];
>>
>>     static bool initialized = false;
>>     if (!initialized) {
>>         Initialize(pools);
>>         initialized = true;
> 
> That was the first thing I tried. Well, using Mem::Init() as the
> Initialize() anyway. It results in recursion blowing out the stack until
> SEGFAULT.

I do not understand why you mention that. My suggestion does not imply
that you can use Mem::Init() without changes here. In fact, I have
described exactly why you cannot do that immediately below this code
snippet. And there may be other changes that you have to do to make
things work -- I am not dictating code by email.

>>     }
>>
>>     MemAllocator *pool = pools[type];
>>     assert(pool);
>>     return *pool;
>> }
>>
>> Where the static Initialize() method initializes all pools in "pools"
>> without calling GetPool(). If you want to add another safety check, then
>> add an assertion (to the Initialize() function) that Initialize() is not
>> called twice.

> The Mem::Init() method does not use GetPool() directly, it uses
> GetInstance().create which allocates a new object and self-registers the
> raw-pointer using a second call to GetPool(t).

To make pooling [initialization] safe, Initialize() or modified
Mem::Init(), cannot call GetPool() directly or indirectly. GetPool()
should return a ready-to-use pool object. Fixed initialization functions
should not _need_ to call GetPool() because they deal with
initialization only.

>> The new Initialize() function is essentially a constructor for our
>> implicit array<MemAllocator*> type...

> meaning the only useful thing the Initialize() function does would be
> that memset().

Meaning that it has to initialize all pools that we currently initialize
from Mem::Init() plus any pools that we can easily initialize from
Mem::Init() but do not (if any). All other pools should not be obtained
via the GetPool() API and, hence, should not be a part of the "pools"
array tied to that API.

>>>> AFAICT, this combination will assert if memAllocate() (or any similar
>>>> external caller) is called "too early".

>>> That is intentional.

>> That intention is wrong IMO.
>>
>> Why intentionally delay pool initialization?

> Because only the caller code (same code using memAllocate BTW) knows
> what the pool parameters need to be initialized to. Including what
> object type the allocator pointed to will be.

Either that caller code should be inside Initialize() or it should not
use the "pools" array and its GetPool() API. Otherwise, we have no
guarantee that the pool initialization caller will get to that array
before a pool user will, resulting in the same kind of bugs we are
fixing now. "Do not return pointers to callers that expect a valid
object" is a simple/basic design principle. We should not be fighting
over it! Everything else in this discussion flows from that principle.

>> I see nothing related to
>> pools initialization that cannot be done when the pools are needed for
>> the first time. The Initialize() function mentioned above can do all the
>> initialization. The cache manager registration can wait if needed, of
>> course.

> They are. Just not by GetPool(t). The relevant class operator new()
> initializes an MemAllocator (aka pool) on first dynamic object creation
> and registers it (aka. sets the GetPool(t) pointer).

Class-specific allocators do not need access to the "pools" array. They
can use their own class-specific pool.

> The proposed patch solves both initialization and destruction cycles fine.

It does not. It leaves some pools uninitialized and, hence, exposed to
the use-before-initialization danger in the current or future code.

> The quibbling is over whether GetPools(t) should be (de-)optimized to
> return a pure reference or left with the original raw-pointer based design.

This is not about the return type as such. It is about having nil pools
and then asserting when those pools are accessed before they are
initialized. It is not quibbling. The current and proposed code is bad
and will cause problems sooner or later. You are free to ignore those
dangers if you wish, of course. I cannot force you to finish fixing what
you have started to fix.

>>> We can't return a reference here since the initializer functions make
>>> use of GetPool(t) to locate the array pointer to be set.
>>
>> In the fixed code I am talking about, the initialization happens before
>> GetPool() returns. Thus, GetPool() always returns a ready-to-use pool
>> and, hence, can return a reference to it.

> That means every call to GetPools() will have to be instrumented with
> the text label and size of the object wanted back. Just in case this
> call was the first ever use of the pool wanted.
...

Not at all. The "pools" array is initialized before first use by
Initialize() code similar to the current Mem::Init(). Any pool inside
the old "pools" array that cannot be initialized from Initialize() is
removed from "pools" (because it is not safe to keep it there) and is
moved where it can be safely initialized before (or upon) the first use.

> The current design of GetPool() has only one if-condition to check its
> array exists (or not).

Yes, and I do not propose to change that.

> The allocators are statically initialized pointer
> in per-class new() members, such that the initialization occurs only on
> first-use. Just not by GetPool() itself - which I think is fine.

Returning nil from GetPool() is not fine because GetPool() does not know
whether the caller is going to initialize that pool pointer or
dereference it. I do not think it is very difficult to adjust the code
so that we do not have to live with that risk, but you have the right to
refuse to do so. I am thankful that we stop now and not five iterations
earlier when the code was in a much worse shape. Somebody can always
come back to this later, when Squid starts crashing again.

Thank you,

Alex.