[squid-dev] [PATCH] Bug 4438 - string pool refactoring

Wed Mar 30 17:15:12 UTC 2016

On 30/03/2016 8:19 a.m., Alex Rousskov wrote:
> On 03/29/2016 10:15 AM, Amos Jeffries wrote:
>> On 26/03/2016 3:28 a.m., Alex Rousskov wrote:
>>> the committed version is buggy IMHO
> 
> 
>> <http://pixy.kinkie.it/~kinkie/irc-logs/bot/index.php?d=2016-03-23>
>>
>> "
>> [20:07:04]
>> yadi
>> what I mean is the pools are MemPool class instances. if they get
>> destructed on exit() while some other class is still not-yet destructed
>> and using pooled memory we have trouble.
>>
>> [21:24:58]
>> rousskov
>> yadi, I know that you have meant that. What I am saying is that MemPool
>> cannot be destructed until there are no allocations alive.
>> "
>>
>> I may still be misunderstanding you. I took that as meaning we had it right.
> 
> The committed code was wrong. My subsequent "cannot be destructed" IRC
> comment above should be interpreted as "Squid should make premature
> MemPools Instance destruction impossible; it is currently possible".
> 
> 
>> The only other object which might share this problem is GetPool(t). Its
>> object is POD though.
> 
> I agree that the "pools" variable inside GetPool() is fine as far as
> initialization/destruction order is concerned.
> 
> 
>> +    static MemPools *Instance = nullptr;
>> +    if (!Instance) {
>> +        Instance = new MemPools;
>> +    }
> 
> 
> It is much easier (and faster) to just say:
> 
>   static MemPools *Instance = new MemPools;
> 
> 
>>      /* Must use this idiom, as we can be double-initialised
>>       * if we are called during static initialisations.
>>       */
> 
> I recommend removing that comment. It misleads more than it helps IMO.
> If you want to put a comment there, say something like
> 
>   // We must initialize on first use (which may happen during static
>   // initialization) and preserve until the last user is gone (which
>   // may happen long after main() exit). We currently preserve forever.
>   static MemPools *Instance = new MemPools;
> 

Done.

> 
> 
>>     static bool initialized = false;
>>
>>     if (!initialized) {
>>         memset(pools, '\0', sizeof(pools));
>>         initialized = true;
>>     }
> 
> This committed code does nothing useful: Static variables, such as
> "pools" are zero-initialized by default and that happens at constant
> initialization time.

Not necessarily. That is OS and allocator library dependent behaviour.

For exmple, my compiler disagrees with you. use-before-initialization on
the return statement for any pool pointer not initialized by Mem::Init()
when the memset() is removed.

> 
> The "initialized" flag is useful for the changes in your patch, but the
> the comment in the patch would become misleading after the memset() call
> is removed because the flag essentially applies to the Mem::Init() call
> and not the "pools" variable. In other words, the final code should look
> something like this:
> 
> 
> static MemAllocator &

In order to use a reference here MemAllocator will have to forgo
constructor parameters, and move its initialization into a virtual
method. MEMPROXY_CLASS macro and MemPool::GetInstance().create method
will both have to be updated to get their object from GetPool(t) then
check if it has been initialized, then call that method on it. Which is
far more complicated than what we have now.

Note that to construct a MemAllocator child one must provide a name
label and the size of objects it will produce.

By returning a reference to pointer here (possibly a nullptr) we retain
all the existing behaviour and code throughout Squid creating pools.

> GetPool(size_t type)
> {
>     static MemAllocator *pools[MEM_MAX];
> 
>     static bool initialized = false;
>     if (!initialized) {
>         Initialize(pools);
>         initialized = true;

That was the first thing I tried. Well, using Mem::Init() as the
Initialize() anyway. It results in recursion blowing out the stack until
SEGFAULT.

>     }
> 
>     MemAllocator *pool = pools[type];
>     assert(pool);
>     return *pool;
> }
> 
> Where the static Initialize() method initializes all pools in "pools"
> without calling GetPool(). If you want to add another safety check, then
> add an assertion (to the Initialize() function) that Initialize() is not
> called twice.

The Mem::Init() method does not use GetPool() directly, it uses
GetInstance().create which allocates a new object and self-registers the
raw-pointer using a second call to GetPool(t).

==> the MemAllocator does not live inside the GetPool() array. That
array is just a global registry of pointers to which allocators are active.

==> Other code outside of libmem also uses
MemPools::GetInstance().create() to initialize both their own per-type
allocator and the pools it is allocating from.

==> There may be multiple types of allocator operating depending on a)
what the MEMPOOLS environment variable was set to at the time each
allocator was first created, and b) what squid.conf pools on/off setting
was at most recent reconfigure predating the classes first use.

==> allocator lifetimes may need to predate or exceed libmem lifetime.
Particularly lasting until its owner class X's own .o unit destruction
during shutdown.

> 
> The new Initialize() function is essentially a constructor for our
> implicit array<MemAllocator*> type...

meaning the only useful thing the Initialize() function does would be
that memset().

> 
> If you are sure that modern cache manager supports registrations during
> static initializations, then Initialize() can be Mem::Init(). In that
> case, do not make Mem::Init() public. If you are not sure, then split
> memory initialization (in new private Initialize()) and cache manager
> registration (still in Mem::Init()).
> 
> 
>>>> +static MemAllocator *&
>>>> +GetPool(size_t type)
>>>> +{
>>>> +    static MemAllocator *pools[MEM_MAX];
>>>> +    static bool initialized = false;
>>>> +
>>>> +    if (!initialized) {
>>>> +        memset(pools, '\0', sizeof(pools));
>>>> +        initialized = true;
>>>> +    }
>>>> +
>>>> +    return pools[type];
>>>> +}
>>>
>>>>  /* find appropriate pool and use it (pools always init buffer with 0s) */
>>>>  void *
>>>>  memAllocate(mem_type type)
>>>>  {
>>>> -    assert(MemPools[type]);
>>>> -    return MemPools[type]->alloc();
>>>> +    assert(GetPool(type));
>>>> +    return GetPool(type)->alloc();
>>>>  }
>>>
>>> AFAICT, this combination will assert if memAllocate() (or any similar
>>> external caller) is called "too early".
> 
> 
>> That is intentional.
> 
> That intention is wrong IMO.
> 
> Why intentionally delay pool initialization?

Because only the caller code (same code using memAllocate BTW) knows
what the pool parameters need to be initialized to. Including what
object type the allocator pointed to will be.

> I see nothing related to
> pools initialization that cannot be done when the pools are needed for
> the first time. The Initialize() function mentioned above can do all the
> initialization. The cache manager registration can wait if needed, of
> course.

They are. Just not by GetPool(t). The relevant class operator new()
initializes an MemAllocator (aka pool) on first dynamic object creation
and registers it (aka. sets the GetPool(t) pointer).

Also, this way the statistics is not inadvertently initializing unused
pools when it walks the GetPool(t) 't' values.

> 
> 
>> @@ -69,12 +69,15 @@
>>      if (!initialized) {
>>          memset(pools, '\0', sizeof(pools));
>>          initialized = true;
>> +        // Mem::Init() makes use of GetPool(type) to initialize
>> +        // the actual pools. So must come after the flag is true
>> +        Mem::Init();
>>      }
> 
> 
>> I've made the situation a little bit safer by calling Mem::Init() in the
>> initialization branch of GetPool(). 
> 
> This change was the right step. Please make sure cache manager
> registration is safe at static initialization time OR move cache manager
> registration call elsewhere (e.g., leaving registration in public
> Mem::Init() while private Initialize() focuses on memory initialization
> alone).

Done. It was using a member static. I have moved that member into the
GetInstance() method, which will resolve any issues we might have had.

> 
>> But the risk still exists for pools
>> which are not initialized by Mem::init(). 
> 
> Yes, and that risk should be addressed IMO! Can we move those few
> "stray" memDataInit() calls into Initialize()? They only need class
> sizes so perhaps #including the necessary headers will not cause any new
> linking problems, but I do not know that.
> 
> If "place all memDataInit() calls in Initialize()" strategy does create
> linking problems, then each module causing these linking problems should
> probably have its own dedicated memory pool object without going through
> centralized initialization. That dedicated pool object must be
> initialized upon first use as well, of course. This option requires more
> work to avoid code duplication, but I do not think it would be very
> difficult to implement if the simpler strategy fails.
> 
> Besides the two alternatives above, I cannot think of other ways to
> fully solve the initialization order problem. Better suggestions are
> welcomed, of course.
> 

The proposed patch solves both initialization and destruction cycles fine.

The quibbling is over whether GetPools(t) should be (de-)optimized to
return a pure reference or left with the original raw-pointer based design.

Converting to a pure reference means re-architecting the whole allocator
ownership and lifetime design. Possibly for the worse as the allocators
lifetime will then match libmem instead of whatever lifetime the .o/.la
unit using it has.

> 
>> Having an assert for that case
>> is still best as it highlights that the code being changed is broken
>> when the dev runs the change.
> 
> Why is an assert (that may or may not trigger for the developer who is
> changing the code) is better than as-needed or on-first-use initialization?
> 

This one is better because any object type using this function needs a
redesign to use the newer on-first-use API of libmem. Not the old
after-Mem::Init() API.
This assert will only blow up in the fact of the coder trying to make a
class do something the whole class is not designed to do.

> 
>> We can't return a reference here since the initializer functions make
>> use of GetPool(t) to locate the array pointer to be set.
> 
> In the fixed code I am talking about, the initialization happens before
> GetPool() returns. Thus, GetPool() always returns a ready-to-use pool
> and, hence, can return a reference to it.

That means every call to GetPools() will have to be instrumented with
the text label and size of the object wanted back. Just in case this
call was the first ever use of the pool wanted. And GetPools() will have
two if-condition to check for a) pointer array initialization, and b)
individual pool initialization.

GetPool() gets called *a lot* (millions of times per minute) so doubling
the number of if-conditionals tested is not trivial. And it may not even
be possible to pass the correct parameters to all GetPool(t) uses (ie
during statistics dumping).

The current design of GetPool() has only one if-condition to check its
array exists (or not). The allocators are statically initialized pointer
in per-class new() members, such that the initialization occurs only on
first-use. Just not by GetPool() itself - which I think is fine.

Amos