[squid-dev] [PATCH] Bug 4438 - string pool refactoring

Tue Mar 29 19:19:13 UTC 2016

On 03/29/2016 10:15 AM, Amos Jeffries wrote:
> On 26/03/2016 3:28 a.m., Alex Rousskov wrote:
>> the committed version is buggy IMHO

> <http://pixy.kinkie.it/~kinkie/irc-logs/bot/index.php?d=2016-03-23>
> 
> "
> [20:07:04]
> yadi
> what I mean is the pools are MemPool class instances. if they get
> destructed on exit() while some other class is still not-yet destructed
> and using pooled memory we have trouble.
> 
> [21:24:58]
> rousskov
> yadi, I know that you have meant that. What I am saying is that MemPool
> cannot be destructed until there are no allocations alive.
> "
> 
> I may still be misunderstanding you. I took that as meaning we had it right.

The committed code was wrong. My subsequent "cannot be destructed" IRC
comment above should be interpreted as "Squid should make premature
MemPools Instance destruction impossible; it is currently possible".

> The only other object which might share this problem is GetPool(t). Its
> object is POD though.

I agree that the "pools" variable inside GetPool() is fine as far as
initialization/destruction order is concerned.

> +    static MemPools *Instance = nullptr;
> +    if (!Instance) {
> +        Instance = new MemPools;
> +    }

It is much easier (and faster) to just say:

  static MemPools *Instance = new MemPools;

>      /* Must use this idiom, as we can be double-initialised
>       * if we are called during static initialisations.
>       */

I recommend removing that comment. It misleads more than it helps IMO.
If you want to put a comment there, say something like

  // We must initialize on first use (which may happen during static
  // initialization) and preserve until the last user is gone (which
  // may happen long after main() exit). We currently preserve forever.
  static MemPools *Instance = new MemPools;

>     static bool initialized = false;
> 
>     if (!initialized) {
>         memset(pools, '\0', sizeof(pools));
>         initialized = true;
>     }

This committed code does nothing useful: Static variables, such as
"pools" are zero-initialized by default and that happens at constant
initialization time.

The "initialized" flag is useful for the changes in your patch, but the
the comment in the patch would become misleading after the memset() call
is removed because the flag essentially applies to the Mem::Init() call
and not the "pools" variable. In other words, the final code should look
something like this:

static MemAllocator &
GetPool(size_t type)
{
    static MemAllocator *pools[MEM_MAX];

    static bool initialized = false;
    if (!initialized) {
        Initialize(pools);
        initialized = true;
    }

    MemAllocator *pool = pools[type];
    assert(pool);
    return *pool;
}

Where the static Initialize() method initializes all pools in "pools"
without calling GetPool(). If you want to add another safety check, then
add an assertion (to the Initialize() function) that Initialize() is not
called twice.

The new Initialize() function is essentially a constructor for our
implicit array<MemAllocator*> type...

If you are sure that modern cache manager supports registrations during
static initializations, then Initialize() can be Mem::Init(). In that
case, do not make Mem::Init() public. If you are not sure, then split
memory initialization (in new private Initialize()) and cache manager
registration (still in Mem::Init()).

>>> +static MemAllocator *&
>>> +GetPool(size_t type)
>>> +{
>>> +    static MemAllocator *pools[MEM_MAX];
>>> +    static bool initialized = false;
>>> +
>>> +    if (!initialized) {
>>> +        memset(pools, '\0', sizeof(pools));
>>> +        initialized = true;
>>> +    }
>>> +
>>> +    return pools[type];
>>> +}
>>
>>>  /* find appropriate pool and use it (pools always init buffer with 0s) */
>>>  void *
>>>  memAllocate(mem_type type)
>>>  {
>>> -    assert(MemPools[type]);
>>> -    return MemPools[type]->alloc();
>>> +    assert(GetPool(type));
>>> +    return GetPool(type)->alloc();
>>>  }
>>
>> AFAICT, this combination will assert if memAllocate() (or any similar
>> external caller) is called "too early".

> That is intentional.

That intention is wrong IMO.

Why intentionally delay pool initialization? I see nothing related to
pools initialization that cannot be done when the pools are needed for
the first time. The Initialize() function mentioned above can do all the
initialization. The cache manager registration can wait if needed, of
course.

> @@ -69,12 +69,15 @@
>      if (!initialized) {
>          memset(pools, '\0', sizeof(pools));
>          initialized = true;
> +        // Mem::Init() makes use of GetPool(type) to initialize
> +        // the actual pools. So must come after the flag is true
> +        Mem::Init();
>      }

> I've made the situation a little bit safer by calling Mem::Init() in the
> initialization branch of GetPool(). 

This change was the right step. Please make sure cache manager
registration is safe at static initialization time OR move cache manager
registration call elsewhere (e.g., leaving registration in public
Mem::Init() while private Initialize() focuses on memory initialization
alone).

> But the risk still exists for pools
> which are not initialized by Mem::init(). 

Yes, and that risk should be addressed IMO! Can we move those few
"stray" memDataInit() calls into Initialize()? They only need class
sizes so perhaps #including the necessary headers will not cause any new
linking problems, but I do not know that.

If "place all memDataInit() calls in Initialize()" strategy does create
linking problems, then each module causing these linking problems should
probably have its own dedicated memory pool object without going through
centralized initialization. That dedicated pool object must be
initialized upon first use as well, of course. This option requires more
work to avoid code duplication, but I do not think it would be very
difficult to implement if the simpler strategy fails.

Besides the two alternatives above, I cannot think of other ways to
fully solve the initialization order problem. Better suggestions are
welcomed, of course.

> Having an assert for that case
> is still best as it highlights that the code being changed is broken
> when the dev runs the change.

Why is an assert (that may or may not trigger for the developer who is
changing the code) is better than as-needed or on-first-use initialization?

> We can't return a reference here since the initializer functions make
> use of GetPool(t) to locate the array pointer to be set.

In the fixed code I am talking about, the initialization happens before
GetPool() returns. Thus, GetPool() always returns a ready-to-use pool
and, hence, can return a reference to it.

HTH,

Alex.