[squid-dev] [PATCH] Fix for bug 4190

Alex Rousskov rousskov at measurement-factory.com
Mon Sep 7 04:45:17 UTC 2015


On 09/06/2015 04:44 AM, Amos Jeffries wrote:
> On 6/09/2015 4:40 p.m., Alex Rousskov wrote:

>>> +class UserNameCache : public RegisteredRunner
>>
>> Why is this called UserNameCache? It looks like this is a _user_ cache
>> [...] not user _name_ cache

> Legacy: it caches Auth::User keyed by-name.

Since this is a new name (unused in the old code), we can pick the right
name -- no legacy issues here. If this is the only thing that caches
Auth::User, then "keyed by-name" is not important.


> If we are going to change this it should probably be to
> "CredentialsCache" I think.

Sounds good to me.


>>> +class UserNameCache : public RegisteredRunner
>>
>> Please separate cache operation (the UserNameCache class) and
>> external-to-caches cache object creation/cleanup/destruction/etc.
>> initiation code (creating a dedicated child of RegisteredRunner for
>> that). This is a good idea in general because it simplifies code, but it
>> is especially desirable here because you have multiple caches but need
>> only one Runner object to initiate their creation/cleanup/destruction/etc.


> No. We cant have one runner.

Please note that the above comment was not about having one runner. The
discussion below does not change what the above comment is about:
Separating the UserNameCache class from the RegisteredRunner API.


> The only safe way to check if a Cache() call is usable without creating
> a whole new cache as side effect is to check the current scheme config.

There is no reason to let a small problem create a bigger one. If the
static User::Cache() methods prevent you from doing something good
(i.e., using Runners interface properly), just fix those static methods.
AFAICT, it is trivial to avoid the side effect you are describing if needed.


> With one runner we hit a race condition between the scheme config being
> re-configured and the runner needing to decide if the cache is worth
> re-configuring or deleting.


That does not compute for me: If there is a race condition, it would
exist for one runner or three runners AFAICT, but I suspect this is not
important.

Also, please note that the runner does not "need to decide". If making
that decision elsewhere is better, then the runner can simply forward
main.cc events to all existing caches. That "elsewhere" code will decide
which cache needs to be created or deleted. I am not saying the Runner
cannot decide, I am just saying there are many options here, but none of
them require the [stable/simple] runner to be the same object as the
[volatile and complex] cache object.


> It would also mean even more hard-coding of what schemes exist and need
> to be managed by the runner, when schemes are supposed to be plug-n-play
> components. AIUI the whole point of the runner API was to avoid that
> kind of hard-coding.

The runner API is essentially about hiding modules from main.cc.
Hard-coding inside a runner class is OK (but is not required, of course).


> We can have N split runners controlling a cache each. Just doing nothing
> if a CbcPointer they hold is invalid or null. That should work. But not
> one runner juggling multiple optional schemes.

I am sure it is possible to make both "one runner" and "N runners"
design work. Pick the one you think is best overall.


> Doing the split though leads to direct code duplication. Since the
> runner is just calling 1:1 methods on the cache, both have to have
> identically matching lifetimes and ctor/dtor sequences need to line up
> their actions with inverse dependency :: creating cache creates runner,
> deleting runner deletes cache, creating runner early or deleting cache
> early causes problems).

I do not know authentication code as well as you do, but I recommend
registering one runner regardless of individual cache existence and
letting that single runner call configuration/cleanup methods of the
caches that happen to exist at the time of the configuration/cleanup event.

If that simple approach requires some code adjustments (e.g., a
User::Cache() implementation without side effects), then make those
adjustments.


>>> +void
>>> +UserNameCache::Cleanup(void *data)
>>> +{
>>> +    debugs(29, 5, "checkpoint");
>>> +    if (!cbdataReferenceValid(data))
>>> +        return;
>>
>> I believe you are using eventAdd() with cbdata protection so the
>> reference has to be valid. If I have not missed some case where data may
>> be invalid, please remove this if statement.


> Squid crashes on shutdown without this. 

One more reason to remove that should-not-be-there code and fix the bug
causing the crash instead of trying to hide that bug like this. The code
path that I see checks the validity of that pointer before calling the
callback, so the extra check should not be necessary [unless there is a
bug elsewhere or a different code path that should not exist]:

> bool
> EventDialer::canDial(AsyncCall &call)
> {
...
>     if (isLockedArg && !cbdataReferenceValid(theArg))
>         return call.cancel("stale handler data");
> }


> The above line is what implements the "dont do anything" logic when the
> event *does* reach the static callback.

That line is already inside event.cc (as quoted above). If you do need
to repeat it, something went wrong.


>>> +    eventDelete(&UserNameCache::Cleanup, this);
>>
>> This call will lead to debug_trap() if the event callback happens to be
>> scheduled already (and, hence, out of the event queue but not fired
>> yet). There is currently no safe way to delete events with arguments,
>> especially long-lived ones like cleanup. Do not delete them if possible.
>>
>> If the cache is no longer needed, delete the cache object itself. This
>> will prevent the cbdata-protected event from reaching the cache object.
>> If you cannot delete the cache object because of the bugs elsewhere,
>> perhaps you can use cleared store_ as a sign that the event should not
>> be processed any more?

> We can only delete reliably on final shutdown due to cachemgr things
> needing access to Cache(), which will re-allocate it all again uselessly.

If you are talking about UserNameCache::endingShutdown(), then Squid
will run at least one main loop iteration after that call (as guaranteed
by the Runners API). Thus, it is not safe to call eventDelete() there.

And again, if the proposed Cache() implementation is in the way of doing
the right thing, change Cache() and/or its callers. It is a new method
so you can make it do whatever you need.


> Which is part of the reason Cache was inheriting from RunnerRegistry. We
> have to have one runner per cache anyway, and the registry default
> finshShutdown() takes care of all the deletion and cbdata invalidations
> at the correct times without any new code.

I doubt endingShutdown() takes care of all the deletion and cbdata
invalidations at the correct times. If I saw no signs of problems in the
proposed code, I would be content with the "it seems to work" argument.
When I see problems, that argument alone is not enough, especially when
we are talking about such significant changes that affect multiple code
areas.


HTH,

Alex.



More information about the squid-dev mailing list