[squid-dev] Timeouts for abandoned negative reviews

Alex Rousskov rousskov at measurement-factory.com
Sat Jan 11 00:20:50 UTC 2020


On 1/10/20 1:59 AM, Amos Jeffries wrote:
> On 9/01/20 11:20 am, Alex Rousskov wrote:
>>     Squid GitHub pull requests have the following problem: A core
>> developer can stall PR progress by submitting a negative review and then
>> ignoring the PR (despite others reminding them that the reviewer action
>> is required). Such stalled PRs cannot be merged because our policies
>> strictly prohibit merging of vetoed PRs.

> The "problem" you are describing is what I see as a core principle of
> the review process.

You may have missed the critical "and then..." part of my description. I
did not describe the ability to block a PR. I described the ability to
block and then ignore the PR. I called that "stalling". The ability to
block a PR (a fundamental feature) is very different from stalling (a
dereliction of duty).

[I am omitting the part of your response that discusses basic PR
blocking because this is not what I am talking about.]


>> Collection of meaningful stats is
>> prohibitively expensive, but there are ~50 PRs that were open for 100+
>> days and many of them are not abandoned/irrelevant. Here are some of the
>> worst examples (the stalled day counts below are approximate):
>>
>> * PR 369: stalled for 310 days (and counting)
>>   https://github.com/squid-cache/squid/pull/369
>>
>> * PR 59: 120-480 days, depending on when you think the PR was stalled
>>   https://github.com/squid-cache/squid/pull/59
>>
>> * PR 443: stalled for 100+ days (and counting)
>>   https://github.com/squid-cache/squid/pull/443


> Also, at a fundamental level I object to the categorization of any open
> PRs as "abandoned".

This term does not change the proposal AFAICT, but what term would you
prefer?


> Particular to this proposal I regularly review *all* PRs in a quick scan
> to see where progress can happen

It is good that you know when a PR needs your attention, but please note
that your scans have not solved the problem I am writing about. We need
to solve that problem.

The timeout might solve (or reduce) the problem not because of new
notifications, but because it sheds reviewer load. A reviewer who for
months sees authors pleading to explain what needs to be done but does
not respond is probably overloaded.


>> Stalled PRs may significantly increase development overheads and badly
>> reflect on the Squid Project as a whole. I have heard many complains
>> from contributors frustrated with the lack of Project responsiveness and
>> accountability. Stalled PRs have also been used in attempts to block
>> features from being added to the next numbered release.


> Interesting statement there. Particularly since you and I are
> essentially the only reviewers.
> 
> Are you admitting reason for PR 358 not being approved yet?

You seem to imply something nefarious on my part. Sorry, I do not know
what you are implying exactly, so I will just stick to the facts:

* PR 358 was approved a year ago and was open for about a day. I do not
think it is relevant here.

* If you meant PR 538, then I reviewed it within ~6 days of opening
(before seeing your squid-dev message), and Eduard helped you make
progress with this PR a few days earlier. And the PR is not even
blocked! Clearly, this is not a stalled PR by any stretch of the
imagination.


> I certainly have not done such underhanded politics. When I want to
> block a feature for next release I state so clearly in the PR. Though
> there is scant reason to block anything from merging to master when the
> code is good - no numbered releases come from there.

I am not sure why you are saying the above. You seem to be trying to
prove that you have not done something bad that I have done, but I do
not know what that something is, so I cannot respond in a meaningful way.


>> While 100-day stalling is unheard of in other open source projects I
>> know about, the problem with unresponsive reviewers is not unique to
>> Squid. The best (applicable to the Squid Project) solution that I know
>> of is a timeout:
>>
>> If the reviewer remains unresponsive for N days, their negative review
>> can be dismissed. The counting starts from a well-defined event S, and
>> there are at least two reminder comments addressed at the reviewer (R1
>> days after S and R2 days before the dismissal).
>>
>> Do you think timeouts can solve the problem with stalled PRs if Project
>> contributors do not attempt to game the system? Can you think of a
>> better solution?


> I do not think this will solve stalled PRs.

Why do you think timeouts will not solve stalled PRs? What do you think
will happen if we agree to enable timeouts (other than better
communication you have mentioned)?

Can you think of a solution that would work?


> It may lead to better communication when reviewers are forced to post
> regularly about why no progress. But do we really prefer PRs littered
> with a long history of that? or a clean history with the 'stalled' state
> easily found?

Sounds like a false dichotomy. I "prefer" that each blocked PR has a
clear, legitimate reason for remaining blocked and a clear way towards a
block removal (where such a removal is possible). IMO, a blocking
reviewer has a duty to engage with the PR author to get the PR into that
state. I hope we agree on this.

We could discuss whether hypothetical blocking reasons such as "I need
three months to review your changes" or "I need four months to find a
vulnerability in your changes" are legitimate (if clearly stated). I
think the correct outcome of such a discussion would be that they are
legitimate when there are no other core developers willing to carefully
approve the PR with a full understanding of the blocking reviewer
concerns (if any).


>  As the reviewer most likely to be hit by these notices, I much prefer
> the PR to end with the thing needing action  than to have to read its
> old history to figure out whether I can work on it immediately

Yes, I agree that increased notification noise is a possible negative
side effect of the timeout solution, but if we cannot find a better
solution, then I think that price is acceptable. Authors essentially
have to generate similar notifications already!


> We could use a tag set by the party suspecting a stall and unset by the
> reviewer when they next pay attention to the PR.

Sure, that procedure can work.

And just to avoid misunderstanding, please note that "paying attention"
is not what resets the timeout. A stalled PR remains stalled until the
reviewer provides sufficient information for the PR author to make progress.


> Also, IMO it will just change to a different type of stall, add the risk
> of Joe Blogs above, and revert Squid to the bygone situation where core
> developers were regularly commiting code that broke trunk/master because
> they personally thought it less broken than it was. We have this
> negative as a veto to protect against those nasties - and for the most
> part it is working.

The usefulness of a veto itself is undisputed. The "old way" did not
work well because committing without a second review was the norm. With
the timeout proposal, it is an _exceptional_ (and recognized as
unfortunate) situation with several safeguards preventing abuse.


> As I see the situation, the root cause is:
> 
> With multiple Factory personnel working paid (full time?) hours on Squid
> code there are a lot of large and complicated Factory PRs to review and
> only one non-Factory reviewer. When a non-Factory reviewer has an issue
> with any one PR it is quickly responded to (usually not fixed,
> apparently due to their commercial situation and sunken-costs?) and
> bumped back into waiting-for-reviewer status.

There is a lot of FUD here!

I will try to clarify what the "usually not fixed" part means:

* If the author knows what the reviewer wants and agrees that there is a
legitimate/in-scope/etc. problem, then the author fixes that problem, of
course.

* Otherwise, the author should ask for clarification or retraction. This
is a natural open source development process rather than a nefarious
scheme driven by "commercial situation and sunken-costs" (whatever that
means!).

Does that match your understanding of that "usually not fixed" statement?


> As I see the situation, the root cause is:
> 
> With multiple Factory personnel working paid (full time?) hours on Squid
> code there are a lot of large and complicated Factory PRs to review and
> only one non-Factory reviewer. When a non-Factory reviewer has an issue
> with any one PR it is quickly responded to (usually not fixed,
> apparently due to their commercial situation and sunken-costs?) and
> bumped back into waiting-for-reviewer status.
> 
> Non-Factory patches come along relatively rarely and quickly handled by
> the paid (almost-full time?) Factory reviewer with requests that require
> a lot of author work.
> 
> So IMO these two reviewer vs author stalls are not distinct, but
> directly linked to the originator of the PR.


I am doing my best to ignore the FUD parts to focus on understanding
what you see as the "root cause". I am having a hard time distilling the
above text into a root cause description. Is it that I request so many
modifications in your PRs that you do not have sufficient time left to
finish PR reviews?


> [ In the spirit of being open, Alex is that Factory reviewer and I that
> Non-Factory one. We both have quite a lot of overhead work like this
> discussion to spread our Squid time over, so maybe hours:minutes instead
> of days:hours depicted - I think the relative bias in time is the
> important factor there rather than absolute units. ]

I do not know what you mean by "days:hours depicted". If you mean that
absolute/calendar delays are not important, then I disagree: For my team
and for the folks waiting for the features to be merged, the absolute
wait time is very important (as well as overheads), especially when that
absolute time is measured not in days or weeks, but months and,
arguably, years.

Alex.


More information about the squid-dev mailing list