[squid-dev] Strategy about build farm nodes

Mon May 3 14:29:31 UTC 2021

On 5/3/21 12:41 AM, Francesco Chemolli wrote:
> - we want our QA environment to match what users will use. For this
> reason, it is not sensible that we just stop upgrading our QA nodes, 

I see flaws in reasoning, but I do agree with the conclusion -- yes, we
should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT!

The principles I have proposed allow upgrades that do not violate key
invariants. For example, if a proposed upgrade would break master, then
master has to be changed _before_ that upgrade actually happens, not
after. Upgrades must not break master.

What this means in terms of sysadmin steps for doing upgrades is up to
you. You are doing the hard work here, so you can optimize it the way
that works best for _you_. If really necessary, I would not even object
to trial upgrades (that may break master for an hour or two) as long as
you monitor the results and undo the breaking changes quickly and
proactively (without relying on my pleas to fix Jenkins to detect
breakages). I do not know what is feasible and what the best options
are, but, again, it is up to _you_ how to optimize this (while observing
the invariants).

> - I believe we should define four tiers of runtime environments, and
> reflect these in our test setup:

>  1. current and stable (e.g. ubuntu-latest-lts).
>  2. current (e.g. fedora 34)
>  3. bleeding edge
>  4. everything else - this includes freebsd and openbsd

I doubt this classification is important to anybody _outside_ this
discussion, so I am OK with whatever classification you propose to
satisfy your internal needs.

> I believe we should focus on the first two tiers for our merge workflow,
> but then expect devs to fix any breakages in the third and fourth tiers
> if caused by their PR,

FWIW, I do not understand what "focus" implies in this statement, and
why developers should _not_ "fix any breakages" revealed by the tests in
the first two tiers.

The rules I have in mind use two natural tiers:

* If a PR cannot pass a required CI test, that PR has to change before
it can be merged.

* If a PR cannot pass an optional CI test, it is up to PR author and
reviewers to decide what to do next.

These are very simple rules that do not require developer knowledge of
any complex test node tiers that we might define/use internally.

Needless to say, the rules assume that the tests themselves are correct.
If not, the broken tests need to be fixed (by the Squid Project) before
the first bullet/rule above can be meaningfully applied (the second one
is flexible enough to allow PR author and reviewers to ignore optional
test failures).

> Breakages due to changes in nodes (e.g. introducing a new distro
> version) would be on me and would not stop the merge workflow.

What you do internally to _avoid_ breakage is up to you, but the primary
goal is to _prevent_ CI breakage (rather than to keep CI nodes "up to
date"!). If CI is broken, then it is the responsibility of the Squid
Project to fix CI ASAP. Thank you for assuming that responsibility as
far as Jenkins tests are concerned.

There are many ways to break CI and detect those breakages, of course,
but if master cannot pass required tests after a CI change, then the
change broke CI.

> What I would place on each individual dev is the case where a PR breaks
> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
> trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test
> and 5-pr-auto builds fail to detect the breakage because it happens on a
> unstable or old platform. 

This feels a bit out of topic for me, but I think you are saying that
some CI tests called trunk-matrix, trunk-arm32-matrix,
trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be
classified as _required_. In other words, a PR must pass those CI tests
before it can be merged. Is that the situation today? Or are you
proposing some changes to the list of required CI tests? What are those
changes?

Thank you,

Alex.