[squid-dev] Strategy about build farm nodes

Mon May 3 04:41:49 UTC 2021

On Wed, Apr 28, 2021 at 11:34 PM Alex Rousskov <
rousskov at measurement-factory.com> wrote:

> On 4/28/21 5:12 PM, Amos Jeffries wrote:
> > I'm not sure why this is so controversial still. We have already been
> > over these and have a policy from last time:
>
> Apparently, the recollections of what was agreed upon, if anything,
> during that "last time" differ. If you can provide a pointer to that
> "last time" agreement, please do so.
>

Merge workflows are agreed, and not in discussion. Recent discussions have
highlighted some issues with what's around them, and I'm trying to clarify
that as well

> * dev PR submissions use the volatile 5-pr-test, after test approval by
> > anyone in QA. Check against unstable OS nodes, as many as possible.
> > Kinkie adds/removes from that set as upgrades fix or break at CI end of
> > things.
>
> I do not know how to interpret the last sentence correctly, but, IMO, we
> should not add or change nodes if doing so breaks master tests. AFAICT
> from PR 806 discussion[1], Francesco thinks that it is not a big deal to
> do so. The current discussion is meant to resolve that disagreement.
>

Let me highlight the underlying principles for my proposal: IMO our
objectives are, in descending order of importance (all points should be
intended "as possible given our resources"):
1. ensure we ship healthy code to a maximum number of users
2. have minimal friction in the development workflow

These objectives have a set of consequences:
- we want our QA environment to match what users will use. For this reason,
it is not sensible that we just stop upgrading our QA nodes, or we would
target something that doesn't match our users' experience
- it makes little sense to target unstable distributions (fedora rawhide,
possibly centos stream, gentoo, opensuse tumbleweed, debian unstable) as
first-class citizens of the testing workflow, especially on stages that are
executed often (pr-test)

This means that:
- I periodically weed out distributions that are no longer supported (e.g.
Fedora 31, Ubuntu Xenial) and add current distribution (e.g. Ubuntu
Hirsute, Fedora 34).
I take it on me that when I do that, I need to ensure new compiler features
do not block previously undetected behaviours - I am currently failing
this, see https://build.squid-cache.org/job/trunk-matrix/121/ . I will need
to develop a process with a proper staging phase.
- I believe we should define four tiers of runtime environments, and
reflect these in our test setup:
 1. current and stable (e.g. ubuntu-latest-lts). These are not expected to
change much over a span of years, and to offer non-breaking updates over
their lifetime
 2. current (e.g. fedora 34)
 3. bleeding edge: they may introduce breaking changes which it makes sense
to follow because they might highlight real issues and because they will
eventually trickle down to current and then lts
 4. everything else - this includes freebsd and openbsd (mostly due to the
different virtualization tech they use)

I believe we should focus on the first two tiers for our merge workflow,
but then expect devs to fix any breakages in the third and fourth tiers if
caused by their PR, while I will care for any breakages caused by
dist-upgrades

> [1] https://github.com/squid-cache/squid/pull/806#issuecomment-827937563
>
>
> > * anubis auto branch tested by curated set of LTS stable nodes only.
>
> FWIW, the above list and the original list by Francesco appears to focus
> on distro stability, popularity, and other factors that are largely
> irrelevant to the disagreement at hand. The disagreement is whether it
> is OK to break master (and, hence, all PR) tests by changing CI. It does
> not matter whether that CI change comes from an upgrade of an "LTS
> stable node", "unstable node", or some other source IMO. Breaking
> changes should not be allowed (in the CI environments under our
> control). If they slip through despite careful monitoring for change
> effects, the breaking CI changes should be reverted.
>

I think it depends.
Breakages due to changes in nodes (e.g. introducing a new distro version)
would be on me and would not stop the merge workflow.
What I would place on each individual dev is the case where a PR breaks
something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test
and 5-pr-auto builds fail to detect the breakage because it happens on a
unstable or old platform.

-- 
    Francesco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20210503/17ff450a/attachment.htm>