[squid-dev] Strategy about build farm nodes

Mon May 17 06:17:35 UTC 2021

>
>
> Adding new nodes with next distro release versions is a manual process
> not related to keeping existing nodes up to date (which is automated?).
>

Mostly.
Our Linux environments are docker containers on amd64, armv7l and arm64.
On a roughly monthly cadence, I pull from our dockerfiles repo (
https://github.com/kinkie/dockerfiles) and
$ make all push
The resulting docker images are free for everybody to use and test things
on on any docker system
(https://hub.docker.com/r/squidcache/buildfarm). Just
$ docker run -ti --rm -u jenkins squidcache/buildfarm:$(uname -m)-<distro
name> /bin/bash -l
(note: the above command will not preserve any artifacts once the shell
exits)

Adding new Linux distros means copying and tweaking a Dockerfile, testing
things, and updating our Jenkins jobs. I do it roughly every 6 months

FreeBSD, OpenBSD and (hopefully soon) Windows are hand-managed and much
slower changing VMs

> >> What I would place on each individual dev is the case where a PR breaks
> >> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
> >> trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test
> >> and 5-pr-auto builds fail to detect the breakage because it happens on a
> >> unstable or old platform. >
> > This feels a bit out of topic for me, but I think you are saying that
> > some CI tests called trunk-matrix, trunk-arm32-matrix,
> > trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be
> > classified as _required_.
>
> That is how I read the statement too.
>

In a word of infinite resources and very efficient testing, sure.
But in a space where a single os/compiler combo takes 2hrs on Linux and
4hrs on Freebsd or openbsd, and a full 5-pr-test takes 6 hours end to end,
we need to optimize or making any of these requirements blocking would make
these times get 4+ times larger (a full trunk-matrix takes just about a day
on amd64, 2 days on arm64), and the complaint would then be that
development or release is slowed down by the amount of testing done.

My proposal aims to test/land the PR on the systems where we can be
efficient and that are relevant, and fix any remaining after-the-fact
issues with followup, PRs, that remain a soft requirement for the dev
introducing the change. The dev can test any work they're doing with the
anybranch-* jobs, if they don't have access to that OS

Can we do better? Sure.
For the sake of cost-mindedness a lot of the build farm nodes run in my
home - a couple of raspberry PIs, an Intel NUC, I'm in the process of
purchasing a second (and second-hand) NUC that comes with a Windows
license. The set up is meant to be thrifty, I'm mindful of burning
Foundation resources for little gain and I'm running a balancing act
between always-on VMs, on-demand VMs and my own gadgets.

The real game changer would be rethinking how we do things to reduce the
amount of testing needed.

For instance: we currently have a lot of different build-time
configurations meant to save core memory in runtime environments. Is it
maybe time to revisit this decision and move these checks to run time?
Unfortunately, one of the problems we have is that we're running blind. We
don't know what configurations our users deploy; we can only assume and
that makes this conversation much more susceptible to opinions and harder
to build consensus on

-- 
    Francesco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20210517/b38ef76f/attachment.htm>