Continuous Integration

Started2022-12-06
Decided2023-01-27
Last amended2023-04-04

Why

The sudden decommission of Hydra forces us to revisit our Continuous Integration (CI) setup.

Decision

We predominantly rely on Buildkite as CI system.

We build artifacts and run checks on them. Artifacts include compiled executables, but also the source code itself. Checks include unit and integration tests, but also source code linters. We specify our artifacts in `flake.nix`, and most of our checks, too.

We perform these builds and checks with different granularity β€” in order to keep our computing resources within reasonable limits, we don’t automatically check everything on every commit. Instead, the granularities are:

  • post-commit = (at most once) after each commit; for a quick sanity check after a push
  • pre-merge = before merging each pull request; for a reasonably complete, automated check that master will satisfy functional requirements
  • post-merge = after merging each pull request to master; for an exhaustive check that master satisfies functional requirements on all platforms
  • nightly = every night; for an exhaustive, automated check of functional and non-functional requirements

The following table lists our artifacts, the checks performed on them (`.` for build), the granularity at which the check is performed, and the CI system used for doing that:

ArtifactCheckGranularityCI System(Status)
Source codeCode formatting stylepost-commitBuildkiteπŸ”΅
Documentation.post-commitGithub ActionπŸ”΅
Pull request (PR)Mergeable to master concurrently with other PRspre-mergeBorsπŸ”΅
Compiled modules.post-commitBuildkiteπŸ”΅
Unit tests (linux)post-commitBuildkiteπŸ”΅
Unit tests (macos)post-mergeBuildkiteπŸ”΅
Unit tests (windows)nightlyGithub ActionπŸ”΅
Executables / Release archive. (linux)pre-mergeBuildkite🟑ADP-2502
. (macos)post-mergeBuildkiteπŸ”΅
. (windows, cross-compiled)pre-mergeBuildkiteπŸ”΅
ExecutablesIntegration tests (linux)pre-mergeBuildkiteπŸ”΅
Integration tests (macos)BuildkiteπŸ”΄ADP-2522
Integration tests (windows)Github ActionπŸ”΄ADP-2517
BenchmarksnightlyBuildkiteπŸ”΅
Release archiveE2E testsnightlyGithub ActionπŸ”΅
Docker image.post-commitBuildkiteπŸ”΅

Legend: Status πŸ”΅= working; 🟑= needs work; πŸ”΄= not working

Details

Granularity

  • Granularity refers to automatic actions taken by the CI system. It should be possible to trigger a build or check manually at any time.
  • The purpose of granularity is to conserve computing resources β€” in a world with infinite resources, the system would perform every build and check on every commit.
  • The name of the granularity β€œpost-commit” was chosen for brevity β€” the action is performed automatically on the latest commit after a `git push`, not on the git commits in between. In other words, the action is performed at most once per commit.
  • We use the β€œpost-merge” granularity for actions that
    • consume scarce resources and have a high chance of failing, e.g. builds and checks on macOS
  • We use the β€œnightly” granularity for actions that
    • consume many resources, e.g. benchmarks

CI System

As a general rule, we choose

  • Github Actions for actions that
    • are very simple and do not require a nix store / environment
    • run on Windows
  • Buildkite otherwise
    • especially for actions that require a nix store

We have a tension where we have to set up some checks (e.g. unit tests) in two different environments due to different availability of operating systems:

  • Linux, macOS β€” in Buildkite
  • Windows β€” in Github Actions

We hope to address this tension by requesting a Windows machine for use with Buildkite.

Platform macOS

At the time of writing, we have two mac-mini machines that act as Buildkite agents. Unfortunately, they are frequently overloaded and fail the builds or checks. Hence, we only use granularity β€œpost-merge” or β€œnightly” for them.

Company Processes

For developing and maintaining our CI, we may use DevX/SRE expertise from IOG.

  • Our tribe is responsible for choosing our CI tooling
  • Our tribe should have a process for getting DevX/SRE support
  • Our tribes’ DevX/SRE resources can help teach us how to debug problems that arise
  • Link to the SRE Chapter of IOG

Rationale

Artifacts and checks

The two main concerns of a CI pipeline are: building artifacts and running checks.

The purpose of building an artifact is to produce, say, an executable or HTML. The purpose of running a check is to check that the artifact satisfies certain properties, e.g. all unit tests pass.

Different CI systems, like Hydra, Cicero, Buildkite or Github Actions, have a different focus regarding these concerns.

  • The world view of Hydra is that everything is about building artifacts. Hydra was surprisingly successful as a CI tool, because this world view can be used for running checks, too β€” they can be expressed as trivial artifacts, where success of the check is equivalent to success of building `()`, and failure of the check is equivalent to failure of the artifact build.
  • The world view of Github Actions, Buildkite or Cicero is that everything is about running checks. The drawback is that building artifacts is more difficult and we have problems managing the build cache.

For us, the main takeaway is that we should try to separate these concerns clearly.

Our artifacts include: source code and compiled executables. We have different checks on these: Linters and style checkers on the source code, unit and integration tests on the executables.

As we are coming from Hydra, compiling executables is easiest to do through a cached nix store. At the moment, it looks like only Buildkite has good support for that; hence we choose Buildkite.

Our options for CI system

Buildkite

  • Pro β€” Good at artifacts, working nix cache
  • Pro β€” Good documentation, easy to write
  • Con β€” Dependency on machine (currently provided by SRE / Samuel Leathers)
  • Con β€” no Windows machine
  • Con β€” Dependency on permissions (currently only SRE / Samuel Leathers has write permission)

In a pinch, the dependencies can be solved by forking the repository and providing our own machines.

Github Action

  • Neutral β€” Good at small actions, but problems at scale
  • Neutral β€” Good documentation, but a bit cumbersome to write
  • Pro β€” No dependency on machine
  • Pro β€” Windows machine
  • Pro β€” No dependency on permissions

Cicero

  • Con β€” Poor at artifacts, nix cache currently not working properly
  • Con β€” Poor documentation
  • Neutral β€” Dependency on machine (provided by SRE, but they have long-time commitment)
  • Con β€” no Windows machine
  • Pro β€” No dependency on permission

References

[1] G Kim, K Behr, G Spafford; The Phoenix Project; IT Revolution Press (2013). A business novel about the DevOps movement: make the flow of work visible and automate it, to an extreme of, say, 30 releases per day.

[2] Cicero on Github

Scratchbook

Random Findings

Installing Nix with the `cachix/install-nix-action` Github Action: https://github.com/input-output-hk/cardano-node/blob/db396b163af615aa89286aa985583ef8843cfcde/.github/workflows/check-mainnet-config.yml#L16-L23

Documentation Findings

Cicero

Cicero = An engine for executing actions. An β€œaction” is an arbitrary program (Bash, Python, Nix, …) that is run in the Nomad execution environment.

Tullia = A domain specific language, embedded in the Nix language, for expressing actions to be run with Cicero. This is useful when writing Cicero actions that mainly build stuff with Nix.