Mini Git Docker Pipeline

Watch git repositories and trigger arbitrary pipelines such as docker build and run commands. Light on runtime resources and configuration overhead.

Often times, CI/CD pipeline runners are integrated into a whole software development platform that come with various other features such as ticketing, wikis, and repository management. Even more standalone approaches – e.g., Jenkins – come with heavy dependencies, considerable resource usage, and configuration effort. The additional complexity however allows an online configuration Web application in general, advanced user and secret management, remote and loadbalanced agents, stage dependencies, live build logtail, etc.

All this is not strictly needed and not in the scope of this project. Instead, the focus lies on the core functionality that is widely considered standard:

Watch git repositories and run Docker or custom handler scripts on the checkout.

Which actual build or deployment system to use is a matter of the built images and container runs as outcome. Apart from that, live build logs and a writable artifacts directory are locally available and can be post-processed or served when needed. Further features of pipelined include:

Simple, convenient, and easily deployable configuration file with seamless reconfigure upon changes.
Using standard git, docker, and rsync commands, as they are configured for a certain user on the host.
Support for parallel execution of enqueued build jobs.
Implementation as single standalone Python script without further dependencies.
Single process with async i/o handling and negligible resource usage for itself.

Mode of Operation

Once started, the pipelined daemon periodically checks the given git remotes for new branches or tags. If those match the configuration, corresponding build jobs are enqueued.

Background build jobs are executed in parallel up to a certain global limit and not affected by a reconfiguration signal. At any time, the following process signals are handled:

SIGHUP: Reload project configuration file without interruption, i.e., already enqueued or running jobs can seamlessly continue.
SIGINT: Don’t accept new jobs, wait for pending ones to complete (if any), and exit.
SIGQUIT, SIGTERM: Cancel possibly running jobs and exit timely.

The first build step is to checkout the corresponding commit into a local build directory, based either on a full or shallow clone. In general, all git and docker commands are executed by calling their respective binaries, with their output captured, possibly parsed, and piped to a live build log file.

If enabled, docker build and run form the actual build steps, using the project’s Dockerfile. The build can be used as either the main “target” that produces a tagged image, or merely as mostly cached no-op to prepare dependencies and an entrypoint for running the image as container. Apart from the checkout directory, the run has a “drop” directory mounted, which can be used to persist artifacts. Optionally, the checkout (or artifact) directory can be mirrored to a local directory or remote destination, as supported by rsync.

By this means, arbitrary build environments and pipeline definition approaches are possible, given they can be installed and run via Docker. Additionally, custom hooks via callback scripts are supported.

A build directory with the checkout, artifacts directory, and a command log is left behind and can be processed or served at will.

Configuration

All handled projects and their settings are defined in a single self-contained configuration file. A single project example that uses all available flags mostly with defaults could look like:

[example-project]

# enable or ignore the whole section
;enabled=True

# git remote to watch and checkout
remote=git@git.example.com:example-project.git
# shallow or full clone upon build
;shallow=True

# which ref/heads/ references should be watched and built, regular expression
branches=master|main|[0-9]+-.*
# which ref/tags/ tags should be watched and built, regular expression
;tags=

# additional environment variables from file or inline, also explicitly passed as build ARG and run ENV
;env_file=/dev/null
;env=
;    FOO=bar
;    BAR=baz

# dockerfile to use, relative to checkout directory
;dockerfile=Dockerfile
# allow to pull new upstream images during build
;pull=True
# enable docker build layer cache
;cache=True

# build a container image, enables or disables docker overall
;build=False
# run the container after building the image
;run=False

# mount /var/run/docker.sock, with severe security implications
;docker_in_docker=False
# mount the root image read-only when running
;read_only=True
# mount a writable /tmp, size in percent
;tmpfs=50
# tag the image, not done by default
;tag=${BUILD_PROJECT}-${BUILD_SHORT_COMMIT}
# add a label to the image and container, none by default
;label=com.example.pipeline=True

# enable mirror to local directory or remote destination
;rsync.destination=/var/www/html
# enable destination-side deletion
;rsync.delete=False
# compare by checksum, not modification time and size
;rsync.checksum=False
# preserve user, permissions, and times - this usually requires root
;rsync.preserve=False
# include file, absolute path or relative to checkout directory
;rsync.include=MANIFEST.in

# git, docker, and callback timeout, none per default
;timeout=3600
# executed after checkout, before build, if any
;prepare_callback=/usr/bin/env
# executed at end of until then successful build
;success_callback=/usr/local/bin/new-build.sh
# executed upon error
;error_callback=/usr/local/bin/failed-build.sh

The file format follows the standard Python ini syntax, which also allows a special DEFAULT section and config-time interpolation when needed. For runtime interpolation and environment variables, see below.

Environment Variables

The configuration allows to set additional environment variables via inline definitions or from a separate file. Tags and labels support $-based (possibly braced) interpolation, as provided by the standard Python string template implementation.

Overall, the available runtime variables are, with increasing precedence:

System environment, as inherited from the main process
Current build: BUILD_REF, BUILD_BRANCH, BUILD_TAG, BUILD_COMMIT, BUILD_SHORT_COMMIT
Project information: BUILD_PROJECT, BUILD_REMOTE
Temporary build directory with checkout and artifacts: BUILD_DIR
Per-project configuration variables

The full environment is passed along to all actual git, docker, rsync, and callback commands. Together with the per-project variables, this allows to globally or selectively override their configuration or behaviour.

Default and project variables are also exposed to docker as ARG and ENV for image builds and container runs, respectively.

Usage

The project configuration file is passed to the main script and can be reloaded dynamically via SIGHUP:

usage: pipelined.py [-h] [--limit NUM] [--interval SECONDS] --workdir PATH --config CONFIG.INI [--dump-config]

Watch git repositories and trigger arbitrary pipelines as docker build and run commands. Light on runtime resources and configuration overhead.

options:
  -h, --help           show this help message and exit
  --limit NUM          parallel build limit (default: 1)
  --interval SECONDS   git remote check interval (default: 60)
  --workdir PATH       output and state directory (default: None)
  --config CONFIG.INI  project configuration file (default: None)
  --dump-config        dump config and exit (default: False)

There is no theoretical upper limit on parallel jobs, it should be chosen according to the host resources available for Docker commands. Each project’s remote is polled for new refs in the given interval in seconds. The work directory is used for checkouts, artifacts, and build logs – see below. As any other file or subdirectory, it will be created when needed respecting the running user’s umask.

Installation

As there are no additional requirements needed apart from a standard Python 3.8+ installation, the script can simply be copied for example to /usr/local/bin/pipelined. Otherwise, only a project configuration file and state directory are needed, such as /etc/pipelined.ini and /var/pipelined, respectively.

Calls to git and docker should be “transparent”, which simply means the user that pipelined runs as must be correspondingly configured.

Assuming the git repositories are accessed via SSH, a corresponding keypair must exist that is authorized by the remotes. Usually, this involves the user’s .ssh directory, possibly the local or global .gitconfig, and variables such as GIT_SSL_NO_VERIFY.
The user must have permissions on the docker socket. If not running as super user, there might be a docker group for this purpose.
Several git, docker, and rsync options or even config locations can be overridden by environment variables. These can also be set in the project configuration file.
The process should run with an appropriately secure user umask, such as 027 (that prevents any other access) or at least 022 (that allows others to read only).

It should be possible to setup and run pipelined in docker itself. However, this might come with little gain, as only very few dependencies are needed but full access to the docker socket is required. Alternatively, a simple systemd unit can be used to run the daemon:

[Unit]
Description=git docker pipelined
BindsTo=docker.socket
After=docker.socket

[Service]
User=root
Group=root
UMask=0027

# remove failed, incomplete, and old builds
#ExecStartPre=-/usr/bin/find /var/pipelined -mindepth 3 -maxdepth 3 -type d -name '*.tmp' -print -exec rm -rf {} \;
#ExecStartPre=-/usr/bin/find /var/pipelined -mindepth 3 -maxdepth 3 -type d -mtime +30 -print -exec rm -rf {} \;
# could possibly filter by label but might make sense to do globally anyway
#ExecStartPre=-/usr/bin/docker container prune --filter until=720h --force
#ExecStartPre=-/usr/bin/docker image prune --filter until=720h --force

Type=exec
ExecStart=/usr/local/bin/pipelined --limit 1 --interval 60 --workdir /var/pipelined --config /etc/pipelined.ini
ExecReload=/usr/bin/kill -HUP $MAINPID
RestartKillSignal=SIGINT
KillSignal=SIGTERM
KillMode=process

[Install]
WantedBy=multi-user.target

Maintenance of build directories and container images is not done automatically. If storage is a concern, old data can be pruned for example as suggested in the service file above or by using the custom callbacks.

/var/pipelined/
 \_ example-project/refs%2Fheads%2Fmaster/5bf417f5.tmp/
     \_ build.log
     \_ drop/
     |   \_ … (artifacts)
     \_ build/
         \_ .git/
         \_ … (checkout)

The (url encoded) per-commit state directory contains the build log (command outputs), the checkout (and working) directory, and the writable volume mount that can be used for artifacts. A .tmp suffix is added during build (and kept for failed builds), which gets removed upon success.

Command Reference

For reference and testing, all calls to git and docker are documented below. If these bare commands work for the given user, so should the pipeline, as environment and permissions are inherited. Actually executed commandlines can be seen in the respective build log.

Using the custom callback handlers to run code from within the checkout should be considered a security risk, similarly severe as mounting the docker socket. It is thus recommended to use these only for local scripts and operations that cannot be done from within the docker stages.

Git

Periodically in the configured interval, each project is checked for new commits using ls-remote:

git --no-pager ls-remote --heads --tags --refs --quiet -- BUILD_REMOTE

The output gets parsed for commit hashes for refs/heads/ or refs/tags/ references. Fetching or cloning a specific commit is not always supported or allowed. So a “standard” clone is done instead:

git --no-pager -C RUN_PATH clone --no-local --no-hardlinks --recurse-submodules --verbose -- BUILD_REMOTE build

If shallow cloning is enabled, the following flags are added:

… --depth 1 --shallow-submodules --no-tags --single-branch --branch BRANCH_OR_TAG …

Followed by a checkout to switch to the commit in question:

git --no-pager -C BUILD_PATH -c advice.detachedHead=false checkout --recurse-submodules --force --no-guess --detach COMMIT

Note that this might fail on a shallow clone if there is a new HEAD in the meanwhile, skipping the build.

Docker

The minimal build command – if configured – uses the project’s Dockerfile from within the checkout directory:

docker build --rm --force-rm --file BUILD_PATH/DOCKERFILE -- BUILD_PATH

Depending on the configuration, the following flags can be added:

… --pull --no-cache --tag TAG_FROM_ENV --label LABEL_FROM_ENV --build-arg BUILD_* …

As --quiet would prevent build logs and no consistent tagging is enforced, the build output needs to be parsed for the image id. If enabled, the just built image is used to run a new container:

docker run --rm --pull never -v BUILD_PATH:/build:rw -v DROP_PATH:/drop:rw --name PROJECT_NAME-COMMIT -- IMAGE_ID

These flags might also be present when configured respectively:

… --read-only -v /var/run/docker.sock:/var/run/docker.sock --label LABEL_FROM_ENV --tmpfs /tmp:size=TMP_SIZE%,mode=1777,rw,relatime --env BUILD_* …

Rsync

If a rsync destination path is configured, the checkout directory can be mirrored to a local path or remote destination:

rsync --verbose --recursive --links --exclude .git -- BUILD_PATH/ RSYNC_DESTINATION

If there is a run stage, the DROP_PATH is used as the source directory instead. Corresponding to other flags possibly set, several additional arguments might be present:

… --checksum --owner --group --perms --times --delete-after --delete-excluded --include-from BUILD_PATH/RSYNC_INCLUDE …

If the clone, prepare_callback, build, run, rsync, and success_callback commands succeed, the pipeline is considered successful overall.