============================
Design of The Invirtibuilder
============================

Introduction
============

The Invirtibuilder is an automated Debian package builder, APT
repository manager, and Git repository hosting tool. It is intended
for projects that consist of a series of Debian packages each tracked
as a separate Git repository, and designed to keep the Git and APT
repositories in sync with each other. The Invirtibuilder supports
having multiple threads, or "pockets" of development, and can enforce
different access control and repository consistency rules for each
pocket.

Background and Goals
====================

The Invirtibuilder was originally developed for Invirt_, a project of
the MIT SIPB_. When we went to develop a tool for managing our APT and
Git repositories, we had several goals, each of which informed the
design of the Invirtibuilder:

* One Git repository per Debian package.

  Because of how Git tracks history, it's better suited for tracking a
  series of small repositories, as opposed to one large one
  [#]_. Furthermore, most preexisting tools and techniques for dealing
  with Debian packages in Git repositories (such as git-buildpackage_
  or `VCS location information`_) are designed exclusively for this
  case.

* Synchronization between Git and APT repositories.

  In our previous development models, we would frequently merge
  development into trunk without necessarily being ready to deploy it
  to our APT repository (and by extension, our servers) yet. However,
  once the changes had been merged in, it was no longer possible to
  see the current state of the APT repository purely from inspection
  of the source control repository.

* Support for multiple *pockets* of development.

  For the Invirt_ project, we maintain separate production and
  development environments. Initially, they each shared the same APT
  repository. To test changes, we had to install them into the APT
  repository and install the update on our development cluster, and
  simply wait to take the update on our production cluster until
  testing was completed. When designing the Invirtibuilder, we wanted
  the set of packages available to our development cluster to be
  separate from the packages in the production cluster.

* Different ACLs for different pockets.

  Access to our development cluster is relatively unrestricted—we
  freely grant access to interested developers to encourage
  contributions to the project. Our production cluster, on the other
  hand, has a much higher standard of security, and access is limited
  to the core maintainers of the service. The Invirtibuilder needed to
  support that separation of privilege.

* Tool-enforced version number restrictions.

  Keeping our packages in APT repositories adds a few restrictions to
  the version numbers of packages. First, version numbers in the APT
  repository must be unique. That is, you can not have two different
  packages of the same name and version number. Second, version
  numbers are expected to be monotonically increasing. If a newer
  version of a package had a lower version number than the older
  version, dpkg would consider this a downgrade. Downgrades are not
  supported by dpkg, and will not even be attempted by APT.

  In order to avoid proliferation of version numbers used only for
  testing purposes, we opted to bend the latter rule for our
  development pocket.

* Tool-enforced consistent history.

  In order for the Git history to be meaningful, we chose to require
  that each version of a package that is uploaded into the APT
  repository be a fast-forward of the previous version.

  Again, to simplify and encourage testing, we bend this rule for the
  development pocket as well.

Design
======

Configuration
-------------

For the Invirt_ project's use of the Invirtibuilder, we adapted our
existing configuration mechanism. Our configuration file consists of a
single YAML_ file. Here is the snippet of configuration we use for our
build configuration::

 build:
  pockets:
   prod:
    acl: system:xvm-root
    apt: stable
   dev:
    acl: system:xvm-dev
    apt: unstable
    allow_backtracking: yes
  tagger:
   name: Invirt Build Server
   email: invirt@mit.edu

The Invirtibuilder allows naming Invirtibuilder pockets separately
form their corresponding Git branches or APT components. However, if
either the ``git`` or ``apt`` properties of the pocket are
unspecified, they are assumed to be the same as the name of the
pocket.

The ``acl`` attributes for each pocket are interpreted within our
authorization modules to determine who is allowed to request builds on
a given pocket. ``system:xvm-root`` and ``system:xvm-dev`` are the
names of AFS groups, which we use for authorization.

The ``tagger`` attribute indicates the name and e-mail address to be
used whenever the Invirtibuilder generates new Git repository objects,
such as commits or tags.

Finally, it was mentioned in `Background and Goals`_ that we wanted
the ability to not force version number consistency or Git
fast-forwards for our development pocket. The ``allow_backtracking``
attribute was introduced to indicate that preference. When it is set
to ``yes`` (i.e. YAML's "true" value), then neither fast-forwards nor
increasing-version-numbers are enforced when validating builds. The
attribute is assumed to be false if undefined.

Git Repositories
----------------

In order to make it easy to check out all packages at once, and for
version controlling the state of the APT repository, we create a
"superproject" using Git submodules [#]_.

There is one Git branch in the superproject corresponding to each
pocket of development. Each branch contains a submodule for each
package in the corresponding component of the APT repository, and the
submodule commit referred to by the head of the Git branch matches the
revision of the package currently in the corresponding component of
the APT repository. Thus, the heads of the Git superproject match the
state of the components in the APT repository.

Each of the submodules also has a branch for each pocket. The head of
that branch points to the revision of the package that is currently in
the corresponding component of the APT repository. This provides a
convenient branching point for new development. Additionally, there is
a Git tag for every version of the package that has ever been uploaded
to the APT repository.

Because the Invirtibuilder and its associated infrastructure are
responsible for keeping the superproject in sync with the state of the
APT repository, an update hook disallows all pushes to the
superproject.

Pushes to the submodules, on the other hand, are almost entirely
unrestricted. Like with the superproject, the Git branches for each
pocket and Git tags are maintained by the build infrastructure, so
pushes to them are disallowed. Outside of that, we make no
restrictions on the creation or deletion of branches, nor are pushes
required to be fast-forwards.

The Build Queue
---------------

We considered several ways to trigger builds of new package versions
using Git directly. However, we realized that what we actually wanted
was a separate build queue where each build request was handled and
processed independently of any requests before or after it. It's not
possible to have these semantics using Git as a signaling mechanism
without breaking standard assumptions about how remote Git
repositories work.

In order to trigger builds, then, we needed a side-channel. Since it
was already widely used in the Invirt_ project, we chose to use
remctl_, a GSSAPI-authenticated RPC protocol with per-command ACLs.

To trigger a new build, a developer calls remctl against the build
server with a pocket, a package, and a commit ID from that package's
Git repository. The remctl daemon then calls a script which validates
the build and adds it to the build queue. Because of the structure of
remctl's ACLs, we are able to have different ACLs depending on which
pocket the build is destined for. This allows us to fulfill our design
goal of having different ACLs for different pockets.

For simplicity, the queue itself is maintained as a directory of
files, where each file is a queue entry. To maintain order in the
queue, the file names for queue entries are of the form
``YYYYMMDDHHMMSS_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX``, where ``X``
indicates a random hexadecimal digit. Each file contains the
parameters passed in over remctl (pocket, package, and commit ID to
build), as well as the Kerberos principal of the user that requested
the build, for logging.

The Build Daemon
----------------

To actually execute builds, we run a separate daemon to monitor for
new build requests in the build queue. The daemon uses inotify so that
it's triggered whenever a new item is added to the build
queue. Whenever an item in the build queue triggers the build daemon,
the daemon first validates the build, then executes the build, and
finally updates both the APT repository and Git superproject with the
results of the build. The results of all attempted builds are recorded
in a database table for future reference.

Build Validation
````````````````

The first stage of processing a new build request is validating the
build. First, the build daemon checks the version number of the
requested package in each pocket of the repository. If the package is
present in any other pocket with the same version number, but the Git
commit for the package is different, the build errors out, because it
is not possible for an APT repository to contain two different
packages with the same name and version number.

Next, the build daemon checks to make sure that the version number of
the new package is a higher version number than the version currently
in the APT repository, as version numbers must be monotonically
increasing.

Finally, we require new packages to be fast-forwards in Git of the
previous version of the package. This is verified as well.

As mentioned above, the ``allow_backtracking`` attribute can be set
for a pocket to bypass the latter two checks in development
environments.

When the same package with the same version is inserted into multiple
places in the same APT repository, the MD5 hash of the package is used
to validate that it hasn't changed. Because rebuilding the same
package causes the MD5 hash to change, when a version of a package
identical to a version already in the APT repository is added to
another pocket, we need to copy it directly. Since the validation
stage already has all of the necessary information to detect this
case, if the same version of a package is already present in another
pocket, the validation stage returns this information.

Build Execution
```````````````

Once the build has been validated, it can be executed. The requested
version of the package is exported from Git, and then a Debian source
package is generated. Next, the package itself is built using sbuild.

sbuild creates an ephemeral build chroot for each build that has only
essential build packages and the build dependencies for the package
being built installed. We use sbuild for building packages for several
reasons. First, it helps us verify that all necessary build
dependencies have been included in our packages. Second, it helps us
ensure that configuration files haven't been modified from their
upstream defaults (which could cause problems for packages using
config-package-dev_).

The build daemon keeps the build logs from all attempted builds on the
filesystem for later inspection.

Repository Updates
``````````````````

Once the build has been successfully completed, the APT and Git
repositories are updated to match the new state. First, a new tag is
added to the package's Git repository for the current version
[#]_. Next, the pocket tracking branch in the submodule is also
updated with the new version of the package. Then the a new commit is
created on the superproject which updates the package's submodule to
point to the new version of the package. Finally, the new version of
the package is included in the appropriate component of the APT
repository.

Because the Git superproject, the Git submodules, and the APT
repository are all updated simultaneously to reflect the new package
version, the Git repositories and the APT repository always stay in
sync.

Build Failures
``````````````

If any of the above stages of executing a build fail, that failure is
trapped and recorded for later inspection, and recorded along with the
build record in the database. Regardless of success or failure, the
build daemon runs any scripts in a hook directory. The hook directory
could contain scripts to publish the results of the build in whatever
way is deemed useful by the developers.

Security
========

As noted above, our intent was for a single instance of the
Invirtibuilder to be used for both our trusted production environment
and our untrusted development environment. In order to be trusted for
the production environment, the Invirtibuilder needs to run in the
production environment as well. However, it would be disastrous if
access to the development environment allowed a developer to insert
malicious packages into the production apt repository.

In terms of policy, we enforce this distinction using the remctl ACL
mechanism described in `The Build Queue`_. But is that mechanism on
its own actually secure?

Only mostly, it turns out.

While actual package builds run unprivileged (with the help of the
fakeroot_ tool), packages can declare arbitrary build dependencies
that must be installed for the package build to run. Packages'
maintainer scripts (post-install, pre-install, pre-removal, and
post-removal scripts) run as root. This means that by uploading a
malicious package that another package build-depends on, then
triggering a build of the second package, it is possible to gain root
privileges. Since breaking out of the build chroot as root is trivial
[#], it is theoretically possible for developers with any level of
access to the APT repositories to root the build server.

One minor protection from this problem is the Invirtibuilder's
reporting mechanism. A single independent malicious build can't
compromise the build server on its own. Even if a second build
compromises the build server, the first build will have already been
reported through the hook mechanism described in `Build Failures`_. We
encourage users of the Invirtibuilder to include hooks that send
notifications of builds over e-mail or some other mechanism such that
there are off-site records. The server will still be compromised, but
there will be an audit trail.

Such a vulnerability will always be a concern so long as builds are
isolated using chroots. It is possible to protect against this sort of
attack by strengthening the chroot mechanism (e.g. with grsecurity_)
or by using a more isolated build mechanism
(e.g. qemubuilder_). However, we decided that the security risk didn't
justify the additional implementation effort or runtime overhead.

Future Directions
=================

While the Invirtibuilder was written as a tool for the Invirt_
project, taking advantage of infrastructure specific to Invirt, it was
designed with the hope that it could one day be expanded to be useful
outside of our infrastructure. Here we outline what we believe the
next steps for development of the Invirtibuilder are.

One deficiency that affects Invirt_ development already is the
assumption that all packages are Debian-native [#]. Even for packages
which have a non-native version number, the Invirtibuilder will create
a Debian-native source package when the package is exported from Git
as part of the `Build Execution`_. Correcting this requires a means to
find and extract the upstream tarball from the Git repository. This
could probably be done by involving the pristine-tar_ tool.

The Invirtibuilder is currently tied to the configuration framework
developed for the Invirt_ project. To be useful outside of Invirt, the
Invirtibuilder needs its own, separate mechanism for providing and
parsing configuration. It should not be difficult to use a separate
configuration file but a similar YAML configuration mechanism for the
Invirtibuilder. And of course, as part of that process, filesystem
paths and the like that are currently hard-coded should be replaced
with configuration options.

The Invirtibuilder additionally relies on the authentication and
authorization mechanisms used for Invirt_. Our RPC protocol of choice,
remctl_, requires a functional Kerberos environment for
authentication, limiting its usefulness for one-off projects not
associated with an already existing Kerberos realm. We would like to
provide support for some alternative RPC mechanism—possibly
ssh. Additionally, there needs to be some way to expand the build ACLs
for each pocket that isn't tied to Invirt's authorization
framework. One option would be providing an executable in the
configuration that, when passed a pocket as a command-line argument,
prints out all of the principals that should have access to that
pocket.

.. _config-package-dev: http://debathena.mit.edu/config-packages
.. _fakeroot: http://fakeroot.alioth.debian.org/
.. _git-buildpackage: https://honk.sigxcpu.org/piki/projects/git-buildpackage/
.. _grsecurity: http://www.grsecurity.net/
.. _Invirt: http://invirt.mit.edu
.. _pristine-tar: http://joey.kitenet.net/code/pristine-tar/
.. _qemubuilder: http://wiki.debian.org/qemubuilder
.. _remctl: http://www.eyrie.org/~eagle/software/remctl/
.. _SIPB: http://sipb.mit.edu
.. _VCS location information: http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-vcs
.. _YAML: http://yaml.org/

.. [#] http://lwn.net/Articles/246381/
.. [#] A Git submodule is a second Git repository embedded at a
       particular path within the superproject and fixed at a
       particular commit.
.. [#] Because we don't force any sort of version consistency for
       pockets with ``allow_backtracking`` set to ``True``, we don't
       create new tags for builds on pockets with
       ``allow_backtracking`` set to ``True`` either.
.. [#] http://kerneltrap.org/Linux/Abusing_chroot
.. [#] http://people.debian.org/~mpalmer/debian-mentors_FAQ.html#native_vs_non_native