--- /dev/null
+============================
+Design of The Invirtibuilder
+============================
+
+Introduction
+============
+
+The Invirtibuilder is an automated Debian package builder, APT
+repository manager, and Git repository hosting tool. It is intended
+for projects that consist of a series of Debian packages each tracked
+as a separate Git repository, and designed to keep the Git and APT
+repositories in sync with each other. The Invirtibuilder supports
+having multiple threads, or "pockets" of development, and can enforce
+different access control and repository consistency rules for each
+pocket.
+
+Background and Goals
+====================
+
+The Invirtibuilder was originally developed for Invirt_, a project of
+the MIT SIPB_. When we went to develop a tool for managing our APT and
+Git repositories, we had several goals, each of which informed the
+design of the Invirtibuilder:
+
+* One Git repository per Debian package.
+
+ Because of how Git tracks history, it's better suited for tracking a
+ series of small repositories, as opposed to one large one
+ [#]_. Furthermore, most pre-existing tools and techniques for
+ dealing with Debian packages in Git repositories (such as
+ git-buildpackage_ or `VCS location information`_) are designed
+ exclusively for this case.
+
+* Synchronization between Git and APT repositories.
+
+ In our previous development models, we would frequently merge
+ development into trunk without necessarily being ready to deploy it
+ to our APT repository (and by extension, our servers) yet. However,
+ once the changes had been merged in, it was no longer possible to
+ see the current state of the APT repository purely from inspection
+ of the source control repository.
+
+* Support for multiple *pockets* of development.
+
+ For the Invirt_ project, we maintain separate production and
+ development environments. Initially, they each shared the same APT
+ repository. To test changes, we had to install them into the APT
+ repository and install the update on our development cluster, and
+ simply wait to take the update on our production cluster until
+ testing was completed. When designing the Invirtibuilder, we wanted
+ the set of packages available to our development cluster to be
+ separate from the packages in the production cluster.
+
+* Different ACLs for different pockets.
+
+ Access to our development cluster is relatively unrestricted—we
+ freely grant access to interested developers to encourage
+ contributions to the project. Our production cluster, on the other
+ hand, has a much higher standard of security, and access is limited
+ to the core maintainers of the service. The Invirtibuilder needed to
+ support that separation of privilege.
+
+* Tool-enforced version number restrictions.
+
+ Keeping our packages in APT repositories adds a few restrictions to
+ the version numbers of packages. First, version numbers in the APT
+ repository must be unique. That is, you can not have two different
+ packages of the same name and version number. Second, version
+ numbers are expected to be monotonically increasing. If a newer
+ version of a package had a lower version number than the older
+ version, dpkg would consider this a downgrade. Downgrades are not
+ supported by dpkg, and will not even be attempted by APT.
+
+ In order to avoid proliferation of version numbers used only for
+ testing purposes, we opted to bend the latter rule for our
+ development pocket.
+
+* Tool-enforced consistent history.
+
+ In order for the Git history to be meaningful, we chose to require
+ that each version of a package that is uploaded into the APT
+ repository be a fast-forward of the previous version.
+
+ Again, to simplify and encourage testing, we bend this rule for the
+ development pocket as well.
+
+Design
+======
+
+Configuration
+-------------
+
+For the Invirt_ project's use of the Invirtibuilder, we adapted our
+existing configuration mechanism. Our configuration file consists of a
+singls YAML_ file. Here is the snippet of configuration we use for our
+build configuration::
+
+ build:
+ pockets:
+ prod:
+ acl: system:xvm-root
+ apt: stable
+ dev:
+ acl: system:xvm-dev
+ apt: unstable
+ allow_backtracking: yes
+ tagger:
+ name: Invirt Build Server
+ email: invirt@mit.edu
+
+The Invirtibuilder allows naming Invirtibuilder pockets separately
+form their corresponding Git branches or APT components. However, if
+either the ``git`` or ``apt`` properties of the pocket are
+unspecified, they are assumed to be the same as the name of the
+pocket.
+
+The ``acl`` attributes for each pocket are interpreted within our
+authorization modules to determine who is allowed to request builds on
+a given pocket. ``system:xvm-root`` and ``system:xvm-dev`` are the
+names of AFS groups, which we use for authorization.
+
+The ``tagger`` attribute indicates the name and e-mail address to be
+used whenever the Invirtibuilder generates new Git repository objects,
+such as commits or tags.
+
+Finally, it was mentioned in `Background and Goals`_ that we wanted
+the ability to not force version number consistency or Git
+fast-forwards for our development pocket. The ``allow_backtracking``
+attribute was introduced to indicate that preference. When it is set
+to ``yes`` (i.e. YAML's "true" value), then neither fast-forwards nor
+increasing-version-numbers are enforced when validating builds. The
+attribute is assumed to be false if undefined.
+
+Git Repositories
+----------------
+
+In order to make it easy to check out all packages at once, and for
+version controlling the state of the APT repository, we create a
+"superproject" using Git submodules [#]_.
+
+There is one Git branch in the superproject corresponding to each
+pocket of development. Each branch contains a submodule for each
+package in the corresponding component of the APT repository, and the
+submodule commit referred to by the head of the Git branch matches the
+revision of the package currently in the corresponding component of
+the APT repository. Thus, the heads of the Git superproject match the
+state of the components in the APT repository.
+
+Each of the submodules also has a branch for each pocket. The head of
+that branch points to the revision of the package that is currently in
+the corresponding component of the APT repository. This provides a
+convenient branching point for new development. Additionally, there is
+a Git tag for every version of the package that has ever been uploaded
+to the APT repository.
+
+Because the Invirtibuilder and its associated infrastructure are
+responsible for keeping the superproject in sync with the state of the
+APT repository, an update hook disallows all pushes to the
+superproject.
+
+Pushes to the submodules, on the other hand, are almost entirely
+unrestricted. Like with the superproject, the Git branches for each
+pocket and Git tags are maintained by the build infrastructure, so
+pushes to them are disallowed. Outside of that, we make no
+restrictions on the creation or deletion of branches, nor are pushes
+required to be fast-forwards.
+
+The Build Queue
+---------------
+
+We considered several ways to trigger builds of new package versions
+using Git directly. However, we realized that what we actually wanted
+was a separate build queue where each build request was handled and
+processed independently of any requests before or after it. It's not
+possible to have these semantics using Git as a signalling mechanism
+without breaking standard assumptions about how remote Git
+repositories work.
+
+In order to trigger builds, then, we needed a side-channel. Since it
+was already widely used in the Invirt_ project, we chose to use
+remctl_, a GSSAPI-authenticated RPC protocol with per-command ACLs.
+
+To trigger a new build, a developer calls remctl against the build
+server with a pocket, a package, and a commit ID from that package's
+Git repository. The remctl daemon then calls a script which validates
+the build and adds it to the build queue. Because of the structure of
+remctl's ACLs, we are able to have different ACLs depending on which
+pocket the build is destined for. This allows us to fulfil our design
+goal of having different ACLs for different pockets.
+
+For simplicity, the queue itself is maintained as a directory of
+files, where each file is a queue entry. To maintain order in the
+queue, the file names for queue entries are of the form
+``YYYYMMDDHHMMSS_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX``, where ``X``
+indicates a random hexadecimal digit. Each file contains the
+parameters passed in over remctl (pocket, package, and commit ID to
+build), as well as the Kerberos principal of the user that requested
+the build, for logging.
+
+The Build Daemon
+----------------
+
+To actually execute builds, we run a separate daemon to monitor for
+new build requests in the build queue. The daemon uses inotify so that
+it's triggered whenever a new item is added to the build
+queue. Whenever an item in the build queue triggers the build daemon,
+the daemon first validates the build, then executes the build, and
+finally updates both the APT repository and Git superproject with the
+results of the build. The results of all attempted builds are recorded
+in a database table for future reference.
+
+Build Validation
+````````````````
+
+The first stage of processing a new build request is validating the
+build. First, the build daemon checks the version number of the
+requested package in each pocket of the repository. If the package is
+present in any other pocket with the same version number, but the Git
+commit for the package is different, the build errors out, because it
+is not possible for an APT repository to contain two different
+packages with the same name and version number.
+
+Next, the build daemon checks to make sure that the version number of
+the new package is a higher version number than the version currently
+in the APT repository, as version numbers must be monotonically
+increasing.
+
+Finally, we require new packages to be fast-forwards in Git of the
+previous version of the package. This is verified as well.
+
+As mentioned above, the ``allow_backtracking`` attribute can be set
+for a pocket to bypass the latter two checks in development
+environments.
+
+When the same package with the same version is inserted into multiple
+places in the same APT repository, the MD5 hash of the package is used
+to validate that it hasn't changed. Because rebuilding the same
+package causes the MD5 hash to change, when a version of a package
+identical to a version already in the APT repository is added to
+another pocket, we need to copy it directly. Since the validation
+stage already has all of the necessary information to detect this
+case, if the same version of a package is already present in another
+pocket, the validation stage returns this information.
+
+Build Execution
+```````````````
+
+Once the build has been validated, it can be executed. The requested
+version of the package is exported from Git, and then a Debian source
+package is generated. Next, the package itself is built using sbuild.
+
+sbuild creates an ephemeral build chroot for each build that has only
+essential build packages and the build dependencies for the package
+being built installed. We use sbuild for building packages for several
+reasons. First, it helps us verify that all necessary build
+dependencies have been included in our packages. Second, it helps us
+ensure that configuration files haven't been modified from their
+upstream defaults (which could cause problems for packages using
+config-package-dev_).
+
+The build daemon keeps the build logs from all attempted builds on the
+filesystem for later inspection.
+
+Repository Updates
+``````````````````
+
+Once the build has been successfully completed, the APT and Git
+repositories are updated to match the new state. First, a new tag is
+added to the package's Git repository for the current version
+[#]_. Next, the pocket tracking branch in the submodule is also
+updated with the new version of the package. Then the a new commit is
+created on the superproject which updates the package's submodule to
+point to the new version of the package. Finally, the new version of
+the package is included in the appropriate component of the APT
+repository.
+
+Because the Git superproject, the Git submodules, and the APT
+repository are all updated simultaneously to reflect the new package
+version, the Git repositories and the APT repository always stay in
+sync.
+
+Build Failures
+``````````````
+
+If any of the above stages of executing a build fail, that failure is
+trapped and recorded for later inspection, and recorded along with the
+build record in the database. Regardless of success or failure, the
+build daemon runs any scripts in a hook directory. The hook directory
+could contain scripts to publish the results of the build in whatever
+way is deemed useful by the developers.
+
+.. _config-package-dev: http://debathena.mit.edu/config-packages
+.. _git-buildpackage: https://honk.sigxcpu.org/piki/projects/git-buildpackage/
+.. _Invirt: http://invirt.mit.edu
+.. _remctl: http://www.eyrie.org/~eagle/software/remctl/
+.. _SIPB: http://sipb.mit.edu
+.. _VCS location information: http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-vcs
+.. _YAML: http://yaml.org/
+
+.. [#] http://lwn.net/Articles/246381/
+.. [#] A Git submodule is a second Git repository embedded at a
+ particular path within the superproject and fixed at a
+ particular commit.
+.. [#] Because we don't force any sort of version consistency for
+ pockets with ``allow_backtracking`` set to ``True``, we don't
+ create new tags for builds on pockets with
+ ``allow_backtracking`` set to ``True`` either.