============================ Design of The Invirtibuilder ============================ Introduction ============ The Invirtibuilder is an automated Debian package builder, APT repository manager, and Git repository hosting tool. It is intended for projects that consist of a series of Debian packages each tracked as a separate Git repository, and designed to keep the Git and APT repositories in sync with each other. The Invirtibuilder supports having multiple threads, or "pockets" of development, and can enforce different access control and repository consistency rules for each pocket. Background and Goals ==================== The Invirtibuilder was originally developed for Invirt_, a project of the MIT SIPB_. When we went to develop a tool for managing our APT and Git repositories, we had several goals, each of which informed the design of the Invirtibuilder: * One Git repository per Debian package. Because of how Git tracks history, it's better suited for tracking a series of small repositories, as opposed to one large one [#]_. Furthermore, most pre-existing tools and techniques for dealing with Debian packages in Git repositories (such as git-buildpackage_ or `VCS location information`_) are designed exclusively for this case. * Synchronization between Git and APT repositories. In our previous development models, we would frequently merge development into trunk without necessarily being ready to deploy it to our APT repository (and by extension, our servers) yet. However, once the changes had been merged in, it was no longer possible to see the current state of the APT repository purely from inspection of the source control repository. * Support for multiple *pockets* of development. For the Invirt_ project, we maintain separate production and development environments. Initially, they each shared the same APT repository. To test changes, we had to install them into the APT repository and install the update on our development cluster, and simply wait to take the update on our production cluster until testing was completed. When designing the Invirtibuilder, we wanted the set of packages available to our development cluster to be separate from the packages in the production cluster. * Different ACLs for different pockets. Access to our development cluster is relatively unrestricted—we freely grant access to interested developers to encourage contributions to the project. Our production cluster, on the other hand, has a much higher standard of security, and access is limited to the core maintainers of the service. The Invirtibuilder needed to support that separation of privilege. * Tool-enforced version number restrictions. Keeping our packages in APT repositories adds a few restrictions to the version numbers of packages. First, version numbers in the APT repository must be unique. That is, you can not have two different packages of the same name and version number. Second, version numbers are expected to be monotonically increasing. If a newer version of a package had a lower version number than the older version, dpkg would consider this a downgrade. Downgrades are not supported by dpkg, and will not even be attempted by APT. In order to avoid proliferation of version numbers used only for testing purposes, we opted to bend the latter rule for our development pocket. * Tool-enforced consistent history. In order for the Git history to be meaningful, we chose to require that each version of a package that is uploaded into the APT repository be a fast-forward of the previous version. Again, to simplify and encourage testing, we bend this rule for the development pocket as well. Design ====== Configuration ------------- For the Invirt_ project's use of the Invirtibuilder, we adapted our existing configuration mechanism. Our configuration file consists of a singls YAML_ file. Here is the snippet of configuration we use for our build configuration:: build: pockets: prod: acl: system:xvm-root apt: stable dev: acl: system:xvm-dev apt: unstable allow_backtracking: yes tagger: name: Invirt Build Server email: invirt@mit.edu The Invirtibuilder allows naming Invirtibuilder pockets separately form their corresponding Git branches or APT components. However, if either the ``git`` or ``apt`` properties of the pocket are unspecified, they are assumed to be the same as the name of the pocket. The ``acl`` attributes for each pocket are interpreted within our authorization modules to determine who is allowed to request builds on a given pocket. ``system:xvm-root`` and ``system:xvm-dev`` are the names of AFS groups, which we use for authorization. The ``tagger`` attribute indicates the name and e-mail address to be used whenever the Invirtibuilder generates new Git repository objects, such as commits or tags. Finally, it was mentioned in `Background and Goals`_ that we wanted the ability to not force version number consistency or Git fast-forwards for our development pocket. The ``allow_backtracking`` attribute was introduced to indicate that preference. When it is set to ``yes`` (i.e. YAML's "true" value), then neither fast-forwards nor increasing-version-numbers are enforced when validating builds. The attribute is assumed to be false if undefined. Git Repositories ---------------- In order to make it easy to check out all packages at once, and for version controlling the state of the APT repository, we create a "superproject" using Git submodules [#]_. There is one Git branch in the superproject corresponding to each pocket of development. Each branch contains a submodule for each package in the corresponding component of the APT repository, and the submodule commit referred to by the head of the Git branch matches the revision of the package currently in the corresponding component of the APT repository. Thus, the heads of the Git superproject match the state of the components in the APT repository. Each of the submodules also has a branch for each pocket. The head of that branch points to the revision of the package that is currently in the corresponding component of the APT repository. This provides a convenient branching point for new development. Additionally, there is a Git tag for every version of the package that has ever been uploaded to the APT repository. Because the Invirtibuilder and its associated infrastructure are responsible for keeping the superproject in sync with the state of the APT repository, an update hook disallows all pushes to the superproject. Pushes to the submodules, on the other hand, are almost entirely unrestricted. Like with the superproject, the Git branches for each pocket and Git tags are maintained by the build infrastructure, so pushes to them are disallowed. Outside of that, we make no restrictions on the creation or deletion of branches, nor are pushes required to be fast-forwards. The Build Queue --------------- We considered several ways to trigger builds of new package versions using Git directly. However, we realized that what we actually wanted was a separate build queue where each build request was handled and processed independently of any requests before or after it. It's not possible to have these semantics using Git as a signalling mechanism without breaking standard assumptions about how remote Git repositories work. In order to trigger builds, then, we needed a side-channel. Since it was already widely used in the Invirt_ project, we chose to use remctl_, a GSSAPI-authenticated RPC protocol with per-command ACLs. To trigger a new build, a developer calls remctl against the build server with a pocket, a package, and a commit ID from that package's Git repository. The remctl daemon then calls a script which validates the build and adds it to the build queue. Because of the structure of remctl's ACLs, we are able to have different ACLs depending on which pocket the build is destined for. This allows us to fulfil our design goal of having different ACLs for different pockets. For simplicity, the queue itself is maintained as a directory of files, where each file is a queue entry. To maintain order in the queue, the file names for queue entries are of the form ``YYYYMMDDHHMMSS_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX``, where ``X`` indicates a random hexadecimal digit. Each file contains the parameters passed in over remctl (pocket, package, and commit ID to build), as well as the Kerberos principal of the user that requested the build, for logging. The Build Daemon ---------------- To actually execute builds, we run a separate daemon to monitor for new build requests in the build queue. The daemon uses inotify so that it's triggered whenever a new item is added to the build queue. Whenever an item in the build queue triggers the build daemon, the daemon first validates the build, then executes the build, and finally updates both the APT repository and Git superproject with the results of the build. The results of all attempted builds are recorded in a database table for future reference. Build Validation ```````````````` The first stage of processing a new build request is validating the build. First, the build daemon checks the version number of the requested package in each pocket of the repository. If the package is present in any other pocket with the same version number, but the Git commit for the package is different, the build errors out, because it is not possible for an APT repository to contain two different packages with the same name and version number. Next, the build daemon checks to make sure that the version number of the new package is a higher version number than the version currently in the APT repository, as version numbers must be monotonically increasing. Finally, we require new packages to be fast-forwards in Git of the previous version of the package. This is verified as well. As mentioned above, the ``allow_backtracking`` attribute can be set for a pocket to bypass the latter two checks in development environments. When the same package with the same version is inserted into multiple places in the same APT repository, the MD5 hash of the package is used to validate that it hasn't changed. Because rebuilding the same package causes the MD5 hash to change, when a version of a package identical to a version already in the APT repository is added to another pocket, we need to copy it directly. Since the validation stage already has all of the necessary information to detect this case, if the same version of a package is already present in another pocket, the validation stage returns this information. Build Execution ``````````````` Once the build has been validated, it can be executed. The requested version of the package is exported from Git, and then a Debian source package is generated. Next, the package itself is built using sbuild. sbuild creates an ephemeral build chroot for each build that has only essential build packages and the build dependencies for the package being built installed. We use sbuild for building packages for several reasons. First, it helps us verify that all necessary build dependencies have been included in our packages. Second, it helps us ensure that configuration files haven't been modified from their upstream defaults (which could cause problems for packages using config-package-dev_). The build daemon keeps the build logs from all attempted builds on the filesystem for later inspection. Repository Updates `````````````````` Once the build has been successfully completed, the APT and Git repositories are updated to match the new state. First, a new tag is added to the package's Git repository for the current version [#]_. Next, the pocket tracking branch in the submodule is also updated with the new version of the package. Then the a new commit is created on the superproject which updates the package's submodule to point to the new version of the package. Finally, the new version of the package is included in the appropriate component of the APT repository. Because the Git superproject, the Git submodules, and the APT repository are all updated simultaneously to reflect the new package version, the Git repositories and the APT repository always stay in sync. Build Failures `````````````` If any of the above stages of executing a build fail, that failure is trapped and recorded for later inspection, and recorded along with the build record in the database. Regardless of success or failure, the build daemon runs any scripts in a hook directory. The hook directory could contain scripts to publish the results of the build in whatever way is deemed useful by the developers. .. _config-package-dev: http://debathena.mit.edu/config-packages .. _git-buildpackage: https://honk.sigxcpu.org/piki/projects/git-buildpackage/ .. _Invirt: http://invirt.mit.edu .. _remctl: http://www.eyrie.org/~eagle/software/remctl/ .. _SIPB: http://sipb.mit.edu .. _VCS location information: http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-vcs .. _YAML: http://yaml.org/ .. [#] http://lwn.net/Articles/246381/ .. [#] A Git submodule is a second Git repository embedded at a particular path within the superproject and fixed at a particular commit. .. [#] Because we don't force any sort of version consistency for pockets with ``allow_backtracking`` set to ``True``, we don't create new tags for builds on pockets with ``allow_backtracking`` set to ``True`` either.