From: Evan Broder Date: Sun, 3 Jan 2010 20:28:15 +0000 (-0500) Subject: Add documentation on the Invirtibuilder. X-Git-Tag: 0.1.5~26 X-Git-Url: http://xvm.mit.edu/gitweb/invirt/packages/invirt-dev.git/commitdiff_plain/ab69fa47de5126ea8147c31fd98d995672661131 Add documentation on the Invirtibuilder. svn path=/trunk/packages/invirt-dev/; revision=2858 --- diff --git a/README.invirtibuilder b/README.invirtibuilder new file mode 100644 index 0000000..d364399 --- /dev/null +++ b/README.invirtibuilder @@ -0,0 +1,307 @@ +============================ +Design of The Invirtibuilder +============================ + +Introduction +============ + +The Invirtibuilder is an automated Debian package builder, APT +repository manager, and Git repository hosting tool. It is intended +for projects that consist of a series of Debian packages each tracked +as a separate Git repository, and designed to keep the Git and APT +repositories in sync with each other. The Invirtibuilder supports +having multiple threads, or "pockets" of development, and can enforce +different access control and repository consistency rules for each +pocket. + +Background and Goals +==================== + +The Invirtibuilder was originally developed for Invirt_, a project of +the MIT SIPB_. When we went to develop a tool for managing our APT and +Git repositories, we had several goals, each of which informed the +design of the Invirtibuilder: + +* One Git repository per Debian package. + + Because of how Git tracks history, it's better suited for tracking a + series of small repositories, as opposed to one large one + [#]_. Furthermore, most pre-existing tools and techniques for + dealing with Debian packages in Git repositories (such as + git-buildpackage_ or `VCS location information`_) are designed + exclusively for this case. + +* Synchronization between Git and APT repositories. + + In our previous development models, we would frequently merge + development into trunk without necessarily being ready to deploy it + to our APT repository (and by extension, our servers) yet. However, + once the changes had been merged in, it was no longer possible to + see the current state of the APT repository purely from inspection + of the source control repository. + +* Support for multiple *pockets* of development. + + For the Invirt_ project, we maintain separate production and + development environments. Initially, they each shared the same APT + repository. To test changes, we had to install them into the APT + repository and install the update on our development cluster, and + simply wait to take the update on our production cluster until + testing was completed. When designing the Invirtibuilder, we wanted + the set of packages available to our development cluster to be + separate from the packages in the production cluster. + +* Different ACLs for different pockets. + + Access to our development cluster is relatively unrestricted—we + freely grant access to interested developers to encourage + contributions to the project. Our production cluster, on the other + hand, has a much higher standard of security, and access is limited + to the core maintainers of the service. The Invirtibuilder needed to + support that separation of privilege. + +* Tool-enforced version number restrictions. + + Keeping our packages in APT repositories adds a few restrictions to + the version numbers of packages. First, version numbers in the APT + repository must be unique. That is, you can not have two different + packages of the same name and version number. Second, version + numbers are expected to be monotonically increasing. If a newer + version of a package had a lower version number than the older + version, dpkg would consider this a downgrade. Downgrades are not + supported by dpkg, and will not even be attempted by APT. + + In order to avoid proliferation of version numbers used only for + testing purposes, we opted to bend the latter rule for our + development pocket. + +* Tool-enforced consistent history. + + In order for the Git history to be meaningful, we chose to require + that each version of a package that is uploaded into the APT + repository be a fast-forward of the previous version. + + Again, to simplify and encourage testing, we bend this rule for the + development pocket as well. + +Design +====== + +Configuration +------------- + +For the Invirt_ project's use of the Invirtibuilder, we adapted our +existing configuration mechanism. Our configuration file consists of a +singls YAML_ file. Here is the snippet of configuration we use for our +build configuration:: + + build: + pockets: + prod: + acl: system:xvm-root + apt: stable + dev: + acl: system:xvm-dev + apt: unstable + allow_backtracking: yes + tagger: + name: Invirt Build Server + email: invirt@mit.edu + +The Invirtibuilder allows naming Invirtibuilder pockets separately +form their corresponding Git branches or APT components. However, if +either the ``git`` or ``apt`` properties of the pocket are +unspecified, they are assumed to be the same as the name of the +pocket. + +The ``acl`` attributes for each pocket are interpreted within our +authorization modules to determine who is allowed to request builds on +a given pocket. ``system:xvm-root`` and ``system:xvm-dev`` are the +names of AFS groups, which we use for authorization. + +The ``tagger`` attribute indicates the name and e-mail address to be +used whenever the Invirtibuilder generates new Git repository objects, +such as commits or tags. + +Finally, it was mentioned in `Background and Goals`_ that we wanted +the ability to not force version number consistency or Git +fast-forwards for our development pocket. The ``allow_backtracking`` +attribute was introduced to indicate that preference. When it is set +to ``yes`` (i.e. YAML's "true" value), then neither fast-forwards nor +increasing-version-numbers are enforced when validating builds. The +attribute is assumed to be false if undefined. + +Git Repositories +---------------- + +In order to make it easy to check out all packages at once, and for +version controlling the state of the APT repository, we create a +"superproject" using Git submodules [#]_. + +There is one Git branch in the superproject corresponding to each +pocket of development. Each branch contains a submodule for each +package in the corresponding component of the APT repository, and the +submodule commit referred to by the head of the Git branch matches the +revision of the package currently in the corresponding component of +the APT repository. Thus, the heads of the Git superproject match the +state of the components in the APT repository. + +Each of the submodules also has a branch for each pocket. The head of +that branch points to the revision of the package that is currently in +the corresponding component of the APT repository. This provides a +convenient branching point for new development. Additionally, there is +a Git tag for every version of the package that has ever been uploaded +to the APT repository. + +Because the Invirtibuilder and its associated infrastructure are +responsible for keeping the superproject in sync with the state of the +APT repository, an update hook disallows all pushes to the +superproject. + +Pushes to the submodules, on the other hand, are almost entirely +unrestricted. Like with the superproject, the Git branches for each +pocket and Git tags are maintained by the build infrastructure, so +pushes to them are disallowed. Outside of that, we make no +restrictions on the creation or deletion of branches, nor are pushes +required to be fast-forwards. + +The Build Queue +--------------- + +We considered several ways to trigger builds of new package versions +using Git directly. However, we realized that what we actually wanted +was a separate build queue where each build request was handled and +processed independently of any requests before or after it. It's not +possible to have these semantics using Git as a signalling mechanism +without breaking standard assumptions about how remote Git +repositories work. + +In order to trigger builds, then, we needed a side-channel. Since it +was already widely used in the Invirt_ project, we chose to use +remctl_, a GSSAPI-authenticated RPC protocol with per-command ACLs. + +To trigger a new build, a developer calls remctl against the build +server with a pocket, a package, and a commit ID from that package's +Git repository. The remctl daemon then calls a script which validates +the build and adds it to the build queue. Because of the structure of +remctl's ACLs, we are able to have different ACLs depending on which +pocket the build is destined for. This allows us to fulfil our design +goal of having different ACLs for different pockets. + +For simplicity, the queue itself is maintained as a directory of +files, where each file is a queue entry. To maintain order in the +queue, the file names for queue entries are of the form +``YYYYMMDDHHMMSS_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX``, where ``X`` +indicates a random hexadecimal digit. Each file contains the +parameters passed in over remctl (pocket, package, and commit ID to +build), as well as the Kerberos principal of the user that requested +the build, for logging. + +The Build Daemon +---------------- + +To actually execute builds, we run a separate daemon to monitor for +new build requests in the build queue. The daemon uses inotify so that +it's triggered whenever a new item is added to the build +queue. Whenever an item in the build queue triggers the build daemon, +the daemon first validates the build, then executes the build, and +finally updates both the APT repository and Git superproject with the +results of the build. The results of all attempted builds are recorded +in a database table for future reference. + +Build Validation +```````````````` + +The first stage of processing a new build request is validating the +build. First, the build daemon checks the version number of the +requested package in each pocket of the repository. If the package is +present in any other pocket with the same version number, but the Git +commit for the package is different, the build errors out, because it +is not possible for an APT repository to contain two different +packages with the same name and version number. + +Next, the build daemon checks to make sure that the version number of +the new package is a higher version number than the version currently +in the APT repository, as version numbers must be monotonically +increasing. + +Finally, we require new packages to be fast-forwards in Git of the +previous version of the package. This is verified as well. + +As mentioned above, the ``allow_backtracking`` attribute can be set +for a pocket to bypass the latter two checks in development +environments. + +When the same package with the same version is inserted into multiple +places in the same APT repository, the MD5 hash of the package is used +to validate that it hasn't changed. Because rebuilding the same +package causes the MD5 hash to change, when a version of a package +identical to a version already in the APT repository is added to +another pocket, we need to copy it directly. Since the validation +stage already has all of the necessary information to detect this +case, if the same version of a package is already present in another +pocket, the validation stage returns this information. + +Build Execution +``````````````` + +Once the build has been validated, it can be executed. The requested +version of the package is exported from Git, and then a Debian source +package is generated. Next, the package itself is built using sbuild. + +sbuild creates an ephemeral build chroot for each build that has only +essential build packages and the build dependencies for the package +being built installed. We use sbuild for building packages for several +reasons. First, it helps us verify that all necessary build +dependencies have been included in our packages. Second, it helps us +ensure that configuration files haven't been modified from their +upstream defaults (which could cause problems for packages using +config-package-dev_). + +The build daemon keeps the build logs from all attempted builds on the +filesystem for later inspection. + +Repository Updates +`````````````````` + +Once the build has been successfully completed, the APT and Git +repositories are updated to match the new state. First, a new tag is +added to the package's Git repository for the current version +[#]_. Next, the pocket tracking branch in the submodule is also +updated with the new version of the package. Then the a new commit is +created on the superproject which updates the package's submodule to +point to the new version of the package. Finally, the new version of +the package is included in the appropriate component of the APT +repository. + +Because the Git superproject, the Git submodules, and the APT +repository are all updated simultaneously to reflect the new package +version, the Git repositories and the APT repository always stay in +sync. + +Build Failures +`````````````` + +If any of the above stages of executing a build fail, that failure is +trapped and recorded for later inspection, and recorded along with the +build record in the database. Regardless of success or failure, the +build daemon runs any scripts in a hook directory. The hook directory +could contain scripts to publish the results of the build in whatever +way is deemed useful by the developers. + +.. _config-package-dev: http://debathena.mit.edu/config-packages +.. _git-buildpackage: https://honk.sigxcpu.org/piki/projects/git-buildpackage/ +.. _Invirt: http://invirt.mit.edu +.. _remctl: http://www.eyrie.org/~eagle/software/remctl/ +.. _SIPB: http://sipb.mit.edu +.. _VCS location information: http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-vcs +.. _YAML: http://yaml.org/ + +.. [#] http://lwn.net/Articles/246381/ +.. [#] A Git submodule is a second Git repository embedded at a + particular path within the superproject and fixed at a + particular commit. +.. [#] Because we don't force any sort of version consistency for + pockets with ``allow_backtracking`` set to ``True``, we don't + create new tags for builds on pockets with + ``allow_backtracking`` set to ``True`` either.