Wednesday, December 21, 2011

Techniques and lessons for improvement of deployment processes

So far, all my software deployment related blog posts were mostly about techniques implemented in Nix, NixOS and Disnix and some general deployment information.
In my career as a Masters student and PhD student, I have written quite a number of Nix expressions for many packages and I have also "Nixified" several large code bases of commercial companies to make it possible to use the distinct features of Nix to make software deployment processes reproducible and reliable.

However, for many systems it turns out that implementing Nix support (for instance, to allow it to be deployed by our Nix-based deployment tools) is more difficult, tedious and laborious than expected. In this blog post, I'm going to describe a number of interesting techniques and lessons that could improve the deployment process of a system, based on my experience with Nix and Disnix. Second, by taking these lessons into account, it also becomes easier to adopt Nix as underlying deployment system.

  1. Deal with the deployment complexity from the beginning in your development process. Although this lesson isn't technical and may sound very obvious to you, I can ensure you that if you have a codebase which grows rapidly, without properly thinking about deployment, it becomes a big burden to implement the techniques described in this blog post later in the development process. In some cases, you may even have to re-engineer specific parts of your system, which can be very expensive to do, especially if your codebase consists of millions of lines of code.

  2. Make your deployment process scriptable. This lesson is also an obvious one, especially if you have adopted an agile software development process, in which you want to create value as soon as possible (without a deployed system there is no value).

    Although it's obvious, I have seen many cases where developers still build and package components of a software system by manually performing build and package tasks in the IDE and by doing all the installation steps by hand (with some deployment documentation). Such deployment processes typically take a lot of time and are subject to errors (because they are performed manually and humans make mistakes).

    Quite often these developers, who perform these manual/ad-hoc deployment processes, tell me that it's much too complicated to learn how a build system works (such as GNU Make or Apache Ant) and to maintain these build specifications.

    The only thing I can recommend to them is that it really is worth the effort, because it significantly reduces the time to make a new release of the product and also allows you test your software sooner and more frequently.

  3. Decompose your system and their build processes. Although I have seen many software projects using some kind of automation, I have also encountered a lot of build tools which treat the the complete system as a single monolithic blob, which must be built and deployed as a whole.

    In my opinion it's better to decompose your system in parts which can be built separately, for the following reasons:

    • It may significantly reduce build times, because only components that have been changed have to be rebuilt. There is no need to reexamine the complete codebase for each change.
    • It increases flexibility, because it becomes easier to replace specific parts of a system with other variants.
    • It allows a better means of sharing components across systems and better integration in different environments. As I have stressed out in an earlier blog post, software nowadays is rarely self-contained and run in many types of environments.
    • It allows you to perform builds faster, because they can be performed in parallel.

    In order to decompose a system properly, you have to think early about a good architecture of your system and keep reflecting about it. As a general rule, the build process of a sub component should only depend on the source code, build script and the build results of the dependencies. It should not be necessary to have cross references between source files among components.

    Typically, a bad architecture for a system also implies a very complex and inefficient build process.

    I have encountered some extreme cases, in which the source code system is one big directory of files, containing hundreds of dependencies including third-party libraries in binary form, infrastructure components (such as the database and web servers) containing many tweaks and custom configuration files.

    Although such code bases can be automatically deployed, it offers almost no flexibility, because it is a big burden to replace a specific part (such as a third party library) and still ensure that the system works as expected. Also it requires the system to be deployed in a clean environment because it offers no integration. Typically these kind of systems aren't updated that frequently either.

  4. Make build-time and run-time properties of components configurable. Another obvious lession, but I quite frequently encounter build processes which make implicit assumptions about the locations where specific dependencies can be found.

    For example, in some Linux packages, I have seen a number of Makefiles which have hardcoded references to the /usr/bin directory to find specific build tools, which I have to replace myself. Although most Linux systems have these build-time dependencies stored in this folder, there are some exceptions such as NixOS and GoboLinux.

    In my opinion it's better to make all locations and settings configurable, either by allowing it to be specified as a build parameter, environment variable (e.g. PATH) or stored in a configuration file, which can be easily edited.

  5. Isolate your build artifacts. I have noticed that some development environments store the output of a build in separate folders (for example Eclipse Java projects) whereas others store all the build artifacts of all projects in a single folder (such as Visual Studio C# projects). Similarly, with end user installations in production environments, libraries are also stored in global directories, such as /usr/lib on Linux environments, or C:\Windows\System32 on Windows.

    Based on my experience with Nix, I think it's better to store the build output of every component in isolation, by storing them in separate folders. Although this makes it harder to find components, it also offers some huge benefits, such as the ability to store multiple versions next to each other. Also upgrades can be made more reliable, because you can install another version next to an existing one. Furthermore, builds can also be made more pure and reproducible, because it's required to provide the locations of these components explicitly. In global folders such as /usr/lib a dependency may still be implicitly found even if it's not specified, which cannot happen if all the dependencies are stored in separate locations. More information about isolation can be found in my blog post about NixOS and the FHS.

  6. Think about a component naming convention. Apart from storing components in separate directories which offers some benefits, it's also important to think about how to name them. Typically, the most common naming scheme that's being used consists of a name and version number. This naming scheme can be insufficient in some cases, for example:

    • It does not take the architecture of the component into account. In a name-version naming scheme a 32-bit variant and a 64-bit variant are considered the same, while they may not be compatible with another component.
    • It does not take the optional features into account. For example, another component may require certain optional features enabled in order to work. It may be possible that a program incorrectly uses a component which does not support a required optional feature.

    I recommend you to also take relevant build-time properties into account when naming a component. In Nix you get such a naming convention for free, although the naming convention is also very strict. Nix hashes all the build-time dependencies, such as the source code, libraries, compilers and build scripts and makes this hash code part of the component name, for example: /nix/store/r8vvq9kq18pz08v24918my6r9vs7s0n3-firefox-8.0.1. Because of this naming scheme, different variants of a component, which are (for example) compiled with another version of a compiler of for a different system architecture do not share the same name.

    In some cases, it may not be necessary to adopt such a strict naming scheme, because the underlying component technology used is robust enough to offer enough flexibility. In one of my case studies covering Visual Studio projects, we only distinguish between variants of component by taking the name, version, interface and architectures into account. because they have no optional features and we assume the .NET framework is always compatible. We organize the filesystem in such a way that 32-bit and 64-bit variants are stored in separate directories. Although a naming scheme like this is less powerful than Nix's, it already offers huge benefits compared to traditional deployment approaches.

    Another example of a more strictly named component store is the .NET Global Assembly Cache (GAC), which makes a distinction between components based on the name, version, culture and a cryptographic key. However, the GAC is only suitable for .NET library assemblies and not for other types of components and cannot take other build properties into account.

  7. Think about component composition. A system must be able to find its dependencies at build-time and run-time. If the components are stored in a separate folders, you need to take extra effort in order to compose them. Some methods of addressing dependencies are:

    • Modify the environment. Most platforms use environment variables to address components, such as PATH for looking up executables, CLASSPATH for looking up Java class files, PERL5LIB for addressing Perl libraries. In order to perform a build these environment variables must be adapted to contain all the dependencies. In order to run the executable, the executable needs to be wrapped in a process, which sets these environment variables. Essentially, this method binds dependencies statically to another component.
    • Composing symlink trees. You can also provide a means of looking up components from a single location by composing the contents of the required dependencies in a symlink tree and by referring to its contents. In conjunction with a chroot environment, and a bind mount of the component directory, you can make builds pure. Because of this dynamic binding approach, you can replace dependencies, without a rebuild/reconfiguration, but you cannot easily run one executable that uses a particular variant of a library, while another runs another variant. This method of addressing components is used by GoboLinux.
    • Copy everything into a single folder. This is a solution which always work, however it is also a very inefficient as you cannot share components.

    Each composition approach has its own pros and cons. The Nix package manager supports all three approaches, however the first approach is mostly used for builds as its the most reliable one, because static binding always ensures that the dependencies are present and correct and that you can safely run multiple variants of compositions simultaneously. The second approach is used by Nix for creating Nix profiles, which end users can use to conveniently access their installed programs.

  8. Granularity of dependencies. This lesson is very important while deploying service-oriented systems in a distributed environment. If service components have dependencies on components with a large level of granularity, upgrades may become very expensive because some components are unnecessarily reconfigured/rebuilt.

    I have encountered a case in which a single configuration file was used to specify the locations of all service components of a system. When the location of a particular service component changes, every service component had to be reconfigured (even services that did not need access to that particular service component) which is very expensive.

    It's better to design these configuration files in such a way that they only contain properties that a service components need to know.


Conclusion


In this blog post I have listed some interesting lessons to improve the deployment process of complex systems based on my experience with Nix and related tools. These can be implemented in various deployment processes and tools. Furthermore, it becomes easier to adopt Nix related tooling by taking these lessons into account.

I also have to remark that it is not always obvious to implement all these lessons. For example, it is hard to integrate Visual Studio C# projects in a stricter deployment model supporting isolation, because the .NET runtime has no convenient way to address run-time dependencies in arbitrary locations. However, there is a solution for this particular problem, which I have described in an earlier blog post about .NET deployment with Nix.

UPDATE: I have given a presentation about this subject at Philips recently. As usual, the slides can be obtained from the talks page of my homepage.

No comments:

Post a Comment