Even more continuous integration via dependency management tools

January 27th 2015 Ian Buchanan in CI

When centralized version control systems were state-of-the-art, it made sense for agile thought-leaders to promote storing project dependencies in a code repository as pre-requisite to continuous integration. The goal was to version every configuration element, including external libraries, and to make sure every developer can easily obtain everything necessary to build. While those goals remain relevant, it is also important to keep current with downstream changes from third-party libraries. Since the early days of continuous integration, new dependency management tools have become popular to keep up with changes in third-party libraries, making integration even more continuous. If you are still committing libraries to your version control system, it is time to make dependency management tools an integral part of your continuous integration practice.

Integrating third-party libraries

When writing about continuous integration in 2006, Martin Fowler admits, "The current open source repository of choice is Subversion." With that bias, he recommended:

Although many teams use repositories a common mistake I see is that they don't put everything in the repository. If people use one they'll put code in there, but everything you need to do a build should be in there including: test scripts, properties files, database schema, install scripts, and third party libraries.

When all files, including third-party libraries, are in the source code repository, the procedure for obtaining the specific files and versions of third-party libraries is as simple as checking out the code base.

Unfortunately, this leaves some aspects of integrating with those third-party libraries unanswered. For the purposes of continuous integration, the most important unanswered question is, "Are we using the latest version of this library?" The Agile Alliance asserts one goal of continuous integration is to "minimize the duration and effort required by each integration episode". To elaborate, Extreme Programming explains:

Continuous integration avoids or detects compatibility problems early. Integration is a "pay me now or pay me more later" kind of activity. That is, if you integrate throughout the project in small amounts you will not find your self trying to integrate the system for weeks at the project's end while the deadline slips by. Always work in the context of the latest version of the system.

Neither calls out external dependencies; however, software projects are more and more built on external libraries, frameworks, and platforms. Avoiding "integration episodes" and "compatibility problems" is just as important for these externalities and should be addressed as frequently as internal source code changes.

Dependency management tools

Nearly a decade after Fowler's advice, there are dependency management tools for all major programming languages (for a list of appropriate dependency management tools by language, see Nicola Paolucci's blog on Git and project dependencies). Despite the availability of dependency management tools, many developers still believe there is only a simple dichotomy of "inside your project, or installed on your build server." Instead of manually managing multiple versions of libraries in either place, dependency management tools provide an explicit, repeatable, and reliable procedure for obtaining any given version. In short, dependency management tools move the responsibility of managing third-party libraries from the code repository to the automated build.

Typically dependency management tools use a single file to declare all library dependencies, making it much easier to see all libraries and their versions at once. That means it is sufficient to check-in references to the libraries, without storing the files themselves. This is convenient for newer distributed version control systems (DVCS) because some do not handle large binary files efficiently. It also has the surprising side-effect of making the build process more transparent - developers can read which versions of external libraries are needed to run a build, instead of needing to inspect file names or internal properties. Even without reading the set of declared dependencies, dependency management tools typically automate answering, "Is this the latest version?" Thusly, dependency management tools enable more rapid integration of changes from third-party sources.

The biggest drawback of dependency management tools may be that there are so many to choose from. Unlike version control and continuous integration tooling, dependency management tools are specific to programming language. Compounding the differences by language, each tool works slightly differently, each with different quirks. Despite these flaws, it is worth taking the time to learn them and integrate them into version control and continuous integration flows. These dependency management tools and the practice of declaring dependencies instead of storing them provide an important advantage of rapidly revealing integration problems from third-party libraries.

Updating third-party libraries

The practices and tooling are still young. As recently as 2013, Prezi engineers Ryan Lane and Peter Neumark wrote:

Dependency hell is an unsolved problem in Computer Science right up there with P ≠ NP.

Many teams who have struggled with dependency hell react by doing third-party integration less frequently. With each evaluation of the cost of updating against the benefit of new capabilities in the library, the cost has increased. Just as with source code, the longer integration with the newer version has been deferred, the greater the pain of updating. Hence, each decision to defer third-party integration leads to the ultimate state of never updating third-party libraries at all.

At the other extreme, there is automatically updating all libraries with every build. For the current state of dependency management tools, this can introduce so much instability that builds become unreliable. The problem is not so much frequency but isolation of change. It can be confusing for a developer trying to understand why his check-in caused a build break if the cause might be a change in a third-party library.

The problem with both extremes is being blind to change, either ignoring change itself or the impact. Max Lincoln explains:

If you don't have a good dependency report that shows what you're using and what upgrades are available then you are uninformed. You could try to manually assemble a report, but that is impractical on large projects (most Java projects), or projects with lots of small, frequently released libraries (most Ruby projects).

The solution is to use dependency management tools in the build process to generate dependency reports and signal when updates to third-party libraries are available. Even for dependency management tools with some auto-updating capabilities, the process of introducing those updates to the code base should be consistent with any other code changes. Dependency management tools do not yet (and may never) prevent the anti-patterns that lead to dependency hell, such as too many dependencies, long dependency chains, conflicting dependencies, and circular dependencies. For DVCS tools that often means the changes, even if only just bumping version number references, are reviewed by members of the team on a pull request, who ensure a passing build before the change is merged into master. This formality keeps team members aware of when dependencies are changing and prevents breaking the build.

Looking to the future

Since dependency management tools are already language specific, it is likely those will be around for a long time to come. However, the back-end storage of the libraries themselves may benefit from a more general approach to binary repositories. Tools like Nexus and Artifactory can be the back end for multiple dependency management tools as well as package management tools. With a recent resurgence of functional programming, there is some fresh thinking about binary repositories and dependency management tools that may accelerate adoption even further. For example, Mark Hibberd recently described his not-yet-released, open-source project annex in a video from StrangeLoop. Annex would capture richer declarations of dependency, smarter updating of dependencies, and earlier detection of dependency problems, all in a way that is cross- platform and more reliable.