Puppet, the configuration management and automation tool for datacenters, forms a major portion of the infrastructure supporting code-hosting service GitHub.
GitHub has now open-sourced one of its key Puppet tools, a system for previewing the effects that changes in Puppet configurations will have across thousands of machines.
Octocatalog-diff is a Ruby app that allows modifications to Puppet code to be previewed in detail. Thus, users can "ensure not only that it serves the intended purpose for the role at hand, but also to avoid causing unexpected side effects on other roles," as GitHub described in its blog post about the software.
Bigger changes require better tools
GitHub had three big use cases for Puppet that demanded a unified tool like Octocatalog-diff: Testing deployments, automated testing of code, and catalog testing ("comparison of catalogs produced by two different Puppet versions or between two environments").
In each case, Octocatalog-diff makes sure major changes to Puppet roles across GitHub's environment can be deployed without breaking other Puppet rules.
Puppet could do this as well, but GitHub didn't want to have to grant access to the Puppet master to engineers who didn't really need it. To that end, Octocatalog-diff can run on its own, without access to the Puppet master.
GitHub claims Octocatalog-diff has helped the company perform several major Puppet-powered infrastructure changes that could have caused big problems -- upgrading from Puppet 3.4 to 4.5 alone, for instance. "Using octocatalog-diff to predict changes across the fleet," wrote GitHub, "a relatively small number of developers accomplished these substantial initiatives quickly and without their Puppet changes causing outages."
One-up on GitHub? Not so fast
GitHub hosts a sizable share of the world's open source software projects, although GitHub itself is a closed-source product, and the company has traditionally been tight-fisted about making available software from its own codebase. But a few projects have emerged over time, such as its own internally developed software load balancing system, cited as one of the company's most complex components.
To an outsider, that load balancer also seems like a chief source of GitHub's competitive edge -- an integral part of how the service is able to provide access to code repositories to such a large audience of users without falling over. The same might go for Octocatalog-diff, since being able to roll out Puppet-automated changes without breakage counts as a competitive edge.
Still, if those technologies grant an edge, it's in getting a leg up in your own projects, rather than in competing with GitHub directly. Like many other services that release pieces of their infrastructure as open source projects, GitHub isn't banking its future on any one feature, but rather on using its presence as a de facto standard to stay on top. It'll take more than having a load balancer or the ability to upgrade Puppet infrastructure effortlessly to be able to beat GitHub at its own game.