How to do open source right: LinkedIn shows the way

LinkedIn is a model for producing open source code that really matters to a community—and highlights why developers would want to work for you

How to do open source right (CC BY-SA 2.0)

If you want to know how to do open source the smart way, pay attention to LinkedIn. It has delivered some of the industry’s most impressive open source software, most recently its Cruise Control load-balancing tool for Apache Kafka, a distributed streaming platform also developed by LinkedIn that is used to build real-time data pipelines and streaming apps in big data applications.

Cruise Control exemplifies the serious open source savvy on LinkedIn’s part, with its extensibility and generality.

Although meant for general consumption, Cruise Control didn’t have a real community around it; it had been developed by and for LinkedIn. But LinkedIn built Cruise Control in a way that would translate beyond LinkedIn’s needs. Many such projects make the rookie mistake of solving only their creators’ needs; LinkedIn didn’t make that mistake.

LinkedIn engineer Jiangjie Qin says that he and his team deliberately built Cruise Control outside Kafka core, choosing not to bind it tightly. This let the team make Cruise Control both extensible and generalizable.

By “extensible,” Qin means that developers outside LinkedIn can improve it to satisfy other requirements. But even more important is the principle of generality, he says:

We realized early on that other distributed systems can also benefit from similar operational automation that requires such application-aware monitor-analysis-action cycles. While there are existing products that help balance resource utilization in a cluster, most of them are application-agnostic and perform the rebalance by migrating the entire application process. While this works well for stateless systems, it usually falls short when it comes to stateful systems (such as Kafka) due to the large amount of state associated with the process. Therefore, we wanted Cruise Control to be a general framework that could understand the application and migrate only a partial state and be used in any stateful distributed system.

So, if you want to use Cruise Control with rival streaming platforms like Apache Spark, you can.

It takes a lot of effort to effectively open-source code like Cruise Control. Writing the software is easy compared to code cleanup, documentation, and other work that prepares code for maximum impact as an open source project.

But that approach is baked into the LinkedIn engineering DNA: “Open source is key to how LinkedIn engineering works,” Qin says. Indeed, everything (except member data, which is expressly off-limits) is open to open source at LinkedIn. “If it’s not open-sourced it’s because it’s not general enough to be useful beyond LinkedIn, or we just haven’t done the work yet to open-source it.” Yet.

This mentality—opening everything that could be useful within and beyond LinkedIn, and of doing the “dirt work” necessary to make it useful to external developers—is what makes LinkedIn such an impressive organization to watch. Not only does this result in better software for LinkedIn and everyone else, it also helps LinkedIn recruit the best engineers. As Qin says, “Many companies claim to build great software, but the best way to prove it is to open-source it. If you open-source something, that means you’re proud of it, and it’s useful not just for you but for others as well.”

Decades after open source launched in earnest, too many corporations mistakenly view open source as a way to source software for zero dollars, rather than as a way to change how they build software and do business. But such companies will be left behind by LinkedIn, Facebook, and others that take an open-source-first approach to code, and are willing to invest the labor to make it pay off.

Copyright © 2017 IDG Communications, Inc.