Harvard census identifies most commonly used open source packages

Researchers hope that by raising awareness of the most widely used open source packages, they can help prevent the next Log4j or Heartbleed exploit from happening.

Harvard census identifies most commonly used open source packages
MysteryShot / Getty Images

Researchers at the Laboratory for Innovation Science at Harvard University (LISH) have published the most comprehensive census of free and open source (FOSS) software packages to date, with the aim of helping the industry better protect against high-profile vulnerabilities like Heartbleed and Log4shell, which impacted popular open source projects.

The census comes at a time when the technology industry is being forced to contend with the risks posed by the widespread use of open source technology within critical enterprise and public sector applications.

The research focuses on software packages at the application library level by aggregating data from over half a million observations of FOSS libraries used in production applications at thousands of companies in 2020.

“FOSS has become a critical part of the modern economy. There are tens of millions of FOSS projects, many of which are built into software and products we use every day. However, it is difficult to fully understand the health, economic value, and security of FOSS because it is produced in a decentralized and distributed manner,” the census authors noted in their report.

What’s in the report?

The census is broken down into eight ranked lists. Four include version numbers and four are version agnostic. Packages that use the default JavaScript npm package manager have been split out from non-npm packages.

There are also separate lists for packages that are directly called by developers versus those that are indirectly called as dependencies, bringing attention to the kinds of deeper dependencies that are more difficult for developers to observe within their environments.

These lists “represent our best estimate of which FOSS packages are the most widely used by different applications, given the limits of time and the broad, but not exhaustive, data we have aggregated,” the report notes.

While the census does not attempt to identify the riskiest OSS projects, it does note that “measuring risk profiles is a separable task, and it’s easier to do it once the most widely used software is identified.” That work will require cross-industry effort and will depend on the individual risk profile of the consuming organization.

For organizations that have already started to put together their software bills of materials, these lists can provide a useful reference point as to which open source packages are the most common and start to dedicate resources to ensure those projects are secure.

Preventing the next Log4j

The researchers hope that by raising awareness of the most commonly used open source packages, they can help prevent the next Log4j or Heartbleed exploit from happening.

“Hopefully the next Log4j is on our list and we get to it before serious problems arrive,” Frank Nagle, an author of the report and assistant professor at Harvard Business School, told InfoWorld.

The report authors hope that by identifying “critical FOSS packages” it can help spur developers and end users to share data, invest and coordinate efforts to secure key open source projects, which are often maintained by small groups of volunteer developers.

Back in 2014, following the discovery of the Heartbleed flaw, the Linux Foundation founded the Core Infrastructure Initiative (CII) in an attempt to provide better funding and support to critical FOSS projects, namely by paying maintainers and identifying critical projects and setting out security best practices. In 2020 much of these efforts were folded into the newly created Open Source Security Foundation (OpenSSF), which supported this research project.

Open source security is an issue which has caught the attention of governments around the world. The White House recently conducted meetings with public and private sector representatives to discuss the issue. That meeting aimed to discuss how to prevent security defects and vulnerabilities in open source code and packages, improving the process of finding and remediating vulnerabilities, and shortening the response time for fixing issues.

In 2014 the European Commission put into place a FOSS Strategy of its own, and a few years later it started sponsoring FOSS auditing by setting up bug bounty programs, hackathons and conferences.

Other lessons learned

The report also made five broad observations about the state of enterprise usage of open source software today. These are:

  • There is a need for a more standardized naming schema for software components.
  • There remain serious complexities associated with package versioning.
  • Much of the most widely used FOSS is developed by only a handful of contributors.
  • Individual developer account security is of growing importance.
  • Legacy software in the open source space persists.

“Far from being the final word on critical FOSS projects, this census effort represents the beginning of a larger dialogue on how to identify vital packages and ensure they receive adequate resources and support,” the report concluded.

Copyright © 2022 IDG Communications, Inc.

How to choose a low-code development platform