Prometheus 2.0 shoots for Kubernetes scale

The next version of the open source cloud-native monitoring system will feature a totally rewritten data storage layer to meet the demands of a Kubernetes-powered future

Prometheus 2.0 shoots for Kubernetes scale
Thinkstock

Prometheus, the cloud monitoring service under the aegis of the Cloud Native Computing Foundation (CNCF), has a major makeover coming for its 2.0 revision.

A blog post at the CNCF discusses the original 1.x design for Prometheus's data storage layer and its bottlenecks. That's bad news for a monitoring system that needs to work at the scale of products like Kubernetes, where many monitored objects—that is, hundreds of containers spinning up and down—create large amounts of data.

The required changes, according to the CNCF, "are so fundamental that merging them will trigger a new major release." That major release, Prometheus 2.0, is now available in a very early alpha preview version.

Another blog post, by Kubernetes and Prometheus dev Fabian Reinartz, dives into the technical details behind the changes.

Under the hood, Prometheus is a time-series database, which means it records lots of timestamped values that come in at high speed. The original storage layer for Prometheus used single files for each time series or monitored object, resulting in many small files stored on disk. Both SSDs and conventional hard disks have trouble with this design, in part because it makes certain operations (such as deleting old data) extremely expensive.

Prometheus 2.0 creates files that are partitioned by time range, rather than by time series. The most recent time window is stored as an in-memory table, but also copied to disk in a write-ahead log. When that time window is filled, all Prometheus has to do is finalize that write-ahead log and open a new one. It's also easier to create files that respect the data alignment for the particular disk technology in use and easier to delete old data.

The changes in question, according to Reinartz, are not innovations—they're best-practice concepts borrowed from databases like LevelDB, Cassandra, InfluxDB, or HBase. "The key takeaway is to avoid reinventing an inferior wheel, researching proven methods, and applying them with the right twist," he wrote.

Some preliminary benchmarks using the new storage layer show modest reductions in memory usage, but dramatic drops in both CPU and disk I/O, and far more manageable latency per query as data is added.

Two other big improvements are slated to become part of the plan for Prometheus 2.0 and beyond. First is allowing anyone to build custom remote storage for Prometheus, a feature that right now is in an experimental form and isn't available in the Prometheus 2.0 alpha. The other is to "make Prometheus’s metrics exchange format an IETF standard. There is early work going on around this, but no clear outcome yet."