How to run stateful applications on Kubernetes

Take advantage of Portworx PX-Enterprise to simplify management of data-rich workloads on Kubernetes

1 2 Page 2
Page 2 of 2

Because PX-Enterprise runs as a Pod, it can be installed directly via a container scheduler, like Kubernetes, using the standard kubectl apply -f "[configuration spec]" command. Portworx has a spec generator that customers can use to automatically generate the configuration based on their own environment.

portworx kubernetes spec gen Portworx


As a software-only storage and data management solution, it’s important that Portworx be able to run in any environment on any hardware. Our customers run in the public cloud, on premises, and in hybrid deployments. Portworx supports all of these configurations because enterprises require this flexibility.


Historically, storage products focused on providing storage capacity and performance (such as bandwidth and IOPS) from a centralized set of storage hardware, without getting involved in understanding the application. Container schedulers radically change the demands on storage.

We now need to understand how data for dozens to thousands of application Pods need to be prioritized, managed, and snapshotted -- all on a Kubernetes-based infrastructure. The solution also needs to provide automation and data protection to be useable in production. However, one simple challenge is that microservices architecture encourages applications to operate independently in some cases and as a group in other cases.

As an example of operating independently, a relational database like PostgreSQL or MySQL will often be deployed as a single application Pod. Before upgrading the database version, teams need to take a snapshot and a backup so that a failsafe exists. One concern is how to make these operations fast, safe, and automatable.

From a Portworx perspective, this is handled by making sure applications send (flush) their contents to the data volume before taking a snapshot. (See left panel below.)

portworx kubernetes mysql snapshots Portworx

Without such application-level tooling, a data volume is only crash-consistent, not application-consistent. This means that an application (MySQL in this example) must run recovery steps. In addition, often admins must do manual verification before allowing the app to serve workloads again. This all takes more time and limits us from realizing the automation that we seek from schedulers.

In other cases, scale-out applications like Cassandra run Pods across many servers. Together, the Pods form a single Cassandra ring and work together to provide higher throughput. It becomes important in these scale-out cases to be able to handle all of the Cassandra data volumes as a single group. Acting on each volume independently would otherwise introduce unwanted rebalancing that reduces predictability in production.

In this scale-out case, the steps will start the same (by flushing the memory) but now end with a snapshot of all the data volumes as a group, as shown below.

portworx kubernetes cassandra Portworx

Unlike legacy storage approaches, this distributed set of operations represents new data management functionality that needs to be able to discriminate based on the use case (MySQL, Cassandra) while running on a shared infrastructure, just as Kubernetes does. At the same time, the experience needs to be integrated with Kubernetes in order to provide both automation and the intended data protection. For Portworx, we provide this functionality and a Kubernetes-native experience by integrating through Kubernetes  scheduler extensions and a set of storage custom resources.

Multi-cloud and hybrid-cloud ready

Portworx installs itself as a Pod, can be managed by Kubernetes, deploys on almost any hardware, and is application aware. It is a natural fit to support multi-cloud and hybrid-cloud workloads. The key to multi-cloud operations for stateful services is overcoming data gravity, the idea that stateless components like load balancers and app containers are trivial to “move,” while stateful components like data volumes are difficult because data has mass (figuratively).

Portworx overcomes data gravity, in part, by giving users the ability to snapshot application data, with full application consistency, even across multiple nodes, and to move that data to a secondary environment along with its configuration. With the ability to move data and configuration, Portworx supports multi-environment workloads such as burst-to-cloud, blue-green deployments of stateful applications, and copy-data management for the purposes of reproducibility and debugging, as well as more traditional backup and recovery.

Data is as important as ever. If containers are to become as popular in the enterprise as VMs have been in the previous decade, then a solid storage and data management solution will be a requirement. Just as I couldn’t imagine a world in which VMware couldn’t run a database, I can’t imagine a world in which databases and other stateful services don’t run on Kubernetes. But containers, which are more dynamic and numerous than VMs by an order of magnitude, create problems for stateful services that traditional storage and data management solutions don’t solve. I’m excited to be working at Portworx to tackle these problems head-on. It’s an important mission.

Eric Han is vice president of product management at Portworx, the cloud-native storage company. He previously worked at Google, where he was the first product manager for Kubernetes.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to

Copyright © 2019 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
InfoWorld Technology of the Year Awards 2023. Now open for entries!