As Docker containers come into wider use, their shortcomings also become clearer. How, for instance, do you migrate a running container along with its data to another server, and preserve its data in the process? Typically, you don't.
ClusterHQ, a startup founded in part by core contributors to the Python Twisted network engine, has a proposed solution. Flocker, an open source (Apache) data volume manager for Dockerized applications that's now in its 1.0 release, allows volumes of data (aka datasets) to be associated with containers and moved with them.
Keeping it all together
Flocker bundles containers and datasets, ensuring they move together whenever a Dockerized application is shuttled between hosts on a given cluster. The one limitation is that storage for the data has to be provided by a shared storage back end accessible to all the nodes in the cluster.
Only a few types of storage back ends, mostly cloud-oriented, are supported right now: Amazon EBS, Rackspace Cloud Block Storage, and EMC ScaleIO. ZFS-based storage is also supported, albeit only via a back end that's currently experimental.
"Anything you'd use VMware vMotion for," said Mark Davis, CEO of ClusterHQ, "are the same reasons you might want to move a container around. And if a container has data in it, you need something like Flocker."
That said, one vaunted feature of vMotion -- live migration of running apps -- isn't quite there yet in Flocker. Its migrations are "minimal downtime," rather than zero downtime, meaning there is a small window of unavailability during the migration process. Luke Marsden, CTO and co-founder of ClusterHQ, stated in a phone call that the downtime "depends on the speed with which the back end can have a volume detached from one VM and attached to another VM. But we're very interested in minimizing that downtime."
ClusterHQ already has experimental features in the works to speed up the process by way of volume snapshots, although the back end needs to support snapshots for it to be viable.
Docker's missing pieces
Docker has traditionally worked with data by way of data volumes, but they come with their own limitations. Manually copying data between containers still isn't simple (allegedly fixed in Docker 1.7), but the biggest wall remains the poor state of management for data shared by Docker containers running in different locations.
One current proposal for Docker involves making available a new type of storage to containers, where third parties can provide device drivers for their own storage types. If such a feature were implemented, it wouldn't be difficult for ClusterHQ to rework its support through its dataset back-end plug-in architecture -- and keep a step ahead of whatever functionality rolls into Docker's own core over time.