It's a conundrum: You've got deep learning software, which benefits greatly from GPU acceleration, wrapped up in a Docker container and ready to go across thousands of nodes. But wait -- apps in Docker containers can't access the GPU because they're, well, containerized.
Well, now they can.
Nvidia, developer of the CUDA standard for GPU-accelerated programming, is releasing a plugin for the Docker ecosystem that makes GPU-accelerated computing possible in containers. With the plugin, applications running in a Docker container get controlled access to the GPU on the underlying hardware via Docker's own plug-in system.
Plug me right in
As Nvidia notes in a blog post, one of the early ways developers tried to work around the problem was to install Nvidia's GPU drivers inside the container and map them to the drivers on the outside. Clever as this solution was, it didn't work very well because the drivers on the inside and the outside had to be the exact same build. "This requirement drastically reduced the portability of these early containers, undermining one of Docker’s more important features," said Nvidia.
Nvidia's new approach -- an open source Docker plugin named nvidia-docker -- provides a set of driver-agnostic CUDA images for a container's contents, along with a command-line wrapper that mounts the user-mode components of CUDA when the container is launched. The Docker images that use the GPU have to be built against Nvidia's CUDA toolkit, but Nvidia provides those in Docker containers as well. Nvidia even provides an Ansible role for provisioning the pieces automatically.
By default, CUDA-enabled containers use all the available GPUs, but nvidia-docker provides ways to restrict apps to use only specific GPUs. This comes in handy if you've built a system that has an array of GPUs and want to assign specific processors to specific jobs. It also provides a native way for cloud providers to automatically throttle the number of GPUs provided to a container when GPU access starts becoming a standard feature for container hosting in the cloud.
CUDA and its discontents
A small number of machine learning projects have already started offering Dockerfiles of their applications outfitted with Nvidia CUDA support, in advance of the plugin's 1.0 release. Many of these packages are familiar to machine learning users: Google's TensorFlow, Microsoft's CNTK, and longtime industry-standard projects Caffe and Theano.
The biggest drawback with nvidia-docker is that CUDA is a proprietary standard, and the overwhelming majority of GPU-accelerated computing is done with CUDA. Longtime Nvidia competitor AMD has proposed and promoted its own GPUOpen standard, which is intended not only to allow an open source set of methodologies for GPU-based computing but to also make it possible to write software that executes on both CPUs and GPUs by simply recompiling the same source.
Right now there doesn't appear to be any GPUOpen efforts that involve Docker. Given the project's general penchant for being open source friendly, it might behoove AMD to create something similar for its toolchain.