Faster machine learning is coming to the Linux kernel

The addition of heterogenous memory management to the Linux kernel will unlock new ways to speed up GPUs, and potentially other kinds of machine learning hardware

Faster machine learning is coming to a Linux kernel near you
Credit: Thinkstock

It's been a long time in the works, but a memory management feature intended to give machine learning or other GPU-powered applications a major performance boost is close to making it into one of the next revisions of the kernel.

Heterogenous memory management (HMM) allows a device’s driver to mirror the address space for a process under its own memory management. As Red Hat developer Jérôme Glisse explains, this makes it easier for hardware devices like GPUs to directly access the memory of a process without the extra overhead of copying anything. It also doesn't violate the memory protection features afforded by modern OSes.

One class of application that stands to benefit most from HMM is GPU-based machine learning. Libraries like OpenCL and CUDA would be able to get a speed boost from HMM. HMM does this in much the same way as speedups being done to GPU-based machine learning, namely by leaving data in place near the GPU, operating directly on it there, and moving it around as little as possible.

These kinds of speed-ups for CUDA, Nvidia’s library for GPU-based processing, would only benefit operations on Nvidia GPUs, but those GPUs currently constitute the vast majority of the hardware used to accelerate number crunching. However, OpenCL was devised to write code that could target multiple kinds of hardware—CPUs, GPUs, FPGAs, and so on—so HMM could provide much broader benefits as that hardware matures.

There are a few obstacles to getting HMM into a usable state in Linux. First is kernel support, which has been under wraps for quite some time. HMM was first proposed as a Linux kernel patchset back in 2014, with Red Hat and Nvidia both involved as key developers. The amount of work involved wasn’t trivial, but the developers believe code could be submitted for potential inclusion within the next couple of kernel releases.

The second obstacle is video driver support, which Nvidia has been working on separately. According to Glisse’s notes, AMD GPUs are likely to support HMM as well, so this particular optimization won’t be limited to Nvidia GPUs. AMD has been trying to ramp up its presence in the GPU market, potentially by merging GPU and CPU processing on the same die. However, the software ecosystem still plainly favors Nvidia; there would need to be a few more vendor-neutral projects like HMM, and OpenCL performance on a par with what CUDA can provide, to make real choice possible.

The third obstacle is hardware support, since HMM requires the presence of a replayable page faults hardware feature to work. Only Nvidia’s Pascal line of high-end GPUs supports this feature. In a way that’s good news, since it means Nvidia will only need to provide driver support for one piece of hardware—requiring less work on its part—to get HMM up and running.

Once HMM is in place, there will be pressure on public cloud providers with GPU instances to support the latest-and-greatest generation of GPU. Not just by swapping old-school Nvidia Kepler cards for bleeding-edge Pascal GPUs; as each succeeding generation of GPU pulls further away from the pack, support optimizations like HMM will provide strategic advantages.