IBM adds cutting-edge GPUs to Bluemix on bare metal

The latest generation of Nvidia GPUs for machine learning and number crunching will be offered via IBM Bluemix, but only on bare metal and not more malleable VM types

IBM adds cutting-edge GPUs to Bluemix on bare metal

Most major cloud vendors have, or are preparing, support for GPU-enabled compute instances. But IBM hopes its next step will keep it ahead of the pack—or at least abreast with it.

IBM plans to make available instances of Nvidia’s current-generation GPU, the Tesla P100, via its Bluemix cloud service. But initially the instances will only be available on bare-metal machines, not via more malleable VM types as offered by some of the competition.

Tesla on metal

The Tesla P100 is regarded as the leader of Nvidia’s GPU pack. It uses the Pascal GPU architecture, which is not only speedier overall than the previous generation of 2012 Kepler-powered processors, but includes new types of GPU instructions to accelerate certain calculations. Software that takes advantage of the Pascal instruction set, like the Torch deep learning framework, runs even faster.

Other clouds offer Nvidia GPUs as well, although many only offer previous-generation silicon. Google Cloud Platform, for instance, plans to offer Tesla P100s but currently only offers the Kepler-powered Tesla K80 line. Microsoft Azure’s GPU-powered offerings also use the K80, but GPU instances in general are currently available only as a technology preview. And Amazon Web Services' highest-end GPU-powered instance type, the P2, uses the K80 as well.

What IBM offers is both more and less than other clouds. You can attach up to two Tesla P100s to a given Bluemix machine instance. But P100s are only going to be available on Bluemix bare metal servers, not other types of virtual machines—at least initially. IBM may be doing this out of a sense that the best performance for a GPU-enabled system will come from a bare-metal server rather than a virtualized instance, but the end result is less flexibility in the specs.

Google Cloud, for instance, has a more flexible design for its GPU offerings. Instead of having a separate GPU-powered instance type, up to eight GPUs can be attached to or detached from a given virtual machine.

Pascal’s wager

The big reason to offer any hardware configuration in the cloud is convenience, but IBM has other P100-powered projects for those who want complete control over their compute. Earlier this year it unveiled PowerAI, a line of standalone servers that employ the Tesla P100 and are built mainly for running AI applications.

It’s not clear when IBM will provide support for the P100 outside of bare metal instances, but there’s a growing list of reasons to do so. For one, cloud are becoming increasingly synonymous with container-powered workloads, and the latest container technologies are better equipped to run work on GPUs. Docker has an Nvidia plugin to allow that, and Google’s container-orchestration project Kubernetes added support for managing GPU-powered workloads last year.

IBM is already offering Kubernetes as a cloud service, so the next logical step seems to be further enabling top-of-the-line GPU support for workloads running somewhere other than bare metal. Especially since Google already has a leg up on IBM.

Copyright © 2017 IDG Communications, Inc.