First came Amazon, offering GPU-powered instances in its cloud back in 2010. Then, more recently, Microsoft Azure and IBM Softlayer each provided their versions of the same, albeit with different pricing structures and instance types.
Who’s left? Take a guess.
Starting next year, Google will offer GPU instances for both Google Compute Engine and Google Cloud Machine Learning users, with GPU profiles that complement both high-end number-crunching and more modest remote workstation computation loads.
Google’s plan to stand apart from the competition is to be more granular. Amazon’s machine-learning-oriented GPU instances are rented by the hour and come in a discrete instance type. Google, however, is planning to allow users to “attach up to 8 GPU dies to any non-shared-core machine,” regardless of instance type.
Even more critical, Google’s GPU pricing will follow its existing model: by the minute, same as Google’s VMs. (Azure's billing is also rounded up to the nearest minute, making Google more directly competitive with that service.)
This isn’t about consistency alone; it also reflects how GPU-powered machine learning is actually used. If a machine learning application employs only GPUs for training, it makes sense to be able to toggle off the GPU when it’s not needed instead of changing instance types.
“Whether you need one or dozens of instances,” says Google, “you only pay for what you use.”
Google is also trying to get a leg up on the competition by providing three types of GPU, depending on your needs or application profile. The AMD FirePro S9300 x2 is meant to be used to power remote graphics workstations. For deep learning, Google has two Nvidia Tesla GPUs: the vintage 2012 K80 variety and the new, Pascal-architecture-powered P100.
Amazon’s cloud currently offers only the K80, but the P100 provides better performance under machine learning frameworks designed to take advantage of its advanced instruction set. Likewise, the GPUs in Microsoft Azure are the older-school K80, partly because the GPU is stable and well-understood.
Google’s pricing structure might make more sense when coupled with so-called serverless (“lambda”) architectures, where the application and the hardware are as decoupled from each other as possible. Machine learning applications lend themselves to implementation as microservices (ingest, train, report), and the serverless architecture was built to elevate microservices to first-class status.
That said, such items aren’t likely to be built on Google’s systems in great numbers yet. Google’s native lambda architecture, Google Cloud Functions, is still in alpha—but once it becomes production-ready, it’ll have at least one major, GPU-powered use case waiting for it.
[Edited to clarify that Azure has per-minute billing for its services.]