Microsoft has published details about a new architecture for datacenters that uses reconfigurable hardware -- field-programmable gate arrays (FPGAs) -- to work as a datacenter-wide distributed computing system that can speed up many kinds of operations.
The Configurable Cloud architecture isn't simply theory; Microsoft claims it's running such a setup right now in its own production datacenters to deliver search results faster -- and to pave the way toward speeding up other kinds of operations at scale.
The paper, entitled "A Cloud-Scale Acceleration Architecture," was written for the 49th Annual IEEE/ACM International Symposium on Microarchitecture held this week in Taipei. It details how Microsoft had been using FPGAs in its datacenters as a "compute or network accelerator" system.
The newest iteration of the idea, detailed in the paper, places FPGAs in line with the networking hardware -- what Microsoft calls a "bump in the wire" arrangement. Each FPGA can talk directly to every other FPGA in the datacenter, so the FPGAs become a flexible resource pool that can be partitioned as needed for different jobs. This change also means FPGAs aren't tied to working with any one specific node or CPU, as was the case in Microsoft's earlier implementation of the idea.
Microsoft detailed two major ways the Configurable Cloud architecture can be used for greater speed. One is compute acceleration, where the FPGAs help offload certain kinds of computations from CPUs. Microsoft's main example was speeding up Bing searches by moving some of the most expensive work off the CPU and into the FPGA. Another speedup came in the form of accelerating network functions by offloading the encryption and decryption of network data onto the FPGAs.
FPGAs of the datacenter, unite!
The most ambitious idea Microsoft has in mind for Configurable Cloud involves using this mesh of FPGAs as an elastic compute resource.
Most of the discussion in the paper entails how Microsoft stitched together such a mesh -- for instance, by creating its own networking protocol with extremely low round-trip latency and by using specialized crossbar-routing hardware devised to connect multiple FPGAs.
Microsoft also dropped hints at what else might be possible with an FPGA mesh network -- what the company calls a hardware-as-a-service (HaaS) platform, which "manages FPGAs in a manner similar to Yarn and other job schedulers."
Most distributed computations could be performed with such a setup, but one that comes readily to mind is machine learning applications -- not only because many of these applications can be easily partitioned across multiple nodes, but because certain portions of those problems can be accelerated directly on FPGAs.
Microsoft further notes in the paper that its FPGA setup has some advantages over GPUs (among other kinds of hardware) for offloading work. The biggest is that GPUs can't really be linked in a mesh that's independent of the CPUs they're paired with. GPUs can talk to each other by way of technologies like NVLink, but those only enable GPU-to-GPU communication on the same box, not a mesh that spans multiple nodes.
Machine learning? Maybe later
As promising as Configurable Cloud sounds as a way to perform high-speed distributed work, don't expect FPGA-powered machine learning to catch on quickly.
For one thing, the vast majority of machine learning still gets its biggest boost from GPU acceleration, thanks to the huge amount of industry support for GPUs in that line of work. FPGAs are still far more difficult to program for such applications, and new generations of GPUs specifically tailored to accelerate machine learning tasks are already showing up in the marketplace.
That said, the rising interest in turning FPGAs into first-class citizens in datacenters could change that. Intel, for instance, has started experimenting with fusing FPGAs and conventional processors.
The biggest takeaway from Microsoft's vision of an FPGA datacenter mesh is that it's intended to be a complementary technology -- not only for conventional datacenter technologies, but for other acceleration methods. GPUs and FPGAs could work side-by-side, each contributing their own kind of speedups.
One possible goal is to provide a consistent option to program all the elements that could be woven into such a fabric -- and make it something that could work outside of Microsoft's datacenters.