MXNet review: Amazon's scalable deep learning

Amazon’s favorite deep learning framework scales across multiple GPUs and hosts, but it's rough around the edges

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.

Deep learning, which is basically neural network machine learning with multiple hidden layers, is all the rage—both for problems that justify the complexity and high computational cost of deep learning, such as image recognition and natural language parsing, and for problems that might be better served by careful data preparation and simple algorithms, such as forecasting the next quarter’s sales. If you actually need deep learning, there are many packages that could serve your needs: Google TensorFlow, Microsoft Cognitive Toolkit, Caffe, Theano, Torch, and MXNet, for starters.

I confess that I had never heard of MXNet (pronounced “mix-net”) before Amazon CTO Werner Vogels noted it in his blog. There he announced that in addition to supporting all of the deep learning packages I mentioned above, Amazon decided to contribute significantly to one in particular, MXNet, which it selected as its deep learning framework of choice. Vogels went on to explain why: MXNet combines the ability to scale to multiple GPUs (across multiple hosts) with good programmability and good portability.

MXNet originated at Carnegie Mellon University and the University of Washington. It is now developed by collaborators from multiple universities and many companies, including the likes of Amazon, Baidu, Intel, Microsoft, Nvidia, and Wolfram. MXNet allows you to mix symbolic programming (declaration of the computation graph) and imperative programming (straight tensor operations) to maximize both efficiency and productivity.

The MXNet platform is built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly, although you have to tell it what GPU and CPU cores to use. A graph optimization layer on top of the scheduler makes symbolic execution fast and memory efficient.

MXNet currently supports building and training models in Python, R, Scala, Julia, and C++; trained MXNet models can also be used for prediction in Matlab and JavaScript. No matter what language you use for building your model, MXNet calls an optimized C++ back-end engine.

mxnet overview

An overview of the MXNet architecture: NDArrays are representations of tensors. The KVStore is a distributed key-value store for data synchronization over multiple devices.

To continue reading this article register now