PyTorch review: A deep learning framework built for speed

PyTorch 1.0 shines for rapid prototyping with dynamic neural networks, auto-differentiation, deep Python integration, and strong support for GPUs

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.
At a Glance

Deep learning is an important part of the business of Google, Amazon, Microsoft, and Facebook, as well as countless smaller companies. It has been responsible for many of the recent advances in areas such as automatic language translation, image classification, and conversational interfaces.

We haven’t gotten to the point where there is a single dominant deep learning framework. TensorFlow (Google) is very good, but has been hard to learn and use. Also TensorFlow’s dataflow graphs have been difficult to debug, which is why the TensorFlow project has been working on eager execution and the TensorFlow debugger. TensorFlow used to lack a decent high-level API for creating models; now it has three of them, including a bespoke version of Keras.

CNTK (Microsoft) and Apache MXNet (Amazon) have been the principal competitors to TensorFlow, but there are other framework lineages to consider. Caffe (Berkeley Artificial Intelligence Research Lab), originally for image classification, was expanded and updated to Caffe2 (Facebook and others) and given strong production capabilities. Torch (Facebook, Twitter, Google, and others) uses Lua scripting and a CUDA (Compute Unified Device Architecture) C/C++ back end to efficiently solve problems in machine learning, computer vision, signal processing, and other fields. Despite its strengths as a scripting language, Lua became a liability to Torch when the bulk of the deep learning community adopted Python.

CUDA is Nvidia’s API for its general purpose GPUs. GPUs are much faster than CPUs for training and making predictions from deep neural networks; so are Google’s TPUs (tensor processing units) and FPGAs (field programmable gate arrays), which are available for use on AWS, Microsoft Azure, and elsewhere. In some cases, the use of advanced chips (GPUs, TPUs, or FPGAs) can speed up computations over CPUs by 50x per chip used, reducing training times from weeks to hours or from hours to minutes.

To continue reading this article register now