The best machine learning and deep learning libraries

TensorFlow, Spark MLlib, Scikit-learn, PyTorch, MXNet, and Keras shine for building and training machine learning and deep learning models

The best machine learning and deep learning libraries
Sadsadang / Getty Images
At a Glance

If you’re starting a new machine learning or deep learning project, you may be confused about which framework to choose. As we’ll discuss, there are several good options for both kinds of projects.

There is a difference between a machine learning framework and a deep learning framework. Essentially, a machine learning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection, and data preparation, and may or may not include neural network methods.

A deep learning or deep neural network framework covers a variety of neural network topologies with many hidden layers. Keras, MXNet, PyTorch, and TensorFlow are deep learning frameworks. Scikit-learn and Spark MLlib are machine learning frameworks. (Click any of the previous links to read my stand-alone review of the product.)

In general, deep neural network computations run much faster on a GPU (specifically an Nvidia CUDA general-purpose GPU), TPU, or FPGA, rather than on a CPU. In general, simpler machine learning methods don’t benefit from a GPU.

While you can train deep neural networks on one or more CPUs, the training tends to be slow, and by slow I’m not talking about seconds or minutes. The more neurons and layers that need to be trained, and the more data available for training, the longer it takes. When the Google Brain team trained its language translation models for the new version of Google Translate in 2016, they ran their training sessions for a week at a time, on multiple GPUs. Without GPUs, each model training experiment would have taken months.

Since then, the Intel Math Kernel Library (MKL) has made it possible to train some neural networks on CPUs in a reasonable amount of time. Meanwhile GPUs, TPUs, and FPGAs have gotten even faster.

The training speed of all of the deep learning packages running on the same GPUs is nearly identical. That’s because the training inner loops spend most of their time in the Nvidia CuDNN package.

Apart from training speed, each of the deep learning libraries has its own set of pros and cons, and the same is true of Scikit-learn and Spark MLlib. Let’s dive in.


Keras is a high-level, front-end specification and implementation for building neural network models that ships with support for three back-end deep learning frameworks: TensorFlow, CNTK, and Theano. Amazon is currently working on developing a MXNet back-end for Keras. It’s also possible to use PlaidML (an independent project) as a back-end for Keras to take advantage of PlaidML’s OpenCL support for all GPUs.

TensorFlow is the default back-end for Keras, and the one recommended for many use cases involving GPU acceleration on Nvidia hardware via CUDA and cuDNN, as well as for TPU acceleration in Google Cloud. TensorFlow also contains an internal tf.keras class, separate from an external Keras installation.

Keras has a high-level environment that makes adding a layer to a neural network as easy as one line of code in its Sequential model, and requires only one function call each for compiling and training a model. Keras lets you work at a lower level if you want, with its Model or functional API.

Keras allows you to drop down even farther, to the Python coding level, by subclassing keras.Model, but prefers the functional API when possible. Keras also has a scikit-learn API, so that you can use the Scikit-learn grid search to perform hyperparameter optimization in Keras models. 

Cost: Free open source. 

Platform: Linux, MacOS, Windows, or Raspbian; TensorFlow, Theano, or CNTK back-end. 

Read my review of Keras


MXNet has evolved and improved quite a bit since moving under the Apache Software Foundation umbrella early in 2017. While there has been work on Keras with an MXNet back-end, a different high-level interface has become much more important: Gluon. Prior to the incorporation of Gluon, you could either write easy imperative code or fast symbolic code in MXNet, but not both at once. With Gluon, you can combine the best of both worlds, in a way that competes with both Keras and PyTorch.

The advantages claimed for Gluon include:

  • Simple, easy-to-understand code: Gluon offers a full set of plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers.
  • Flexible, imperative structure: Gluon does not require the neural network model to be rigidly defined, but rather brings the training algorithm and model closer together to provide flexibility in the development process.
  • Dynamic graphs: Gluon enables developers to define neural network models that are dynamic, meaning they can be built on the fly, with any structure, and using any of Python’s native control flow.
  • High performance: Gluon provides all of the above benefits without impacting the training speed that the underlying engine provides.

These four advantages, along with a vastly expanded collection of model examples, bring Gluon/MXNet to rough parity with Keras/TensorFlow and PyTorch for ease of development and training speed. You can see code examples for each these on the main Gluon page and repeated on the overview page for the Gluon API.

The Gluon API includes functionality for neural network layers, recurrent neural networks, loss functions, dataset methods and vision datasets, a model zoo, and a set of contributed experimental neural network methods. You can freely combine Gluon with standard MXNet and NumPy modules, for example module, autograd, and ndarray, as well as with Python control flows.

Gluon has a good selection of layers for building models, including basic layers (Dense, Dropout, etc.), convolutional layers, pooling layers, and activation layers. Each of these is a one-line call. These can be used, among other places, inside of network containers such as gluon.nn.Sequential().

Cost: Free open source. 

Platform: Linux, MacOS, Windows, Docker, Raspbian, and Nvidia Jetson; Python, R, Scala, Julia, Perl, C++, and Clojure (experimental). MXNet is included in the AWS Deep Learning AMI.

Read my review of MXNet


PyTorch builds on the old Torch and the new Caffe2 framework. As you might guess from the name, PyTorch uses Python as its scripting language, and it uses an evolved Torch C/CUDA back-end. The production features of Caffe2 are being incorporated into the PyTorch project.

PyTorch is billed as “Tensors and dynamic neural networks in Python with strong GPU acceleration.” What does that mean?

Tensors are a mathematical construct that is used heavily in physics and engineering. A tensor of rank two is a special kind of matrix; taking the inner product of a vector with the tensor yields another vector with a new magnitude and a new direction. TensorFlow takes its name from the way tensors (of synapse weights) flow around its network model. NumPy also uses tensors, but calls them an ndarray.

GPU acceleration is a given for most modern deep neural network frameworks. A dynamic neural network is one that can change from iteration to iteration, for example allowing a PyTorch model to add and remove hidden layers during training to improve its accuracy and generality. PyTorch recreates the graph on the fly at each iteration step. In contrast, TensorFlow by default creates a single dataflow graph, optimizes the graph code for performance, and then trains the model.

While eager execution mode is a fairly new option in TensorFlow, it’s the only way PyTorch runs: API calls execute when invoked, rather than being added to a graph to be run later. That might seem like it would be less computationally efficient, but PyTorch was designed to work that way, and it is no slouch when it comes to training or prediction speed.

PyTorch integrates acceleration libraries such as Intel MKL and Nvidia cuDNN and NCCL (Nvidia Collective Communications Library) to maximize speed. Its core CPU and GPU Tensor and neural network back-ends—TH (Torch), THC (Torch CUDA), THNN (Torch Neural Network), and THCUNN (Torch CUDA Neural Network)—are written as independent libraries with a C99 API. At the same time, PyTorch is not a Python binding into a monolithic C++ framework—the intention is for it to be deeply integrated with Python and to allow the use of other Python libraries.

Cost: Free open source. 

Platform: Linux, MacOS, Windows; CPUs and Nvidia GPUs. 

Read my review of PyTorch


The Scikit-learn Python framework has a wide selection of robust machine learning algorithms, but no deep learning. If you’re a Python fan, Scikit-learn may well be the best option for you among the plain machine learning libraries.

Scikit-learn is a robust and well-proven machine learning library for Python with a wide assortment of well-established algorithms and integrated graphics. It is relatively easy to install, learn, and use, and it has good examples and tutorials.

On the con side, Scikit-learn does not cover deep learning or reinforcement learning, lacks graphical models and sequence prediction, and it can’t really be used from languages other than Python. It doesn’t support PyPy, the Python just-in-time compiler, or GPUs. That said, except for its minor foray into neural networks, it doesn’t really have speed problems. It uses Cython (the Python to C compiler) for functions that need to be fast, such as inner loops.

Scikit-learn has a good selection of algorithms for classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It has good documentation and examples for all of these, but lacks any kind of guided workflow for accomplishing these tasks.

Scikit-learn earns top marks for ease of development, mostly because the algorithms all work as documented, the APIs are consistent and well-designed, and there are few “impedance mismatches” between data structures. It’s a pleasure to work with a library whose features have been thoroughly fleshed out and whose bugs have been thoroughly flushed out.

On the other hand, the library does not cover deep learning or reinforcement learning, which leaves out the current hard but important problems, such as accurate image classification and reliable real-time language parsing and translation. Clearly, if you’re interested in deep learning, you should look elsewhere.

Nevertheless, there are many problems—ranging from building a prediction function linking different observations, to classifying observations, to learning the structure of an unlabeled dataset—that lend themselves to plain old machine learning without needing dozens of layers of neurons, and for those areas Scikit-learn is very good indeed.

Cost: Free open source. 

Platform: Requires Python, NumPy, SciPy, and Matplotlib. Releases are available for MacOS, Linux, and Windows.

Read my review of Scikit-learn

Spark MLlib

Spark MLlib, the open source machine learning library for Apache Spark, provides common machine learning algorithms such as classification, regression, clustering, and collaborative filtering (but not deep neural networks). It also includes tools for feature extraction, transformation, dimensionality reduction, and selection; tools for constructing, evaluating, and tuning machine learning pipelines; and utilities for saving and loading algorithms, models, and pipelines, for data handling, and for doing linear algebra and statistics.

Spark MLlib is written in Scala, and uses the linear algebra package Breeze. Breeze depends on netlib-java for optimized numerical processing, although in the open source distribution that means optimized use of the CPU. Databricks offers customized Spark clusters that use GPUs, which can potentially get you another 10x speed improvement for training complex machine learning models with big data.

Spark MLlib implements a truckload of common algorithms and models for classification and regression, to the point where a novice could become confused, but an expert would be likely to find a good choice of model for the data to be analyzed, eventually. To this plethora of models Spark 2.x adds the important feature of hyperparameter tuning, also known as model selection. Hyperparameter tuning allows the analyst to set up a parameter grid, an estimator, and an evaluator, and let the cross-validation method (time-consuming but accurate) or train validation split method (faster but less accurate) find the best model for the data.

Spark MLlib has full APIs for Scala and Java, mostly-full APIs for Python, and sketchy partial APIs for R. You can get a good feel for the coverage by counting the samples: 54 Java and 60 Scala machine learning examples, 52 Python machine learning examples, and only five R examples. In my experience Spark MLlib is easiest to work with using Jupyter notebooks, but you can certainly run it in a console if you tame the verbose Spark status messages.

Spark MLlib supplies pretty much anything you’d want in the way of basic machine learning, feature selection, pipelines, and persistence. It does a pretty good job with classification, regression, clustering, and filtering. Given that it is part of Spark, it has great access to databases, streams, and other data sources. On the other hand, Spark MLlib is not really set up to model and train deep neural networks in the same way as TensorFlow, PyTorch, MXNet, and Keras.

Cost: Free open source.

Platform: Spark runs on both Windows and Unix-like systems (e.g. Linux, MacOS), with Java 7 or later, Python 2.6/3.4 or later, and R 3.1 or later. For the Scala API, Spark 2.0.1 uses Scala 2.11. Spark requires Hadoop/HDFS.

Read my review of Spark MLlib

At a Glance
1 2 Page 1
Page 1 of 2
How to choose a low-code development platform