Machine learning couldn’t be hotter, with several heavy hitters offering platforms aimed at seasoned data scientists and newcomers interested in working with neural networks. Among the more popular options is TensorFlow, a machine learning library that Google open-sourced a year ago.

In my recent review of TensorFlow, I described the library and discussed its advantages, but only had about 300 words to devote to how to begin using Google’s “secret sauce” for machine learning. That isn’t enough to get you started.

In this article, I’ll give you a very quick gloss on machine learning, introduce you to the basics of TensorFlow, and walk you through a few TensorFlow models in the area of image classification. Then I’ll point you to additional resources for learning and using TensorFlow.

## Prerequisites

You need a few prerequisites to fully understand the material I’ll cover. First, you should be able to read Python code. If you don’t know how, the book “Learning Python” by Mark Lutz is excellent; for a quicker, free introduction on the web, try Python for Beginners or Learn Python.

Second, you should know something about calculus and basic statistics. Most programmers learn these in college or even high school, but if you’re rusty on any of the concepts I’ll be using, there are plenty of resources on the web, such as Calculus for Beginners and Usable Stats.

It would also help if you understood gradient-based optimization methods. If you don’t, you can treat the optimizers we’ll be using as black boxes.

## Quick intro: Machine learning, neural networks, and deep learning

In traditional programming we explicitly tell the computer what to do with its input data ahead of time, including various program branches that respond to conditions in the data. In machine learning, on the other hand, we give the computer some data, a model for the data, weights and biases for the terms of the model, a function to define the “loss” or “cost” of a model, and an optimization algorithm to “train” the model by adjusting the weights and biases to find the minimum loss.

Once the computer finds the best model from training on the initial data, we can use that model to predict values for new data. If the data tends to change over time, we may have to retrain the model periodically to keep it accurate.

We typically divide the initial data into two or three groups: training data, test data, and, optionally, validation data. The data may be continuous (real numbers), in which case we will be solving a regression problem to predict a response, or it may be discrete (integers or class labels), in which case we will be solving a classification problem.

An artificial neural network, one of the many ways to implement machine learning, is a model consisting of an interconnected group of nodes, typically with an input layer, an output layer, and one or more hidden layers in between. These days each node is often a “sigmoid” neuron, meaning that its activation function varies smoothly between 0 and 1 in an “S”-shaped curve, which gives more stable behavior than the binary step function of the older “perceptron.”

Deep learning is, at its core, a neural network with multiple hidden layers—that is, a deep neural network. There are many types of deep networks; one of the most commonly used is the deep convolutional network, which works well for image recognition. As you explore TensorFlow you will read about this and other kinds of deep neural networks, such as recurrent neural networks (RNNs), which are handy for speech recognition. There are many ways to minimize the loss in deep neural networks, and we’ll discuss several of them as we try some examples.

You can learn much more about neural networks and deep learning at an introductory level from a free e-book on the subject by Michael Nielsen. Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville offers an even more technical overview.

To get a feel for neural networks, try the TensorFlow Playground.

The playground allows you to try to solve four classification problems and one regression problem employing your own choices of feature selections, the properties used to create your predictive model; neuron activation functions, to define the output of your modes; and the number of hidden layers and number of neurons in each layer, for defining how deep your network should be. You can also adjust the batch size for each iteration of training data, ratio of training to test data, the learning rate for training your model, type of regularization, and regularization rate. Try various strategies and see how low you can get the loss for each problem and how long each one takes to converge. As you play with methodologies, pay attention to the way your intuition begins to develop.

Once you think you’re getting a feel for neural networks from the Playground (which is not actually based on TensorFlow even though it lives in the TensorFlow repository), it’s time to check out the TensorFlow source code from GitHub.

The README.md file at the bottom of this GitHub page has a good overview and useful links.

I like to clone repos with GitHub Desktop, but any Git client will work, as will any of the other methods suggested:

TensorFlow supports machine learning, neural networks, and deep learning in the larger context of data flow graphs. These graphs describe the computational network for models in a more complicated but more flexible, generalized, and efficient way than the Playground. The code for a TensorFlow solution first loads the data and builds the graph, then establishes a session and runs the training of the model against the data.

As you’ll see when you open your TensorFlow repository in a programming editor or browse the code on GitHub, the core of TensorFlow is implemented in C++ with optional GPU support. It uses a domain-specific compiler for linear algebra (XLA) to JIT-compile subgraphs of TensorFlow computations (data flow graphs). A version of XLA that supports Google Tensor Processing Units (TPUs), which is not open-sourced at this time, uses custom code generation; the open source CPU and GPU back ends currently use LLVM.

Higher layers of TensorFlow and the primary TensorFlow API are implemented in Python. While there is a C++ API, it implements only a subset of what’s supported in the Python API.

As you browse through the TensorFlow repository, pay special attention to the `examples`

directory. We’ll be coming back to the examples to understand specific TensorFlow applications.

## Installing TensorFlow

You can install TensorFlow locally or use the cloud. Perhaps the most *powerful* way you can use TensorFlow is to set up a Google Cloud Platform project enabled for the Cloud Machine Learning, Cloud Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Cloud Storage JSON, and BigQuery APIs, then set up an environment for Cloud ML either in a Cloud Shell, in a Docker container (on Mac OS, Linux, or Windows), or locally (on Mac OS or Linux).

But for the purposes of getting started, I think you should install TensorFlow locally first. I’d suggest the `pip`

installation if you’re using Mac OS or Linux; for Windows, you have to use one of the Docker images to install locally. When using the Docker image to train models, you should allocate most of your computer’s RAM and cores to Docker, then close Docker to release the resources when you’re done.

During install, you have a choice of CPU-only or GPU versions. The GPU version will run trainings much faster on machines with an Nvidia chip, but it’s much harder to install correctly; I’d suggest installing a CPU-only binary at first.

You may run into a permission error uninstalling old versions on the `pip`

installation step:

`$ sudo pip install —upgrade $TF_BINARY_URL`

If that happens to you, add the switch to ignore the installed versions:

`$ sudo pip install –upgrade —ignore-installed $TF_BINARY_URL`

Your next step is to validate your installation. I recommend copying the Python code for this line by line; if there’s going to be a problem, it most likely will happen while Python attempts to import `tensorflow`

:

`>>> import tensorflow as tf`

When you’re through with the Python session, `exit()`

will get you back to the shell.

If you want to fully test your installation, run the `convolutional.py`

demo from the TensorFlow repository, which will spend about half an hour training a moderately simple convolutional neural network model for identifying handwritten digits from the standard MNIST data set:

`$ python -m tensorflow.models.image.mnist.convolutional`

While that’s grinding away and making your computer fan spin, you might want to read more about what’s going on. Let’s start with data flow graphs, which underpin TensorFlow’s architecture.

## Understand data flow graphs

A data flow graph is a kind of directed graph describing a mathematical computation. If you’re not familiar with directed graphs, all you really need to know is that they are graphs with nodes and edges, and the edges flow in one direction (are directed) from node to node.

In a data flow graph, the nodes represent mathematical operations, or endpoints to feed in data, push out results, or read/write persistent variables. The edges represent the input/output relationships between nodes and carry dynamically sized multidimensional data arrays, which are also known as tensors.

In TensorFlow (named for the flow of tensors along the edges) you can assign each node to a computational device, and the nodes execute asynchronously and in parallel once all the tensors on their incoming edges becomes available. As I mentioned earlier, a TensorFlow model loads the data, creates the data flow graph, establishes a session, and runs the training within the session. The session invokes a just-in-time compiler (XLA) to generate code from the data flow graph.

As you can see in the graph above, the neural weights `W`

and offsets `b`

appear in multiple places: in the rectified linear unit (ReLu) and log of probability (Logit) neuron layers as inputs, and in the stochastic gradient descent (SGD) training layer as outputs. That’s cyclic, so the network needs to be solved iteratively. The Softmax and Cross entropy nodes calculate the loss, and the Gradients node automatically calculates the partial derivatives of the loss with respect to the weights and offsets, to feed into the SGD trainer.

To make this clearer, let’s look at a concrete example in Python.

## Understand how to use TensorFlow

The tutorial that the TensorFlow authors offer for beginners is a slow-paced introduction to a very simple Softmax Regression classifier for the MNIST handwritten-digit data set. It’s a different—and worse—way of classifying digits than the convolutional model that we ran earlier to validate your TensorFlow installation. It’s much easier to understand, however.

I’d suggest that you read my explanation, then run the model yourself while reading the official tutorial. The Python program we’re discussing is at `tensorflow/examples/tutorials/mnist/mnist_softmax.py`

in your TensorFlow repository.

You can safely skip over the first few imports, which are basically housekeeping. The data-reading code import comes next:

This actually pulls in several other program files, which will download the official MNIST training, test, and validation image data when we call it. The tutorial explains the data in detail.

The next code imports the `tensorflow`

library module, gives it the name `tf`

, and clears all flags. You always need to import `tensorflow`

before you can use it.

Now we come to the executable code. First, we read in the data using the code we imported on line 28:

This will take a few seconds, then output:

`Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.`

`Extracting /tmp/data/train-images-idx3-ubyte.gz`

`Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.`

`Extracting /tmp/data/train-labels-idx1-ubyte.gz`

`Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.`

`Extracting /tmp/data/t10k-images-idx3-ubyte.gz`

`Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.`

`Extracting /tmp/data/t10k-labels-idx1-ubyte.gz`

Now that we have the data loaded, we create a model using TensorFlow variables, placeholders, and functions. It is really nothing more than the matrix equation `y = Wx + b`

, plus some setup to hold the tensors (784 is 28 by 28, to hold the pixels of the images; 10 is the number of categories, for the digits 0-9).

Then we define a loss function—the way we define “goodness of fit,” or rather “badness of fit” for the model—and choose the optimizer to use for training.

As you can read in the code and comments, the loss function is an averaged cross-entropy based on the Softmax (normalized exponential) function, and the training method is Gradient Descent set to minimize the cross-entropy loss function.