Review: Caffe deep learning conquers image classification

Caffe offers a strong brew for image processing, but the project shows signs of stalling

Review: Caffe deep learning conquers image classification
Thinkstock

Like superheroes, deep learning packages usually have origin stories. Yangqing Jia created the Caffe project while earning his doctorate at U.C. Berkeley. The project continues as open source under the auspices of the Berkeley Vision and Learning Center (BVLC), with community contributions. The BVLC is now part of the broader Berkeley Artificial Intelligence Research (BAIR) Lab. Similarly, the scope of Caffe has been expanded beyond vision to include nonvisual deep learning problems, although the published models for Caffe are still overwhelmingly related to images and video.

Caffe is a deep learning framework made with expression, speed, and modularity in mind. Among the promised strengths are the way Caffe’s models and optimization are defined by configuration without hard-coding, as well as the option to switch between CPU and GPU by setting a single flag to train on a GPU machine, then deploy to commodity clusters or mobile devices.

Meanwhile, as we enter 2017, Caffe has been stuck at version 1.0.0 RC 3 for almost a year. While there have been code check-ins and visible progress, the project is still not stable. My experience was marred by installation problems, inability to run Jupyter notebooks, and unanswered requests for help. An outsider might get the impression that the project stalled while the deep learning community moved on to TensorFlow, CNTK, and MXNet.

caffe image classification InfoWorld

Among the Caffe demos is a web-based example of image classification using a convolutional neural network, one of Caffe’s strong suits. The demo works fine on the provided examples, but unfortunately not on any of my own images, even when I reduce them to conform to the size expected.

Caffe features and use cases

In the slide deck DIY Deep Learning for Vision: A Hands-On Tutorial with Caffe, Jia and the core Caffe maintainers lay out the how and why of Caffe along with a “highlight reel” of Caffe examples and applications. They describe Caffe as an open framework based on fast, well-tested code, with models and worked examples for deep learning; with a core library written in pure C++ with CUDA; and with command-line, Python, and MATLAB interfaces.

Among the commercial users cited is Facebook, which employs Caffe models for objectionable content detection in uploaded images, an important function whose prudish implementation is the subject of considerable scorn from photographers. I can’t really blame Caffe for that. The Facebook engineers chose to train their filters on nipples, for example, without taking artistic context into consideration. Less controversial are the Facebook “on this day” photo feature for surfacing memories and automatic “alt text” generation to describe images for the blind.

One of Caffe’s more novel techniques is “fine-tuning.” This is the process of taking a model trained on lots of data, such as image keyword tagging from ImageNet, editing the neural network parameters for a different purpose, and using the pretrained parameters as a starting point for learning a new skill, such as image style recognition. The fine-tuning technique can sometimes reduce the time for training on the new classes.

Installing Caffe

When I first tried to review Caffe a couple of months ago, I was unable to build the Caffe executables on MacOS Sierra, which I had just installed. I tracked down the problem to a line in the makefile that explicitly referenced the frameworks for an older version of the OS by number, which is always a red flag, but I decided to wait for the maintainers to start building for Sierra before continuing the review process. I also hoped, in vain, that Nvidia would soon start supporting Xcode 8 so that I could build Caffe with CUDA GPU support yet not impact my other projects.

caffe lenet training InfoWorld

Training an MNIST LeNet model on the 2.6 GHz Intel Core i7 CPU on a MacBook Pro took about 7.5 minutes for 10,000 iterations. That’s fast enough to be usable, and it’s comparable to other frameworks running LeNet. But on a CUDA GPU it would take under a minute, if it fit in the GPU’s RAM.

I picked up Caffe again in December. After updating my repository, I was able to build and test the executables for the CPU, along with configuring the Python libraries well enough to start executing a sample Jupyter notebook. When the notebook got to a cell with a shell script, however, Python crashed.

caffe 00 classification kernel crash InfoWorld

On MacOS Sierra, my installation of Caffe and Python was good enough to import Caffe into a Jupyter Notebook, but could not shell out to download a pretrained CaffeNet model without crashing.

I tried installing Caffe again on Windows 10, for which there is support in a new branch of the Caffe repository. The new CMake build process claimed it was working, but didn’t seem to create executables any place I could find; the older Visual Studio build process did work once I converted the projects from Visual Studio 2013 to Visual Studio 2015. Again, however, I had trouble with the Python library installations, and this time couldn’t even start the Jupyter notebooks.

Since Caffe’s “home” system is Ubuntu, I fired up an Ubuntu “Trusty” virtual machine and tried to build Caffe there based on the documentation. As before, sadly, I was able to build and test the executables but not run the Python notebooks.

That left me only two more options before going back to troubleshoot the failed installations: building and running in Docker, or running a preconfigured machine image in the cloud. Reading the Docker file in the repository made me think that the vague installation documentation might have been at fault for my three failed attempts. The Docker script installs items on Ubuntu in a different and perhaps more sensible order than called out by the documentation. On the other hand, I’d spent several days on Caffe installations that I should have been able to complete in less than an hour, and I’d had enough.

Running Caffe

As I mentioned earlier, Caffe has command-line, Python, and MATLAB interfaces. As I currently lack a copy of MATLAB, I did not try to test that interface.

The command-line executables and libraries compile and build in C++ either with or without GPU support. I built them for CPU-only, as the one Nvidia GPU I have that is powerful enough to use with Caffe is on a MacBook Pro that has the latest version of Xcode installed, and the latest CUDA SDK still requires an older version of Xcode. Using xcode-select to switch to the older Xcode version doesn’t help, at least on my machine.

Caffe relies on ProtoText files to define its models and solvers. For example, the figures below show the model and solver configurations for the reference “CaffeNet” image classifier.

caffe models caffenet train val InfoWorld

Caffe relies on ProtoText files to define its models and solvers. This ProtoText file defines the reference CaffeNet (modified AlexNet) convolutional model for classification of ImageNet images.

caffe models caffenet solver InfoWorld

This ProtoText file defines the reference CaffeNet (modified AlexNet) solver for the classification of ImageNet images. Note that setting GPU or CPU mode can be done in this file.

Caffe defines a network layer by layer in its own model schema. The network defines the entire model bottom to top from input data to loss. As data and derivatives flow through the network in the forward and backward passes, Caffe stores, communicates, and manipulates the information as blobs (binary large objects) that internally are N-dimensional arrays stored in a C-contiguous fashion (meaning the rows of the array are stored in contiguous blocks of memory, as in the C language). Blobs are to Caffe as tensors are to TensorFlow.

Layers perform operations on blobs, and they constitute the components of a Caffe model. Layers convolve filters, perform pooling, take inner products, apply nonlinearities such as rectified-linear and sigmoid and other element-wise transformations, normalize, load data, and compute losses such as softmax and hinge.

Once you’ve built PyCaffe, you can run Python scripts for Caffe and should also be able to run Jupyter notebooks. As mentioned above, I had some trouble running Jupyter notebooks, although I could view published notebooks in NBViewer, such as the one shown below.

caffe net surgery notebook InfoWorld

A precomputed Caffe Jupyter notebook displayed in NBViewer. This notebook explains doing “surgery” on Caffe networks using a cute kitten.

I felt a little better about my GPU issues once I read a thread in the Caffe support Google Group about someone else with a MacBook Pro who got Caffe to compile with GPU support but couldn’t actually run the CaffeNet reference model. The suggestion given was that the model takes more than the 1GB of GPU memory available on a GeForce GT 650M, and the original poster should consider training his model on Amazon EC2 using a g2.2xlarge or better instance and paying the hourly charges.

That the discussion took place at all was refreshing. In most cases, requests for help in that group (including my requests) are greeted with stony silence. Lack of support, frankly, is a bad sign about the health of the Caffe project and community.

AWS not only has instances available with GPUs, but also offers AMI images that have Caffe prebuilt with GPU support. Amazon is no longer the only cloud with such support. Also consider the Azure Batch Shipyard and its deep learning recipes using the NC series of instances. GPUs in the Google cloud should roll out soon.

Caffe tutorials and models

The official introductory Caffe tutorial is the slide deck I mentioned earlier. In addition, a tutorial page on the Caffe site includes references (in the Tour section) to more detailed material.

The slide deck also contains many references -- for example, to the Model Zoo, the image classification demo, and various model repositories. Of immediate interest to Caffe newbies are the Caffe example Jupyter notebooks, which are worth a read whether or not you can actually run them locally. Not surprising, these are almost all about some aspect of image processing.

As we’ve seen, Caffe is a deep learning framework that grew out of vision and learning research at Berkeley and still emphasizes image processing despite having broadened its scope somewhat. The Caffe developers make attractive claims about the project’s maturity, portability, and speed compared to other deep learning frameworks. After working with Caffe myself and delving into the community, I suggest taking those claims with a big grain of salt.

If an existing Caffe model fits your needs or could be fine-tuned to your purposes, it might be worthwhile to pursue it. Otherwise, I recommend using TensorFlow, MXNet, or CNTK instead.

---

Cost: Free open source. Platforms: Linux, MacOS, Windows, Docker. 

InfoWorld Scorecard
Models and algorithms (25%)
Ease of development (25%)
Documentation (20%)
Performance (20%)
Ease of deployment (10%)
Overall Score (100%)
Caffe 1.0 RC3 8 8 7 9 8 8.0
At a Glance
  • Caffe is a free open source deep learning framework that emphasizes image processing.

    Pros

    • Strong convolutional networks for image recognition
    • Good support for CUDA GPUs
    • Straightforward network description format
    • Fairly portable

    Cons

    • Models often need substantial (>1GB) amounts of GPU memory
    • Not quite baked
    • Documentation is problematic and support is hard to obtain

Copyright © 2017 IDG Communications, Inc.

How to choose a low-code development platform