Easier, faster: The next steps for deep learning

Rapidly advancing software frameworks, dedicated silicon, Spark integrations, and higher level APIs aim to put deep learning within reach

Easier, faster: The next steps for deep learning
Credit: Activedia via pixabay

If there is one subset of machine learning that spurs the most excitement, that seems most like the intelligence in artificial intelligence, it’s deep learning. Deep learning frameworks—aka deep neural networks—power complex pattern-recognition systems that provide everything from automated language translation to image identification.

Deep learning holds enormous promise for analyzing unstructured data. There are just three problems: It’s hard to do, it requires large amounts of data, and it uses lots of processing power. Naturally, great minds are at work to overcome these challenges.  

What’s now brewing in this space isn’t just a clash of supremacy between competing deep learning frameworks, such as Google’s TensorFlow versus projects like Baidu’s Paddle. Rivalry between multiple software frameworks is a given in most any part of IT.

The newest part of the story is about hardware versus software. Will the next big advances in deep learning come by way of dedicated hardware designed for training models and serving predictions? Or will better, smarter, and more efficient algorithms put that power into many more hands without the need for a hardware assist? Finally, will deep learning become accessible to the rest of us, or will we always need computer science PhDs to put this technology to work?

Microsoft Cognitive Toolkit: More tension with TensorFlow

Any time a major technology comes along to show the world a better way, you can count on the biggest names in tech to try to seize a slice of the pie. It happened with NoSQL, with Hadoop, and with Spark, and now it’s happening with deep learning frameworks. Google’s TensorFlow has been promoted as a powerful, general solution, but also as a way to tie deep learning apps to Google’s cloud and to Google’s proprietary hardware acceleration.

Leave it to Microsoft to assume the role of rival. Its push back against Google on the deep learning front comes in the form of the Cognitive Toolkit, or CNTK for short. The 2.0 revision of CNTK challenges TensorFlow on multiple fronts. CNTK now provides a Java API, allowing more direct integration with the likes of the Spark processing framework, and supports code written for the popular neural network library Keras, which is essentially a front end for TensorFlow. Thus Keras users may transition gracefully away from Google’s solution and towards Microsoft’s.

But Microsoft’s most direct and meaningful challenge to TensorFlow was making CNTK faster and more accurate, and providing Python APIs that expose both low-level and high-level functionality. Microsoft even went so far as to draw up a list of reasons to switch from TensorFlow to CNTK, with those benefits at the top.

Speed and accuracy aren’t just bragging points. If Microsoft’s system is faster than TensorFlow by default, it means people have more options than just to throw more hardware at the problem—e.g., hardware acceleration of TensorFlow, via Google’s custom (and proprietary) TPU processors. It also means third-party projects that interface with both TensorFlow and CNTK, such as Spark, will gain a boost. TensorFlow and Spark already work together, courtesy of Yahoo, but if CNTK and Spark offer more payoff for less work, CNTK becomes an appealing option in all of those places that Spark has already conquered.

Graphcore and Wave Computing: The hardware’s the thing

One of the downsides to Google’s TPUs is that they’re only available in the Google cloud. For those already invested in GCP, that might not be an issue—but for everyone else, and there’s a lot of “everyone else,” it’s a potential blocker. Dedicated silicon for deep learning, such as general purpose GPUs from Nvidia, are available with fewer strings attached.

Several companies have recently unveiled specialized silicon that outperforms GPUs for deep learning applications. Startup Graphcore has a deep learning processor, a specialized piece of silicon designed to process the graph data used in neural networks. The challenge, according to the company, is to create hardware optimized to run networks that recur or feed into each other and into other networks.

One of the ways Graphcore has sped things up is by keeping the model for the network as close to the silicon as possible, and avoiding round trips to external memory. Avoiding data movement whenever possible is a common approach to speeding up machine learning, but Graphcore is taking that approach to another level.

Wave Computing is another startup offering special-purpose hardware for deep learning. Like Graphcore, the company believes GPUs can be pushed only so far for such applications before their inherent limitations reveal themselves. Wave Computing’s plan is to build “dataflow appliances,” rackmount systems using custom silicon that can deliver 2.9 petaops of compute (note that’s “petaops” for fixed-point operations, not “petaflops” for floating-point operations). Such speeds are orders of magnitude beyond the 92 teraops provided by Google’s TPU.

Claims like that will need independent benchmarks to bear them out, and it isn’t yet clear if the price-per-petaop will be competitive with other solutions. But Wave is ensuring that price aside, prospective users will be well supported. TensorFlow support is to be the first framework supported by the product, with CNTK, Amazon’s MXNet and others to follow thereafter.

Brodmann17: Less model, more speed

Whereas Graphcore and Wave Computing are out to one-up TPUs with better hardware, other third parties are out to demonstrate how better frameworks and better algorithms can deliver more powerful machine learning. Some are addressing environments that lack ready access to gobs of processing power, such as smartphones.

Google has made some noises about optimizing TensorFlow to work well on mobile devices. A startup named Brodmann17 is also looking at ways to deliver deep learning applications on smartphone-grade hardware using “5% of the resources (compute, memory, and training data)” of other solutions.

The company’s approach, according to CEO and co-founder Adi Pinhas, is to take existing, standard neural network modules, and use them to create a much smaller model. Pinhas said the smaller models amount to “less than 10% of the data for the training, compared to other popular deep learning architectures,” but with around the same amount of time needed for the training. The end result is a slight trade-off of accuracy for speed—faster prediction time, but also lower power consumption and less memory needed.

Don’t expect to see any of this delivered as an open source offering, at least not at first. Brodmann17’s business model is to provide an API for cloud solutions and an SDK for local computing. That said, Pinhas did say “We hope to widen our offering in the future,” so commercial-only offerings may well just be the initial step.

Sparking a new fire

Earlier this year, InfoWorld contributor James Kobielus predicted the rise of native support for Spark among deep learning frameworks. Yahoo has already brought TensorFlow to Spark, as described above, but Spark’s main commercial provider, Databricks, is now offering its own open source package to integrate deep learning frameworks with Spark.

Deep Learning Pipelines, as the project is called, approaches the integration of deep learning and Spark from the perspective of Spark’s own ML Pipelines. Spark workflows can call into libraries like TensorFlow and Keras (and, presumably, CNTK as well now). Models for those frameworks can be trained at scale in the same way Spark does other things at scale, and by way of Spark’s own metaphors for handling both data and deep learning models.

Many data wranglers are already familiar with Spark and working with it. To put deep learning in their hands, Databricks is allowing them to start where they already are, rather than having to figure out TensorFlow on its own.

Deep learning for all?

A common thread through many of these announcements and initiatives is how they are meant to, as Databricks put it in its own press release, “democratize artificial intelligence and data science.” Microsoft’s own line about CNTK 2.0 is that it is “part of Microsoft’s broader initiative to make AI technology accessible to everyone, everywhere.”

The inherent complexity of deep learning isn’t the only hurdle to be overcome. The entire workflow for deep learning remains an ad-hoc creation. There is a vacuum to be filled, and the commercial outfits behind all of the platforms, frameworks, and clouds are vying to fill it with something that resembles an end-to-end solution. 

The next vital step won’t just be about finding the one true deep learning framework. From the look of it, there is room for plenty of them. It will be about finding a single consistent workflow that many deep learning frameworks can be a part of—wherever they may run, and whoever may be behind them.