What comes after “Big Data”? I’d say “Faster Big Data.” And it’s going to be a game changer well beyond what Big Data has done so far.
Fast and efficient Big Data applications will change our lives. Some of them will drive our cars, move packages from warehouses to us, speed drugs from theory to reality, even predict crime. The applications seem limitless, and high performance will be revolutionary.
Intel is helping this trend along with its acceleration library, called DAAL (Data Analytics Acceleration Library), for speeding up Big Data problems.
When I code, I like to use the highest-performance libraries available to give me an edge. That’s why DAAL is really worth knowing. The DAAL open source project started by Intel, and the tightly related Intel DAAL product, include support for Hadoop, Spark, R, and Matlab, with language bindings for C++, Java, and Python.
DAAL handles BIG data much better than in-core libraries can
If you know about Intel’s Math Kernel Library (MKL), you might immediately wonder “why DAAL”? Data scientists have been using MKL for Big Data problems for some time; MKL is well respected by high performance programmers, and it’s the math library gold standard in accuracy and performance for x86.
However, most of Intel MKL was designed assuming that the data to be operated upon fits in memory all at once. Intel DAAL is designed to handle those situations where the data is too big to fit in memory all at once by dealing with it in chunks. This is where DAAL offers significant advantages over in-core routines like those that make up most of MKL. Specifically, DAAL is set to handle streaming data and MapReduce-style processing.
DAAL includes routines for various data analytics stages, including data preparation, data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, decision making, and machine learning.
Intel helps give DAAL a jump on new and upcoming processors
Intel DAAL offers strong support for Intel architectures including the Xeon, Core, Atom, and Xeon Phi processors. It’s updated regularly and is premade to take advantage of next-generation processors even before they’re available (and that helps our software be ready for them when they arrive instead of always being “behind”). Each time we build, if we link to the current version of DAAL, then our code is ready for today and tomorrow when the next new processors hit the market.
Free downloads, open source, more to read
For more information, and free downloads, be sure to visit the Intel DAAL productweb site at https://software.intel.com/intel-daal or the DAAL open source repository https://01.org/daal. Intel DAAL has really grown in the past few years, and there are additional updates in the works for 2017. Some additional interesting pieces on using DAAL are here:
- DAAL Programming Guide
- Improving Support Vector Machine with Intel Data Analytics Acceleration Library
- Video: Using Intel Data Analytics Acceleration Library (DAAL) for Regression Training
- Intel Data Analytics Acceleration Library (Intel® DAAL): How to Add User-Defined Algorithm
I know several of the engineers working on DAAL, and they value our feedback – so try it out, and let them know how it goes. I’ve also talked with a number of DAAL users who told Intel about specific optimized analytics algorithms they wished were in DAAL, and Intel in turn added them in 2016 as a result. Maybe you can suggest what 2017 should bring!