A Quick Introduction to ‘daal4py’ for Data Scientists

istock 970317224

Accelerating scikit-learn with Intel’s accelerated Python requires absolutely no code changes, thereby giving us a nearly effortless way to enhance performance. However, scikit-learn is designed for machine learning operations on in-memory homogeneous data. Fortunately, there is good news for extending beyond those limitations: daal4py. Think of it as “scikit-learn meets MPI (Message Passing Interface)” without requiring us to actually program in MPI. We get the benefits of MPI, and our programs get higher performance by utilizing parallelism across multiple nodes of CPUs.

You might want to read my previous piece about accelerating Python in “How Does a 20X Speed-Up in Python Grab You?” Although I wrote it a couple of years ago, it’s still valid today. (And the efforts to accelerate Python have only gotten better in the meantime.)

When the need to handle non-homogeneous streams or distributed data arises, we can turn to daal4py, which is as simple as scikit-learn. Its MPI-based engine under the hood allows scaling machine learning algorithms to achieve cluster-level performance with a few simple calls added into code.

For data scientists working on large problem sizes and using machine learning frameworks, it’s a great idea to try daal4py — it makes machine learning algorithms lightning fast and is performance accelerated. It’s also open source and free to use.

Data distributed across multiple nodes in a cluster offers the ability to tackle larger data sets and enjoy the benefits of increased performance from higher degrees of parallelism. Increased performance offers us the opportunity to process more data, and to increase model accuracy through faster and more frequent deployments.

daal4py is an open source effort from Intel that supports Python on Linux, macOS, and Windows, and it promises to accelerate Python data science toolchains with no new tools to learn and only minimal code changes.

Hidden within the design of daal4py are several different technologies that deliver performance in a flexible design to data scientists and framework designers. It automatically uses Jinja templates to generate Cython-wrapped DAAL C++ headers, with Cython as a bridge between the generated DAAL code and the Python layer. This design allows for quicker development cycles and acts as a reference design if we want to tailor our build of daal4py. Cython also allows for good Python behavior, both for compatibility to different frameworks and for pickling and serialization. All this comes from simply using the scikit-learn-like interfaces of daal4py.

Getting Started

daal4py is easy to build from source with most prerequisites available on conda. The instructions below detail how to gather the prerequisites, setting one’s build environment, and finally building and installing the completed package. The daal4py project’s github website has step-by-step instructions for setting up.

Seeing Higher Performance

daal4py shines on large problem sizes, so you should find substantial work you want to get done. Also, the folks at Intel have a step-by-step tutorial, which is an excellent place to start to get a hands-on feel for the power of daal4py.

 Useful Links for More Information

·       Download Intel® Distribution for Python* (several options: Intel direct, Conda, Pip, Docker, AMI)

·       daal4py home page – for downloads, instructions, etc.

·       daal4py documentation showing the many supported algorithms

·       Step-by-step set-up instructions – a great place to follow to get daal4py on your machine

·       Step-by-step tutorial using daal4py – a great place to get first “hands-on” experience