Python: High Performance or Not? You Might Be Surprised

istock 518147487

The concept of an “accelerated Python” is relatively new, and it’s made Python worth another look for Big Data and High Performance Computing (HPC) applications.

Thanks to some Python aficionados at Intel, who have utilized the well-known Intel Math Kernel Library (MKL) under the covers, we can all use an accelerated Python that yields big returns for Python performance without requiring that we change our Python code!

Sure, Python is amazing. But Python is relatively slow because it’s an interpreted (not a compiled) language.  We can learn and explore interactively—including doing a “Hello, World!” program interactively:

 % python

>>> print("Hello, World.")

Hello, World.

>>> import matplotlib.pyplot as myplt

>>> myplt.plot([3,14,15,92])

>>> myplt.ylabel('hello numbers')

>>> myplt.show()

python reinders

Why It Works

The reason an “accelerated Python” can be so effective comes from a combination of three factors:

  • Python has mature and widely used packages and libraries
  • Computation in Big Data and HPC applications is focused in small parts of the code
  • MKL is fast

Python has mature and widely used packages and libraries: These libraries can be accelerated, without needing to change our Python code at all.  All we have to do is install an accelerated Python. Under the covers, Intel has accelerated NumPy, SciPy, pandas, scikit-learn, Jupyter, matplotlib, and mpi4py. NumPy is a library of routines for operating on N-dimensional arrays. SciPy is a library of fundamental routines for scientific computing, including numerical integration and optimization. Libraries pandas, scikit-learn, and Jupyter provide key routines for Big Data and Machine Learning.  Library matplotlib provides data plotting, and mpi4py provides MPI usage.

Computation in Big Data and HPC applications is focused in small parts of the code: Big Data and High Performance Computing (HPC) generally focus most “work” in a few key algorithms, which have been widely studied and supported in libraries – notably NumPy, SciPy, pandas, scikit-learn, Jupyter, matplotlib, and mpi4py.

MKL is fast: Intel’s Math Kernel Library is highly tuned for math, and perfect to accelerate NumPy, SciPy, and other libraries that are already used by many Python applications. Additional capabilities for acceleration come from the Intel Data Analytics Acceleration Library (DAAL) and Intel Threading Building Blocks (TBB).

Eat Our Cake and Have It Too

Python is a simple language with a straightforward syntax. It’s known for its expressiveness, easy-to-read syntax, large community of users, and an impressive range of libraries. It encourages innovative and incremental programming, which makes it a natural for the sort of trailblazing that new work entails. Data scientists seeking to squeeze information from Big Data have found Python a perfect fit as a result.  Now we can have high performance too.

When thinking of the user-friendly nature of Python, we have to ask, “Does user friendly always mean slow?” It turns out—because we most often focus our heavy computations into forms (like matrix algebra) that use high-speed libraries (compiled code that our Python code utilizes)—we can have the best of both worlds: easy to use and fast.  Acceleration is a big step forward, and it’s automatic when we install the accelerated distributions from Intel or the Anaconda Cloud.  Look for big speedups in scikit-learn and basic operations in NumPy already, and if you stay current with the updates, expect the opportunity for very large speedups (10X or more) for NumPy universal functions (elementwise operations), more in scikit-learn, big speedups (10X or more) for FFT, optimized memory operations for NumPy, caffe, and theano deep learning packages by March 2017.  There will be even more as time goes on.

Free and Easy Downloads

You can learn more about, and download, the Intel distribution for Python at https://software.intel.com/intel-distribution-for-python.  It’s free (but not completely open source), and it has gained considerable popularity with Python users because of its speed.  The Intel packages for accelerating Python performance are also available on the Anaconda Cloud, where the unique packages in the Intel channel on Anaconda Cloud are: distarray, tbb, pydaal (see https://www.continuum.io/sites/default/files/AnacondaIntelFAQFINAL.pdf).

 Click here to download free trial software