Intel’s ‘2018 Model Year’ Developer Tools Are Now Available

istock 492201907

When I want the best performance on x86/x86-64 processors while using C, C++, Fortran, or Python, Intel tools have no equal. Intel tools can also help with C# and Java optimization. Starting this week, we can now kick the tires of Intel’s “2018 Model Year” developer tools known as Intel Parallel Studio XE.

Intel Parallel Studio XE is Intel’s flagship product for software development, debugging, and tuning on x86/x86-64 processors for HPC, enterprise, and cloud computing. It’s a comprehensive tool suite that contains compilers, libraries, and debugging and performance tuning analysis tools to help developers create high performance, scalable, reliable parallel code—faster.

The full suite of tools is offered to commercial users at a reasonable price, but many of the tools are available for free. (The high-performance libraries are available freely with very liberal redistribution rights.) To coincide with their release, Intel is kicking off a series of webinars to highlight tuning for performance.

 In this article I’ll cover:

  • new features in the 2018 lineup
  • keeping up with key hardware and new standards
  • more than compilers and libraries: tuning, debugging, and advice
  • how to evaluate the tools for free
  • how to get many of the tools for free (some restrictions apply)
  • where to learn more – webinars, web links, etc.

A personal note: In my experience, Intel tools are the tools of choice when seeking top performance. I’ve contributed to eight books about achieving top performance, and the vast majority of the work in those books was done using Intel tools (including the dozens of chapters in our High Performance Parallelism Pearls books).

New features in the 2018 lineup

Intel tools have long stood for the highest performance for C, C++, and Fortran. A relative newcomer is high performance for Python. Intel’s tools include an accelerated Python distribution, which is available combined into all Intel’s tool suites, or separately for free via conda, apt-get, or yum.

Obtaining top performance isn’t just about coding, it’s about tuning. Once I’ve coded up my application, I need all the help I can get understanding what is actually going on when I run it. There are two real gems here for helping me:

  • Intel® Advisor’s roofline analysis helps me know what it will take to make my program run faster (tuning, rewriting, or fast hardware – and why). The Intel Advisor roofline analysis also helps me find high-impact opportunities for optimization in under-optimized loops, understand cache effects, look at instruction mixes, and much more.
  • Intel® VTune™ Amplifier’s performance snapshots – a growing collection of snapshots helps me see what is going on from different perspectives. I wrote about these back in February, and the collection in the 2018 model year has grown in depth and breadth. (Look for “snapshot” throughout Intel’s descriptions of its tools.) The 2018 version of Application Performance Snapshot further unifies analysis of MPI (Intel, MPICH, or Cray versions) with application data enabling the exploration of richer metrics to understand computation efficiency: MPI and OpenMP parallelism, Memory Stalls, FPU utilization, and I/O efficiency with recommendations on further in-depth analysis.

Keeping up with key hardware and new standards

Of course, support for new hardware is key to making sure our applications continue to perform well. The 2018 tool updates include support for the latest processors, including features such as vectorization tuned for Intel AVX-512 instructions. These 512-bit-wide vector instructions debuted with Intel Xeon Phi processors last year, and were given expanded capabilities in new Intel Xeon processors this year. Intel tools support both, as well as automatic abilities to build code that auto-selects for best performance based on where our application is run.

Intel also continues to be aggressive about staying up-to-date with the latest standards and IDEs, including full support for C++14, support for the initial C++17 draft, full support for Fortran 2008, and initial Fortran 2015 draft language support. In addition, Intel supports the initial OpenMP 5.0 draft. For Windows developers, Intel features Microsoft Visual Studio 2017 integration.

More than compilers and libraries: tuning, debugging, and advice

The ultimate in performance tuning tools is the Intel VTune Amplifier. It can do performance analysis on any program (C, C++, Fortran, Java, C#, etc. – it doesn’t care what language it’s in).  Now VTune has added support for tuning code inside Docker and Mesos containers – a nice little addition for 2018.

Eliminating deadlocks and data races in a parallel program is a thankless task. This is made much easier with the advanced memory and threading debugger known as the Intel Inspector. This tool really has no equal.

I also encourage people to try out Intel Advisor. This is a truly unique tool for giving advice based on an application’s actual characteristics. (It’s become even better with the roofline analysis that I mentioned previously.) This summer, I advised a friend working in machine learning to take a look at Intel Advisor. Before I knew it, he told me that he had downloaded it on his own, followed its advice, and found some very effective optimizations (parallelization) as a result. He was very happy.

How to evaluate the tools for free

Free trial copies of everything are available from the Intel Parallel Studio web page.

How to get many of the tools for free (some restrictions apply)

Students, educators, and open source developers can access many of the tools for free – details are on the Qualify for Free Tools web page.

Everyone can get Intel’s highest-performance libraries—MKL, DAAL, IPP, TBB, and MPI—for free. I use all five of these for my x86/x86-64 development work – and they truly do let me create much faster software applications. Download links exist on the Intel Performance Libraries web page. (Intel® MKL, DAAL, IPP, and TBB may also be installed via yum, apt-get, and conda repositories for free.) Now Intel® Performance Libraries (except for Intel® MPI) are available with a Simplified End-User License that allows for broader redistribution rights.

  • MKL: Math Kernel Library is the highly optimized library for all things mathematical – including BLAS, FFTs, and solvers. It’s the key library for fast, mathematically intensive applications.
  • DAAL: Data Analytics Acceleration Library is an open source and highly optimized version of key routines for accelerating data analytics. This is a key component in the fastest versions of Caffe, Tensorflow, Torch, and Theano.
  • IPP: Integrated Performance Primitives Library is an impressive array of accelerated routines for image processing, signal processing, data compression/decompression, cryptography, and other data processing needs.
  • TBB: Threading Building Blocks is the most popular template library for multithreading in C++. TBB quickly rose to popularity a decade ago, and continues to dominate the C++ development scene by offering an open source solution to parallel algorithms and parallel flow graph programming. TBB includes a scalable memory allocator – an absolute must for parallel programming.
  • MPI: Message Passing Interface Library is Intel’s high-performance version of this industry standard library, which forms the backbone of any high-performance application spanning multiple “nodes,” such as those found in clusters, servers, and supercomputers.

Where to learn more – webinars, web links, etc.

Click here to download your free 30-day trial of Intel Parallel Studio XE