Beating Moore’s Law: Scaling Performance for Another Half-Century

istock 692429144

As his 90th birthday approaches, one wonders how Intel co-founder Gordon Moore considers that his famous ‘law’, first postulated in 1965, still pretty much holds true. As revised in the ‘70s, it forecast that the number of transistors in an integrated circuit (read CPU) would double every 18 months, while the cost halved over the same period. That’s why the processing power on a single Xeon CPU today is roughly equivalent to all the mainframes that existed in the world in the early 1960s, which would have occupied a space roughly equivalent to several floors of the Empire State building.

But, there’s a rub. Many experts – physicists and engineers – are now saying we’re about out of room for Moore’s law to continue as in years past. The problem is physics: It is increasingly difficult to continue to shrink IC dies to continue the growth in the number of transistors; the physical limitations of etching dies is now reaching atomic limits that just can’t be exceeded with our current knowledge of the universe. Sad.

So, what’s the answer to continue giving our users the horsepower that the ever-growing app explosion demands? It’s simple: Don’t scale out, scale up.

That need is demonstrated by modern CPUs, which sport a dozen or more cores on a single piece of silicon. For example, Xeon CPUs have up to 18 cores, each one waiting to execute a code segment from the operating system, hypervisor, or application that will deliver the productivity that IT and users have grown to expect over the past five decades. That CPU power is multiplied in two other ways – servers with multiple CPUs, and clustering that connects multiple multi-core, multi-CPU servers to serve the increasing compute demands of machine learning, data analytics, and the IoT onslaught.

Now, we can exceed Moore’s law from a compute standpoint by deploying parallel computing clusters – if we know how to take advantage of all those cores. And the most effective way to do so is by developing code that is both threaded and vectorized. Fortunately for the development community, Intel is furthering their own cause by offering Intel® Parallel Studio XE a set of industry leading tools such as compilers, performance libraries, profilers, debuggers and a performance Python Distribution, that provide parallel processing support for C++, FORTRAN, and Python developers.  A comprehensive toolset for parallelization and more, completely free for fulltime students and educators. Intel Parallel Studio XE includes the Intel® Math Kernel Library (MKL), which accelerates math functions for several compilers,  the Intel® Data Analytics Acceleration Library (DAAL) which helps data scientists both train and tune their models faster and analyze larger data sets, and Intel® Threading Building Blocks (TBB), which lets programmers worry about tasks rather than manipulating threads. Intel® TBB also supports nested parallelism and does load balancing, all resulting in faster execution that scales across an increasing number of cores and processors.

Of course, writing parallelized and threaded code is just the start, so Intel’s Parallel Studio XE also includes a set of performance profilers, a memory and thread debugger, and design tools that simplify both coding and deployment of newly parallelized code, including the  Intel® Vtune™ Amplifier for profiling, Advisor for vectorizing and threading, and Inspector, which locates the root cause of memory and threading errors before code is published. The total package is a comprehensive set of tools that lets developers exceed Moore’s law by accelerating code performance way beyond a single piece of silicon.

Click here for a free trial of Intel® Parallel Studio XE for yourself!