sponsored

FPGAs and OpenCL: What’s Up?

They really do deliver a customizable hardware accelerator for everyone

istock 629286010
monsitj

Everywhere I turn these days, someone is touting Field Programmable Gate Arrays (FPGAs) and programming them with OpenCL. I hear it from the two leaders in the field, Intel/Altera and Xilinx. If both leaders are in agreement, I felt I had to dive in to understand what all the hype is about. 

Here’s what I learned:

OpenCL is really C (or C++, if you prefer) with two things added:

  • APIs (OpenCL library functions) for the host to call (find FPGA, download kernel to FPGA, copy data to/from FPGA memory, start kernel, stop kernel, etc.)
  • Syntactic magic around functions (kernels) that we want to run on the FPGA.

An FPGA provides a reconfigurable sea of gates on which anyone can design their own custom hardware accelerator, deploy it for a single application, and then quickly reconfigure the device as a new accelerator for a different application. I have verified to my own satisfaction that new tools centered on OpenCL really do bring the benefits of FPGA hardware platforms to software developers. We aren’t limited to the data types (or operations) that are hard baked into CPU or GPU designs. FPGAs are much more versatile to match my application needs.

If you know C, OpenCL is no big deal to learn

OpenCL isn’t really a new programming language, just as CUDA and OpenMP aren’t really new programming languages. They’re extensions to an existing and familiar language with a purpose. In the case of OpenCL for FPGAs, the purpose is to use the FPGA as an accelerator of specific functions in our program. OpenCL (and CUDA) programmers call these kernels. For efficiency, kernels can be configured to pass data directly to each other to avoid the inefficiencies of copying data to/from the host (CPU) for intermediate steps in a calculation.

High performance? It is!

I’ve actually been doing a fair amount of programming with OpenCL for both Intel/Altera and Xilinx FPGAs lately. For computational work, I’ve used the Intel FPGA SDK for OpenCL on my own system and in the cloud on Nimbix for Intel/Altera, and the SDaccel environment (with OpenCL)  targeting the Amazon Web Services F1 instances for Xilinx.

I say “computational work” because I’ve also been playing around with small FPGA boards from Lattice, Intel, and Xilinx. These have blinking lights and many connectors ready to interface to the real world. While some of these boards do support being programmed by OpenCL, none of these (sub-$300 boards) are really interesting for high-performance algorithm acceleration. For that, you need $5,000-$11,000 for an FPGA accelerator card (hence my advice below to use the cloud for learning).

I can say without a doubt that OpenCL is ready for prime time. I found the programming to be straightforward and the performance quite acceptable. The biggest annoyance is that the time it takes to “compile” (FPGA programmers say: “synthesize, place, and route”) is huge. A simple vector add kernel (FPGA computational Hello, World! Program) took close to three hours to compile on various machines. To be fair, more complex programs can take similar amounts of time because the compiler can do things in parallel. A very complex FPGA code can take days to build. With the latest Intel tools, it appears that my three-hour build is well under an hour now, thanks to the new “fast compile” option. Intel says the fast compile option will typically hurt performance only by 10-20% but offer compile times that are 20-40% of the regular compile times.

Intel vs. Xilinx

Each alternative (Intel vs. Xilinx) had its pros and cons, but I ended up preferring the Intel tools and running Intel in the cloud (on Nimbix) for two reasons:

  • Unlike SDaccel on AWS, the same tools for Intel/Altera can be used freely on my own system as well as for a nominal charge in the cloud. The tools on AWS are locked to produce FPGA bitstreams (binaries) that run only on AWS. This complicates developing and debugging code on my own unless I pay to do it on AWS (which is not very expensive—about 10 cents/hour—so maybe this isn’t such a problem until you factor in my second issue).
  • Unlike AWS, Nimbix charges by the second instead of rounding up to the next hour. Of course, I can build on my own machine for free with Intel. But when it comes time to run my FPGA program, it’s nice to pay only for the few seconds it takes. (I have logged on, run my code, and logged off with a charge for only one or two minutes. AWS would have charged me for 60 minutes.)

When you consider that FPGA time starts at $3.00/hour on Nimbix, or $1.65-$13.20/hour on AWS, the difference in each run costing me 2 minutes vs. 60 minutes of time is huge. Since a “recompile” can take several hours for an FPGA program, I don’t usually get the chance to run my FPGA code multiple times in an hour.

Getting started

The Intel/Altera OpenCL website is a great place to start for tutorials, downloading the latest SDK (for free), and sample code.  I would recommend compiling and running code without an FPGA while learning OpenCL. The “-march=emulator” switch on the Intel compiler makes the code run on the CPU fast enough and easy to debug. Only after I have my program debugged do I compile for the FPGA. In that case, Nimbix is worth a look for getting time to run on an actual FPGA. I will write about my experiences with running FPGA code in the cloud on Nimbix in an article that will appear in Parallel Universe Magazine Issue 31 with enough details to replicate my experience easily. I was able to compile and run the FPGA vector add example, along with stumbling around and learning, for under $5. I like that. You can get started for free today with the Intel/Altera OpenCL website, and then try running FPGA code in the cloud next.

Resources:

Boost performance by augmenting your development process with the Intel® C Compiler and Intel® C++ Compiler.

Related: