Adding more processing cores has emerged as the primary way of boosting performance of server and PC chips, but the benefits will be greatly diminished if the industry can't overcome certain hardware and programming challenges, participants at the Multicore Expo in Santa Clara, California, said this week.
Most software today is still written for single-core chips and will need to be rewritten or updated to take advantage of the increasing number of cores that Intel, Sun Microsystems and other chip makers are adding to their products, said Linley Gwennap, president and principal analyst at The Linley Group.
Off-the-shelf applications will often run faster on CPUs with up to four processor cores, but beyond that performance levels off and may even deteriorate as more cores are added, he said. A recent report from Gartner also highlighted the problem.
Chip makers and system builders have begun efforts to educate developers and provide them with better tools for multicore programming. A year ago, Intel and Microsoft said they would invest $20 million to open two research centers at U.S. universities devoted to tackling the problem. The lack of multicore programming tools for mainstream developers is perhaps the biggest challenge the industry faces today, Gwennap said.
Writing applications in a way that lets different parts of a computing task, such as solving a math problem or rendering an image, be divided up and executed simultaneously across multiple cores is not new. But this model, often called parallel computing, has been limited so far mainly to specialized, high-performance computing environments.
But in recent years, Intel and AMD have been adding cores as a more power-efficient way to boost chip performance, a marked change from their traditional practice of increasing clock speed. Intel is building eight cores into its upcoming Nehalem-EX chips, and AMD is designing 12-core chips for servers. They are also adding multi-threading capabilities, which allow each of the cores to work on multiple lines of code at the same time.
That means mainstream applications have to be written in a different way to take advantage of the additional cores available. The work is hard to do and creates the potential for new types for software bugs. One of the most common is "race conditions," where the output of a calculation depends on the various elements of a task being completed in a certain order. If they are not, errors can result.
A few parallel programming tools are available, such as Intel's Parallel Studio for C and C++. Other vendors in the space are Codeplay, Polycore Software and Clik Arts. There is also a new C-based parallel programming model called OpenCL, being developed by The Khronos Group and backed by Apple, Intel, AMD, Nvidia and others.
But many of the tools available are still works in progress, participants at the Multicore Expo said. Software compilers need to be able to identify code that can be parallelized, and then do the job of parallelizing it without manual intervention from programmers, said Shay Gal-on, director of software engineering at EEMBC, a nonprofit organization that develops benchmarks for embedded chips.
Despite the lack of tools, some software vendors have found it relatively easy to create parallel code for simple computing jobs, like image and video processing, Gwennapp said. Adobe has rewritten Photoshop in a way that can assign duties like magnification and image filtering to specific x86 cores, improving performance by three to four times, he said.
"If you are doing video or graphics, you can take different sets of pixels and assign them to different CPUs. You can get a lot of parallelism that way," he said. But for more complex tasks, it is difficult to find a single approach for identifying a sequence of computations that can be parallelized and then dividing them up.
While the programming side may present the biggest challenge, there are also hardware changes that need to be made, to overcome issues such as memory latency and slow bus speeds. "As you add more and more CPUs on the chip, you need the memory bandwidth to back it up," Gwennap said.
Sharing a single memory cache or data bus among multiple cores can create a bottleneck, meaning the extra cores will be largely wasted. "By the time you get to six or eight CPUs, they spend all their time talking to each other and not moving forward to getting any work done," he said.
The onus may ultimately lie with developers to bridge the gap between hardware and software to write better parallel programs. Many coders are not up to speed on the latest developments in hardware design, Gal-on said. They should open up data sheets and study chip architectures to understand how their code can perform better, he said.