But many of the tools available are still works in progress, participants at the Multicore Expo said. Software compilers need to be able to identify code that can be parallelized, and then do the job of parallelizing it without manual intervention from programmers, said Shay Gal-on, director of software engineering at EEMBC, a nonprofit organization that develops benchmarks for embedded chips.
Despite the lack of tools, some software vendors have found it relatively easy to create parallel code for simple computing jobs, like image and video processing, Gwennapp said. Adobe has rewritten Photoshop in a way that can assign duties like magnification and image filtering to specific x86 cores, improving performance by three to four times, he said.
"If you are doing video or graphics, you can take different sets of pixels and assign them to different CPUs. You can get a lot of parallelism that way," he said. But for more complex tasks, it is difficult to find a single approach for identifying a sequence of computations that can be parallelized and then dividing them up.
While the programming side may present the biggest challenge, there are also hardware changes that need to be made, to overcome issues such as memory latency and slow bus speeds. "As you add more and more CPUs on the chip, you need the memory bandwidth to back it up," Gwennap said.
Sharing a single memory cache or data bus among multiple cores can create a bottleneck, meaning the extra cores will be largely wasted. "By the time you get to six or eight CPUs, they spend all their time talking to each other and not moving forward to getting any work done," he said.
The onus may ultimately lie with developers to bridge the gap between hardware and software to write better parallel programs. Many coders are not up to speed on the latest developments in hardware design, Gal-on said. They should open up data sheets and study chip architectures to understand how their code can perform better, he said.