InfoWorld review: Intel's Westmere struts its stuff
Fast AES encryption, better scalability, and consistent per-core performance make the new six-core Xeon a worthy successor to NehalemFollow @pvenezia
Also in the realm of reducing power consumption, the Westmere CPUs can use low-voltage DDR3 RAM running at 1.35 volts as well as standard DDR3 1.5-volt DIMMs. In addition to the relatively small reduction in power draw, low-voltage DIMMs generate less heat, thereby reducing overall cooling requirements, which is especially significant in servers and blades with high RAM counts.
I had the opportunity to run a series of benchmarks on two sets of Westmere chips, the X5670s and X5680s. Both six-cores, the X5670s run 2.93GHz per core, while the X5680s run 3.33GHz per core. The tests were my standard array of real-world workloads rather than mainline benchmarking tools. They are composed of LAME MP3 audio conversion tests, gzip and bzip2 compression tests, MD5 calculation tests, and MP4-to-FLV video conversion tests. Each of these tests is a single-threaded process, but they are run concurrently at increasing levels to measure performance of the processors under various loads. I start at a 1:1 physical-core-to-process level, then ramp up the ratio significantly.
For these tests, I compared a two-CPU, 8-core 3.20GHz Nehalem W5580 system with 24GB of DDR3 RAM running at 1,333MHz to a two-CPU, 12-core 3.33GHz Westmere X5680 system with 24GB of DDR3 RAM running at 1,333MHz. Aside from the slight difference in clock speed, these are essentially the same chip, but one generation apart. All tests were run from RAM disks to eliminate disk I/O from interfering with the raw CPU tests, and Hyper-Threading was enabled.
The results are pretty much what you'd expect from a Nehalem CPU with two additional cores. At the lowest concurrency level of eight processes, the processors proved essentially equal, with the slight edge to the Westmere X5680 due to the slightly higher clock speed. The LAME test showed the Nehalem running 27 seconds where the Westmere hit 26 seconds.
The next iteration was 12 concurrent processes, and here the Westmere began to pull away, with an identical runtime of 26 seconds. The eight-core Nehalem was oversubscribed at this point and turned in a 37-second time. After that, the Westmere ran away with the test, culminating at a runtime of 149 seconds on the 96-process test, where the Nehalem fell in at 234 seconds. Basically, on a per-core basis, the Westmere isn't much faster than the Nehalem, but there are simply more cores and it scales far better because of that.
The other tests reflected the same results in a slightly less spectacular fashion, except the video conversion test that had the Westmere finishing the 96-process run a full 102 seconds faster than the Nehalem.
I ran the same test suite on a set of 2.93GHz Westmere X5670s, and the results showed the same scaling benefit, but with slower times due to the reduced clock speed as compared to the 3.33GHz X5680s. However, even at a 1:1 process-to-core ratio, the X5670 was roughly on par with the 3.2GHz Nehalem W5580, probably due to the larger L3 cache.
All this points to the fact that if your workloads are single-threaded and single-process, then Westmere CPUs aren't going to buy you very much over their older counterparts. However, if you run highly threaded applications or many iterations of single-threaded applications at once, Westmere will provide significant benefit. Of course, one of the biggest multithreaded workloads is virtualization, and that's where Westmere will likely find a very happy home. Any sufficiently threaded virtualization platform will make great use of the extra cores in Westmere, on the same socket and using the same RAM.
LAME MP3 audio conversion tests, 8 to 96 concurrent processes (times in seconds)
Bzip2 compression tests, 8 to 96 concurrent processes (times in seconds)
MP4 to FLV video conversion tests, 8 to 96 concurrent processes (times in seconds)