The tests I ran are based on common operations found in many applications. The LAME tests convert a 152MB WAV file to MP3 at a 256Kbps bit rate. The compression tests use gzip and bzip2 to compress and uncompress a 55MB MP3 file. The MD5 tests calculate MD5 sums on 152MB files, and the MP4-to-FLV tests transcode a 24MB MP4 file to FLV. These tests are single-threaded, but run concurrently with increasing levels of concurrency to stress physical and logical cores, memory bandwidth, and memory interconnects, as well as disk I/O.
On the Nehalem-EX, I ran these tests with Hyper-Threading enabled and disabled. For comparison, I'll reference the results with Hyper-Threading disabled so that the figures represent the same number of logical CPUs. All tests were run on CentOS 5.4. The reported figures were drawn from tests run from ramdisk to eliminate disk I/O from being a bottleneck.
The results start out somewhat unimpressively. With eight concurrent processes, the four X7350 CPUs in the DL580 were evenly matched against the two Nehalem-EX CPUs in the R810 in the LAME and gzip tests, but were significantly behind in the other tests. At a concurrency level of 16, the gap widened substantially on all tests, with the older system slightly ahead of the Nehalem-EX in the LAME and gzip tests, but running way behind in the remainder. Once the testing started to significantly oversubscribe the number of logical CPUs on each server, the Nehalem-EX pulled way into the lead and stayed there across all tests.
In fact, I ran many test passes at the 48, 64, and 96 concurrent process levels to verify the results because the performance differences were so huge. For example, at 64 concurrent processes, it took 2 minutes, 12 seconds for the two-CPU Nehalem-EX system to complete the MP4-to-FLV test. The four-CPU X7350 system took over 30 minutes to complete the same task. That's a massive performance difference. The performance delta between the two servers only grew wider as the concurrency increased. Not only was I able to ramp the Nehalem-EX up to 768 concurrent processes, but it was still running the tests about 50 percent faster than the X7360 could run 64 concurrent processes.
This extreme performance increase is due to a number of reasons. The older X7350 system might have had two additional CPUs and a 670MHz clock rate bump per core, but it only had 4MB of L3 cache compared to the 24MB L3 cache on the Nehalem-EX. The X7350 also lacked the benefit of QuickPath, and the memory bus became a bottleneck. Thus, in the heavier workload tests, the Nehalem-EX blew the X7360 out of the water, even with a reduced clock rate per core and the same number of cores. In the lighter workloads, the difference was not nearly as significant.
LAME MP3 audio conversion tests, 8 to 96 concurrent processes (times in seconds)
MP4 to FLV transcoding tests, 8 to 96 concurrent processes (times in seconds)
Having trouble installing and setting up Win10? You aren’t alone. Here are many of the most common...
It's all about knowing how to build an open source community -- plus experience running applications in...
Win7 Update scans got you fuming? Here’s how to make the most of Microsoft’s 'magic' speed-up patch
Sponsored by Hewlett Packard Enterprise
Sponsored by Intel
Sponsored by Intel
Can you really use Google’s G Suite instead of Microsoft Office? Here's how they compare on Windows,...
These were heralded as impressive new gotta-have-it features, but in hindsight, they're pretty useless ...
From shape-shifting furniture to holographic displays, the workplace of the future promises more than...
Once again, Microsoft cuts corners on user interface, functionality, and cross-platform support. Will...