Intel's Nehalem simply sizzles
In a range of tests, the new quad-core Xeon processor shows huuuuge performance gainsFollow @infoworld
Intel's new Nehalem Xeon CPUs, which are being introduced in countless one- and two-socket servers and workstations today, have already generated a lot of heat. While introducing the new processors to technical journalists in February, Nick Knupffer, Intel's global communications manager, boasted that "Nehalem represents the biggest performance jump we've made since the introduction of the Pentium Pro."
This claim was met with outright skepticism by nearly everyone in the room, and certainly by me. But after running a two-socket, eight-core Nehalem system in my lab for the past few weeks, it would appear that Knupffer is right. Intel has built a better mousetrap. And it used part of AMD's blueprints to do it.
Back when AMD's Opteron was ruling the performance roost, Intel was busy gluing two separate cores onto a single die and calling it a dual-core CPU. Memory bandwidth lagged due to the central off-die memory controller, and while the overall performance of the processor was acceptable, it lacked the NUMA (Non-Uniform Memory Access) punch that was the Opteron's claim to fame. Nehalem is based on a NUMA architecture, much like the Opteron, and its performance is miles ahead of anything else Intel has released to date. Color me impressed.
The Nehalem chips (Xeon 3500 series for single socket and Xeon 5500 series for two-socket systems) feature a quad-core layout with 731 million transistors, 256KB of L2 cache per core, 8MB of L3 cache, deeper and faster caching, and better branch prediction. Essentially, Nehalem is a blend of the strengths of Intel's legacy Xeon processors with a fundamental architecture change in the incorporation of NUMA.
With NUMA, each CPU has its own memory controller. This ties DIMM ranks to a specific CPU and, in the Nehalem architecture, provides memory bandwidth speeds at 25.6GBps per link or 6.4GT (Gigatransfers) per second with DDR3 RAM. Due to this architecture change and the nature of DDR3 RAM, the RAM clock runs at 800MHz, 1,066MHz, or 1,333MHz. If the DIMM ranks are populated with a single RDIMM (Registered DIMM) per channel, the highest speed of 1,333MHz is possible. As RAM is added to those channels, the overall speed drops to 1,066MHz or 800MHz. However, with 4GB RDIMMs, a dual-socket system can run 24GB of RAM at 1,333MHz using only six RDIMMS. Using the Tylersburg chip set, it's possible to bring the RAM total up to 144GB -- 72GB per CPU -- running at 800MHz.
There's more to Nehalem than just NUMA, however. A raft of supporting players also enters into the mix, including updated Virtualization Technology extensions to assist in virtualization use cases; support for DDR3 memory, which can provide double the data rate of DDR2; and SSE 4.2 instructions, a relatively minor update aimed at accelerating text processing. The significantly increased memory bandwidth is the major update, along with the advent of QuickPath, the new processor interconnect that replaces the aged front-side bus. But these additions are quite welcome and round out the package.