If you think AMD’s Opteron and Intel’s Nocona -- or more formally, “Xeon Processor with 800MHz System Bus” -- are cut from the same 64-bit cloth, look closer. Yes, they’re compatible at the instruction-set and register levels; they should be because they’re both based on AMD’s x86-64 specification. But the total system architecture surrounding these chips -- which includes pathways to other CPUs, memory, and peripherals -- exhibits several differences that factor into buying decisions and developers’ platform targeting.
At its core, Nocona is a NetBurst Xeon DP, a Pentium 4 equipped for dual-processor operation. It has 1MB of Level 2 cache and a top clock speed of 3.6GHz. All memory and I/O data, interrupts, interprocessor communication, and address requests flow over a fast shared bus with a maximum bandwidth of 6.4GBps. It’s a highly evolved design, on the leading edge while remaining faithful to the legacy design principles that Intel is expected to maintain.
The 64-bit technology common to both processors is easy to explain: more memory and more registers. When you’re running a 64-bit OS, standard PC caps on physical and virtual memory go away. (Well almost: Opteron has a larger total address space than Nocona, but Nocona can accommodate twice as much physical memory as a current dual-CPU Opteron system: 32GB vs. 16GB.) Registers are the fastest type of storage a CPU has. The more registers you have and the more bits each register holds, the more compilers can optimize application performance. Having more registers and using them well can also improve the speed and smoothness of task switching, which has an effect similar to that of Intel’s Hyper-Threading technology.
Beyond the instruction set and address space, however, these two processors have nothing in common. And where they diverge most is in their total system architectures.
As with all Xeons, Nocona’s shared bus is the Achilles’ heel of Intel’s architecture. That only gets worse with SMP systems in which multiple CPUs must funnel their data, I/O, addresses, memory access, and interprocessor communication through a single bus and compete for access to a single pool of memory.
There are two ways to improve a shared bus: Make it faster or divide it up into independent buses. Intel sped things up, raising the bus speed from 533MHz to 800MHz, and tossed out a hint that it’s going after the independent bus design. The new touch is PCI Express. The chip that directs traffic on Xeon’s shared bus now has three onboard serial communications channels, each of which has a theoretical maximum throughput of 4GBps. Nocona can’t touch the channels’ aggregate potential of 12GBps with a 6.4GBps shared bus, but faster buses will inch it closer to that limit.
By contrast, Opteron implements as many as four independent high-speed buses on each processor, depending on the model of the CPU. One bus on each processor is dedicated to memory traffic, with a maximum bandwidth of 6.4GBps. The Opteron architecture gives each processor its own bank of memory, so theoretically, bandwidth rises and contention decreases as more processors are added to a server.
Communication with nonmemory system components, including other processors and peripherals, is handled by HyperTransport bus controllers built into the CPU. This parallel bus, developed by AMD and licensed by others, including Apple and Transmeta, has a bandwidth of 6.4GBps (3.2GBps each way), for a total potential system bandwidth of 19.2GBps, independent of memory traffic. Direct HyperTransport links between CPUs allow all processors to share all the system’s memory, split though it is across processors, at full speed.
The bottom line is that the Opteron architecture with HyperTransport set the stage for blazing multiprocessing performance. And at this stage, Intel’s Xeon line has nothing to match that.