On the surface, Nocona’s improvements appear typical of intel’s dual-processor Xeon evolutionary tradition: a higher clock speed and a faster front-side bus. But this time, in a departure from its usual formula, Intel has added several noteworthy twists intended to stem customer and OEM defections from Xeon to AMD’s fast-tracked Opteron.
As a 32-bit x86 processor, Nocona is a killer that instantly obsoletes its predecessor, Xeon DP. Nocona is manufactured using a 90-nanometer process rather than Xeon DP’s 130-nanometer process, allowing Intel to pack more transistors into a smaller space and to drive the chip at a lower voltage. This helps to offset the heat and power draw associated with higher clock speeds. Intel also exploited the extra real estate by raising Level 2 cache size to 1MB from Xeon DP’s 512KB. Bumping up the size of the Level 2 cache allowed Intel to remove the Level 3 cache it had incorporated in late-model Xeon DP processors. It is the doubling of Nocona’s Level 2 cache -- which runs at the CPU’s full clock speed -- that will have the greatest impact on the performance of Xeon-optimized applications.
At 16KB, the size of Nocona’s Level 1 data cache is also twice as large as Xeon DP’s. Although seemingly small, this cache is vital because it sits closest to the chip’s execution units. The Level 1 cache is critical for Nocona to be capable of performing multiple parallel operations per clock cycle, which will also have a noticeable impact on performance.
Nocona’s execution pipeline -- that is, its queue of operations awaiting execution -- is 31 stages long, up from Xeon DP’s 20. Critics point to Xeon’s long pipelines as evidence of the inefficiency of Intel’s x86 designs. The pipeline holds not only operations that are certain to be executed but also those operations that the processor predicts will be executed as the result of conditional instructions -- for example, a branch taken when a register’s value is greater than a specific number. Operations sitting in the pipeline are executed very rapidly, so when Nocona predicts the execution path correctly, the processor’s performance is astounding. But when its predictions fail, the pipeline has to be flushed and refilled from scratch, a process that hinders the chip’s performance.
The Nocona feature that has grabbed the most attention, Intel’s new EM64T (Extended Memory 64 Technology), might be the least interesting. EM64T breaks the 4GB RAM barrier associated with all x86 processors (except Nocona, Prescott, and AMD’s Athlon 64, Athlon 64 FX, and Opteron chips). Basically, Intel created a hack for using chunks of memory above the 4GB mark that will effectively reduce the system’s contiguous address space. Yes, developers will be happy to see a big address space for their new 64-bit apps, but Intel’s system architecture will likely hamper, not improve, performance as more RAM is added to the system.
In the end, Nocona is a Xeon killer not an Opteron killer. The fine points of chip architecture aside, differences at the system level relegate Nocona to the dual-processor bush league. For example, Intel could not match the glueless I/O system. Processors in an Opteron server talk directly to one another without going through the silicon intermediaries forced by Intel’s design. Likewise, Opteron links each CPU directly to memory, and Opteron systems’ memory bandwidth scales upward with the number of CPUs. Nocona’s bandwidth remains static.
Nocona’s design is Intel’s best yet. It’s a beautiful Xeon. But to catch Opteron, Intel would have to ditch its time-honored system architecture. Given where Itanium has (or hasn’t) gone, and the incredible pace at which AMD is advancing its technology, Nocona looks like a well-manicured rest stop alongside the very bumpy road that Intel faces.