AMD answers Nehalem

AMD gives Intel its due for finally getting its x86 server platform right... and for validating so many Opteron innovations

At IDF, Intel laid out the last remaining secrets of Nehalem, a remade x86 platform built around a highly integrated Core 2 microprocessor with on-board memory and point-to-point bus controllers. I fairly raved about it, a fact that caught some who pegged me as an AMD die-hard by surprise.

Being a chiphead, but well shy of a chipmaker, I needed an independent perspective on Intel's first substantial effort to carry the x86 platform beyond the standards and boundaries set by IBM's PC-AT. Intel finally dumped the shared bus. That doesn't set a new high for the industry, but it certainly redefines Intel and lays down a new road for Intel-based servers and workstations.

While I had a head full of Nehalem facts, what I lacked was balance. Intel's IDF sessions on Nehalem compared the platform and CPU architecture only to Intel's work to date which, while I wouldn't call it sub-par, had been out-engineered by AMD.

For a valid contrast, I need to weigh Nehalem against AMD's own 45nm quad-core technology, dubbed Shanghai. That's a platform/architecture shoot-out for the ages, and I'm all over it. But since neither technology is shipping yet, all I can compare is detailed specs and higher-altitude rhetoric. The specs will take some digging. Today I'll tackle the rhetoric, the rationale behind Intel's design decisions, and the packaging of those decisions as competitive advantages. How much of what Intel's done with Nehalem is actually unique, and will IT feel the difference?

I brought a slate of questions to AMD and invited their best batter, a lead in AMD server architecture and platforms, to address them. We covered an enormous amount of ground. The upshot is this: AMD celebrates Intel's validation of innovations that AMD designed into Opteron closer to the turn of the century. The x86-64 instruction set, non-uniform memory access (NUMA, which gives each processor socket its own RAM), Direct Connect (Intel calls its incarnation QuickPath) socket-to-socket bus, on-chip independent memory and bus controllers, independent power control for each core, internal power and thermal management, multiple processors on a single contiguous mask, dedicated Level 2 cache for each core, and shorter pipelines are features that AMD claims as firsts in the x86 domain. Intel once blew off each of these features as irrelevant. Now Nehalem and related platforms adopt all of them.

This is a good thing. When Nehalem goes up against Shanghai, it's apples-to-apples on platforms, or at least it will be treated as such. Even though the savviest server buyers could grasp the scalability advantages of AMD's NUMA and Direct Connect over Intel's shared bus, AMD won't be able to put across subtler differences between AMD's and Intel's implementation of the same ideas. When platform differences get small enough to require debate among gearheads, there is little chance of translating platform engineering variations into criteria relevant to mainstream IT.

Intel hopes to get some traction with a feature that AMD lacks, an integrated power microcontroller. AMD had no specific observations to offer on a feature that Intel kept secret until a couple of weeks ago. On servers, AMD believes that power conservation is done most effectively at the wall: If a server isn't working hard enough, turn it off. That's my line, but it's a long slog to get IT to take up this idea. In the meantime, servers should be at least as clever as desktops at using less energy when they have less work to do.

I understand that AMD is frustrated by the very roadblock I've called out: No matter how ingenious chip engineers are, if Microsoft doesn't pick a feature up, it's as good as wasted effort. This is especially true of power management. If server BIOSes and Windows Server OSes don't leverage Intel's power management microcontroller any better than they do quad-core Opteron's designed-in efficiencies, then Intel's bragging point will be lost on all but notebook users. If Intel has enough sway to make Microsoft twiddle Nehalem's power knobs and dials, or better still, let the microcontroller manage them itself, then Microsoft will have to answer for failing to invest as much care in exploiting architectural and platform features unique to quad-core Opteron.

Intel did more than catch up in cache design for the Nehalem architecture. The huge, shared Level 2 cache has given way to a much smaller Level 2 cache dedicated to each core. Like AMD (and like some previous Intel Xeon designs), Nehalem adopts a three-level cache. Intel uses the Level 3 cache to implement cache probe filtering, a technique that cuts down on core-to-core bus traffic. The handling of cache is a major and palpable differentiator between CPU architectures, especially as other engineering gaps tighten. There is a lot of room for innovation here.

Intel makes marketing hay with the sort of esoteric innovations that grab my attention, but which AMD asserts won't be felt on the server side. One such feature is HyperThreading (HT). This Netburst feature got the axe when Intel went to Core. I always considered HT one of Intel's bolder engineering moves, and now we'll see how it fares in a modern setting.

AMD is betting that instead of pulling up to 30 percent increase in performance with ideally-tuned workloads, HT will bring single-digit boosts. In AMD's view, HT came about as a means of giving the chip something to do while it was waiting for memory. Now that on-chip controllers and faster RAM knock memory latency down to a fraction of what it was in Pentium 4 heyday, there isn't that much waiting time to fill. I have higher expectations in the long run. I think that multithreading will become the smartest way to squeeze more performance out of a socket, especially as programmers and compilers get smarter about parallelization of code.

AMD considers it unlikely that server applications will feel other Nehalem platform and architecture enhancements, such as Version 4.2 of the Streaming SIMD Extensions (SSE) and Intel's Application-Targeted Accelerators. Both require recoding, perhaps hand-coding to put to use. I see tremendous potential benefit, but it's only reachable where developers are willing to risk incompatibility with other types of systems.

There's the rub. At the platform level, Nehalem's advances will be felt by IT without requiring any change in software. To feel Nehalem's power management and architectural (CPU) performance tweaks requires new code. That's effort that high-performance computing and specialized verticals like medical imaging will shoulder, but as a rule, major ISVs shy away from forking code to bring advantages to a small fraction of the x86 server installed base. And as much as I enjoy handwritten in-line assembly code, it's not your average in-house coder's cup of tea.

AMD hopes to call attention to the fact that Nehalem requires new servers. AMD's message is one of platform longevity. Between serious platform revisions, AMD lets customers and OEMs do system upgrades with CPU swaps and BIOS updates. AMD's OEMs aren't consistent in enabling this -- I don't know how many IBM server customers were able to get dual-core to quad-core Opteron chip upgrades. Longevity should matter a great deal as budgets tighten. If Intel feels it finally has a platform with some headroom to it, it might relax the projected expectation of full system replacement every two years.

Since AMD had the floor to itself, it couldn't resist taking one last shot. Now that Intel has shed much of its legacy system platform, it faces a legacy core architecture. Nehalem is still a modified Pentium III core. During the time that Intel has spent cleaning up its platform, AMD, whose server platform has neither had nor needed overhauling since gen-one Opteron, has been working on CPU architectures. Its most anticipated project is Fusion, a CPU that brings AMD's microprocessor, embedded, chipset, and GPU chops, along with some time-proven big iron ideals, to bear on a single socket. Until AMD can blow Intel away again, it's comfortable with a share of the x86 server market.

The fact that Intel's platform has caught up to AMD's doesn't automatically put AMD at a disadvantage. What it does is level the field, which breeds aggressive innovation and pricing as close competitors fight to differentiate their products. Apples to apples in x86 servers is good for IT.

Copyright © 2008 IDG Communications, Inc.

How to choose a low-code development platform