IW: You say that Galaxy will do the same work with 10 percent fewer servers today. Will that ratio improve dramatically when
Pacifica lands?
AB: That depends on the number of application kernel interactions. There's no single way to quantify that. The Pacifica architecture
makes life easier. It will be easier for the open source Xen or Microsoft Virtual Server to be fully functional and perhaps
have some performance advantages [over VMware] in some cases. But I don't think the quantity of performance improvement is
what's driving this. Today, or historically, there was a huge cost premium for virtualization. And still a lot of people chose
that route because they could save as much on the hardware. But going forward, I think we could assume that a year from now
everybody is going to ship virtualization as part of the basic offering.
Virtualization is a very important topic. What's really happening in the market of course is this transition from a two-socket
single-core to the two-socket dual core. AMD has it now; Intel will have it next year. Historically the two-socket market
-- the two-core market -- was the sweet spot. But [now] the two-socket dual-core is actually the most cost-effective system,
which is really a four-way. To take advantage of the four-way, you want to consolidate more workloads on it. But again, this
is a very significant transition in the market. Just look at the percentage market share today: four-way systems are less
than 10 percent of all the systems shipped. The rest are the two-ways and the one-ways. Whereas a year from now, you would
expect that 90 percent of all systems will be four-way systems.
IW: Is Sun going to continue the message it had for Sparc, that it really doesn't matter how fast you toggle the clock?
AB: Clock rate [alone] is completely meaningless. What matters is the amount of work that is accomplished. For example, Opteron
has three integer pipelines internally; Xeon only has two. So there's a two-to-three conversion in terms of productivity.
On top of that, the lower clock rate helps tremendously to lower power consumption. Power consumption is linearly related
to clock rate, and the 30-stage pipeline on Xeon was really, really bad from a power-consumption standpoint. Both the Opteron
pipeline and AMD's future pipeline is much shorter than that. So I think that Intel simply went the wrong way on these microfine
pipelines that tried to maximize clock speed.
But let me go back to clock rate. For an architecture like Opteron, the scaling we see with increasing clock rate is pretty
linear. The memory controller is on chip, so there's no other element like the front side bus that's at a certain speed that
doesn't improve.
IW: And the memory controller always runs at the CPU clock speed.
AB: Yes, and that has been a surprisingly major improvement for AMD. We were comparing some benchmarks from the Xeon MP with
the 8MB cache and even there the larger cache does not make up for the fact that the memory is that far away from the CPU.
IW: Or that communication with other CPUs and all I/O devices has to run through a single northbridge.
AB: Exactly. The I/O performance on Opteron is also very good. Meaning we can support a large number of Fibre Channel adapters
or any other kind of I/O -- InfiniBand going forward, 10 Gigabit Ethernet -- at wire speed given the memory bandwidth, which
is more difficult, let's just say, on an Intel system. Again, Intel is working overtime to correct things, so I don't want
to turn this into an AMD-versus-Intel discussion. But we're very happy with the performance we're getting out of Opteron.
Certainly in technical markets, or within any market where the primary decision criterion is performance, Opteron wins hands
down.
IW: Tell me about the throughput of the system. What's the speed of the HyperTransport in this box?
AB: The system we're shipping is with the full gigahertz HyperTransport. And that makes a difference, by the way. We saw a
significant increase from some earlier systems that were not running at the full speed. The memory speed, 400MHz, is also
very important, particularly on floating point and memory intensive applications.
On the enterprise systems we always use two HyperTransports for I/O, and again this made a difference in terms of the total
I/O capacity we can get. It's tough to talk about peak performance here -- peak I/O bandwidth -- because I/O bandwidth is
limited by the PCI slots, but we are not limited by the internal HyperTransport bandwidth in any scenario.