Over the past year or so, there's been plenty of head-scratching as to how to meaningfully measure a server's power performance: that is, how efficiently it uses energy to do its work. This kind of metric is important as datacenter operators struggle to keep energy costs down and free up floor space -- without sacrificing service quality.
Plenty of folks have invested resources and brainpower in the task, from independent analysts such as Neal Nelson and Associates and InfoWorld's chief technologist Tom Yager to large-scale organizations such as The Green Grid and even the EPA.
Thus, I was rather intrigued to learn this week that SPEC (Standard Performance Evaluation Corporation) has announced what it deems "the first industry-standard benchmark that measures power consumption in relation to performance for server-class computers." It's called SPECpower_ssj2008, a name that doesn't so much roll off the tongue as ooze -- but what's in a name, anyway?
Driving toward meaningful metrics
Before digging into the nitty-gritty of SPECpower_ssj2008, I want to provide some context as to why a server power-performance benchmark has proven elusive. As I said, plenty of smart people have been trying to devise one, and at first blush, it may seem like a deceptively simple task.
I like to compare it to the MPG measurement used to assess vehicles' fuel efficiency. You simply divide the number of miles you've traveled by the number of gallons used, and voilá, you have a meaningful measurement with which you can easily compare vehicular fuel efficiency. A high MPG, such as that you might get from a hybrid sedan, is deemed good. The low MPG you might get from an SUV is bad. Easy.
But wait: Perhaps it's not quite so cut and dry. Is it really meaningful to compare the gas mileage of a hybrid to that of an SUV if you don't factor in how the respective vehicles are being used? If you're comparing the two when the application is carrying three passengers and light baggage down Highway 5 from San Francisco to Los Angeles, the hybrid wins, hands down.
But what if the task at hand is schlepping five passengers and their camping gear along some rough terrain toward an off-the-beaten-path destination? There, the hybrid sedan can't really compete; it's not built to. What the SUV lacks in fuel efficiency, it compensates for with superior muscle and all-terrain features. Score one for the big machine.
Thus, it really makes the most sense to aim for more apples-to-apples comparisons, matching up like vehicles based on their form factor and the tasks for which they're being used.
In the world of servers, comparing power-performance is even more complicated. Servers vary in terms of the types of applications they run, their form factors, the number of processors they have, the speed of the processors, the amount and type of memory, and storage -- as well as how much heat they produce. Cooling, after all, ain't free. In fact, cooling a server can cost as much as running one.
Meet the benchmark
All of that brings us back to SPEC's benchmark, which was developed with assistance from big-name tech companies such as AMD, Dell, Fujitsu Siemens, HP, Intel, IBM, and Sun. The fact that so many companies -- all of which have a clear interest in seeing a benchmark that puts their respective wares in the best light -- are behind SPECpower_ssj2008 gives it all the more weight.
So let's get to the meat of the benchmark. SPECpower_ssj2008 measures server power consumption at different performance levels -- from 100 percent to idle -- in 10 percent segments over a set period of time. This graduated workload is important: It recognizes the fact that processing loads and power consumption on servers vary substantially over the course of days or weeks. Tests by Neal Nelson certainly bore that out. For example, he found that in idle mode, a server running an AMD chip used far less energy than did the machine running an Intel chip.
In order to calculate the power-performance metric, the benchmark measures and adds together the transaction throughputs at the various performance-level segments, then divides the resulting figure by the sum of the average power -- that is, the wattage -- consumed at each segment. The more work a system does at a given CPU utilization, and the less power it uses, the higher it scores on the benchmark. Thus, in essence, an SUV can be compared to a hybrid.
For the benchmark workload, SPEC selected server-side Java, representing Java business apps. According to the organization, "the workload is scalable, multi-threaded, portable across a wide range of operating environments, and economical to run."
Moreover, SPEC says the Java-based benchmark "exercises CPUs, caches, memory hierarchy, and the scalability of shared memory processors, as well as implementations of the Java Virtual Machine (JVM), JIT (just in time) compiler, garbage collection, threads, and some aspects of the operating system."
A spec in the right direction
The benchmark is certainly a good start as datacenter operators struggle to make sense of their machines' energy efficiency, as well the power-performance claims that hardware vendors boast.
In fact, SPEC is reviewing SPECpower_ssj2008 benchmark results submitted by vendors,then posting them for public consumption on the SPEC site. Thus far, HP's Proliant DL160 G5 tops the heap with a score of 698, followed by Dell's PowerEdge 2950 III, which scored a 682.
Although a good start, SPECpower_ssj2008 is just a first step toward measuring servers' energy efficiency. "SPECpower will work with other SPEC benchmarking groups to help them adopt the methodology used in this first benchmark. The intention is that there will be a wide range of SPEC benchmarks that incorporate power measurement in a consistent, repeatable way," writes Greg Darnell, vice chair of the SPECpower committee.
"The methodology can also be used by other benchmark developers interested in measuring power, in the hope that there will be a common set of practices in this new area of benchmarking," Darnell adds.
Additionally, SPEC will be looking at workloads and applications other than server-side Java, though "no definitive decisions have yet been made, however, on those workloads or applications," according to Darnell.
Indeed, running other types of workloads will certainly yield different results -- the way a hybrid's MPG on the highway will differ from its MPG on city streets.
IBM's Elisabeth Stahl, manager of performance marketing for the IBM Systems and Technology Group, expressed similar sentiments about the SPECpower_ssj2008 benchmark. (Big Blue was one of the companies that participated in its development.) "We believe this benchmark is a good first step in helping people to understand the relationship between systems performance and energy use," says Stahl. "We look forward to continued work to create benchmarks that broaden the spectrum of environments represented by the benchmark and to ensure that the data shown is representative of the many computing environments that exist."
The minimum equipment for SPEC-compliant testing is two networked computers, plus a power analyzer and a temperature sensor. One computer is the system under test; the other is the controller system where power, performance, and temperature are captured for reporting. A typical test run for SPECpower_ssj2008 takes about 70 minutes using default settings.
SPECpower_ssj2008 is available immediately from SPEC for $1,600. For more information, go to SPEC's Web site.