Powering down servers is a calculated risk

Vendors cite minimal risk to shutting down servers, yet some companies practice it to save energy

Given the high costs of electricity, datacenter operators are feeling increased pressure to rein in energy consumption. Server virtualization and consolidation are proving effective, but another straightforward approach to energy cost-cutting seems to be met with reluctance: the practice of powering down servers that aren't in use.

In a recent InfoWorld article, InfoWorld Contributing Analyst Logan Harbaugh proposed that powering down servers during off-hours would cut energy waste while not having any adverse effect on the hardware. He even suggested that companies could use the waiting time required to power up servers as a bragging point to users: "This is taking some time because we're being energy conscious and environmentally friendly."

Harbaugh's article met with resistance from a number of readers. In some quarters, shutting down servers is tantamount to heresy. Several readers protested that pulling servers offline is simply bad customer service; in their shops, servers never stop. Others were concerned that powering down a server could be harmful to the machinery. I thought it would be prudent to go directly to some of the vendors and ask them about the practice of shutting down servers that aren't being used.

I discovered that even the experts don't quite see eye to eye on the issue. Ken Baker, datacenter infrastructure technologist at HP, said powering down servers is completely safe. "It's not at all bad for the server. It's something we do to electronic devices all the time. It can handle it from a hardware perspective," said Baker.

But Brad McCredie, an IBM fellow for the Systems and Technology Group, wasn't quite so sanguine. He explained that, technically speaking, powering off and on any kind of computer can have a detrimental effect over time.

"[Temperature cycling] is a well-established failure mechanism and a stress on components," McCredie pointed out. "What it really comes down to is all these things -- chips soldered on modules, soldered on boards and connectors -- that expand and contract when they heat and cool.... When they all contract and expand at different rates, they can fail. That's ultimately the bad thing with power cycling," he said.

Mark Monroe, director of sustainable computing at Sun, suggested that machines can handle being shut down a finite number of times. Arguably, the number is large enough for regular power cycling over an extended period of time. "Most server vendors today say they'll support a certain number of cycles of powering things on and off," Monroe said. "I believe most of the server vendors would say [the number] is in the hundreds as opposed to the thousands."

Both Monroe and McCredie note that their respective companies already offer software capable of powering down components of a server, in the interest of boosting energy efficiency. "We just released a new product that has the capability of turning different parts of the server off: disk, CPUs, fan [and] memory DIMMS will be powered down," said Monroe.

Similarly, IBM offers a product called Active Energy Manager, an extension to IBM Systems Director, that features advanced energy control options designed to boost performance per watt by slowing processor clock speed or even putting processors in "nap" mode when not in use.

Software surprises

Beyond the hardware concerns associated with powering down servers, there can be headaches associated with powering on a server from an operational perspective, according to Sun's Monroe. "There are concerns about changes being made to a server config file but it not being recorded in the log. It will take effect at the next reboot, and lo and behold, the system won't come back up," he said.

HP's Baker shared similar sentiments about the difficulty of getting a server back to an operational state after it's turned back on, which is why powering down might be disconcerting for some organizations. "In an enterprise operating system, or depending on the application you're running, it may not be appropriate for it to be controlled in that manner," he said. "Can you automate that process? Certainly. There's a lot to some of these applications, though, in terms of complexity. Can you automate the functionality that allows the server to be turned back on, log in, load all the necessary software ... and get it on the network to do some work? It's certainly possible. It could be a time-consuming process."

Some companies say they're already powering down servers that aren't in use. One company that strongly advocates the practice is Cassatt. The company offers a software solution, called Cassatt Active Response, that automatically powers servers off and on in response to preset conditions, whether time-based (e.g., powering down servers at the end of the workday and on weekends) or related to application availability (if service levels for a given app reach a certain threshold, a new server would fire up).

Notably, VMware offers "experimental" functionality in VMware Infrastructure 3.5 for powering off and on servers called Distributed Power Management. It's designed to automatically power off servers not currently needed in order to meet service levels, and automatically power on servers as demand for compute resources increases.

Powering down

According to Cassatt's director of product management Ken Oestreich, powering down servers can be a safe, viable activity. Moreover, the company practices what it preaches. "We've got several hundred servers here that we're power managing that are turned on and off several times a day. We've had no failures for three or four years," he said.

He argued that servers have the resilience to be powered up and down on a daily basis in their useful life. "If you assume that the average piece of equipment has a three- or four-year depreciation cycle, and you cycle it once a day, you're talking about one or two thousand power cycles for the lifecycle of the machines. That's actually not a lot," said Oestreich.

Cassatt isn't the only company dabbling in shutting down servers. Cisco is doing a pilot program at its two datacenters of an in-house application called V-Frame, a management app designed (among other things) to elegantly shut down servers in batches remotely, similar to the way a remote server management system shuts down a single server.

According to Rob Aldrich, director of datacenter solutions at Cisco, V-Frame has the capability to address the problem of a server not shutting down completely when the OS hangs: Admins can cut off power to the server from the power rails at the back of the rack, via IP. "It's nice to have the added benefit to know you can cycle the power physically ... and do a hard reboot if you need to," he said.

Cisco is testing the application on only 200 of its 10,000-plus systems, targeting production servers that don't need to be used all the time. "For CRM, we have certain groups of servers that deal with network attached storage archiving which get shut down after a period of inactivity," said Aldrich. "On our file and print servers ... we shut those down after 8 p.m. in the evenings and on weekends. For batch payroll processing, we only spin those up during our processing periods for payroll."

The company has been pilot testing V-Frame for three months, and according to Aldrich, there've been no major problems or complaints thus far from IT.

The company isn't too worried about the impact of temperature cycling on the hardware it's powering down. "When you're refreshing servers every five years at most, it's a risk we're willing to take. We don't think the hardware is going to be adversely affected within the constraints of our refresh cycle," said Aldrich.

Aldrich noted that one of the biggest barriers to shutting down servers is getting buy-in from all parties concerned. After all, if your job depends on application reliability, you might be hesitant to take chances with servers being powered up and down. But the potential payback of lower energy bills and even, possibly, more capacity might make it more palatable.

In fact, all the parties with whom I spoke noted the potential benefits of reining in energy waste by powering down servers. Cisco, for example, has found that servers in idle state still consume 40 percent of their power. Those wasted watts can add up over time.

Timing is everything

Servers that sit in idle state for long periods of time are the top candidates for powering down between uses. Cassatt's Oestreich pointed to some specific applications that fall into that category. The biggest area for potential savings, he said, is "dev and test where you have a test running for awhile, and then the project's over and you leave the server on and it's doing nothing."

The second biggest area, he said, is failover, "where machines are plugged in and on their entire lifetime, and in theory doing nothing their entire lifetime." A third application could be staging servers, where applications are tested before going live.

Cisco's Aldrich said servers that only perform a single function that can be readily predicted (e.g. scheduled well in advance) are prime candidates for periodic shutdown. That includes batch payroll processing functions, which only take place during pay periods.

IBM's McCredie noted that the practice of shutting down servers could even make sense for applications such as Web hosting as well. "When everybody is at home, Web hosting is fairly modest. When everyone goes home and starts shopping and surfing the Web, Web hosting kicks up.... During the day, you might want to power down some of your server farm, then power them back up when you have heavy use at the end of the day."


Copyright © 2008 IDG Communications, Inc.

How to choose a low-code development platform