Lights out: When to power down the data center

If natural disaster looms, should you pull the plug or play the hero? When Irene hit, shutting down two data centers proved a wise choice

You may have noticed that the East Coast has been slammed by Mother Nature recently. We've seen an earthquake, a hurricane, and more than our fair share of rain and wind these past few weeks. If we could ship some of it to East Texas, we would, but for some reason the skies wanted to wash out half the roads in New England and leave Texas to burn.

One effect of all this geologic and atmospheric turmoil is that maintaining a stable data center becomes quite the challenge. Power and data connection failures make it nearly impossible to ensure full uptime, and if the weather is bad enough, operations personnel may need to stay home.

[ Also on InfoWorld.com: Read Paul Venezia's classic, "Nine traits of the veteran Unix admin." | Or see if you qualify for the title of certified IT ninja. ]

Under these circumstances, if the data center is not providing 24/7/365 facilities (such as hosting or colocation), it may be the better part of valor to power down the whole thing before the storm and the inevitable electrical and data loss. I realize many will recoil from this idea. But there's merit to it, especially if the cataclysm is expected to hit during the weekend.

Even in large corporations, weekend resource utilization is relatively low. Assuming that the public presence is hosted elsewhere, internal services are generally consumed by folks checking email -- or those with a crushing deadline, no social lives, or both. On a weekend when a large hurricane is bearing down on the area, it's safe to say overall data center utilization will be even lower.

But what about potential damage to servers and storage, you ask? It's true: Even with beefy UPS and generator backup, there can be problems with, say, climate control units that poke through to the roof and expose themselves to damage. Plus, downed communication lines mean that unless the facility is manned throughout the outage, admins can't get into the site remotely to check on things or even to organize a post-power-loss shutdown. Your decision depends on the details of your facility, but people's safety always comes first, so you may not have a choice.

In the case of Hurricane Irene, I opted to remotely shut down two data centers in two different states that were in the path of the storm, leaving them with only the switching and VPN gear running. Naturally, almost every element of these data centers can be remotely controlled, from turning servers on and off to gaining console access to every relevant device on the network, including storage controllers, core switching, and so forth. Shutting down the data centers was the work of only half an hour, with scripted tools to turn off every Linux server in a specific order -- and the widespread use of virtualization made it absurdly simple to deactivate all the VMs gracefully.

Unfortunately, the other site didn't fare quite as well. The shutdowns were planned for 3 p.m., but that site magically lost power at 11:45 a.m., well before the storm hit, and lacked generator backup due to regulations and site issues. I ended up feverishly shutting down servers from my iPhone in the middle of a parking lot. I got to about half the servers with the shutdown scripts, but the Windows boxes were left to fend for themselves, as was the storage. The last I saw of that data center was a truncated SMS warning about the monster UPS losing batteries. Then it was gone. Poof. This particular site was 250 miles away, so reviving it would have to wait until after the storm blew through.

When the lights came back on, the second data center started itself back up. With the exception of the boxes I'd managed to shut down normally, the other servers automatically powered themselves on when power was restored, as they were instructed. The networking gear came up normally, as did all the storage. In fact, other than a few situations caused by the out-of-order power-up, the site performed admirably. I had to turn on a few servers manually, remount NFS mounts that had failed due to the storage not being immediately available when other servers booted, and kick over some VMs, but that was it.

The data center that was shut down in an orderly fashion came up just as nicely, with only a smattering of minor issues. Prior to the hurricane, I obviously hadn't planned on performing a true shutdown test scenario that weekend, but I had just completed one, and both sites passed with flying colors. This little exercise also highlighted a few small gaps in the monitoring framework that were easily found and fixed.

If you run a data center that can be forced down completely without causing significant negative impact on normal business operations, you should probably plan a complete power-off exercise sooner rather than later. I always do this when building out a new facility, but after that it's a rare event, usually caused by outside elements. All said, this particular forced power-down increased my confidence in the resiliency of both sites. For me, that was the slim silver lining to Hurricane Irene's clouds.

This story, "Lights out: When to power down the data center," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Recommended
Join the discussion
Be the first to comment on this article. Our Commenting Policies