Last week I talked a bit about how important it is to use and maintain whatever remote monitoring and control systems you have available -- and why it's a good idea to drop a few extra bucks here and there to add control and visibility to your data center.
But monitoring and control systems are useless if they're inaccessible or rendered moot when trouble strikes. You can configure email alerts from your UPS and AC units all you want -- if the mail relay they use is offline, those notifications won't go anywhere. The same is true for data outages. There's nothing worse than a trouble situation that also drops the data connection and leaves you with no idea how a data center is weathering the storm because you can't see anything.
[ For best practices on how to set up remote monitoring and control systems to begin with, see Paul Venezia's, "Troubleshoot your data center from the easy chair." | See if you match this profile: "Nine traits of the veteran Unix admin." ]
There are several key points to inspect when thinking about how to eliminate this problem. The first may be outside your control: how your data circuits are prepared for power outages and circuit cuts. I've seen many situations where a company has full power protection for all the gear in its data center, but down in the basement, a fiber transceiver or carrier termination unit is plugged into a $5 power strip that connects directly into mains power. If the building loses power, the data center may have generator backup, but that doesn't matter because the critical component has no juice, and the bits do not flow. This could easily be prevented with even a small UPS. You might be surprised at how many hours a 500VA UPS can run a fiber transceiver, and in this situation, you'd be very grateful.
The next area of preparation involves multiple data circuits and internal routing. It's a fantastic idea to add a business-class cable or DSL connection to your Internet connectivity. While not as reliable as a fiber drop, it will at least stand a chance of being operational when the main connection evaporates due to carrier problems or Big Joe with the backhoe. I've seen many cases where this supplemental circuit is brought into the data center and source routing at the core is used to push basic Internet browsing across that pipe, leaving the more expensive and reliable circuits to handle VPN and business-critical communications. That's fine, but if the main circuit disappears at 2 a.m., how can you remotely access the data center through the secondary circuit?