The ugly truth about disaster recovery

High availability, disaster recovery, and business continuity often fail due to poor design. Here's how to do them right -- even in the cloud

1 2 3 4 Page 4
Page 4 of 4

BC for the SAN-based virtualization cluster is again similar to that for the stand-alone SQL server in that you'll want redundant compute (server) capacity located at a remote site. As in the DR example, you'll probably also want a second SAN running asynchronous replication. However, this time you're going to want to locate the secondary instance at a different site and configure it with enough transactional performance to keep up with a full production workload -- probably a mirror of the configuration at the primary site rather than a stripped-down, low-performance configuration.

A solid cloud-based business continuity design really requires a thorough understanding of how your cloud provider works. Using Amazon AWS as an example, you'd want a second EC2 instance, but this one should be located in a different AWS availability zone from the first. So, if the first instance is in U.S.-East, you'd want the second to be at least in U.S.-West (if not one of the more expensive international zones). Then you'd do some scripting on the primary server to have it periodically ship incremental live-state backups to the secondary. In fact, you could even include turning the secondary instance on and off before and after the replication to save you some cash. In the event that the primary EC2 instance failed, Amazon's Elastic IP assignment could be used to shift traffic to the backup without any users being the wiser.

Some might even question whether that approach takes things far enough -- especially given that there has been at least one instance where a failure in one AWS availability zone hurt services at others. If you find that you're not comfortable working within a single provider, you could always replicate your data to a completely different cloud provider or to on-premise hardware. However, that would involve designing an addressing redundancy system to replace Amazon's Elastic IP (whether that's simply modifying DNS or something more complicated).

Putting it all together

Whatever approach you end up using to satisfy your HA, DR, and BC requirements, make sure that both you and your stakeholders are using the correct terminology and understand what is actually being bought by the investments being made. Business stakeholders, no matter how nontechnical they are, should understand how quickly you'll be able to recover from the entire range of failures that might occur and what it will cost for them to improve those numbers.

The last thing you want to be dealing with in the midst of a disaster is an army of suits with overblown expectations of 100-percent uptime wondering why you didn't manage to live up to them.

This article, "The ugly truth about disaster recovery," originally appeared at Read more of Matt Prigge's Information Overload blog and follow the latest developments in storage at For the latest business technology news, follow on Twitter.

Copyright © 2011 IDG Communications, Inc.

1 2 3 4 Page 4
Page 4 of 4