Replicate your data or lose your company

The core purpose of disaster recovery for an IT environment is protecting the data. This means making a copy and keeping it beyond the reaches of a regional disaster. Are you prepared?

The year was 1993 and I was crossing before the Statue of Liberty on the Staten Island Ferry heading to work in Manhattan. Something wasn't quite right, and I soon noticed there was smoke coming from downtown. It was the first bombing of the World Trade Center, and I was about to disembark to a small nightmare (my office was across the street). Although the actual damage was minimal to the towers, entire companies went out of business as a result. Smoke and water damage combined poorly with backup tapes kept on-site (a huge no-no in the world of data protection). The idea was, "What could be safer than keeping our data here?"

Times have changed, and methods for backup and recovery have become much more advanced over the years, largely due to serious catastrophes that we have seen. But these destructive elements (both man-made and otherwise) have taught us that data recovery is the key to a company's survival. I had the privilege of carrying on a dialog with Norm R. Smith, a consulting engineer with Unisys, who drove the point home while expanding on the pros and cons of different approaches.

[ What's the best disaster recovery product? Find out in the InfoWorld Test Center's comparison of disaster recovery tools for virtualized environments. ]

InfoWorld: Norm, can you explain a bit more why data replication is such an important subject for enterprise admins to consider today?

Smith: Hurricane Katrina. A rolling power failure that encompassed the entire northeast United States. Flooding of the Red River in Fargo, N.D. Just a few of the natural disasters that can ruin an IT manager's day. These kinds of unexpected and unavoidable occurrences underscore the importance of having a disaster recovery strategy for a company’s IT infrastructure. And if that weren’t enough, there's the Sarbanes-Oxley Act contributing requirements for a disaster recovery plan. Yet the typical IT infrastructure is extensive, complicated, and already tuned to the job its doing. How do you approach disaster recovery without disrupting the very production workload you're looking to protect?

The core [purpose] of disaster recovery for an IT environment is protecting the data.  Protecting the data involves making a copy, and keeping it beyond the reaches of a regional disaster. One typical technique is to shut down production on a slow weekend night, copy the data to tape, and store the tapes at another, distant location. That can work, but recovery in the case of an actual disaster is usually a matter of weeks, not minutes. Increasingly, IT infrastructures demand faster recovery, and that means keeping backup data online and keeping it up to date in near real time. There are two typical approaches to provide that capability.

InfoWorld: Can you explain for us first the two typical approaches, which I believe are host-based software and storage-based data replication?

Smith: Yes. First is host-based software. In this case, software that runs on host environments and uses the host's network to copy data (and changes to the data) to another location. But host-based products must be separately deployed to each host -- a potentially daunting task. Host-based solutions consume host processing and network resources, potentially destabilizing an otherwise finely tuned application environment. And no single host-based product covers all of the application and operating system environments typically found in an IT infrastructure.

Second is storage-based data replication. In this case, software runs in storage subsystems and uses the storage processing capacity to copy data (and changes to the data) to another location. But storage-based products must be deployed separately to each storage environment. There is no coordination between data in different storage subsystems, even if applications require coherency of that data. Many storage system deployments require expensive edge devices to convert replicated data traffic from SAN transports to IP transports. Storage based products require matching storage at both production and disaster sites; you can’t use storage based replication between, say, an EMC disk subsystem and a HP disk subsystem. And storage-based  deployments consume storage resources (such as disk access and cache memory), potentially impacting production workloads.

InfoWorld: Those are typical approaches, but you believe that another approach -- out-of-band replication -- may be a better option for some. Can you explain that?

Smith: This approach installs a low-profile appliance into an IT storage infrastructure (the SAN) to capture and transmit copies of production data. Data is captured as it traverses the SAN from a host to storage, eliminating the need for host workload impacts, and for storage workload impacts. (One example of this is the Unisys SafeGuard solution which you can learn more about here.) A single deployment can protect a large number of hosts and a large number of disk storage subsystems. This minimizes the production impact of installing and maintaining data replication. This approach also allows different (mismatched) storage subsystems at production and disaster sites. The availability of out-of-band processing also allows the appliance to provide a rich feature set to manage the data replication flow. For example, Unisys's out-of-band appliance reduces the amount of wide-area network bandwidth required between a production site and the data replica at a disaster site -- typically by a factor of 10 or more -- and it does so without consuming host or storage cycles. It also retains before and after data images with the replica, enabling selected data rollback to any point in the past -- a real boon to recovering from unexpected data corruption at the production site.

Here are the chief advantages to an out-of-band data replication strategy:

  • A single deployment to cover many hosts and many storage subsystems.
  • The ability to mix and match different kinds of storage subsystems.
  • Data protection with virtually no impact to production workloads.
  • The ability to coordinate sync points in the data stream between different hosts and different storage subsystems.
  • Superior bandwidth reduction to minimize wide area network link costs.

Out-of-band can be a very attractive approach to enabling protection of a complex environment, without having to disrupt the entire environment along the way.

InfoWorld:  Thanks Norm. Needless to say, there are many factors to consider when considering your data replication strategy. Each environment will ultimately need to make the choices.  Hopefully it's clear that choices must be made to ensure your company can continue to operate under the most extreme circumstances.

Are you prepared? I would like to know what you are doing to address these issues and which solutions you find beneficial for your particular environment.