How to avoid downtime and disruption when moving data

Increasingly organizations need to move data between data centers and/or the cloud, but the risk of downtime during transition is a sticking point

How to avoid downtime and disruption when moving data
Thinkstock

Business Continuity Awareness Week 2017 is here, and hopefully it will present a fresh opportunity to review some of the cloud’s limitations in this area.

Some 60 percent of all enterprise IT workloads will be run in some form of public or private cloud by as soon as next year, according to 451 Research’s latest estimate. It projects particularly strong growth in critical categories, including data analytics and core business applications. Findings from IDC, Gartner and Forrester present broadly the same picture—that the cloud is rapidly becoming central rather than peripheral to general IT provision.

+ Also on Network World: Businesses eye cloud for big data deployments +

It is little wonder then that IT leaders are expressing concern about the risks of data movement, and of associated downtime. Typical data sets are now anything from 1,000 to 1 million times the size of the average enterprise database from 10 to 20 years ago. This means the potential downtime associated with moving it will be many times greater. It’s no longer a case of winging it for 15 minutes—this could be hours of downtime while data is being resettled.

Businesses know they need to make more use of the cloud, especially to do the more strategic and clever stuff: high-speed, high-volume data crunching to support real-time decision making and sophisticated automation. The volumes of data generated today have made establishing secondary data centers cost-prohibitive, too: a further factor driving companies to the cloud.

But the pain associated with getting from here to there still feels prohibitive. What might happen to their data in transit, what if they can no longer gain access, and how can they keep working with live data if it’s simultaneously being used somewhere else?

Latency is also an issue. Data centers used to be built at close proximity to prevent any degradation in performance associated with network transit. But with the cloud, distances between physical server farms aren’t within companies’ control, so performance issues—which could slow data availability and reconciliation—are an important consideration.

Concerns about downtime are valid in disaster recovery scenarios, too. Where remote data centers may be called upon to get live systems back up and running quickly, it is quite right for CIOs to fear downtime or data loss—due to inadequate synchronization between near and remote systems, for instance. It’s something IBM will be discussing in a webinar as part of Business Continuity Awareness Week.

The future is now

Whether it’s everyday back-office systems or those underpinning ambitious new projects associated with artificial intelligence or the Internet of Things, organizations need to be able to count on the availability and integrity of the data they’re processing at all times. 

For driverless cars to take off, for instance, all parties (passengers, the car manufacturer, insurance companies and third-party service providers) need absolute assurance that vehicle instruments and sensors and the cloud-based platforms they are connected to will continuously be able to send, receive, interpret and process data in real time. It’s estimated that a single autonomous vehicle with its sensors, cameras and laser measuring (LiDAR surveying) can produce 100Gb of data per second.

The only way to provide a viable service using continuously changing data sets—with no downtime and no disruption—is via something we call active data replication. This allows live data to exist in more than one place at the same time, without risk of falling out of sync and without interruption as each end point is updated. It is this capability that will allow car manufacturers and service partners to analyze and respond to live data about how vehicles are performing, identify anomalies in real time, and pre-emptively determine what remedial action may be needed.

Companies don’t have to be reaching for the stars to come up against this kind of data integrity challenge. A lot of organizations are turning to Hadoop-based analytics (a particular way of doing large-scale data crunching at speed) to turn big data into something meaningful and actionable they can use in their everyday activities. A lot of businesses use Hadoop to analyze and respond to Twitter activity, for instance. But again this typically means putting data into the cloud where the required processing capacity is readily available.

Unless they are working with historic data, companies will continue to need access to data in their core business systems—where records continue to be updated. In this kind of scenario, using the cloud to do the processing isn’t simply a case of shipping a batch load of complete data off to a destination where something clever happens to it and getting it back once the magic has happened and the results are in.

Pressing pause isn’t an option

When analysis takes place on live, production data, companies can’t afford for the site of data origin and the point of data crunching to be out of sync. Nor can they wait several days—for data to move, be analyzed and come back—before anything new can happen to it. This isn’t just downtime: it’s paralysis. And that’s without factoring in any corruption that might have happened in the transition or as a result of data being reconciled after the Hadoop analytics event.

Again, the only way to avoid downtime and disruption associated with data movement is to find a way to continuously update and sync data between locations. The likes of Google achieve this via an elaborate satellite set-up. But you can also do it using clever algorithms like we do.

This article is published as part of the IDG Contributor Network. Want to Join?