Hadoop's growth opens up demand for data migration tools

As more companies adopt Hadoop, they need help getting their data onto the platform -- and a new field is born

Hadoop elephant code

Hadoop's explosion over the last few years has been phenomenal. One estimate puts its growth at nearly 60 percent year-over-year, with a market of $50 billion by 2020. As the furious uptake has created demand for Hadoop vendors, an accompanying need for vendors selling Hadoop data migration tools and services is also shaping up.

In theory, getting data into and out of Hadoop is well within the capacity of both the software and its users. Apache's Sqoop project was created to deal with Hadoop import and export, with native support for the usual suspects: MySQL, Oracle, PostgreSQL, and HSQLDB. But not everyone is comfortable doing the work themselves, so vendors are offering polished import/export solutions that require less manual labor.

Companies with data migration solutions for other, pre-existing platforms are a natural for this space. For example, Attunity, maker of a variety of data-movement solutions, has Attunity Replicate, which also handles many data sources and targets other than Hadoop -- such as Oracle, SQL Server, DB2, and Teradata. Attunity offers optimizations specifically for transfers over wide-area networks, clearly intended to appeal to those attempting to migrate mult-terabyte jobs off-premises.

In the same vein, Diyotta DataMover also supports Hadoop as either a source or a target, with an equally large roster of data formats and repositories.

Syncsort specifically targets mainframes, working in conjunction with Cloudera to create a system that harvests data directly from existing mainframes and loads it into Hadoop. Syncsort CEO Lonne Jaffe describes it as "a button you can push to suck in the expensive workloads."

With these offerings, the main attraction isn't the number of supported data sources, but rather the convenience and the expertise-in-a-box approach. Hadoop vendors like Hortonworks compete by offering their own support and migration services, so there may be less incentive on their part to make Sqoop into a full-blown replacement for third-party products.

One more detail vital for any Hadoop data migration product is future-proofing -- specifically, being able to work well with the changes coming down the pike for Hadoop. This is bigger than ditching MapReduce for YARN, but needs to include support for the likes of Apache Argus, Hadoop's forthcoming data security framework.

The best long-term investment for dealing with Hadoop data migrations may be in understanding the existing toolsets and making the most of them. You might not want to roll your own Sqoop import connector for a mission-critical job, but the work could pay off in the long run for a future inward migration -- or if an option even bigger and more ambitious than Hadoop comes along.


Copyright © 2014 IDG Communications, Inc.

How to choose a low-code development platform