Next Hadoop confirms data as a platform

The Hadoop hype focuses on size and speed, but the added support on non-MapReduce apps could have much bigger implications

You may have heard the news out of last week's Strata: the next version of Hadoop is going to be bigger and badder than ever. But hidden within the hype about size and speed is a new feature that could radically shift the way Hadoop is used.

I hit the talk given by Hortonworks' co-founder and Apache Hadoop VP Arun Murthy at the conference and heard all of the flashy stuff: 6,000-node support, high-availability HDFS (Hadoop Distributed File System), a next-generation form of MapReduce that will enable the support of non-MapReduce applications, and the separation of block and namespace management of HDFS that will enable significant scaling of data.

[ Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Keep up with the latest approaches to managing information overload and staying compliant in InfoWorld's interactive . ]

[ FREE DOWNLOAD: The hidden costs of the data explosion | The small world of big data ]

All important stuff, but at the mention of support for apps other than MapReduce jobs, my attention was immediately grabbed. The capability to plug in MapReduce jobs is, of course, the whole reason for the MapReduce framework within Hadoop. But the new YARN framework, which according to the Hadoop wiki "will split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global RM (ResourceManager) and per-application AM (ApplicationMaster)."

If I understood Murthy's description of YARN during his Strata talk, the architecture would allow each application that needs to connect to MapReduce to have its own AM, which would enable greater flexibility on the types of applications with which MapReduce would be able to work.

Now, I'm no Super-Genius like Wile E. Coyote, but it occurs to me that if you have an application framework that developers could point to in oder to Get Stuff Done, then that seems suspiciously like a platform.

An observation I ran past Murthy when I bumped into him in the lobby at Strata. Given that Hadoop's MapReduce framework was about to become quite a bit more robust, isn't this just a set up for taking Hadoop on the path to becoming an operating system?

Murthy was receptive to the question -- in that he didn't call me a blithering idiot for suggesting it -- but he didn't think that's where Hadoop necessarily has to go, at least in the short term. If the application framework is robust enough, there will be plenty of platform in Hadoop for developers to code their applications. It doesn't have to necessarily supplant the underlying operating system.

Instead, Hadoop would act as a layer between the apps and the operating system, just like browsers sit on top of the operating system and serve as a platform for Web applications.

When seen in this light, suddenly Hadoop expands quite a bit more from a storage system and specialized MapReduce job machine to a system on which big data apps can be directly run -- no matter what operating system is running Hadoop.

Now the implications of what Hadoop is becoming should be very clear: such an expanded infrastructure has a huge potential for building data-driven enterprise and SaaS applications -- which, pretty much is all of them.

Is it any wonder that so many companies are trying to jump on the big data bandwagon and Hadoop specifically?

And Murthy wasn't entirely dismissive of the idea of Hadoop as a full-fledged OS. It would be a long way off, but some day Hadoop's application framework and filesystem could indeed expanded to talk to bare metal and peripherals, effectively becoming a completely data-oriented operating system.

Something to wonder about.

This story, "Next Hadoop confirms data as a platform" was originally published by ITworld.


Copyright © 2012 IDG Communications, Inc.