If you've got a lot of data, then Hadoop either is, or should be on your radar.
Once reserved for the Internet empires like Google and Yahoo, the most popular and well-known big data management system is now creeping into the enterprise. There are two big reasons for that: 1) Businesses have a lot more data to manage, and Hadoop is a great platform, especially for combining both legacy old data, and new, unstructured data 2) A lot of vendors are jumping into the game of offering support and services around Hadoop, making it more palatable for enterprises.
[ MORE FROM NETWORK WORLD: Get started with Hadoop: Free training resources from Cloudera, MapR and more | Sizing up the Hadoop ecosystem, a guide to the projects that make up Apache Hadoop | 18 essential Hadoop tools for crunching big data ]
"Hadoop is unstoppable as its open source roots grow wildly and deeply into enterprise data management architectures," Forrester analysts Mike Gualtieri and Noel Yuhanna wrote recently in the company's Wave Report on the Hadoop marketplace. "Forrester believes that Hadoop is a must-have data platform for large enterprises, forming the cornerstone of any flexible future data management platform. If you have lots of structured, unstructured, and/or binary data, there is a sweet spot for Hadoop in your organization."
So where do you start? Forrester says there are a variety of places to go, and it evaluated nine vendors offering Hadoop services to find the pros and cons of each. Forrester concluded that there is no clear market leader at this point, with relatively young companies in this market offering compelling services alongside the tech titans.
First, some background: Hadoop is an open source Apache project that anyone can freely download the core aspects of - these include Hadoop Common, Hadoop Distributed File System (HDFS), Hadoop YARN, and Hadoop MapReduce. Many companies from IBM to Amazon Web Services, Microsoft and Teradata all have packaged Hadoop into more easily-consumable distributions or services. Each company takes a slightly different strategy, but the key differentiator for all of these is that Hadoop has the ability to distribute workloads across potentially thousands of servers, making big data manageable data.
Note: This list is based on vendors listed in Forrester's Wave report and is not meant to be all encompassing of Hadoop and big data management platforms. It is listed in alphabetical order.
Amazon Web Services
Customers looking for a public cloud hosted Hadoop platform needn't look much further than the company Forrester calls the "King of the cloud" - Amazon Web Services. The company's Hadoop product is named Elastic Map Reduce (EMR), which AWS says uses Hadoop to offer big data management services. It is not pure open source Hadoop though, it's been tinkered to run specifically on AWS's cloud.