InfoWorld's own Pete Babb provided some good coverage around the "analytics cloud" recently debuted by IBM, called Blue Insight. You can think of Blue Insight as a system that gathers data from those who use it and externalizes the data to those who need it, doing so on a cloud -- a private cloud.
However, IBM clearly does not have a lock on "big data." There has been movement in this direction for some time now, including some innovative approaches to leveraging data such as MapReduce. For those of you unfamiliar with the concept, MapReduce is a software framework brought to us by Google to support large distributed data sets on clusters of computers. What's unique about MapReduce is that it can process both structured and unstructured data and, through the use of a distributed "share nothing"-type query-processing system, return result sets in record time.
[ Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in the InfoWorld editors' 21-page Cloud Computing Deep Dive PDF special report, featuring an exclusive excerpt from David Linthicum's new book on cloud architecture. | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter. ]
Map, meaning the master node, accepts the request and divides it between any number of worker nodes. Reduce means that the master node considers the results from the worker nodes and combines them to determine the answer to the request. Simply put, each mapping operation is independent of the other, and thus maps can be performed in parallel. The reason Google developed this is rather obvious, as is its use within Facebook and Yahoo.
There are open source software instances of that leverage MapReduce, such as Hadoop, which has caught fire in the last year or so as a very cleaver approach to managing large data sets. Typically this means many terabytes, but it could easily go significantly higher. Cloud providers are either leveraging, or looking to leverage, Hadoop as a mechanism to manage data, as the big search engines and social networking sites do today.
This indicates a trend toward much of the innovation around leveraging larger amounts of structured and unstructured data for business intelligence, or general business operations, coming from innovation in the cloud, and not traditional on-premise software moving up to the cloud. This is a shift. Considering this trend and the fact that cloud providers provide scalability on-demand could be the one-two punch that sends much of our business data to cloud computing platforms.
This article, "The cloud will finally solve the 'big data' problem," was originally published at InfoWorld.com. Follow the latest evelopments on cloud computing at InfoWorld.com.