Big data in the cloud: It's time to experiment

Big data is not data warehousing or BI, so there are no proven paths --which is why you need data scientists

The lure of big data has many people in enterprise IT moving quickly to consolidate and mash up their data assets with other relevant information. The tools are here right now, including big data engines based on Hadoop, public clouds that provide rental access to a huge number of servers, and external cloud-delivered data resources to make better sense of your info.

Take, for example, a manufacturing company that can -- thanks to the use of cloud-based big data -- not only establish the output of its factories for the last 10 years but determine how that output compared with others in its industry, as well as the effects of the weather and other external factors. Moreover, it can predict future factory output through the use of proven algorithms and other relevant data models applied to that big data.

[ InfoWorld's Peter Wayner reviews the top enterprise Hadoop tools. | In the data center today, the action is in the private cloud. InfoWorld's experts take you through what you need to know to do it right in our "Private Cloud Deep Dive" PDF special report. | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter. ]

Big data is good. The cloud is good. Now, how do we actually make the whole thing work?

The truth is not many best practices have emerged on how to move to big data. We have the migration to data warehousing and business intelligence as an existing model, but as I look at what big data really is, it's clear that big data adoption is a different type of problem. Much of that experience in data warehousing and BI isn't relevant, and it may even lead to some dead ends.

The art of big data is that it consolidates many types of data resources with different structures and data models, all in a massive, distributed storage system. Big data systems may not enforce a structure, though structure can be layered into the data after the migration. But there are trade-offs in going this route, including migrating unneeded and redundant data that takes up space in the big data system.

For now, the proper path is more through trial and error than following proven concepts. The answer to how to best do big data is the classic consultant's response: It depends on what you're trying to do.

The bottom line is that you have to experiment. But you need not do so blindly. The emerging role of data scientist can help direct those experiments within an appropriate framework, in the manner of research scientists in any field. Data scientists can get you the answers to big data, as long as you understand that a scientist must run a lot of experiments.

At this point, experimentation is the best practice in moving to big data. Get a data scientist or two to design and run these trials.

This article, "Big data in the cloud: It's time to experiment," originally appeared at InfoWorld.com. Read more of David Linthicum's Cloud Computing blog and track the latest developments in cloud computing at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2012 IDG Communications, Inc.

How to choose a low-code development platform