The concept of big data is simple, as most good ideas are. Big data gives us the ability to use commodity computing to process distributed queries across multiple data sets and return result sets in record time. Cloud computing provides the underlying engine, typically through the use of Hadoop. Because these commodity server instances can be rented as needed, big data becomes affordable for most enterprises.
We always make discoveries as we use new technology, both good and bad. In the case of big data, the path to success will come with key lessons. But given the novelty of big data in real-world deployments, there are major questions for which we don't yet have answers -- so be extra careful in these areas.
[ Read InfoWorld's primer on big data, and find out where cloud computing stands in 2012 in our special report. | In the data center today, the action is in the private cloud. InfoWorld's experts take you through what you need to know to do it right in our "Private Cloud Deep Dive" PDF special report. ]
Management of both structured and unstructured data, which is an advantage of using a nonrelational database, could mean that the unstructured data is much harder to deal with in the longer term. At some point, we'll have to make tough calls around converting unstructured data to structured form. The trouble is that many of the initial design database implementations will be difficult to change once they're in production.
The cost of using local servers is going to be high for those who won't, or can't, move to cloud-based platforms. We're talking hundreds to thousands of servers that have to be loaded, powered, and maintained. Although you can avoid the cost of traditional enterprise software licensing, the raw processing power required will still drive many big data implementations over budget. I suspect many big data efforts will initially occur within data centers, where the big data expenses are intermingled with the overall data center costs; count on the final tallies to be a surprise.
Cloud-based big data servers are not at all the same. Amazon Web Services provides very different offerings than Google, for example, and capabilities differ between any pair of platforms you compare. Thus, the amount of time, effort, and talent required to get big data projects to their end state also vary, due to differences in technology. I suspect one or two platforms will emerge as the clear paths to success, but we're not there yet.
This article, "Big data and cloud computing: Watch out for these unknowns," originally appeared at InfoWorld.com. Read more of David Linthicum's Cloud Computing blog and track the latest developments in cloud computing at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.