The dirty truth about big data and NoSQL

You thought big data was exclusive to social media empires and search engines? Think again

If I asked you for the defining characteristic of a big data customer, you'd probably say they're sitting on large amounts of data. If I asked for the defining characteristic of a NoSQL customer, you might answer they require high levels of concurrency.

Well, if that's the total market for NoSQL and big data, then both MongoDB, Inc., as well as the various companies supporting Hadoop should probably shut their doors and call it a day.

[ 18 essential Hadoop tools for crunching big data. | Work smarter, not harder -- download the Developers' Survival Guide from InfoWorld for all the tips and trends programmers need to know. | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]

In truth, opting for Hadoop is in many ways an economic decision. If a company has deep pockets and daunting amounts of data, then it can throw money at a high-end MPP solution from IBM, SAP, or Teradata -- in fact, most large companies have already made that sort of investment. But not all of us hang out with the 1 percent and light our cigars with $100 bills. Even those that do then have to make business decisions "up front" on whether the exorbitant costs of keeping data and deciding what to do later.

For the rest of us, Hadoop provides analytics capabilities we couldn't access before. Even the cost of commercially supported "enterprise" distributions of Hadoop amounts to nickels on the dollar compared to, say, IBM Netezza.

NoSQL technologies like MongoDB or Neo4j are also, in effect, economic decisions. If you buy a fat enough server and pay for enough developer time, you can indeed run nearly any document or graph database job in your favorite RDBMS. But developer time is not cheap and server licenses get expensive -- plus, the infrastructure to scale up an RDBMS so that it supports high availability and disaster recovery costs a bundle. No wonder the brighter operations folks like the sound of the NoSQL alternatives: Save money by using commodity hardware, and snap on more servers as needed.

In all but the tiniest companies, it's a myth that your data is "small" and your concurrency requirements are light. If Hadoop and MongoDB were aimed only at the companies that are already capturing massive amounts of data and have millions of users, the market would be much smaller than MongoDB's valuation alone implies.

The dirty secret is that big data and NoSQL vendors aren't just targeting gigantic, consumer-facing companies like Facebook or Google. The technology applies much more broadly, and as the supply of high-concurreny, low-cost, flexible data storage increases, so will demand. If you can hoard all that data cheaply, why not mine it cheaply as well and compete with the big names?

Moreover, as organizations mature, RDBMS schema design -- and more important, schema updates -- demand massive coordination. Anything that requires massive coordination better happen rarely. By contrast, flexible technologies such as NoSQL meet the needs of highly competitive companies seeking to adapt to changing customer demands and shifting, expanding markets.

Don't get me wrong -- massively scaled companies with massive amounts of data can and do deploy NoSQL databases and big data tools. However, your IT department has trained itself to throw away data it doesn't think is relevant today. This stuff is useful to all of us and changes the way we think about data: capture first, analyze later. It's a much bigger market than you think.

This article, "The dirty truth about big data and NoSQL," was originally published at Keep up on the latest developments in application development, and read more of Andrew Oliver's Strategic Developer blog at For the latest business technology news, follow on Twitter.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.