I'm often told that the use of big data systems will kill the now very old world of data warehousing. Why? It's hugely expensive to build data warehouses. Consider the cost of the technology, including very pricey hardware and software. The minimum buy-in is well over $1 million -- and I'm being kind with that number.
Enter big data on cloud platforms. Now you can access other people's hardware to build massive data-storage systems. These data-storage systems can use highly distributed query processing systems for a divide-and-conquer approach to gain answers from massive amounts of data in mere minutes or even seconds.
[ From Amazon Web Services to Windows Azure, see how the elite 8 public clouds compare in the InfoWorld Test Center's review. | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter. ]
Traditional data warehouses typically work with abstracted data that's been rolled up (cleansed and transformed, in data warehousing lingo) into a separate database (the data warehouse or data mart), for which specific analytics are known in advance (such as compliance reporting or sales trend analysis). That database is updated incrementally with the same type of rolled-up data, typically on a weekly or monthly schedule. By contrast, big data systems tend to have raw data, whether from operations (log reports), user activity (website tracking), or other real-world usage (census surveys). That raw data is left as is because its usage is not predetermined, so there's no known target to trasnsform it to.
It's clear that using big data systems means you have more current, original-context information that can better support line managers and executives. What's more, the cost is about a third or less than that of traditional data warehouses. And getting a big data system up and functional on a public cloud takes about one-tenth of the time, if that.
Given the huge differences and the obvious benefits of big data on public clouds, what's the future of traditional data warehousing?
The reality is that those who use data warehousing technology will continue to do so. Although tasks are moving quickly to big data, the systems I've seen deployed are more operationally focused. Big data systems are typically used to understand tactical issues, such as when inventory likely needs to be replenished or who's not selling their quota.
Enterprises still use data warehousing for reports and visualizations that go to executives and regulatory agencies to report the holistic performance of the company. They are generated by traditional data warehouse systems that cost millions of dollars to build, and those systems are not going anywhere anytime soon. No matter how good and cost-effective big data on the cloud is or becomes, data warehousing will still be a fact of life in many enterprises, and that fact will last for the lifecycle of existing systems. It's strange, but that's the way I see it.
This article, "The cloud and big data are no threat to data warehouses," originally appeared at InfoWorld.com. Read more of David Linthicum's Cloud Computing blog and track the latest developments in cloud computing at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.