When a topic is complex, people tend to indulge in sloppy analysis, inadvertently conflating concepts and making apples-to-oranges comparisons.
A glaring example of such muddled thinking is the absurd meme that the data warehouse is “dead.” This line of argument tends to confuse several distinct conceptions of the data warehouse. Here's why, no matter how you approach the topic, the data warehouse is actually very much alive and thriving.
Data warehouses as discrete analytic platforms. Some people might declare the data warehouse dead if they or their organizations don’t own, manage, use, or need a data warehouse as a discrete stand-alone platform. If, say, all your business analytic needs are satisfied by an OLTP database -- as opposed to a separate business-analytics platform specifically engineered for data warehousing -- you might be lulled into the false impression that data warehouses are unnecessary, hence moribund.
Data warehouse platforms as specific types of data storage, processing, and governance node. Some people might declare data warehousing kaput simply because they’re familiar with (or are starting to use) a nonrelational platform to satisfy all or most of their data warehousing requirements. The people who profess this tend to be users (or providers) of pure-play solutions in the Hadoop, NoSQL, and “NewSQL” arenas. The fact that Hadoop, for example, is starting to assume data warehousing infrastructure roles (refinery, archiving, exploration) doesn’t mean that relational databases, which have been the heart of this space from the start, have grown less relevant. In fact, more IBM customers are moving toward a “logical data warehouse” architecture in which relational platforms are increasingly supplemented, but not supplanted, by Hadoop platforms.
Data warehousing as a range of use cases, deployment roles, development and integration skills, domain knowledge, and best practices. Some people might declare data warehousing obsolete simply because they’re in thrall to some new metaphor called a “data lake,” “data reservoir,” or whatever. But that perspective ignores the fact that no one is suggesting that the so-called lake or reservoir alone can support the core data warehousing use case: aggregation, retention, and governance of officially sanctioned, “single-version-of-the-truth” data records, such as those related to customers, finance, HR, and so forth.
Also, the data lake perspective tends to peg data warehousing to the management of structured data records; in fact, the paradigm is agnostic to the underlying formats and schemas of the data being aggregated and governed. It is certainly possible -- and in some use cases, increasingly practical -- to do data warehousing on Hadoop and NoSQL platforms.
As Dennis Duckworth notes in this recent blog, the hardline “data warehousing is dead” mentality is starting to recede from industry discussions. At the recent Strata conference, says Duckworth, there was general consensus, even among the Hadoop pure plays, that data warehousing as a practice is thriving. As Duckworth observes, data warehousing is now perceived as an important new addressable growth frontier for the Hadoop industry.
Contrary to popular misconceptions, the data warehouse isn’t dead, nor is Hadoop killing it. In fact, Hadoop is accentuating the critical importance of a platform for centralized data governance and master data management within big data environments. Moreover, through the power of fluid interfaces among Hadoop, RDBMSes, and other data platforms within hybrid “logical data warehouses,” all of these data platform markets are doing rather well.