We all knew and loved Gartner in the '90s as Microsoft's favorite analyst firm, before it bit the hand that fed it somewhere in the early 2000s. I don't know exactly when Gartner regained its respectability, but its latest diatribe (I suggest you thumb through the summary rather than enduring the Alan Greenspan-like Gartnerese of backtracks and doublespeak) attacks the concept of a data lake without offering any credible alternative. Instead, Gartner suggests you try even harder with data warehousing.
This is tried and true advice that has worked so well throughout human history: Be extra careful, plan really hard, coordinate well with large groups of people, and don't mess up. This great plan was brought to us by the buffer overflow, buffer underrun, privilege escalation, and the fine people from the White Star Olympic line of luxurious sea vessels, because one out of three ain't bad.
The data lake strategy is part of a greater movement toward data liberalization. It started with the printing press and moving the books out of the monastery. Sure, there was confusion and a schism, but did we really want to wait for the monks to decide who gets the handwritten books?
It continues with the Internet. Granted, it's sad that bookstores are toast, but I really hate to wait in line. Yes, Wikipedia has its problems, but in comparison, Encyclopedia Britannica (now on disc) delivers only slightly less erroneous material -- and one-tenth the coverage.
Now Gartner has aligned itself with the data monks who sit over the data and horde it in usually expensive, proprietary technologies. It may be more secure (don't bet on it), and if only those trained (or who have sufficient clout) can access it, then the interpretation may be more accurate -- or the distortions more deliberate.
By that same argument, proprietary software is more secure because only "experts" have access to the source, right?
Gartner critiques vendor marketing concepts of data lakes, along with the intuitive meaning of the name, rather than basing its analysis on any real practice of how data lakes are implemented. Of course you can drown in a data lake! But that's why you build safety nets like security procedures (for example, access is allowed only via Knox), documentation (what goes where in what directory and what roles you need to find it), and (yes, Gartner) governance.
But this needn't involve convening a massive integration project every time someone wants to pull data out in a way that hadn't been thought of before or draw new correlations between data from disparate systems. Sure, people will make mistakes and draw wrong conclusions, but having more people who are well informed is generally better than hoping that some (often technical rather than business-aware) data czar sitting over a data warehouse, as gatekeeper, is going to save you from all this.
Data lakes are based on new technology. This is a new methodology. Of course there's a risk, but no real progress is ever made without taking some risk.
Understand what data liberalization means for your organization and how you can use it and the new class of tools surrounding it to make better, more informed decisions. Understand the new technologies and their capabilities (like streaming and finally forgoing being stuck with canned reports). Don't be scared because an analyst firm wants $200 for a five-page FUD report with an eye-popping title.
This article, "Gartner gets the 'data lake' concept all wrong," was originally published at InfoWorld.com. Keep up on the latest news in application development and read more of Andrew Oliver's Strategic Developer blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.