We live in an era of astounding advances in storage technology. In just the past three years, we've seen consecutive releases of 500GB, 1TB, and 2TB hard disks. Meanwhile, the cost per gigabyte has skidded lower and lower. How fortunate -- considering the volume of data has grown at breakneck speed. My fear, however, is that this exponential growth may be creating a data bubble we're not really prepared to deal with.
IDC has estimated that enterprise data doubles every 18 months. That's a scary statistic, but also somewhat difficult to wrap your head around. Sometimes a good analogy can help.
[ Whatever volume of data your business has, you've got to back it up. W. Curtis Preston gives you the 411 in "Modernizing your backup infrastructure." ]
Let's say you're an avid movie buff, and when the AFI's top 100 DVD collection came out in November 1998, you were one of the first to go out and buy it. A collection of 100 DVDs is large enough to be impressive looking, but not so overbearing that you couldn't easily browse it to find something you wanted to watch. Weighing in at around 28 pounds and taking up about 4 feet of space on your bookcase, even the most cramped NYC loft is likely to have space for it somewhere. Best of all, "Apocalypse Now" is only a quick 30-second visual search away from your DVD player.
Now, let's apply IDC's enterprise data growth stat to your primo DVD collection. After doubling every 18 months, in September 2010, your collection would number 25,600 discs. It would weigh about 3.5 tons and occupy roughly 1,000 feet of space on your bookcase. Finding a single DVD in that mess could take hours if you had the foresight to alphabetize it. If you didn't, just forget about being able to sort it without taking a week off work. Because it has grown to such a massive size, your DVD collection is now almost useless. It's a ball and chain you'll drag behind you until you just give up and get rid of most of it.
That's precisely what's happening with our data. Personal, corporate, governmental -- it doesn't matter. We're keeping and maintaining way more of it than we can possibly ever use. The fact that an 18GB disk available in 1998 is roughly the same size, weight, and cost as a 2,000GB disk you can buy today is only serving to hide this problem and make us lazy about policing our data growth.
If this problem is such a big deal, what exactly are we supposed to do about it? In our movie analogy, you would eventually be forced to cap your DVD collection before it got completely out of hand and rely on renting or a Netflix subscription. In other words, you've effectively moved your data from your datacenter to the cloud. But if you've done that, you've also married yourself to an often ambiguous set of licensing, access, reliability, and ownership problems. You don't own the movies you're watching, and there's literally no guarantee you'll be able watch any given movie tomorrow night.
Obviously, that's not the kind of risk enterprises can take with their data. The solution, though, is remarkably simple: Keep less data and do a better job of organizing the information you retain.
True, that quite reasonable approach will require a massive end-user retraining effort culminating in a cultural shift away from data hoarding. Structured collaboration and data management tools will need to be implemented and fully utilized to replace piles of unstructured data. Technologies such as deduplication and automated archiving will also go a long way toward controlling mountains of data, but these measures often serve to mask the underlying problem.
The sad truth is that no technological silver bullet exists today that will solve this problem for you. You'd never completely trust an automated system to decide what data you don't need any more. In the end, you need to do it.
That idea is incredibly distasteful. Everyone has better things to do than dig through e-mail or departmental file shares and delete things that are no longer needed. However, my fear is that if the IDC's statistics prove to be accurate over the next five or 10 years, we eventually won't have a choice. By then, it will be a far, far larger problem and take significantly more resources to correct.
I truly hope that advanced archiving, data analysis, and storage tools will continue to evolve in such a way that they can allow us to be blasé about what we keep and still be able to find what we need when we need it. But if I were you, I'd start taking a hard look at constricting data growth and enforcing organizational standards. Just because you buy huge storage resources cheap doesn't mean you should. It may not be very much fun, but it's better than being buried alive.