Storage is bigger and faster than ever, with 1.5TB drives shipping and 8Gbps Fibre Channel, 10Gbps iSCSI, Infiniband becoming affordable. The data to fill those disks and pipes is growing faster than ever, with archiving for e-discovery and legislative requirements growing all the time, audio and video data for surveillance, teleconference archives, video blog posts, Webcasts, and simply more business processes being digitized. By contrast, a unified approach for protecting and managing that data is not really much further along than it was ten years ago, when 10TB was a large amount of data for even big enterprises.
Now that petabytes are becoming commonplace, the problem is much more urgent. If indexing software to build metadata about all the files stored across an enterprise requires a cluster of servers to run, and it still takes days to complete an index, the utility of that metadata is limited. We keep getting hints of potential solutions to this sort of problem, such as Microsoft's promise of a new file system (Windows Future Storage) based on a relational database -- originally promised as part of Windows Server 2008 but now pushed out indefinitely.
Don't blame Microsoft for failing to pull the rabbit out of the hat; it's a difficult problem to solve. To automatically classify data and index it requires a high degree of artificial intelligence. Indexing engines that can run across a LAN and index data on multiple disparate systems are extremely processor and bandwidth intensive.
While some of today's data management applications do a good job, they tend to be isolated silos, tied to a specific vendor's storage or to an application running on a specific platform. An enterprise-wide, multi-platform data management system that can handle all aspects of data management, including indexing, metadata creation, virtualization, migration, data tiering, replication, and so forth does not yet exist.
For such a data management system to become a reality, three key pieces must come together: widely adopted standards for data management, which should come from SNIA, the Storage Networking Industry Alliance; methods for automatically classifying and finding data, which should come from the file system; and cooperation between storage and OS vendors to facilitate single-console management of data across multiple data storage platforms, operating systems, and networks.
Will these pieces fall into place before we're swimming in exabytes? It depends mostly on you. Ask your vendors for these features, and keep asking. Nearly all storage and operating system vendors are members of SNIA. The infrastructure is there to create the standards necessary, but it has taken much longer to make any progress than one might hope.