One of the bigger problems databases face is the space crunch. As individual databases edge toward the petabyte range, it's getting much harder to find appropriate storage. And of course, the number of tables isn't growing in proportion, which means we're going from a large table size of a few million rows to that of a few billion rows. In fact, soon enough, few among us will be surprised to encounter tables in the tens of billions of rows.
Not to mention that storing all this new data is only a third of the problem. Making it accessible is the second third: Sure, disk space keeps getting cheaper, but coaxing decent performance out of a database in the petabyte range may require thousands of drives.
The last third of the problem is simply the space required to maintain backups of all this data. Currently, it's already too expensive to back up a database that's several terabytes – without compressing it, that is.
That's why I feel the next big achievement in databases will be better, more efficient compression algorithms and surrounding structures for table data – and possibly even for databases as a whole. The compression technologies currently in use are neither widespread nor high-performance enough to withstand the rigors of a demanding DSS (decision support system) or OLTP (online transaction process) system.
As for backups, a few vendors currently sell compression solutions for SQL Server, but no solution of much consequence exists for the other relational DBMSes. And all of them rely on the usual open APIs and standard compression algorithms. To get the level of compression and performance that the future of databases will demand, there's going to have to be a technology breakthrough.
That's why I believe that live data compression and backup compression will be the next big frontiers to conquer.