However, there is a key need to access data through secondary indexes and to query data that is likely distributed across the cluster. Using a map-reduce technique is overkill since every node is required to participate in the query execution.
As an industry based on sharded scale-out architectures, we need to solve the access patterns that require related objects to be retrieved, plus support querying data through secondary indexing. While map-reduce techniques have been useful in building the first generation of solutions, interesting challenges arise in building the next generation of these innovations.
A large body of work has been accomplished in distributed algorithms that is relevant in moving toward distributed query processing and query optimizers for large scale-out architectures. The future of managing large data sets is likely to see significant innovations in indexing schemes and query optimization.
Despite such challenges, scale-out architectures are gaining more traction every day. Scaling up RDBMS-style becomes less and less practical as the volume of data increases. And when a scale-up architecture underlying a very large data set fails, it'll likely be one very large central point of failure.
The Internet is a classic example of ultrascale, shared-nothing clusters -- that is, very large distributed systems working in tandem. In many ways, database systems must learn and adapt from this example and build out ultrascale, distributed database systems on similar principles.
New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to firstname.lastname@example.org.