- An advanced-analytic resource of elastic, fluid topology
- An all-consuming resource that ingests information originating in any source, format, and schema
- A latency-agile resource that persists, aggregates, and processes any dynamic mix of at-rest and in-motion information
- A federated resource that sprawls within and across value chains, spanning both private and public clouds
- A seamless interoperability resource that lets you change, scale, and evolve back-end data platforms without breaking interoperability with existing tools and applications
Yes, that's a tall order. Clearly, data virtualization and its virtualized underpinnings are much easier to talk about than to do. Plus, it is not cheap to implement, administer, or optimize.
Cloud-based big data will require virtualized infrastructures of growing complexity. It's no surprise that most data professionals approach this messy topic in much the same way that astronomers attempt to map the universe's dark matter. They know it's an essential, albeit tedious, chore. Truth be told, big data professionals would much prefer to point their strategic telescopes toward the sexy orbs -- Hadoop, NoSQL, and so on -- that shine brightest in the new technology firmament.
As the range of your cloudy big data applications grows, you'll almost certainly have to go further down the virtualization path. The stubborn heterogeneity of hybridized big data clouds will push you in that direction. Within your private clouds, constant big data platform churn will require a virtualization fabric that bridges new approaches with your legacy investments. Churn will stem from your ongoing platform modernization and migration efforts, from your need to incorporate innovative, fit-for-purpose platforms into your cloud, and from vendors' product-enhancement cycles. Unless you put all of your big data initiatives on a "one size fits all" public cloud service, you'll need to virtualize access to public, private, and hybrid cloud architectures in various shifting combinations.
Clearly, the extent to which you'll go the data-virtualization route will depend on the complexity of your business requirements and big data environment. It will also depend on your tolerance for risk, complexity, and headaches.
In the coming years, as more complex analytic models, rules, and information converge on the big data cloud, that platform will become a centerpiece of virtualized access, execution, and administration. Within this new world, MapReduce will be the key (but not the only) development framework. Instead, MapReduce will form part of a broader, but still largely undefined virtualization architecture for inline analytics and transactional computing.
Nobody yet has stepped forward to outline the layers, interfaces, and abstractions that will glue the cloud big data universe together from end to end. That's yet another tall order.