Big data in the cloud has so many potential functional service layers sprawling across so many nodes, clusters, and tiers that it's easy to feel overwhelmed.
Take a deep breath. Your first step should be to plan a comprehensive cloud data virtualization infrastructure. Virtualized cloud analytics is the big data paradigm for the new era. As an integration approach, it ensures unified access, modeling, deployment, optimization, and management of big data as a heterogeneous resource.
[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this hot topic. | Cut to the key news for technology development and IT management with our once-a-day summary of the top tech happenings. Subscribe to the InfoWorld Daily newsletter. ]
Data virtualization, like any virtualization, is an approach that allows you to access, administer, and optimize a heterogeneous infrastructure as if it were a single, logically unified resource. This enables you to abstract the external interface from the internal implementation of some service, functionality, or other resource.
Data virtualization's centerpiece is an abstraction layer, such as any of the SQL-virtualization approaches that support logically unified access, query, reporting, predictive analytics, and other applications against disparate back-end data repositories, such as relational, Hadoop, NoSQL, and so forth. Of course, data virtualization may in turn rely on other layers of infrastructure virtualization, such as storage and server platforms, in some cases spread across geographic locations and multiple cloud environments.
However many layers you're discussing, virtualization is the epitome of unsexy data topics. But it's fundamental if you want your big data cloud platform to address the following business imperatives: