Fueled by a $105 million cash infusion from GE, EMC spin-off Pivotal today drew back the curtain on its next-generation PaaS platform, Pivotal One. The offering aims to provide companies a level of cloud independence by knitting together data, application, and cloud fabrics into a single platform that runs on third-party IaaS, such as Amazon Web Services, OpenStack, vSphere, vCloud Director, or private brands.
Pivotal's goal is to provide organizations with a platform on which they can roll out homegrown or custom-built cloud-based big data applications that pull in vast quantities of data in real time from various streams and turn that data into immediate, useful information. Among the opportunities here is to advance the vision of the Internet of things, where networked items -- corporate assets or consumer goods like fleets of trucks, medical equipment, vending machines, construction equipment, gas and electric meters, and thermostats -- become "smart objects" that can become part of the Internet and active participants in business processes.
Those last two examples, by the way, illustrate why GE has plunked down $105 million in a startup that won't deliver its flagship Pivotal One offering until Q4 of this year.
Flexibility is central to Pivotal One, which is why its underlying components are built on open source projects. Pivotal HD, the data-fabric component, is based on an "enterprise-hardened" version of Apache Hadoop, and the Pivotal Cloud and Application Platform is based on Cloud Foundry and the Spring application developer framework.
Digging into those layers reveals a dizzying but necessary array of components, given the company's ambitious undertaking in being a PaaS that integrates new data fabrics, modern programming frameworks, and cloud portability while still supporting legacy systems. Conveniently, Pivotal has a services arm, Pivotal Labs, which provides application design, development, and management services.
To appreciate the magnitude of Pivotal's PaaS ambition, it's useful to dig down into some of the platform's core components.
HAWQ: SQL meets Hadoop
Core to the Pivotal HD data-fabric layer is HAWQ, a parallel SQL query engine that marries Pivotal Analytic Database (Greenplum) and Hadoop 2.0 and is optimized for analytics, with full transaction support. Designed for a high degree of linear scalability, HAWQ reads data from and writes data to HDFS natively. HAWQ delivers tools for interacting with petabyte-range data sets, and it includes a standards-compliant SQL interface.
According to Pivotal, HAWQ uses a technology dubbed dynamic pipelining, a parallel data flow framework, to orchestrate query executions. HAWQ breaks complex queries into small tasks and distributes them to query processing units for execution. The system's basic unit of parallelism is the segment instance.
"Multiple segment instances work together on commodity servers to form a single parallel query processing system. When a query is submitted to the HAWQ master server, it is optimized and broken into smaller components and dispatched to segments that work together to deliver a single result set," according to Pivotal. "All operations -- such as table scans, joins, aggregations, and sorts -- execute in parallel across the segments simultaneously."