Hewlett-Packard's Vertica has updated its flagship parallel columnar database so other programs can instigate analysis directly on its database clusters.
For Vertica Analytic Database 5.0, HP has included an SDK (software development kit) that developers can use to have their programs make direct method calls to the Vertica database.
[ Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
"We've opened up our platform so you can not only execute the SQL you want to execute, but you can write your own custom methods," said Colin Mahony, HP Vertica vice president of product and business development. "We're exposing the same environment that our developers get to work with when they add new features and functions. You get the ease of use and flexibility with SQL with the performance and extensibility of a programming language."
Initially, the SDK supports C++, though support for other languages will be added in the future.
"An SDK is an essential feature of an analytic platform," said Curt Monash of Monash Research. He said this feature is valuable because typically, in order to perform complex analysis, the data must be moved out of the database and into a data warehouse. With the SDK, programmers can have their programs probe the data directly within the database itself.
The SDK could also minimize the headache of programming for parallel environments, where data is scattered across multiple servers.The Vertica database is a grid-based column-oriented database, one developed specifically for large-scale data analysis that can be carried out across a cluster of servers.
"Vertica's optimizer and execution engine automatically parallelizes jobs," Mahony said. "All [developers] have to do is write their method, and we handle how its gets broken up and parallelized."
This feature could be particularly valuable in running jobs coded in MapReduce against the structured data within Vertica. MapReduce is a framework increasingly used for processing unstructured data, usually in Hadoop clusters, across multiple servers. By speaking the MapReduce language, Vertica offers users the ability to query both structured and unstructured data within a single operation.
"A lot of our customers have taken fairly large MapReduce libraries and converted them to run inside Vertica without much effort," Mahony said. "People can seamlessly move data and analytics back and forth between Vertica and Hadoop."
The new version of Vertica has a number of other improvements and enhancements as well. It has an expanded set of SQL analytic functions, including the abilities to execute basic geospatial queries, event-series pattern matching, event-series joins, and advanced aggregate statistical and regression algorithms.
The company has also tightened the core code base to run faster subqueries, database statistics and other routine database operations. The backup capabilities have been expanded. And this release features cluster-cloning capabilities, or the ability to break off a piece of the database and run it in its own sandboxed environment.
"So if you have a 200-node cluster and want to spin off a sandbox for a separate data mart, you can point Vertica at the new server cluster, hit a button, and it will automatically ship the data off" to this new cluster, he said.