Ex-Microsofter Muglia melds SQL analytics, data warehouse as a service

Snowflake's new data warehouse is designed from the ground up for the cloud and is aimed at analytics users who need high elasticity

storage facility exit
Credit: flickr/:Amy:.

A startup called Snowflake is setting out to create the "data warehouse as a service" market -- so analytics users can do self-service, low-to-no-maintenance data warehousing in the cloud.

The Snowflake Elastic Data Warehouse is aimed at analytics users with multiple requirements. They want to work with masses of cloud-hosted data and are looking for an alternative to legacy warehouse solutions like Teradata, but don't want to deal with large-scale database maintenance or wrangle the complexities of a solution like Hadoop.

In a phone conference, CEO (and Microsoft vet) Bob Muglia and VP of marketing Jon Bock explained that the problems Snowflake is meant to solve can't be addressed by wrapping, repackaging, or repurposing existing open source solutions. To that end, Snowflake was built from the ground up as an elastic cloud-first service that works with existing SQL-powered analytics and requires minimal maintenance.

Muglia and Bock noted that Snowflake provides an alternative to both conventional data warehousing and Hadoop by addressing certain weaknesses in each. Conventional data warehousing technology is used to analyze transactional business information. While it's powerful, it's also complex to maintain and runs mostly on premises. Hadoop is used both on-prem and in the cloud to store and analyze the new flood of semi-structured, machine-generated data, but it too is complex to maintain and requires different skill sets to work with. Efforts to merge the behaviors of the two -- such as giving Hadoop SQL querying functionality -- typically compromise the way one works over the other.

Snowflake's answer was to create an entirely new relational SQL-compatible database system. It can supply analytics software with a cloud-hosted source for its data, and it's built with "the key attributes of the cloud" in mind, as Muglia put it. The newness of the architecture is a major point. "We're not just putting infrastructure glue around things to make them work in a cloud environment," Muglia said.

The software supports both structured and semi-structured, JSON-style data, and it sports what Snowflake describes as "multidimensional elasticity" -- the ability to scale up and down, as well as to scale automatically based on usage.

Data warehouses, as Muglia noted, have not been built to be elastic. "Even the ones in cloud infrastructure don't really have true elasticity," he said. He cited an example of one cloud-hosted data warehouse product that didn't scale on the fly; rather, changing sizes simply triggered the creation of a new instance, with all the data then copied over to it -- a process that could take hours on end.

One of the side effects of this lack of elasticity is the amount of effort poured into the likes of capacity planning, and what Muglia described as "making the data fit what the systems are" -- the transformation and ETL steps that slow down the analytics process. Hadoop is sometimes used for this kind of data transformation, but the results still have to be loaded into an analytics database for further work. Snowflake, by contrast, is meant to allow the analyst to focus on what to ask of the data, rather than the massaging of the data into a form that can be queried properly.

Eclipsing or replacing BI itself isn't the goal, though. "We're not trying to put together a BI layer [itself], which is what people like Birst are focused on," Muglia said. "We're trying to be a data warehouse in the same sense as traditional on-premise layers like Teradata." The company's aim is to satisfy those who still leverage SQL as an analytics tool: "There's a huge market where SQL is still the language of data, but it's kind of a neglected market."

Muglia feels there's "an untapped opportunity of people who don't have a true enterprise data warehouse." According to him, Amazon RedShift -- the cloud giant's own data warehousing service -- gained more customers in its first two years than Teradata has had in an entire decade.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.