At Build 2015 yesterday, Microsoft unveiled three new ways it's making Azure into a haven for big data, whether it was born in one's data center or out in the cloud. Microsoft's strategy is clear: It's betting heavily on its customers generating and storing data natively in Azure, and on creating a hybrid local/remote architecture for working with that data.
Most everyone expected, and got, big announcements around Azure for Build 2015, and each of them reflected how Azure is being developed across multiple dimensions at once.
A bottomless Lake
With Azure Data Lake, for instance, Microsoft is betting the growing interest in Hadoop, and in its HDFS storage system, merits offering a service for same.
While Azure Data Lake is only being prepared for a public preview right now -- no pricing has been announced -- what it promises raises eybrows. The service claims to have been designed for raw data storage at high volume and with high throughput, but most striking is how Data Lake "has no fixed limits on account size or file size," according to its product page. (It's safe to guess pricing will be based on usage -- storage, throughput, or both.)
Some of this is in the vein of Microsoft's other announcements around the Internet of things, where Data Lake is meant to provide a default target for data collected from intelligent connected devices. But most of it is clearly aimed at those who are already familiar with what HDFS offers and who want to leverage it as as a service without sinking too much into setup and maintenance.
A stretchier SQL
With Microsoft's enhancements to SQL Database, another crucial part of the picture of Microsoft's ambitions for these data-centric solutions becomes clear: These things are meant to live either in the cloud or on-premises with as few barriers between them as possible. Azure SQL Data Warehouse, a scalable data-warehousing system, is either an on-premises or in-the-cloud solution, with the customer's needs dictating that choice.
The third item, SQL Database enhancement elastic pools, is more of a management tool for in-cloud resources than something intended to span cloud and local environments. But there's nothing stopping Microsoft from extending its reach to encompass local resources in time as well.
The way Microsoft is stitching together local resources and these new big-data features, Azure App Fabric, has already drawn attention. Aside from being a cornerstone for how Microsoft is using Azure to build out its cloud business, Azure Fabric is being positioned as the infrastructure Microsoft wants to use to stitch together local and remote resources.
Start anywhere, build out anywhere
With projects like Data Lake emerging, it's clear how that infrastructure will be built out. Big data projects like Hadoop don't have to be erected manually on-premises; rather, they can be set up in the cloud first, then synced back on-prem if needed. Or they can be left entirely in the cloud, depending on where the majority of the enterprise's infrastructure will live.
Al Hilwa, program director of software development research for IDC, noted how Microsoft's cloud-first approach with these announcements "show an evolution of the SQL Server technology toward a cloud-first approach. A lot of these capabilities like elastic query are geared for cloud approaches, but Microsoft will differentiate from Amazon by also offering them for on-premises deployment." (So far, Amazon's hybrid cloud strategy is minimal.)
Hilwa believed that Data Lake and Warehouse "are focused on larger data sets that are typically born in the cloud. The volumes of data supported here builds on Microsoft’s persistent investments in data centers." Based on the overall scope of Microsoft's ambitions, though, those born-in-the-cloud data sets could become data sets that span a hybrid architecture only just now taking shape.