Microsoft revamps Data Lake with SQL analytics, Visual Studio tooling

Azure Data Lake Store sports the Microsoft-devised U-SQL query language, which allows a mixture of conventional SQL and C# programming metaphors

Microsoft revamps Data Lake with SQL analytics, Visual Studio tooling

When Azure Data Lake was first revealed, it was plain Microsoft was set on making Azure a welcome environment for enterprise big data applications. Today the service gets a new name, Azure Data Lake Store, and will be outfitted with the analytical tools that Hadoop users have come to expect -- and a few new ones of Microsoft's invention.

The analytics system for Data Lake Store -- appropriately enough, the Azure Data Lake Analytics -- stores data in the same HDFS format as Hadoop, but also allows data to be pulled in from other Azure sources, such as Azure SQL Database and Azure SQL Data Warehouse.

Data processing in Data Lake Store is done by jobs running under Apache YARN, another familiar Hadoop technology. As previously announced, there are no hard limits on the amount of data that can be stored (although you pay for what you use).

Microsoft has enriched both in-cloud functions and the client toolset with new features. Among the processing tools available in Data Lake Analytics is a Microsoft-specific SQL query language, U-SQL (its name is a nod to T-SQL, used in SQL Server), which allows a mixture of conventional SQL and C# programming metaphors. SQL is still one of the most common ways to perform self-service data requests in Hadoop, so this extends the capability, akin to the way Microsoft juiced up T-SQL with the .Net CLR.

azure data lake Microsoft

Microsoft's Azure Data Lake allows for conventional YARN-based analytics like Spark, but also sports the Microsoft-devised U-SQL query language. Queries built with U-SQL can leverage the .Net runtime as well as conventional SQL expressions.

The new tooling also encompasses the client side. Microsoft is providing Azure Data Lake Tools for Visual Studio so that developers can use Visual Studio's toolset for writing Data Lake applications. Most ad hoc data work using Microsoft applications for the front end involves Excel and is focused on reporting rather than application design, so the two toolsets are likely to complement each other.

None of this is to say that Microsoft isn't interested in having Azure host more conventional Hadoop offerings. For those who favor using Azure as a base for big data and are comfortable with Hadoop, but aren't as enthusiastic about managing Hadoop, Azure Data Lake also includes Microsoft's managed Hadoop distribution, HDInsight. (Linux-based versions of the Hortonworks HDP distribution of Hadoop are now available as well.)

When Microsoft first started offering Hadoop on Azure in 2013, it did little more than offer managed instances of Hortonworks HDInsight. Since then, Hadoop has become less a product than a loose, ever-expanding cluster of technologies (Spark, HDFS) out of which any number of other items can spring. Microsoft's plan with Azure Data Lake Store may find an elegant way to honor both those visions without compromising either.