Making sense of Microsoft’s graph database strategy

From the Microsoft Graph to LinkedIn and beyond, Microsoft is betting on a graph database future

Making sense of Microsoft’s graph database strategy
geralt (CC0)

It’s taken some time, but Microsoft’s $26 billion purchase of LinkedIn is finally starting to show some interesting results, with LinkedIn data starting to show up in tools like Outlook. It’s the first sign of Microsoft using the social network’s relationship graph, the complex data set that was the reason for one of Microsoft’s biggest Silicon Valley acquisitions.

Under the hood, a social network like LinkedIn is nothing more than a huge NoSQL graph database, using a schema-less approach to managing semistructured data. Each node in the graph is an individual, with all his or her profile data. Each node is linked to others, tens or hundreds for people with a few connections, thousands for highly connected individuals. Queries traverse those connections, letting you find all the people you know working on AI, or who are based in Ontario, or who used to work at LinkedIn.

Graph databases everywhere: Microsoft Graph, Common Data Service, Cosmos DB, and Security Graph

Microsoft’s interest in graph-based data is clear. CEO Satya Nadella described the Office 365 APIs, the foundation of what’s now called the Microsoft Graph, as the company’s “most important” bet. It’s certainly a very powerful tool, and opening it up to everyone lets organizations explore how their internal teams evolve and how corporate knowledge is stored in documents and conversations – along with the tools to expose that information and making it usable.

There’s a lot of data in the Microsoft Graph, with tools both for consumer information and for business information. Elements associated with Microsoft accounts, like the new Activity Stream and the Device Graph, are the basis for device-roaming features like the Continue on My PC tools recently released for iOS and Android (similar to Apple’s iCloud account-based Handoff capability in iOS), and which Microsoft is encouraging Universal Window Platform (UWP) developers to build into their code as part of Project Rome and the upcoming Windows Timeline feature.

But the Microsoft Graph and LinkedIn aren’t Microsoft’s only graphs with APIs:

  • Dynamics 365 has the Common Data Service, a way of describing standard items in a business. With the Common Data Service, you can extend a standard schema with your model of a customer or your products.
  • Then there’s the cloud-spanning Cosmos DB, which builds on a JSON document database with different API sets, including one for developing and managing your own graph databases at scale.
  • Although not completely public, Microsoft’s Security Graph is used to assess and manage threats, exposed to your apps through tools like Azure Active Directory’s conditional-access feature.

Microsoft’s different approach: Querying multiple graphs

Where things get interesting is using graph queries across multiple graphs and using them to extract insights that can help drive business decisions. I’ve often talked about the idea of “right-time information”: the right information at the right time delivered to the right people so they can make the right decision for the right business outcome. Being able to query the edges of a graph, rather than on the node, lets you understand the relationships between items, a key factor in delivering the type of information support a modern business needs.

By supporting multiple graphs, Microsoft is offering an alternative to traditional database-driven decision-support tools. By mixing internal staff and document data on the Microsoft Graph, external relationships via LinkedIn, core business information in the Dynamics 365 Common Data Service, and custom schema in the cloud-hosted Cosmos DB, you can make complex cross-graph queries focusing on not just than individual nodes in those graphs but also on the links between nodes. That lets you work with much more complex relationships than those exposed in relational databases.

One way this being exposed is in the new Bing for Business tool that adds information from a corporate Active Directory and other sources to Bing searches when a user is logged in to an Azure Active Directory account. Results are dynamically generated from Microsoft Graph queries that return details of, for example, where someone is in the organization chart, along with related content from the wider web and from documents they’ve shared internally.

It’s a different way of exposing the information that’s been available inside Microsoft’s Delve tool, taking it from an application that had to be launched before you could make a query to the browser that’s always open. As an industry, we’ve baked search into the browser, so it’s logical to make it one of the tools we use to explore the graphs that underlie our businesses.

The initial release of Bing for Business focuses on the Microsoft Graph, along with tools that let administrators add specific intranet links for specific queries. So, when you search for the current expense policy, you’re directed to the appropriate self-service tools. Future releases will bring in more of Microsoft’s graphs, locking down searches based conditional-access feature and exposing external relationships via LinkedIn.

The Microsoft graphs’ flaw: They use different query grammars

Although the overall vision for Microsoft’s various graph-based properties is starting to come clear, there are still some issues with querying across multiple sources. Although they all offer REST APIs, the underlying query languages can differ. For example, the Microsoft Graph uses its own query grammar in its APIs, while CosmosDB builds on the widely used Apache Gremlin graph query language.

API-based queries tend be relatively simple, focused on specific searches. More complex queries tend to be handled using domain-specific languages like Gremlin that are designed for use with graph databases. One of Gremlin’s more interesting features is its ability to generate new maps from the underlying data that you can parse and use in your applications. Gremlin can also handle pattern matching, as well as working with large-scale data analytic tools such as Hadoop; so you can use it to deliver queries from Azure’s HDInsight big data tool alongside your Cosmos DB-hosted graphs.

If we’re to get the benefit of all the various Microsoft graph properties, we’re going to need a common query platform that can take queries and fan them out across various sources, asynchronously handling responses and ensuring that the queries are appropriately constructed to target specific APIs.

You could build your own multigraph query engine, but this really is something Microsoft needs to deliver, perhaps as an Azure service. That way, it can be integrated with existing subscriptions and with familiar authentication methods, either for users or for apps.