Why you should use a graph database

Graph databases excel for apps that explore many-to-many relationships, such as recommendation systems. Let's look at an example

Why you should use a graph database
Thinkstock

There has been a lot of hype recently about graph databases. While graph databases such as DataStax Enterprise Graph (based on Titan DB), Neo4, and IBM Graph have been around for several years, recent announcements of managed cloud services like AWS Neptune and Microsoft’s addition of graph capability to Azure Cosmos DB indicate that graph databases have entered the mainstream. With all of this interest, how do you determine whether a graph database is right for your application?

What is a graph database?

Before we go any further, let’s define some terminology. What is a graph database? Think of it in terms of the data model. A graph data model consists of vertices that represent the entities in a domain, and edges that represent the relationships between these entities. Because both vertices and edges can have additional name-value pairs called properties, this data model is formally known as a property graph. Some graph databases require you to define a schema for your graph—i.e. defining labels or names for your vertices, edges, and properties prior to populating any data—while other databases allow you to operate without a fixed schema.

As you might have noticed, there isn’t any new information in the graph data model that we couldn’t capture in a traditional relational data model. After all, it’s simple to describe relationships between tables using foreign keys, or we can describe properties of a relationship with a join table. The key difference between these data models is the way data is organized and accessed. The recognition of edges as a “first class citizen” alongside vertices in the graph data model enables the underlying database engine to iterate very quickly in any direction through networks of vertices and edges to satisfy application queries, a process known as traversal.

The flexibility of the graph data model is a key factor driving the recent surge in graph database popularity. The same requirements for availability and massive scale that drove the development and adoption of various NoSQL offerings over the past 10 or so years are continuing to bear fruit in the recent graph trend.

How to know when you need a graph database

However, as with any popular technology, there can be a tendency to apply graph databases to every problem. It’s important to make sure that you have a use case that is a good fit. For example, graphs are often applied to problem domains like:

  • Social networks
  • Recommendation and personalization
  • Customer 360, including entity resolution (correlating user data from multiple sources)
  • Fraud detection
  • Asset management

Whether your use case fits within one of those domains or not, there are some other factors that you should consider that can help determine if a graph database is right for you:

  • Many-to-many relationships. In his book “Designing Data Intensive Applications” (O’Reilly), Martin Kleppmann suggests that frequent many-to-many relationships in your problem domain is a good indicator for graph usage, since relational databases tend to struggle to navigate these relationships efficiently.
  • High value of relationships. Another heuristic I’ve frequently heard: if the relationships between your data elements are just as important or more important than the elements themselves, you should consider using graph.
  • Low latency at large scale. Adding another database into your application also adds complexity to your application. The ability of graph databases to navigate through the relationships represented in large data sets more quickly than other types of databases is what justifies this additional complexity. This is especially true in cases where a complex relational join query is no longer performing and there are no additional optimization gains to be made to the query or relational structure.
1 2 Page 1
Page 1 of 2