When we talk about advantages of GraphQL we often hear one-liners such as “only fetch what you need,” “only requires one generic endpoint,” “data source agnostic,” or “more flexibility for developers.” But as always things are more subtle than a one-liner could ever describe.
Generic and flexible are the key words here and it’s important to realize that it’s hard to keep generic APIs performant. Performance is the number one reason that someone would write a highly customized endpoint in REST (e.g. to join specific data together) and that is exactly what GraphQL tries to eliminate. In other words, it’s a tradeoff, which typically means we can’t have the cake and eat it too. However, is that true? Can’t we get both the generality of GraphQL and the performance of custom endpoints? It depends!
Let me first explain what GraphQL is, and what it does really well. Then I’ll discuss how this awesomeness moves problems toward the back-end implementation. Finally, we’ll zoom into different solutions that boost the performance while keeping the generality, and how that compares to what we at Fauna call “native” GraphQL, a solution that offers an out-of-the-box GraphQL layer on top of a database while keeping the performance and advantages of the underlying database.
GraphQL is a specification
Before we can explain what makes a GraphQL API “native,” we need to explain GraphQL. After all, GraphQL is a multi-headed beast in the sense that it can be used for many different things. First things first: GraphQL is, in essence, a specification that defines three things: schema syntax, query syntax, and a query execution reference.
Schema syntax: Describes your data and API (the schema)
The schema simply defines what your data looks like (attributes and types) and how it can be queried (query name, parameters, and return types). For example, a todo application could have Todos and a List of todos, if it only provides one way to read this data — e.g., get a list of todos via the id, then that schema would look like this:
// todo-schema.gql
type Todo {
title: String!
completed: Boolean!
list: List
}
type List {
title: String!
todos: [Todo] @relation
}
type Query {
getList(id: ID): List!
}
Query syntax: Specifies how you can query
Once you have a schema that defines how your data and queries look, it’s super easy to retrieve data from your GraphQL endpoint. Do you want a list? Just call the getList item and specify what attributes of the list you want to return.
query {
getList("<some id>){
title
}
}
Do you want to join that data with todos? No problem! Just add a small snippet of JSON to the mix.
query {
getList("<some id>"){
title
todos {
title
completed
}
}
}
But how does this join happen? Of course we did not yet define how this query actually maps to data that comes from our data source.
Query execution reference: A reference implementation for execution
It’s relatively easy to understand the schema and see how you can query. It’s harder to understand what the performance implications are since there are so many different implementations out there. GraphQL is not an implementation to retrieve your data. Rather, GraphQL provides guidelines on how a request query should be broken down into multiple “resolvers” and turned into a response.
Resolvers define how one element of the query (what GraphQL calls a field) can be turned into data. Then, depending on the framework or GraphQL provider, you either implement these resolvers yourself or they are provided automagically. When resolvers are provided for you, the implementation will determine whether we can talk about native GraphQL or not. In essence, the resolvers are just functions that have a certain signature. In JavaScript, such a resolver could look like this:
function someresolver(obj, args, context, info) {
return // do something to get your data
}
And each of our “fields” will have a corresponding resolver. Fields are just the attributes in your schema.
// todo-schema.gql
type Todo {
title: String!
completed: Boolean!
list: List
}
type List {
title: String!
todos: [Todo] @relation
}
type Query {
getList(id: ID): List!
}
Each of these fields will have a resolver function (either generated by the library or implemented manually). The execution of a GraphQL query starts at a root field resolver, in this case, getList. Since getList promises to return a List, we will also need to fill in the fields of the List; therefore we need to call the resolvers for these fields and so on. In essence, it’s a process of recursive function calls. Let’s look at an example:
query {
getList("<some id>"){
todos {
title
}
}
}
For the above query, we would traverse three fields, each with a resolver function:
- getList returns a List with only the field todos
- todos receives the List item from getList and returns a list of Todos related to that list.
- title receives a Todo item from the todos resolver and returns a string from the title.
This is in itself a very elegant recursive way to answer the query. However, we will see that the choices made in the actual implementation will have a huge impact on performance, scalability, and the behavior of your API.
Patterns for writing resolvers
In order to understand the different approaches, we need to learn how to get started building a GraphQL execution engine. The syntax of that depends on the server library we choose, but each library adheres to the resolver guidelines described above.
Resolvers are just functions
As we have explained, implementing a GraphQL engine is all about implementing functions called resolvers with a specific signature.
function someresolver(obj, args, context, info) {
return // do something to get your data
}
The arguments serve different purposes for which you can find the details here. The ones that are most important for this implementation are:
- obj is the previous object. In the previous example, we mentioned that the todos resolver receives the List object that was resolved by getList. The object parameter is meant to pass on the result of the previous resolver.
- args is the argument(s). In getList, we pass an ID, so args will be {id: "some id"}.
And these functions form a resolver chain
We can’t do much with one resolver function, and our GraphQL server library will need to know how to map a query to the different resolvers and how we delegate work from one resolver to the next resolver. Each library has a slightly different syntax to specify this, but the general idea remains the same. For example, one syntax to specify the mapping of queries to resolvers could look as follows:
{
Todo: {
title(obj, args, context, info) { ... }
completed(obj, args, context, info) { ... }
list(obj, args, context, info) { ... }
},
List: {
title(obj, args, context, info) { ... }
todos(obj, args, context, info) { ... }
}
Query: {
getList(obj, args, context, info) { ... }
}
}
If we write the query below, we can match this to resolvers by first looking into the root resolvers (the ones in Query) where we will find the getList resolver. Since that resolver returns a List, the GraphQL execution engine knows that getList returns a List and therefore it needs to go search in List for the resolver of the todos field.
query {
getList("<some id>"){
todos {
title
}
}
}
The way this resolves is called the resolver chain.
Now that we know how GraphQL libraries want us to write resolvers, we can start thinking about how it affects performance.
The resolver chain is more like a resolver tree
The above explanation might make you think that resolving a GraphQL query is quite linear, but in fact resolver chains are more like chains that keep on splitting... Oh wait, that’s just called a tree!
GraphQL approach #1: The naive implementation
When we implement resolvers naively in this rather elegant recursive system we can easily write an API that is slow. By playing human interpreter on the previous query we can see how a naive implementation results in a tree-like execution plan that sends many queries to the database.
Although there are only two database calls in the above implementation, one of these calls, getTodo(id), is called for each Todo that is associated with the List. That means that we will be hammering our database with 1 (getting the list) + N (getting each todo) calls and joining all this data in the back end. This is known as the N+1 problem.
Remember that, in contrast, a clean REST API would allow us to call each of these resources separately from the front end and require us to join these somewhere (e.g. in the front end) or require us to build custom endpoints. There were two things we wanted to improve upon by using GraphQL:
- Multiple calls from the front end, requiring that the data also has to be joined in the front end.
- Workarounds that require you to build a custom endpoint for performance or for each new front end or app requirement. For example, if you have a new UI for your mobile application, it might have different requirements. It might require other joins or fewer attributes.
An example that depicts the number of calls in a REST back end, both when we follow REST practice by separating endpoints per entity type and when we optimize by writing custom endpoints to join.
Although our naive GraphQL endpoint effectively solves these issues, it creates another problem further down the road.
The introduction of GraphQL in combination with a naive implementation moves a problem that was transparent to the back end, where the problem might still be present but is hidden from the API user. In essence, we replaced the problem of multiple REST calls with the N+1 problem. It is not necessarily a problem to express generic queries in REST because similar things can be done with Odata. The difficulty is to provide an efficient implementation for such a generic endpoint.
With the naive implementation, we now have an even higher number of queries between the back end and the database, and the results have to be merged in the back end. Joining many small requests requires memory and might cripple our database as we scale. This is clearly not ideal.
GraphQL approach #2: Batching with Dataloader
Facebook has created a great piece of software, called Dataloader, that allows you to solve a part of the problem. If you plug in Dataloader, similar queries will be essentially cached per frame or per tick. Consider the following execution, without Dataloader.
Dataloader takes in all of these queries, waits for the end of the frame, then combines them into one query.
Besides that, Dataloader does in-memory caching of certain queries’ results. With Dataloader, we have now improved the number of database calls significantly. However, it only helps for similar queries that fetch the same entity type, which means that we are still joining data in memory and we are still doing multiple calls. If we look at the queries that are sent to the database, the approach now looks like this:
This is already great, but we can do much better. After all, joining in memory in the back end is just not scalable as data grows and our schema becomes more complex, requiring that multiple entities be fetched. For example, consider the following model from a Twitter-based example application called Fwitter.
We need to retrieve not only the Fweets (which are like Twitter’s tweets), but their comments, hashtags, statistics, author, original fweets in case of a refweet, etc. We face a complex in-memory join, and again, many database calls.
GraphQL approach #3: Generating the query
Ideally, if possible, we want a GraphQL query that translates into one database query. Sounds simple right? However, we’ll see that code generation can be hard or impossible depending on the underlying data storage and query language.
For starters, this approach can raise many questions. Do we generate one query or multiple queries? Can we even generate one query that expresses a complex GraphQL statement? If we generate a big query with many joins, is it still efficient?
Since native GraphQL is a special case of “generating the query,” let’s first (and finally) define what we at Fauna call “native” GraphQL. We’ll answer these questions in the subsequent section, “Why is native GraphQL so difficult to achieve?”
GraphQL approach #4: Native GraphQL
We can take the previous approach one step further and run this translation layer as a part of our infrastructure close to the database. That means that an extra hop will be eliminated. And if our database is multi-region, our GraphQL API will not lose the latency benefits of that.
If the GraphQL layer is a part of the database and allows you to do highly efficient queries straight from your application, then your response latencies will be much better. You’ll be fetching all the entities you need in one hop and only one call.
At Fauna, we consider a GraphQL implementation native when the GraphQL layer lives on the infrastructure of the database and adheres to the following conditions:
- One GraphQL query = one database query = one transaction
- GraphQL queries offer the same ACID guarantees as the underlying database
- The underlying database allows efficient execution of such queries