Understanding GraphQL engine implementations

Among the many ways of implementing a GraphQL engine, only one approach offers the same performance, scalability, and ACID guarantees as the underlying database.

1 2 Page 2
Page 2 of 2

In FaunaDB, native GraphQL means that your queries have the same consistency guarantees because one query also results in one transaction. Besides that, the same scalability and multi-region guarantees apply because there is no additional system, service, or back end in between that must be scaled separately. Queries written in GraphQL are just as fast and scalable as queries written in our own Fauna Query Language (FQL).

The last question that remains is, why doesn’t every approach that generates queries offer native GraphQL? That question requires us to investigate the difficulties of mapping GraphQL to a query. 

Native GraphQL challenges

If the main ingredient is code generation close to the data, then there should be many ways to do that, right? As I noted above, the problem is that code generation is not always simple. How difficult it is and whether it’s even feasible depends on the underlying database and query language. Let’s look at the challenges. 

Absence of relations

The most obvious problem would be the lack of relations. Some newer scalable databases like Firebase, Firestore, MongoDB, and DynamoDB do not focus on providing relations, which makes it infeasible to transform a GraphQL query into one database query. The only way out is to do multiple queries and joins in memory in a different location such as the back end.

Generation capabilities of the query language

Query languages are often created to provide easy ad hoc data exploration. They are therefore usually declarative and rarely intended to be generated. In many query languages, it’s therefore extremely difficult to generate an optimal query or even a complex query. For example, there is an impressive project called Join Monster that generates SQL queries from GraphQL. Writing something like that is a significant endeavor that requires arcane skills in string concatenation.

But wait, doesn’t an Object Relational Mapper (ORM) solve this problem? Well, not exactly. An ORM helps us map a GraphQL query to objects. But these objects will also have the same tree-like relations as the resolvers. Hence, although we’ve delegated the complex generation to the ORM tool, the ORM still has to do the work of translating your ORM queries to the underlying query language.

It’s true that ORM tools are great at that and do perform intelligent optimizations, but they will typically decide at a certain moment to break up the query into multiple subqueries due to the complexity of the generation, or more importantly, due to performance implications. This is the next and probably most important point.

Performance of the execution model

When there are many joins, these joins might become a performance issue for the underlying database in traditional databases. In traditional SQL databases, one big query is not always faster than multiple small queries. As a result, a query is typically broken down into multiple queries, which brings us again to the realm of multiple queries and joins in memory. Granted, keeping the splitting of queries to a minimum can yield a huge performance boost. But the solution will not be able to deliver consistency guarantees and will be harder to scale. 

The problem lies in the implementation of these joins and boils down to the low-level details of join implementations. It’s the same reason why a graph database is better at certain workloads (and worse at others) than a traditional RDBMS. In an RDBMS, a join typically works on indexes by using an algorithm (nested loops or hash joins). Long story short: Such algorithms are great for joining two giant sets of data together efficiently, but they become less efficient when many joins are in play. In essence, there is a mismatch between the GraphQL join and the SQL join; the RDBMS is not made for the “tree-walking strategy” required by GraphQL.

What is required for native GraphQL?

Although FaunaDB is not a pure graph database, it shares a few similarities with a graph database. One of those similarities is something that graph databases call index-free adjacency. In simple terms, this means that you can directly link different objects together in storage using references.

graphql requirements 12 Fauna

Instead of looping through an index or building a hash (nested loops or hash joins), FaunaDB simply walks through the list of references and dereferences them as it encounters them. In other words, FaunaDB does not need to do joins in this scenario because it offers an alternative solution.  

Walking the tree and dereferencing the references

To walk through a list of references we have a convenient Map function, and to dereference a reference we have a Get function. The pattern to loop through a list and dereference would look as follows.

Map(
   ListOfReferences,
   Lambda('ref', Get(Var('id')))
)

Imagine that the result of this Get is also a list. Well, no problem. We can just Map and Get over that list too. It’s just like a regular programming language.

Due to the way that this works and the fact that pagination is baked into FQL and therefore also into our GraphQL, we do not have problems executing big joins. The Map and Get pattern works similarly to how a graph database would execute the query (simply by following the references).

graphql requirements 13 Fauna

If we look at this pattern and then remember how the resolvers in GraphQL work recursively, we can see that this is very similar. So, it’s not surprising that FQL maps very well onto GraphQL. The only difference is that this is not happening in memory or in your back end. It’s all happening in the database in one query and one transaction, which dramatically reduces the round trips between your database and the GraphQL client.

In FaunaDB’s native GraphQL implementation, each resolver function resolves a field into an FQL expression. Then, once all the fields are resolved, all these snippets of FQL form one bigger FQL query that can be executed as a whole. In our case, this is made easy thanks to a functional and composable query language.

A functional and composable query language

The previous example might look verbose at first and it’s no doubt different from what you are used to. When the Fauna Query Language was designed, the vision was for it to be a language that could do more than just querying. We wanted to use FQL in user-defined functions (stored procedures) as well as complex conditional transactions because we believe that it makes sense to keep your business logic as close to where the data lives as possible. Traits like this make FQL more of a general purpose programming language than other query languages.

A second requirement was to empower our users to write their own tools on top of FQL, which required the language to be highly composable. The perfect language for that is an expression-oriented functional language. In contrast to many query languages, FQL is not declarative. That means that you do not just write what you want to retrieve and let the database figure it out. Instead, just as in a normal programming language, you write exactly how you want to retrieve the data.

Finally, more and more databases today are being accessed directly from the front end or from a serverless function, and serving clients all over the world. Due to the potential distance between the application and the database, it was important for FQL to be able to fetch data and execute complex logic in one query. That sounds familiar, right? It’s exactly one of the reasons why you would use GraphQL as well.  

The first important insight is that the components of our queries are just functions in the language that we are currently using (in our case JavaScript). This means that we can easily put snippets of FQL in functions or assign them to variables and compose them using the constructs of our host language. The only way to see how easy it is to generate complex queries in FQL is to implement one ourselves, so let’s take the previous query and start implementing it. 

query {
   getList("<some id>"){
      todos {
         title
      }
   }
}

We want to fetch lists by ID so we start off by getting the index. In FaunaDB, indexes are mandatory, which makes it incredibly difficult to write an inefficient GraphQL query. Further, FaunaDB generates all of these indexes for you behind the scenes.

We start off with the index function to get the Index.

Index('list_by_id')

And then place it in a Match function to get the list reference.

Match(Index('list_by_id'), "list id")

This returns Fauna references. From those, we can then get the actual List objects by mapping over this list of IDs and calling Get. The fact that we support the language constructs such as Map that you often see in regular programming languages already gives a sneak preview why it’s so easy to map a GraphQL query to FQL.

Map(
 Paginate(
   Match(Index('list_by_id'), "list id")
 ),
 Lambda('id', Get(Var('id')))
)

If we want to get Todos as well, no problem. We will use Let to make our query structures and delegate the retrieval to todos to another function. And this is where it becomes interesting. All we are doing here is defining the query, we are not executing anything yet. If we were implementing this in JavaScript (which we have an FQL driver for) we could now just start using JavaScript variables to compose our query.

const query = Map(
 Paginate(
   Match(Index('list_by_id'), "list id")
 ),
 Lambda('id',
   Let({
       list: Get(Var('id')),
       todos: getTodos(Select(['todos'], Var('list')))
     },
     // Return the variables
     {
       todos: Var('todos'),
       list: Var('list')
     }
   )
 )
)

This means that getTodos could be implemented similarly and will just be placed into the already existing query.

function getTodos(todos){
 return Map(
   todos,
   Lambda('todoId', Get(Var('todoId')))
 )
}

Since we are not constructing a string but just placing functions in other functions, we can now break up this query much more easily. Generating a generic query for a complex model becomes trivial. The root resolver can then just call the complete query. So, in our native GraphQL, what is actually happening behind the scenes is that each GraphQL query is translated into one query in our native language (FQL).

Native GraphQL guarantees

GraphQL is so easy to use that it is quickly becoming the language of choice to query a database. The fact that a familiar language can be used to retrieve data from many different databases and APIs is a very positive development. However, the success of GraphQL came at the price of giving up knowledge of how our data is retrieved.

Before GraphQL, the REST API that we accessed, or the SQL and/or the query plans the SQL generated, gave us a clear indication of how our API would perform, scale, and behave (including which guarantees, e.g. ACID, it provided). Being only one endpoint and one familiar language with many different implementations, GraphQL undeliberately obfuscates this. When we use a library/framework/GraphQL provider, the implementation often becomes a black box. 

The term native GraphQL indicates that the provided GraphQL API has some desired properties. We found it important to clearly indicate what these properties are and how we implemented them in FaunaDB. We’ve explained which issues you might encounter in do-it-yourself approaches or approaches based on ORMs or SQL generation, and we show why we did not encounter these problems and why we are able to offer something that we call native GraphQL.

To us at Fauna, native GraphQL is an API that adheres to the same guarantees as the underlying database. In that sense, native GraphQL is indistinguishable from the database query language in terms of performance, scalability, and ACID guarantees.

Brecht De Rooms is senior developer advocate at Fauna. He is a programmer who has worked extensively in IT as a full-stack developer and researcher in both the startup and IT consultancy worlds. It is his mission to shed light on emerging and powerful technologies that make it easier for developers to build apps and services that will captivate users.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2020 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
How to choose a low-code development platform