Using DataLoader in GraphQL – Peter Oliha

Outline

In my previous post on how to set up a GraphQL server, we briefly discussed the issue of duplicate request to a data source; also known as the N+1 problem which can occur when you define resolvers for each field in a schema.

This issue is common with GraphQL and there are some useful articles that give more insights about this, such as

This post will focus on practical examples of how to use the DataLoader package to mitigate this issue.

What is DataLoader

DataLoader is a JavaScript library that helps to limit duplicate requests to an application’s data source by batching and caching. Although the initial implementation that was released by Facebook was written in Javascript, there are implementations in other languages such as Golang, Java, PHP, Python, Ruby and more. In this article, we will be using the JavaScript library.

Queries without DataLoader

From the previous article, we saw that the number of requests to our datasource is directly proportional to the number of fields that is defined in a GraphQL schema.

In our specific use case, we had this user schema

const userSchema = gql`
  type Query {
    user(id: Int!): User
    allUsers: [User]
  }

  type User {
    age: Int!
    email: String!
    hobbies: [String!]
    id: Int!
    name: String!
  }
`;

The query to get the information for a single user needed to access the datasource(usually a database) 5 times. Once for each field of the User type.

Additionally, the query to return a list of 10 users had to access the datasource 50 times.

You can see how this can quickly become a performance issue for larger datasets.

Let’s see how we can improve on this.

Add DataLoader to our project

Clone code from github

To get up and running quickly, we will build off our codebase from when we set up a GraphQL server.

In your terminal, enter the following to clone the repo

git clone https://github.com/poliha/gql-koa-typescript.git

This will create a folder called gql-koa-typescript

New branch

Change your working directory to the new folder

cd gql-koa-typescript

Next create a new branch

git checkout -b dataloader

All our work will be saved in this new git branch.

Install dataloader

First run npm install to get all the dependencies listed in the package.json

Next, install the dataloader package

npm install --save dataloader

Using DataLoader

context.ts

For our resolvers to make use of the dataloader, we will make the dataloader available in the context.

A context is an object that is shared across all resolvers that are executing for a particular operation.
Apollo documentation

We will create this context below

import DataLoader from 'dataloader';
import { getUserByIndex } from './datastore';

const loaders = (): any => ({
  getUserByIndex: new DataLoader((ids) => {
    return Promise.all(ids.map((id) => getUserByIndex(Number(id))));
  }),
});

const getContext = (ctx: any) => {
  return {
    loaders: loaders(),
  };
};

export default getContext;

First, we import the dataloader package and the getUserByIndex method that is used to query the data store.

Next we create our loaders function. This is a map that returns a new batch loading function for the getUserByIndex key.

  getUserByIndex: new DataLoader((ids) => {
    return Promise.all(ids.map((id) => getUserByIndex(Number(id))));
  })

This is where the main work happens. We are just defining a batch function that will be used to load data from our datasource. A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values or Error instances.

Notice that we are still using the getUserByIndex internally. The batch function just helps to group requests that occur within the same event loop together.

Finally we export a getContext method which just returns our context object

const getContext = (ctx: any) => {
  return {
    loaders: loaders(),
  };
};

server.ts

Next we will update server.ts file by importing the context and providing it to the Apollo server instance

import context from './context';

const server = new ApolloServer({
  debug: true,
  playground: true,
  tracing: true,
  resolvers,
  typeDefs,
  context,
});

resolver.ts

Here we are going to update the resolver functions to use the loaders we defined in context.ts. To achieve this we will

Update the function parameters to include the context.
Replace the getUserByIndex method with our loader method.

We will do this for all the resolvers. For example, once we are done, the age resolver will look like this.

  age: async (id: number, _: any, ctx: any) => {
    const user = await ctx.loaders.getUserByIndex.load(id);
    return user.age;
  }

Notice that the context(ctx) is the third parameter in the function definition as outlined in the Apollo server resolvers documentation.

Furthermore, we now use

ctx.loaders.getUserByIndex.load(id);

instead of

getUserByIndex(id);

to access the datastore.

Make the same changes for the other resolvers.

Queries and results

That is all the changes we have to make. We ready to test out our improvements

Start the server

npm run start

Querying for a single user

In a browser, go to http://localhost:8080/graphql This will open the apollo graphQL playground. In the query editor, enter the following

query getUser($id: Int!){
  user(id: $id){
    age
    email
    hobbies
    id
    name
  }
}

In the Query Variables tab, enter the following

{
  "id": 1
}

In the datastore, we count the number of invocations with this line of code.

 console.count('getUserByIndex');

Taking a look at your terminal you will see that the count is 1.

server listening at port 8080
getUserByIndex: 1

This means that the datastore was only accessed once.

Querying for all users

Next let us run the query for all users.

Open a new tab in the playground, enter the query below and click on send.

query getAllUsers{
  allUsers{
    age
    email
    hobbies
    id
    name
  }
}

This should return the information of all your users

Again looking at our terminal, we should see the following output.

server listening at port 8080
getUserByIndex: 1
getUserByIndex: 2
getUserByIndex: 3
getUserByIndex: 4
getUserByIndex: 5
getUserByIndex: 6
getUserByIndex: 7
getUserByIndex: 8
getUserByIndex: 9
getUserByIndex: 10
getUserByIndex: 11

This means that to get our list of 10 users, we only accessed the datastore 10 times.

That is it, our DataLoader batch function is working as expected and we can see a significant improvement from accessing the datastore 5 times for a single user and 50 times for a list of 10 users.

We have seen how we can use the Dataloader package to solve duplicated requests to our datasource by batching requests.

Have any questions, want to share your thoughts or just say Hi? I’m always excited to connect! Follow me on Twitter or LinkedIn for more insights and discussions. If you’ve found this valuable, please consider sharing it on your social media. Your support through shares and follows means a lot to me!