In my previous post on how to set up a GraphQL server, we briefly discussed the issue of duplicate request to a data source; also known as the N+1 problem which can occur when you define resolvers for each field in a schema.
This issue is common with GraphQL and there are some useful articles that give more insights about this, such as
- Paypal’s GraphQL Resolvers: Best Practices
- Shopify’s Solving the N+1 Problem for GraphQL through batching
This post will focus on practical examples of how to use the DataLoader package to mitigate this issue.
What is DataLoader
DataLoader is a JavaScript library that helps to limit duplicate requests to an application’s data source by batching and caching. Although the initial implementation that was released by Facebook was written in Javascript, there are implementations in other languages such as Golang, Java, PHP, Python, Ruby and more. In this article, we will be using the JavaScript library.
Queries without DataLoader
From the previous article, we saw that the number of requests to our datasource is directly proportional to the number of fields that is defined in a GraphQL schema.
In our specific use case, we had this user schema
const userSchema = gql`
type Query {
user(id: Int!): User
allUsers: [User]
}
type User {
age: Int!
email: String!
hobbies: [String!]
id: Int!
name: String!
}
`;
The query to get the information for a single user needed to access the datasource(usually a database) 5 times. Once for each field of the User type.
Additionally, the query to return a list of 10 users had to access the datasource 50 times.
You can see how this can quickly become a performance issue for larger datasets.
Let’s see how we can improve on this.
Add DataLoader to our project
Clone code from github
To get up and running quickly, we will build off our codebase from when we set up a GraphQL server.
In your terminal, enter the following to clone the repo
git clone https://github.com/poliha/gql-koa-typescript.git
This will create a folder called gql-koa-typescript
New branch
Change your working directory to the new folder
cd gql-koa-typescript
Next create a new branch
git checkout -b dataloader
All our work will be saved in this new git branch.
Install dataloader
First run npm install
to get all the dependencies listed in the package.json
Next, install the dataloader package
npm install --save dataloader
Using DataLoader
context.ts
For our resolvers to make use of the dataloader, we will make the dataloader available in the context.
A context is an object that is shared across all resolvers that are executing for a particular operation.
Apollo documentation
We will create this context below
import DataLoader from 'dataloader';
import { getUserByIndex } from './datastore';
const loaders = (): any => ({
getUserByIndex: new DataLoader((ids) => {
return Promise.all(ids.map((id) => getUserByIndex(Number(id))));
}),
});
const getContext = (ctx: any) => {
return {
loaders: loaders(),
};
};
export default getContext;
First, we import the dataloader
package and the getUserByIndex
method that is used to query the data store.
Next we create our loaders function. This is a map that returns a new batch loading function for the getUserByIndex
key.
getUserByIndex: new DataLoader((ids) => {
return Promise.all(ids.map((id) => getUserByIndex(Number(id))));
})
This is where the main work happens. We are just defining a batch function that will be used to load data from our datasource. A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values or Error instances.
Notice that we are still using the getUserByIndex
internally. The batch function just helps to group requests that occur within the same event loop together.
Finally we export a getContext
method which just returns our context object
const getContext = (ctx: any) => {
return {
loaders: loaders(),
};
};
server.ts
Next we will update server.ts
file by importing the context and providing it to the Apollo server instance
import context from './context';
const server = new ApolloServer({
debug: true,
playground: true,
tracing: true,
resolvers,
typeDefs,
context,
});
resolver.ts
Here we are going to update the resolver functions to use the loaders we defined in context.ts. To achieve this we will
- Update the function parameters to include the context.
- Replace the
getUserByIndex
method with our loader method.
We will do this for all the resolvers. For example, once we are done, the age resolver will look like this.
age: async (id: number, _: any, ctx: any) => {
const user = await ctx.loaders.getUserByIndex.load(id);
return user.age;
}
Notice that the context(ctx) is the third parameter in the function definition as outlined in the Apollo server resolvers documentation.
Furthermore, we now use
ctx.loaders.getUserByIndex.load(id);
instead of
getUserByIndex(id);
to access the datastore.
Make the same changes for the other resolvers.
Queries and results
That is all the changes we have to make. We ready to test out our improvements
Start the server
npm run start
Querying for a single user
In a browser, go to http://localhost:8080/graphql This will open the apollo graphQL playground. In the query editor, enter the following
query getUser($id: Int!){
user(id: $id){
age
email
hobbies
id
name
}
}
In the Query Variables tab, enter the following
{
"id": 1
}
In the datastore, we count the number of invocations with this line of code.
console.count('getUserByIndex');
Taking a look at your terminal you will see that the count is 1.
server listening at port 8080
getUserByIndex: 1
This means that the datastore was only accessed once.
Querying for all users
Next let us run the query for all users.
Open a new tab in the playground, enter the query below and click on send.
query getAllUsers{
allUsers{
age
email
hobbies
id
name
}
}
This should return the information of all your users
Again looking at our terminal, we should see the following output.
server listening at port 8080
getUserByIndex: 1
getUserByIndex: 2
getUserByIndex: 3
getUserByIndex: 4
getUserByIndex: 5
getUserByIndex: 6
getUserByIndex: 7
getUserByIndex: 8
getUserByIndex: 9
getUserByIndex: 10
getUserByIndex: 11
This means that to get our list of 10 users, we only accessed the datastore 10 times.
That is it, our DataLoader batch function is working as expected and we can see a significant improvement from accessing the datastore 5 times for a single user and 50 times for a list of 10 users.
We have seen how we can use the Dataloader package to solve duplicated requests to our datasource by batching requests.
Have any questions, want to share your thoughts or just say Hi? I’m always excited to connect! Follow me on Twitter or LinkedIn for more insights and discussions. If you’ve found this valuable, please consider sharing it on your social media. Your support through shares and follows means a lot to me!