08Jan
Optimizing Graphql Data Queries with Data Loader
Optimizing Graphql Data Queries with Data Loader

Introduction

When building any scalable application, performance, as well as speed, are important factors to consider. Hence optimization is crucial. There are numerous optimization techniques you should consider when working with GraphQL servers, but the most common is the caching technique which returns persistent data, and also the batching technique which reduces the number of round trips your server makes to the database. In this tutorial, I will walk you through building a demo GraphQl server to demonstrate the actual use case of these optimization techniques.

Prerequisite

  • Basic knowledge of GraphQl
  • Basic knowledge of JavaScript ES6

What we will build

We will build a demo GraphQl server for a book app with the following features:

  • Create a new library
  • Add books to the library
  • Fetch all books
  • Fetch all libraries

Let’s get started by installing typescript globally with the following command

npm install -g typescript

Create a folder called graphql-optimization. Inside the folder, run npm init to initialize a Node.js project then run tsc --init to create a tsconfig.json file. Then uncomment and update the following in the tsconfig.json file:

“lib”: [“dom”,”es6″],
“outDir”: “dist”,
“rootDir”: “src”,
“removeComments”: true, 
“strict”: true, 
“esModuleInterop”: true,
“forceConsistentCasingInFileNames”: true

Next, inside the GraphQl-optimization folder create a new folder called src.

The src directory will contain the following files:

  • database.ts
  • index.ts
  • resolver.ts
  • schema.ts

Dependency Installation

Our demo GraphQl server requires the following dependencies:

@types/sqlite3: provides built-in type declarations for sqlite3
apollo-server-express: It allows us to create GraphQL servers with Node.js express framework
dataloader: A generic utility for optimizing data fetch via batching and caching.
express: A Node.js framework for building applications
graphql-tag: A JavaScript template literal tag that parses GraphQL query strings into the standard GraphQL AST
knex: SQL query builder for Postgres, MSSQL, and other relational databases
nodemon: automatically restarts node application when file content changes
sqlite3: A simple database wrapper that makes SQLite database interactions much easier. SQLite is an open-source SQL database that stores data to a text file on a device.

dev dependencies:

ts-node: allows TypeScript to run seamlessly in a Node.js environment
typescript: speeds up development experience by catching errors and providing fixes at runtime

Update package.json with the following:

{
  "name": "graphql-optimization",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "start": "nodemon --exec \"ts-node\" src/index.ts"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@types/sqlite3": "^3.1.6",
    "apollo-server-express": "^2.10.1",
    "dataloader": "^2.0.0",
    "express": "^4.17.1",
    "graphql-tag": "^2.10.3",
    "knex": "^0.21.1",
    "nodemon": "^2.0.6",
    "sqlite3": "^4.2.0"
  },
  "devDependencies": {
    "ts-node": "^8.6.2",
    "typescript": "^3.8.2"
  }
}

Run the following command to install the above dependencies:

npm install

Navigate to src/database.ts in the src directory, and add the following code:

let knex: any = null;
 
async function startDatabase() {
  if (!knex) {
    knex = require('knex')({
      client: 'sqlite3',
      connection: {
        filename: ':memory:',
      },
    });
 
    await createDatabase(knex);
 
    console.log('database initialized');
  }
 
  return knex;
}
 
async function createDatabase(knex: any) {
  await knex.schema
    .createTable('authors', (table: any) => {
      table.increments('id');
      table.string('first_name');
      table.string('last_name');
    })
    .createTable('books', (table: any) => {
      table.increments('id');
      table.string('title');
      table.integer('author_id').unsigned().references('authors.id');
    })
    .createTable('libraries', (table: any) => {
      table.increments('id');
      table.string('name');
      table.string('description');
    })
    .createTable('libraries_books', (table: any) => {
      table.increments('id');
      table.integer('library_id').unsigned().references('libraries.id');
      table.integer('book_id').unsigned().references('books.id');
    });
 
  await knex('authors').insert([
    { id: 1, first_name: 'Mister', last_name: 'Roro' },
    { id: 2, first_name: 'Styled', last_name: 'West' },
    { id: 3, first_name: 'Daddy', last_name: 'Ice' },
  ]);
 
  await knex('books').insert([
    { title: 'Awesome tunes', author_id: 1 },
    { title: 'Starry Window', author_id: 2 },
    { title: 'Upbeat vocals', author_id: 2 },
    { title: 'Rotten', author_id: 3 },
  ]);
 
  await knex('libraries').insert({
    id: 1,
    name: 'My favorite songs',
    description: 'Lorem ipsum',
  });
 
  await knex('libraries_books').insert([{ library_id: 1, book_id: 1 }]);
 
  return true;
}
 
export default startDatabase;

Here, the startDatabase function is responsible for the database configuration, thereafter,  the createDatabase function gets triggered. createDatabase function creates all the tables and seeds them with some dummy data.

Navigate to src/schema.ts and add the following code:

import { gql } from 'apollo-server-express';
 
export default gql`
  type Book {
    id: ID!
    title: String!
    author: Author
  }
 
  type Author {
    id: ID!
    name: String!
    books: [Book]
  }
 
  type Library {
    id: ID!
    name: String!
    description: String
    books: [Book]
  }
  
  type Query {
    books: [Book]
    library(id: ID!): Library
  }
 
`;

In the code block above, we define a type Book that consists of:

  •  a required integer field id,
  • required string field title
  • an optional object author that establishes a relationship between a book and its author.

Next, we define a type Author that consists of:

  •  a required integer field id,
  • required string field name
  • an optional object books which establish a relationship between an author and its books.

Next, we define a type Library that consists of:

  • a required integer field id,
  • required string fields name and description
  • an optional object books which establish a relationship between a Library and its books.

Finally, we create the Query operation type that defines the queries that the server will
accept and resolve. The books and library queries return an array of
Books and a single Library by its id respectively.

Navigate to src/resolvers.ts and add the following code:

type Context = {
  knex: any;
};
 
type LibraryInput = {
  id: string;
};
 
async function getBooks(
  _: any,
  {},
  { knex }: Context,
): Promise<any> {
  if (!knex) throw new Error('Not connected to the database');
 
  return await knex('books').select();
}
 
async function getAuthor(
  { author_id: authorId }: any,
  {},
  { knex }: Context,
): Promise<any> {
  if (!knex) throw new Error('Not connected to the database');
 
  console.log('database called', authorId);
  return await knex('authors')
  .where('id', authorId)
  .select()
  .first()
}
 
async function getLibrary(
  _: any,
  { id }: LibraryInput,
  { knex }: Context,
): Promise<any> {
  if (!knex) throw new Error('Not connected to the database');
 
  const library = await knex('libraries').where('id', id).select();
 
  if (!library.length) throw new Error('Library not found');
 
  return library[0];
}
 
async function getBooksByLibrary(
  { id }: any,
  {}: any,
  { knex }: Context,
): Promise<any> {
  return await knex('libraries_books')
    .where('library_id', id)
    .join('books', 'libraries_books.book_id', 'books.id')
    .select('books.*');
}
 
async function getBooksByAuthor(
  { id }: any,
  {}: any,
  { knex }: Context,
): Promise<any> {
  return await knex('books').where('author_id', id);
}
 
 
 
const resolvers = {
  Book: {
    author: getAuthor,
  },
  Library: {
    books: getBooksByLibrary,
  },
  Author: {
    books: getBooksByAuthor,
  },
  Query: {
    books: getBooks,
    library: getLibrary,
  },
};
 
export default resolvers;

In the code block above we create a resolver for the getBooks query which returns all the books in the database.
Next, we create resolver functions for the getAuthor and getLibrary queries which return an author and a Library by their ids respectively.
Next, we create a resolver function for the getBooksByLibrary query which returns all books in a particular library.
Next, we create a resolver function for the getBooksByAuthor query which returns all books owned by a particular author.
Next, we create a resolvers object containing all the resolver functions we created earlier.
Finally, we export the resolvers object.

Setting up the Graphql server

To get the GraphQl server up and running. You should add the following code to src/index.ts

import express from 'express';
import { ApolloServer } from 'apollo-server-express';
 
import typeDefs from './schema';
import resolvers from './resolvers';
import startDatabase from './database';
 
const server = new ApolloServer({
  typeDefs,
  resolvers,
  context: async () => {
    const knex = await startDatabase();
 
    return { knex };
  },
});
 
const app = express();
server.applyMiddleware({ app });
 
const PORT = 4000;
 
app.listen(PORT, () => {
  console.log(
    `GraphQL endpoint and playground available at http://localhost:${PORT}${server.graphqlPath}`,
  );
});

Here we create an instance of ApolloServer which requires the resolvers, schema, and context. The database is initiated in the context making it accessible to the resolvers. Here, we use the Knex.js package to reduce the lines of code that we need to write in order to query or mutate the database. The asynchronous callback in the context field awaits the initialization of the database and adds the database object to the context.

Next, we instantiate the Express server and extends it with the GraphQL server functionality.

Finally, the app.listen method starts the GraphQl server at the provided port 4000.

Run the npm start command to launch our GraphQl server and open http://localhost:4000/graphql in your browser to access the GraphQl playground.

Optimizing Graphql Queries

Right now, our demo application is working as expected but then, some extra requests are being made to the database which is quite expensive for a production-ready app.

Head over to your GraphQl playground and run the following query:

query {
  books {
    title
    author {
      id
      
    }
  }
}

Notice the following logs in the terminal:

database query id 1
database query id 2
database query id 2
database query id 3

Have you seen that the author with the id of 2 is requested from the database twice?

One way to prevent this unnecessary database request is to introduce the batching technique.

In GraphQl, Batching is an optimization technique that involves sending multiple queries to the server in one request, thereby preventing unnecessary database requests.

We will use the dataloader package to optimize our queries via the batching technique.

Install dataloader package with the following command:

npm install --save dataloader

Head over to src/index.ts and add the following:

...

import DataLoader from 'dataloader';
 
const dataLoaders = async () => {
  const db = await startDatabase();
 
  return {
    author: new DataLoader((ids) => {
      console.log('database queried', ids);
 
      return db('artists').whereIn('id', ids).select();
    }),
  };
};
 
const server = new ApolloServer({
  typeDefs,
  resolvers,
  context: async () => {
    const knex = await startDatabase();
    const loaders = await dataLoaders();
 
    return { knex, loaders };
  },
});
 
...

Here, we create data loaders to collect all the ids needed for our queries and use these ids to query the database at once. The dataloader package deletes any duplicate ids collected so as to reduce the number of requests sent to the database.
Next, we add the data loaders to the context object.

Also, we need to pass down the dataLoaderinstance for the author to our getAuthor resolver in order to complete the optimization.

In src/resolver.ts add loaders to the Context type and replace the getAuthor resolver function as follows:

type Context = {
  knex: any;
  loaders: any;
};
...
 
async function getAuthor(
  { author_id: authorId }: any,
  {},
  { loaders }: Context,
): Promise<any> {
 
  return await loaders.author.load(authorId);
}

Instead of querying the database from the resolver, we allow the dataLoader to handle them so as to eliminate duplicate ids.

Now if you run the same query again, you’ll see the following logged in your terminal:

database queried [1, 2, 3]

From this response, you can see that the duplicate request we had earlier is eliminated making our queries more optimized.
Now, instead of sending four queries to the database, a single query is made with unique ids.

Caching

query {
  books {
    title
    author {
      id
    }
  }
  library(id: 1) {
    books {
      id
      title
      author {
        id
      }
    }
  }
}

Consider the above code block where we query for all books, but we also want to query books in a library and their author Here, we query the database for all books first followed by a query for all the books in a library. Since we are already using the dataloader package, per-request caching is enabled by default hence preventing fetching of duplicate data.

Any author that is fetched in the first query is cached and if the second query tries fetching an author that is cached, instead of sending another request to the database, it will simply fetch the cached author thereby increasing the speed of the application.

Conclusion

In this tutorial, you have created an optimized GraphQL server that used batching and per-request caching optimization techniques to improve the performance of our GraphQl server.

I hope you have learned a great deal from this tutorial. Do reach out in the comment section below if you have any questions or suggestions.

Here is the Github repo for the final version of our project.

Leave a Reply