20Nov
Automated Postgresql Backups with NodeJS and Bash
Automated Postgresql Backups with NodeJS and Bash

The database is the holy grail of your application, but, given the unpredictable nature of software, you should always be prepared for the possibility of media, hardware and software failures. If any of these failures occurs, the main objective should be to ensure the database is back up and running as fast as possible while minimizing user disruption and simultaneously ensuring there is no data loss.

The best way to achieve this is to have a detailed backup-and-restore procedure documented in case anything goes wrong with the production database. Sprinkle a bit of automation so the process takes care of itself and you will have saved yourself from some frustrating future headaches.

It’s entirely possible to accomplish this with bash alone, but bash scripting can get complicated and boring pretty fast – the syntax is unintuitive and it’s not the most beginner-friendly language out there, especially for people more used to modern languages like Node. Our philosophy is to reserve those bash muscles for when we really need them.

Backing up a postgres database

Postgres ships with two handy utilities that help us easily backup and restore our databases – pg_dump and pg_restore.

pg_dump is an in-built way of making consistent database backups backups (also referred to as ‘dumps’) in a number of different formats and across different platforms. It also provides a schema-only feature for people looking to preserve their database schemas and not the data.

If you need to restore and sort the data, Postgres provides the pg_restore utility. Internally, it uses sql commands such as DROP DATABASE and CREATE DATABASE bypassing normal restrictions present when using commands such as dropdb. In order to use pg_restore, however, your database output format should not be a plain text file.

Of note should be the fact that pg_dump and pg_restore were created to be as platform-agnostic as possible. You’ll probably need to do a bit of work if you want to transfer a Postgres database backup to a new OracleDB database, but backing up and restoring on any SQL platform should work just as well.

In order to backup a database, we run the following command:

pg_dump -U cray -d my_database -f database-backup-04.09.20.tar -F t
  • -U – is used to specify the database user to run the command with.
  • -d specifies the database to be dumped
  • -f indicates that we want to output all the data to a file
  • -F specifies the file format. i.e. ‘t’, which is ‘.tar’

The resulting output is an archive file contains multiple ‘.dat’ files, which contains your data and a ‘restore.sql’ file. The sql file is only included for our benefit.

In order to restore the above database using pg_restore, we run:

pg_restore -cC -d t25_dev bootstrap.tar
  • -c  – drop any existing database objects before restoring them. If the tables in the backup database already exist in your actual database, they will be deleted before being re-added.
    Note: using this option might result in a lot of ‘errors.’ These can be safely ignored, as stated at the end of the output.
  • -C – create the database before restoring into it. When this option is used together with -c like above, the database is dropped and recreated.

pg_restore runs SQL statements internally and isn’t limited by any connected clients. Destructive commands like dropdb warn you when there are any connected clients, pg_restore won’t.

Running bash commands in NodeJS

Our weapon of choice for running bash commands is @getvim/execute. The library goes into a lot of effort to abstract away pitfalls that come with bash, like having to learn ‘awk’ for simple string manipulation and the aforementioned cumbersome syntax. Besides, it has some interesting perks, like being able to switch context between Javascript and bash quite easily.

In addition, we also need to install dotenv in order to take advantage of environment variables. These help us keep sensitive information like passwords out of version control systems like git.

To install both dependencies, run:

npm install @getvim/execute dotenv

Then create a file named ‘.env’ in the root folder and enter the following data:

DB_USER=<your database username>
DB_NAME=<your database name>
PGPASS=<your postgres password> #used by pg_dump directly

Our code for scanning the database and creating a backup file then like this

const {execute} = require('@getvim/execute');
const dotenv = require('dotenv');
dotenv.config();
const username = process.env.DB_USERNAME;
const database = process.env.DB_NAME;
const date = new Date();
const currentDate = `${date.getFullYear()}.${date.getMonth() + 1}.${date.getDate()}.${date.getHours()}.${date.getMinutes()}`;
const fileName = `database-backup-${currentDate}.tar`;

function backup() {
    execute(`pg_dump -U ${username} -d ${database} -f ${fileName} -F t`,).then(async () => {
        console.log("Finito");
    }).catch(err => {
        console.log(err);
    })
}

backup();

The restore method looks just as similar:

function restore() {
    execute(`pg_restore -cC -d ${database} ${fileNameGzip}`).then(async () => {
        console.log("Restored");
    }).catch(err => {
        console.log(err);
    })
}

Of course, in practice, the restoration command has to be fine-tuned to your preferences. You definitely don’t want to drop and recreate a live production database. In addition, the potential for damaging or losing data is very real during this step. To mitigate this, you should preferably do the restoration on an empty database rather than one that already contains objects. When in doubt, consult the pg_restore documentation,

This would be enough if not for the fact that a backup stored on the same machine isn’t much of a backup. We still need to send it to the database server.

Compressing the archive file.

But first, we need to compress the archive file. This results in a much smaller payload for when we have to do the actual uploading.

While it’s possible imprement a gzipper on your own, in fact, the NodeJS documentation on compression is quite comprehensive, it’s a lot more trouble than it’s worth to do from scratch.

Instead, let’s use the handy gzipme package to take care of that.

To install it, run

npm install gzipme -S

And we’ll only change the `backup` method slightly:

const compress = require('gzipme');
const fileName = `database-backup-${currentDate}.tar`;

function backup() {
    execute(`pg_dump -U ${username} -d ${database} -f ${fileName} -F t`,).then(async () => {
        //added line
        await compress(fileName);
        fs.unlinkSync(fileName);
        console.log("Finito");
    }).catch(err => {
        console.log(err);
    })
}

This will create a new ‘.tar.gz’ file in your root directory that’s much smaller than the original file. While this step isn’t strictly necessary, it results in a smaller payload, adding a lot more efficiency to the process. If you have a lot of data to back up, then this step might be crucial. Once the file has been compressed, we also delete the original file.

Uploading the backup file to a new machine

At this point, you’re free to let your imagination run wild. Once the archive file is created on your machine, you can transport it to the next server in whichever fashion is most convenient. E.g. you could run another bash script that relies on a program such as ‘rsync’ to transport files to another server, a program that sends the file to your email address or even uploads it to your preferred cloud provider.

To keep it simple, a simple request using axios looks like this:

function sendToBackupServer(fileName = fileNameGzip) {
    const form = new FormData();
    form.append('file', fileName);
    axios.post('http://my.backupserver.org/private', form, {headers: form.getHeaders(),}).then(result => {
        // Handle result…
        fs.unlinkSync(fileNameGzip);
        console.log(result.data);
    }).catch(err => {
        // log error, send it to sentry... etc
        console.error(err);
    });
}
    

Once the upload process has been completed successfully, the backup files are deleted from the current machine so they don’t take up too much space.

Automating the process with a cron job

Cron is a useful utility on linux that is used to schedule when tasks should run. These tasks, referred to as ‘cron jobs,’ can be run periodically in order to perform tasks like backing up a database, deleting log files and running other system maintenance tasks.

The cron syntax consists of five fields separated by whitespaces – i.e. – “* * * * *” – which are explained below

* * * * * *
| | | | | | 
| | | | | +-- Year              (range: 1900-3000)
| | | | +---- Day of the Week   (range: 1-7, starting from Monday)
| | | +------ Month of the Year (range: 1-12)
| | +-------- Day of the Month  (range: 1-31)
| +---------- Hour              (range: 0-23)
+------------ Minute            (range: 0-59)

Depending on the crontab version your system relies on, it could accept finer levels of granularity. For instance, the finest level supported by the cron package is seconds.

To install it, run

npm install node-cron -S

If we wanted to run the expression every two hours, we’d use the following expression – 0 */2 * * * (as borrowed from crontab.guru).

The function to achieve the desired result would then be

function startSchedule() {
    cron.schedule('0 */2 * * *', () => {
        backup()
        sendToBackupServer();
    }, {});
}

Ultimately, all our code is contained in a single file

const { execute } = require('@getvim/execute');
const compress = require('gzipme');
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
const cron = require('node-cron');

const dotenv = require('dotenv');
dotenv.config();

const username = process.env.DB_USER;
const database = process.env.DB_NAME;
const date = new Date();

const currentDate = `${date.getFullYear()}.${date.getMonth() + 1}.${date.getDate()}.${date.getHours()}.${date.getMinutes()}`;
const fileName = `database-backup-${currentDate}.tar`;
const fileNameGzip = `${fileName}.tar.gz`;

function backup() {
    execute(
        `pg_dump -U ${username} -d ${database} -f ${fileName} -F t`,
    ).then(async ()=> {
        await compress(fileName);
        fs.unlinkSync(fileName);
        console.log("Finito");
    }).catch(err=> {
        console.log(err);
    })
}

function restore() {
    execute(`pg_restore -cC -d ${database} ${fileNameGzip}`)
        .then(async ()=> {
            console.log("Restored");
        }).catch(err=> {
        console.log(err);
    })
}

function sendToBackupServer(fileName = fileNameGzip) {
    const form = new FormData();
    form.append('file', fileName);
    axios.post('http://my.backupserver.org/private', form, {
        headers: form.getHeaders(),
    }).then(result => {
        // Handle result…
        console.log(result.data);
        fs.unlinkSync(fileNameGzip);
    }).catch(err => {
        // log error, send it to sentry... etc
        console.error(err);
    });
}

function startSchedule() {
    cron.schedule('0 */2 * * *', () => {
        backup()
        sendToBackupServer();
    }, {});
}

module.exports = {
    startSchedule
}

That can then be used like so

// index.js

const {startSchedule} = require('./backup-database');

startSchedule();

Link to Github project: https://github.com/Bradleykingz/automated-postgres-backups-with-node

Conclusion

In this article, we explored how NodeJS can be used in combination with pg_dump, pg_restore and cron jobs to automatically backup your data and send it to a third-party database. This process works the same for any SQL database, but there are caveats to transferring data between different database platforms.

Depending on your needs, your backup process will be more or less intensive than what we’ve outlined here. You many need to backup all your databases, not just one, in addition to ensuring database roles can be restored, or maybe all you need is the database schema, not the data. pg_dump is quite powerful and will probably suit your needs with a bit more tinkering.

4 Replies to “Automated Postgresql Backups with NodeJS and Bash”

  1. errno: -4058,
    code: ‘ENOENT’,
    syscall: ‘spawn /bin/bash’,
    path: ‘/bin/bash’,
    spawnargs: [
    ‘-c’,
    ‘set -euo pipefail\n’ +
    ‘pg_dump -h 192.168.1.11 -U postgres -d feemanager -f database-backup-2021.7.13.10.34.tar’
    ],
    cmd: ‘set -euo pipefail\n’ +
    ‘pg_dump -h 192.168.1.11 -U postgres -d feemanager -f database-backup-2021.7.13.10.34.tar’,
    stdout: ”,
    stderr: ”
    }

    got this error on execution

    1. Have you found the solution? Above method is good for localhost but it is not working with remote databases.

  2. This code is working fine with localhost but, what if I want to create a dump from a remote database.

    1. hello same error for me someone please get me out

Leave a Reply