21Oct
Verifying an Email Address Without Sending an Email in NodeJS
Verifying an Email Address Without Sending an Email in NodeJS

Almost every platform on the internet needs to be able to uniquely identify its users, and email addresses are the most common mechanism for achieving this. However, they are fraught with all manner of issues – from spam accounts to blatant trolls, perpetrated by bots and other malicious actors.

All user-provided input should be validated before being saved to the database, and doubly so for email addresses, considering the important role they serve.

Most systems implement this functionality by running a simple regex check against the email address for syntax validation and sending an email to the user-provided address. However, this might not be enough for a lot of systems.

Why isn’t validation enough?

The simplest way to weed out bad email addresses is by validating them. Validation only ensures that the email address provided by the client is syntactically correct. In the strictest terms, a valid email address should correspond to the RFC 2822 specification. Most applications can, however, get away with being a bit stricter than the specification allows in order to weed out bad actors.

Email validation is a relatively simple problem. All you need is a simple regex and you’re fine. If the requirements for your application are less prohibitive, you could even get away with simply detecting whether there’s an ‘@’ sign in the provided email address.

Email verification is a different problem on its own. Applications need to be able to verify email addresses in order to:

i) Ensure that the email exists.
ii) Ensure that the email belongs to that particular user.

This guide is going to focus on the first part of the problem (making sure that the given email address exists.) If your application needs to verify ownership of an email address, you probably won’t be able to escape sending an actual email to the user.

Verifying an email address

A lot of different factors can go into verifying an email address, the number, and scope of which are best left to individual applications. However, some common issues you want to be on the lookout for include:

  • Non-existent email addresses
  • Temporary email address
  • Common typos

Non-existent email addresses

Non-existent email addresses come in two forms – those with valid domain names and those with invalid ones.

The latter type of email addresses are relatively easy to weed out – we can perform a simple DNS check to make sure the domain actually exists. If it doesn’t, the email address is obviously fake.

Fake email addresses with a valid domain name are comparatively more difficult to detect. A crude way of spotting them would be sending an actual email address and waiting for the message to either be delivered or bounce. A bounced email will clearly indicate an invalid email address.

We’ll explore a more sophisticated way of achieving the same result later on.

Temporary email addresses

Temporary email addresses are ephemeral – they exist for a short time and will most likely never be assigned to another user in the system. They aren’t a problem by themselves – there are a lot of legitimate uses for them – but they tend to be quite popular within spammer circles.

If your platform receives a high incidence of such email addresses, you should probably start filtering them out at the point of registration.

Detecting common typos

Lastly, you should also be on the lookout for common typos. This will mostly be useful for correcting common spelling mistakes by legitimate users, but could also come in handy for rejecting accounts by people trying to cheat your verification checks.

Bootstrapping the project

We are going to create a simple endpoint that accepts an email and password in the request body and returns a 200 status code if the email is valid or a 400 error code with a message explaining why the request failed otherwise.

We are going to need just one dependency: Deep Email Validator

To install it, run:

npm install deep-email-validator

This dependency is going to take care of all the heavy lifting for us. It carries out the following tests:

Validates the email regex

  • Checks for common typos
  • Determines if an email was generated by disposable email service.
  • Checks for MX records on the DNS server.
  • Determines if the SMTP server is running.
  • Validates the existence of the mailbox on the SMTP server.

Our logic looks like this:

const express = require('express');
const router = express.Router();
const emailValidator = require('deep-email-validator');

async function isEmailValid(email) {
 return emailValidator.validate(email)
}

When this function is run, it returns a response similar to:

{
  valid: false,
  validators: {
    regex: { valid: true },
    typo: { valid: true },
    disposable: { valid: true },
    mx: { valid: true },
    smtp: { valid: false, reason: 'Mailbox not found.' }
  },
  reason: 'smtp'
}

The validator runs the five different checks and gives us a result indicating whether the provided email address meets each of these criteria.

Let’s create a simple route that makes use of this logic:
“`

router.post('/register', async function(req, res, next) {
  const {email, password} = req.body;

  if (!email || !password){
    return res.status(400).send({
      message: "Email or password missing."
    })
  }

  const {valid, reason, validators} = await isEmailValid(email);

  if (valid) return res.send({message: "OK"});

  return res.status(400).send({
    message: "Please provide a valid email address.",
    reason: validators[reason].reason
  })

});

That marks the end of the actual code we will need to validate an email address without sending an email address in NodeJS.

But as it stands, everything is done inside a black box. It helps to understand what’s going on inside the library so that we can set realistic expectations of our system and know how to deal with issues when they arise.

How it Works Internally

Detecting common typos

Simple way users might try and circumvent email validation checks is by slightly changing the spelling of the domain part of their address. (Alternatively, of course, this might be a completely innocent error on the user’s part.)

Deep Email Validator relies on Mailcheck internally. It is a simple email typo-detection library. It contains a list of common domain names and uses the Sift4 algorithm to find the distance between the domain of the provided email and the internal list.

We won’t get too deep into string manipulation, but it would be useful to be familiar with Levenshtein Distance in order to understand how the algorithm works. This is the minimum number of single-character edits that have to be made in order to transform one string to another.

For instance, ‘[email protected]’ and ‘[email protected]’ have a LD of 1 – it requires just one edit (a deletion of the ‘o’ character) to transform the first string to the second. However, ‘[email protected]’ and ‘[email protected]’ have a LD of 6 – it requires 6 deletions (removing the ‘yandex’ part) to make the two strings equal. Therefore, it’s very likely the user meant to input ‘[email protected]’.

Detecting a temporary email address

Temporary email addresses are detected using disposable-email-addresses. It maintains a list of common disposable email address hosts as scraped from Wikipedia. These are matched and verified against the provided email address.

Checking for MX records

Every email address is composed of the local-part and the domain.

A Mail Exchanger (MX) record specifies the email server responsible for accepting email messages on behalf of a domain name. Therefore, we can query the DNS servers using Node’s in-built ‘dns’ module to find out which MX records are valid. Domains that don’t exist or have invalid MX records will produce an error.

This is what the library does internally.

Determining whether an SMTP server is online

In order to actually send an email address, the Mail Transfer Agent (MTA) is what’s responsible for connecting to the SMTP server, querying for the mailbox and delivering the email if the address exists.

We can use Node’s in-built ‘net’ library to connect to the default SMTP port 25 and query whether or not the given mailbox exists. If it does, the server responds with a 250 OK message. The above functionality has been implemented in this file.

Potential pitfalls

While this solution works pretty well, it’s not perfect. If your system receives a lot of fake email registrations, pinging the email inbox to determine whether an email exists or not could dramatically cut down on the number of bad data in your database.

However, it’s worth considering whether adding this functionality to your app is worth the time and effort that comes with maintaining the code.

For most applications, in fact, this validation method alone won’t be enough. You will probably want to verify ownership of the email address rather than its existence alone. The simplest and most efficient way of doing that today is by sending a link or code that the user can click or enter into your system in order to uniquely identify them.

Consider an email address such as ‘[email protected]’, for example. While it’s perfectly valid syntactically and will pass all of our tests, it obviously can’t belong to a genuine user.

Since it’s impossible to keep all the bad actors out, make sure your system implements a robust form of account activation and deactivation. Only activated (verified) users should be able to access crucial parts of your application. Storage is cheap. A single email address and other randomly-filled columns won’t do much to dent your storage, unless your application has serious spam issues.

Lastly, the code for this application can be found here.

Conclusion

Verifying email addresses is a crucial part of keeping out invalid accounts from your system. You can do so without sending an email address, as is customary, but extra precaution will probably have to be taken in order to make sure account ownership is also verified in addition to syntax validation.

Leave a Reply