24Jan
Advanced NodeJs
Advanced NodeJs

What makes Node.js so performant and Scalable? Why is Node the technology of choice for so many companies? In this article, we will answer these questions and look at some of the advanced concepts that make Node.js unique. We will discuss:

  1. Event Loop ➰
  2. Concurrency Model 🚈
  3. Child Process 🎛️
  4. Threads and Worker Threads 🧵

JavaScript developers with a deeper understanding of Node.js reportedly earn 20% ~ 30% more than their peers. If you are looking to grow your knowledge of Node.js then this blog post is for you. Let’s dive in 🤿!!

What happens when you run a Node.js Program?

when we run our Node.js app it creates

  • 1 Process 🤖
  • 1 Thread 🧵
  • 1 Event Loop ➰

A process is an executing program or a part of an executing program. An application can be made out of many processes. Node.js runtime, however, initiates only one process.

A thread is a basic unit to which the operating system allocates processor time. Think of threads as a unit that lets you use part of your processor.

An event loop is a continuously running loop (just like a while loop). It executes one command at a time, more on this later. For now, let’s think of it as a while loop that will run until Node has executed every line of code.

Now, let’s take a look at how our code runs inside of Node.js instance.

console.log('Task 1');
console.log('Task 2');
// some time consuming for loop
for(let i = 0; i < 1000000000; i++) {
}
console.log('Task 3');

What happens when we run this code? It will first print out `Task 1` then `Task 2` and then it will run the time consuming for loop (we won’t see anything in the terminal for a couple seconds) and finally it will print out `Task 3`. Let’s look at a diagram of what’s actually happening.

Component 1
Component 1

Node puts all our tasks into an Events queue and sends them one by one to the event loop. The event loop is single-threaded and it can only run one thing at a time. So it goes through Task 1 and Task 2 then the very big for loop and then it goes to Task 3. This is why we see a pause in the terminal after Task 2 because it is running the for a loop.

Now let’s do something different. Let’s replace that for loop with an I/O event.

console.log('Task 1');
console.log('Task 2');
fs.readFile('./ridiculously_large_file.txt', (err, data) => {
    if (err) throw err;
    console.log('done reading file');
    process.exit();
});
console.log('Task 3');

Pro tip: you can generate a 100mb file in linux or mac just by running this command `dd if=/dev/urandom of=ridiculously_large_file.txt bs=1048576 count=100`

We would naturally assume that this will output something similar. Just like the for loop reading big files takes time and the execution on the event loop will take some time. We however, get something totally different.

Task 1
Task 2
Task 3
done reading file

But what caused this? How did Task 3 get executed before the file was read. Well let’s take a look at the visuals below to see what’s happening

Component 2
Component 2

I/O tasks, network requests, database processes are classified as blocking tasks in Node.js. So whenever the event loop encounters these tasks it sends them off to a different thread and moves on to the next task in events queue. A thread gets initiated from the thread pool to handle each blocking tasks and when it is done, it puts the result in a call-back queue. When the event loop is done executing everything in the events queue it will start executing the tasks in the call-back queue.  So that’s why we see `done reading file` at the end.

What makes the Single Threaded Event Loop Model Efficient? ⚙️

JavaScript was created to do just a simple things in the web browsers such as form validation or simple animations. This is why it was built with the single-threaded event loop model. Running everything in one thread is considered as a disadvantage.

However, in 2009 Ryan Dahl the creator of Node saw this simple event loop model as an opportunity to build a lightweight web server.

To better understand what problem Node.js solves we should look at the what typical web servers were like before Node.js came into play.

This is how a traditional multi-threaded web application model handles request:

  1. It maintains a thread pool (a collection of available threads)
  2. When client request comes in a thread is assigned
  3. This thread will take care of reading Client requests, processing Client requestS, performing any Blocking IO Operations (if required) and preparing Response.
  4. This thread is not free until a response is sent back

Main drawback of this model is handling concurrent users. So let’s say if we have more users visiting our sites than there are available threads then some users will need to wait until a thread frees up to get response. If a lot of users are performing blocking I/O tasks then this wait time also increases. This is also very resource-heavy if we are expecting one million concurrent users we better make sure we have enough threads to handle those requests.

Moreover, the server itself start to slow down because of increasing load. There’s also the overhead of context switching between threads and writing applications to optimize threads resource sharing can be painful.

Because of the single-threaded model Node.js, it doesn’t need to spin off new threads for every single request. Node.js also delegates blocking tasks to other components as we saw earlier. Since we don’t really care about many threads it makes node.js very lightweight and ideal for microservice-based architecture.

Drawbacks of Node’s Single Threaded Model !!!

The single-threaded event loop architecture uses resources efficiently but it doesn’t have some drawbacks. The Node.js instance cannot immediately benefit from multiple cores in your CPU. A Java application can have immediate access to more memory as we upgrade our hardware but Node runs on a single thread.

This is 2020 😄 and we are seeing more and more complicated web applications. What if our application needs to do complex computation, run a machine learning algorithm? Or What if we want to run a complicated crypto algorithm? In this case we have to harness the power of multiple cores to increase performance.

Languages like Java and C# can programmatically initiate threads and harness the power of multiple cores. In Node.js that is not an option as we saw earlier. Node’s way of solving this problem is child_process.

Child Process in Node

The child_process module gives node the ability to spawn child process by accessing operating system commands.

Let’s assume we have a REST endpoint that has a long-running function and we would like to use multiple cores in our processor to execute this function.

Here’s our code

const { fork } = require('child_process');
 
app.get('/endpoint', (request, response) => {
   // fork another process
   const process_ml_algo = fork('./process_data.js');
   const data = request.body.data;
   // send send the data to forked process
   process_ml_algo.send({ data });
   // listen to forked process 
   process.on('ml_algo', (result) => {
     log.info(`ml_algo executed with ${result}`);
   });
   return response.json({ status: true, sent: true });
});
// receive message from master process
process.on('ml_algo', async (message) => {
    const result = await runMachineLearningProcess(message.mails); 
 
    // send response to master process
    process.send({ result: result });
});

In the example above we demonstrate how we can spin off a new process and share data between them. Using the forked process we can take advantage of multiple cores of CPU.

You can take a look at all the methods of child processes in the official node docs.

Here is a diagram of how child process work

Component 3
Component 3

child_process is a good solution but there’s another option. child_process module spins off new instances of Node to distribute the workload all these instances will each have 1 event loop 1 thread and 1 process. In 2018 Node.js introduced worker_thread. This module allows node the ability to have

  • 1 Process
  • Multiple threads
  • 1 Event Loop per thread

Yes!! You read that right 😄.

Component 4
Component 4
const { Worker, workerData, isMainThread, parentPort } = require('worker_threads');
 
if (isMainThread) {
  const worker1 = new Worker(__filename, { workerData: 'Worker Data 1'});
  worker1.once('message', message => console.log(message));
  const worker2 = new Worker(__filename, { workerData: 'Worker Data 2' });
  worker2.once('message', message => console.log(message));
} else {
  parentPort.postMessage('I am ' + workerData);
}

We check if it is the main thread and then create two workers and pass on messages. On the worker thread the data gets passed on through `postMessage` method and the workers execute the command.

Since worker_threads makes new threads inside the same process it requires less resources. Also we are able to pass data between these threads because they have the shared memory space.

As of January 2020 worker_threads are fully supported in the Node LST version 12. I highly recommend reading up the following post if you want to learn more about worker_threads.

Node.js multithreading

And that’s it!!!

In this we looked over how the event loop model works in Node.js, we discussed some of the pros and cons of single-threaded model and looked at couple of solutions. We didn’t go over all the functionalities of child_process and worker_threads. However, I hope that this article provided you with a brief introduction to these concepts and why they exist. Please let me know if you have any feedback. Until next time 👋👋

7 Replies to “Advanced Node.Js: A Hands on Guide to Event Loop, Child Process and Worker Threads in Node.Js”

  1. alinaitova91 5 years ago

    So is the Concurrency Model in Nodejs pretty much the same as parallelism in Go?

    1. Shadid Haque 5 years ago

      Go Routines got a similar idea handling parallelism but I won’t call it same. But then again I am no expert in Go. I would definitely do some more research on Go and perhaps update or make a new article comparing them.

  2. cameleo.mm 5 years ago

    Hello,
    congratulations for your post 🙂
    I have three questions :
    – Why it requires less resources, can you develop more on that? : “Since worker_threads makes new threads inside the same process it requires less resources”
    – If i well understood, NodeJS create a new thread for each task among I/O, network and database requests. So why we say that Node is single-threaded ?
    – When we create a child process we have 1 process, 1 thread and 1 event loop. But if there is an I/O task (for example) Node will create a separate thread, so do we have 2 threads now in the child process ?

    1. Shadid Haque 5 years ago

      Excellent questions.
      1. Why threads requires less resources?

      Thread workers are more lightweight and share the same process ID as their parent threads. They can also share memory with their parent threads, which allows them to avoid serializing big payloads of data and, as a result, send the data back and forth much more efficiently. But you have to be careful while using threads. Use them for really intensive CPU tasks otherwise just use clusters because they are easy to implement and maintain.

      Here’s example of sharing memory between worker threads and parent threads. Processes don’t share memory at all. They are well different process from one another. They are like totally different program in a sense that are communicating

      “`
      import { parentPort } from ‘worker_threads’;
      parentPort.on(‘message’, () => {
      const numberOfElements = 100;
      const sharedBuffer = new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT * numberOfElements);
      const arr = new Int32Array(sharedBuffer);
      for (let i = 0; i < numberOfElements; i += 1) {
      arr[i] = Math.round(Math.random() * 30);
      }
      parentPort.postMessage({ arr });
      });
      “`

      2. I Node is single-threaded ?

      not quite. As you have already noticed we were creating new threads. So what happens here is this, in node we have a library called libuv which is written in c++ and handles those external threads. libuv has a number of available threads. However, Node's JS environment is single threaded. The event loop is one thread only

  3. hi! process.on(‘ml_algo’ …. what is “ml_algo” ? Is it not “message”?

  4. Shadid, what an informative article! Thank you for teaching me all of this knowledge! <3

  5. maybe you misunderstood, because the event queue is the place where the callback function of the event is stored, it does not contain the usual tasks, the normal tasks will be put directly into the callstack.

Leave a Reply