The more we think about the term “development”, the more aspects come to mind: front-end and back-end, commercial and free, proprietary and open-source, specific frameworks and fundamentals. We cover all of these topics in our blog, but talking about fundamentals always feels special.
Data structures are a staple of many technical interviews: together with sorting algorithms interview questions, they appear in every “Learn this to become a better developer” bullet list. Their importance, therefore, should not be underestimated: they’re an essential skill which can really make or break the entire development process.
There are a plethora of various data structures available, so we’ll focus on the most fundamental and important ones — arrays, linked lists, stacks, and binary trees. In this article, we’ll visualize how they work, examine their strengths and weaknesses, and provide you with some awesome learning resources.
Why is it important to learn data structures?
Many seasoned developers, when asked about the single tip they could give to beginners, say: Learn the fundamentals. Modern technologies and frameworks, in particular, are so alluring because they’re, well, modern: people are always talking about them, showcasing cool new projects made with Framework X or Framework L, and encouraging others to adopt the new technologies.
Thankfully, various tech publications try to offer more diverse content. Our blog, for instance, aims to cover different topics — from fundamentals including sorting algorithms visualization to tech-specific including materials like the recent JavaScript interview questions. Dev.to is another great example.
Fundamental knowledge, therefore, means that you don’t only know a single “advanced concept”; rather, you understand how the entire computer system functions. This knowledge encompasses such aspects as:
- Data structures.
- Algorithms.
- Software quality and performance.
- Writing readable code (a rather simple tip which is often overlooked)
- Design patterns.
- Networking.
- Test writing.
- Caching and memory hierarchies.
- Object-oriented design and design patterns(or equivalents in your favorite paradigm).
- Managing complexity and abstraction effectively.
For some aspiring developers, the list above may seem overwhelming or exaggerated. This raises an important question: Should all developers be required to have proficiency in, say, data structures? This question would probably make a great article in and of itself, but we can address this problem from a different perspective.
When discussing concepts like data structures (and their importance in the job search process as well), the general consensus of the front-end community seems to follow this logic: ”My skills are more oriented towards the visual part of the projects: design, experience, interface, and so forth. I do not interact with the projects’ under-the-hood logic in any way. Why am I forced to learn data structures and algorithms for the technical interview, only to never actually use these skills on the job?”
This is a perfectly valid criticism: data structures are heavily tied to the project’s business logic — and front-end web development has a different function. This job requirement can often stem from companies who don’t really have a clear vision/understanding of each team member’s function — and in those cases, you’ll just have to roll with it and accept the fact that not all companies read our blog.
So what are data structures?
Data structures are formats that specify how data should be organized, managed, and stored in order to be accessed and modified efficiently. This brief definition holds several key points which can help us digest it better, so let’s break them down:
- The format is a defined “mini-system” of arranging something.
- Organizing, storing, and managing data are the main data-related operations which are heavily dependent on one another.
- “Efficiently” is a crucial requirement because efficient data structures defeat its own purpose completely.
Renowned computer scientists Peter Wegner and Edwin D.Reilly in their Encyclopedia of Computer Science provide another definition which is a bit more technical and precise: A data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.
This definition has another important term: data value, which is the information that essentially characterizes the data variable. For instance, a database titled “Remote Team Alpha” will probably contain various employee attributes like name, age, address, role, and so forth; these data variables only describe the entities. Data values, on the other hand, provide the real meaning.
So how do data structures function?
One of the most important abilities of any electronic device is working with memory. To find the given data piece, they use pointers which are strings denoting a memory address. Different data structures, therefore, define the algorithms which manage how the device should store and fetch the data: an array, for instance, bases its workflow around arithmetic operations, while a list utilizes address storage.
Arrays
Organization: In this structure, elements are stored in a specific order in a contiguous memory block.
Accessing elements: Let’s take an array A of size N. A unique index ‘i’ is given to each memory location. This index is usually referenced as A[i] (another notation is Ai)
Remember the data values definition we’ve discussed earlier? In this case, ‘C’, ‘L’, ‘A’, ‘H’, ‘S’, and ‘E’ are all data values. It should be noted that they must usually be of the same type, but different programming languages enforce different rules, with some of them allowing for “multi-type” arrays.
Downsides and caveats: The inability to mix data types inside an array (in C, for instance) can be a major problem — for one, it really tests your attentiveness; it is also a suboptimal design decision even if some programming languages actually allow to mix them.
Another problem has to do with the fact that the array size cannot change. Let’s say we’re creating an array which with various operating systems: there’s Windows, there’s Linux… and so we naively create an array for two values: int x[2];
. But wait — we forgot macOS! In this case, in C-like languages, the only option that would allow us to add the third element is creating a new array and copying the data values from the old array to the new one. Phew! (Of course, there’s still the realloc
command which partially solves the problem via, as the name suggests, re-allocating the memory).
# courtesy of geeksforgeeks.org # Python code to demonstrate the working of # array(), append(), insert() # importing "array" for array operations import array # initializing array with array values # initializes array with signed integers arr = array.array('i', [1, 2, 3]) # printing original array print ("The new created array is : ",end=" ") for i in range (0, 3): print (arr[i], end=" ") print("r") # using append() to insert new value at end arr.append(4); # printing appended array print("The appended array is : ", end="") for i in range (0, 4): print (arr[i], end=" ") # using insert() to insert value at specific position # inserts 5 at 2nd position arr.insert(2, 5) print("r") # printing array after insertion print ("The array after insertion is : ", end="") for i in range (0, 5): print (arr[i], end=" ")
All in all, an array’s efficiency can be hindered if you don’t want to just store the elements, but also organize and interact with them. Sorting elements, for instance, is an important task — but it also makes element insertion much slower. As for dynamic resizeability, we can use another data structure which is…
Linked lists
Organization: In this data structure, data elements are stored as a linear collection. This collection is divided into nodes which contain data and a link; the link serves as a reference to the next element. The list’s end is signified by a node with a terminator.
Accessing elements: The elements’ order in a list doesn’t correlate with their placement in memory; instead, each element points to its neighbor (i.e. the subsequent element).
The most obvious advantage over a simple array is the extended functionality: thanks to the way linked lists are organized, managing elements (e.g. inserting or removing them) can be done more efficiently.
Downsides and caveats: The most glaring issue with linked lists lies in the random memory access — lack thereof, that is. Random access is a powerful method because it allows, as the name suggests, to locate and interact with arbitrary elements with constant execution speed. Along with efficient indexing and node operations (e.g. locating the last node, locating a node with certain data), random access isn’t possible.
Additionally, linked lists use much more memory than arrays — this is caused by pointers (which contain links as seen in the scheme above). Another caveat has to do with traversing through elements: due to linked lists’ sequential nature (i.e. organizing its flow strictly from Element A to Element B to Element C…), non-sequential navigation (e.g. backwards) is considerably slower and costlier.
# courtesy of codefellows.org class Node(object): def __init__(self, data=None, next_node=None): self.data = data self.next_node = next_node def get_data(self): return self.data def get_next(self): return self.next_node def set_next(self, new_next): self.next_node = new_next class LinkedList(object): def __init__(self, head=None): self.head = head def insert(self, data): new_node = Node(data) new_node.set_next(self.head) self.head = new_node def size(self): current = self.head count = 0 while current: count += 1 current = current.get_next() return count def search(self, data): current = self.head found = False while current and found is False: if current.get_data() == data: found = True else: current = current.get_next() if current is None: raise ValueError("Data not in list") return current
Stacks
Organization: Just like real-world stacks of physical objects, stacks as a data structure follow the same principle: data elements are placed on top of one another. This structure, therefore, is also known as the Last In — First Out (LIFO)
, while the structure’s end is called top of the stack.
Accessing elements: In a stack, only the last element can be accessed directly. The LIFO approach showcases how it’s organized: The main stack-related operations are push (add the element to the group), pull (remove the most recent element), and peek (learn about the stack’s top without changing it) .
Downsides and caveats: Obviously, accessing elements beneath the top becomes a hurdle. In a stack of 10 elements which are sorted from 1 to 10, accessing the Element 10 is trivial because it’s located on the top. To access Element 1, however, we first have to traverse through elements 8-2 — and this is both time- and resource-consuming.
# courtesy of dbader.org # How to use collections.deque as a stack (LIFO): from collections import deque q = deque() q.append('eat') q.append('sleep') q.append('code') >>> q deque(['eat', 'sleep', 'code']) >>> q.pop() 'code' >>> q.pop() 'sleep' >>> q.pop() 'eat'
Binary trees
Organization: Similar to the principle utilized by linked lists, this data structure is organized into nodes; each node contains the data element itself alongside references to the right and left elements. Each node has only a single parent (a single node connected above it) and can have any number of children (various nodes connected below it). The only exception is the topmost node which is also called root.
Accessing elements: Due to its nonlinear structure, elements in a binary tree can be accessed either via depth-first or breadth-first traversals — these depend on whether we’re accessing parents or children (e.g. visit left child → right child → parent or parent → left child → right child).
Downsides and caveats: The main disadvantage stems from the binary tree’s complexity: it requires O(logn) time to modify and access elements from a known location. In other data structures, on the other hand, this process can be achieved in a constant time frame.
# courtesy of @djra #!/usr/bin/python class Node: def __init__(self, val): self.l = None self.r = None self.v = val class Tree: def __init__(self): self.root = None def getRoot(self): return self.root def add(self, val): if(self.root == None): self.root = Node(val) else: self._add(val, self.root) def _add(self, val, node): if(val < node.v): if(node.l != None): self._add(val, node.l) else: node.l = Node(val) else: if(node.r != None): self._add(val, node.r) else: node.r = Node(val) def find(self, val): if(self.root != None): return self._find(val, self.root) else: return None def _find(self, val, node): if(val == node.v): return node elif(val < node.v and node.l != None): self._find(val, node.l) elif(val > node.v and node.r != None): self._find(val, node.r) def deleteTree(self): # garbage collector will do this for us. self.root = None def printTree(self): if(self.root != None): self._printTree(self.root) def _printTree(self, node): if(node != None): self._printTree(node.l) print str(node.v) + ' ' self._printTree(node.r) # 3 # 0 4 # 2 8 tree = Tree() tree.add(3) tree.add(4) tree.add(0) tree.add(8) tree.add(2) tree.printTree() print (tree.find(3)).v print tree.find(10) tree.deleteTree() tree.printTree()
How to learn data structures
We cannot recommend Harvard University’s CS50: Introduction to Computer Science course enough as many of these lectures focus on data structures. The teaching method of David J. Malan (CS50’s main lecturer), however, is special: David excels to blending complex concepts with engaging metaphors. Just how interesting is this course? Well, here’s a screenshot from the data structures lecture to get you interested:
To practice your knowledge, you can utilize various tech prep platforms. Leetcode is arguably the best platform to prepare for technical interviews on; HackerRank and CodeWars also have dedicated data structure categories. The learning process can feel overwhelming, so here are some tips you can utilize:
- Use your preferred language: It’s tempting to jump into C or C++ head-first because ”it’s the right way” or ” all real programmers do it”; data structures, however, are mostly language-agnostic, so learning them in Python is equal to learning them in C.
- Learn how to solve easier exercises quickly and without mistakes — they form a crucial basis for more complex algorithms and structures.
- Don’t overuse the “Run” button because most technical interviews are “whiteboard interviews” (i.e. the interviewee writes code on a whiteboard without using an IDE ) — this way, you’ll be better prepared for this kind of stress.
(We actually have a more extensive list of such resources, wink-wink)
Conclusion
Data structures are essential for most programmers because they’re the backbone of how all software work. Information rules the world — and the ability to organize the information in an efficient manner is the ultimate prerequisite for building quality software. In this journey of mastering computer science fundamentals, you can always count on our blog — we’ve got you covered with even more awesome articles in the making. 🙂