PostgreSQL vs MySQL: Is There a Clear Winner?
Data is all around us: events, dates, thoughts, ideas, and so on (plus the awesome articles we publish on our blog). Nowadays, more and more aspects of our lives have been digitized: the things we buy and the things we enjoy can be quantified, accounted for, and stored in a database. Therefore, databases — vessels for storing these precious assets — are crucial; managing and working with databases is an essential skill for many developers.
The world of relational (i.e. SQL-based) databases is rich with various database solutions. In this article, however, we’ll focus on two of them: PostgreSQL and MySQL. Although the topic of this article follows the “Technology A vs. Technology B” structure, we’ll try and stay objective. Without declaring that either PostgreSQL or MySQL reign supreme, we’ll instead analyze their differences, benefits they offer, and the problems they can introduce. After this article, do make sure to check our SQL interview questions out! 🙂
The Bigger Picture: Importance of Databases
Craig Kerstiens, head of the Cloud team at a Microsoft-owned company, provides a great write-up on the reasons why SQL is such an essential skill for almost every developer. The article is titled SQL: One of the Most Valuable Skills and details the language’s importance in three key points:
- SQL is a tool you can use anywhere. A product manager, for instance, can track key business metrics; web developer can manage the user base; data scientist can look up and analyze data.
- SQL is underappreciated and underlearned. This statement is echoed by a machine learning expert Vicki Boykis in her blog post titled Data science is different now. SQL is often viewed as an “obscure back-end-only technology” which the hip new generation doesn’t really need. Still, SQL is as in-demand as ever — and it becomes a real advantage when applying for jobs.
How MySQL Evolved in the Last Years
Database architecture is always easy to criticize because companies are usually reluctant to change it. Just like many websites are still running outdated technologies like JQuery, many companies are still using outdated MySQL versions. This problem has been highlighted by an SQL expert Markus Winand who often starts his conference speeches with a (somewhat) humorous question:
If you’re still using SQL-92, why aren’t you using Windows 3.1? It was released in 1992 as well!
Our experience tells us that this problem of “database conservativeness” is rooted in the costs. Switching to a different database system (or upgrading to a new version, for that matter) is too costly, so business/users are stuck with older software. Of course, this is a more typical scenario for businesses rather than individual users — the IT departments often cannot explain the benefits of investing X thousand dollars and Y man-hours into upgrading, and so all in-house developers of this company are stuck with, say, MySQL 3.0.
In this environment, it’s easy to miss all the changes that happened to MySQL over the years. A good example is the
check constraints which are ignored in the
create table statement. MySQL has always been criticized for this odd behavior — after all, other relational databases (Db2, Oracle DB, PostgreSQL, SQLite, etc.) don’t have this problem. Here’s the good news: MySQL fixed it! However, it only happened in April 2019, so it’s easy for most developers to end up out of the loop.
Upon acquisition by Oracle, MySQL has improved tremendously. The positive effects of this acquisition is outlined by Markus Winand in his blog post titled One Giant Leap For SQL: MySQL 8.0 Released:
- MySQL is one of the world’s most popular databases. →
- This means that MySQL is the de facto industry standard. MySQL has a plethora of problems and shortcomings, but it’s gradually improving (akin to PHP with the introduction of PHP 7) →
- As the industry standard improves, the whole industry follows in its wake — relational databases as a class become more viable choice for users. →
- Novice SQL programmers are typically exposed to the world of databases via MySQL (in large part due to its ubiquity). Their learning experience gets better as well; furthermore, as MySQL catches up, they learn that relational databases aren’t severely limited in their functionality and performance. →
- As MySQL implements new features, database-related software vendors (i.e. companies that produce content management systems, e-commerce software, etc.) are finally incentivized to do so as well.
Adopting Window Functions
A major new feature of modern MySQL is window functions. Its functionality is crucial because it allows you, among other things, to find the best N rows per group, move averages, and group consecutive events. While other databases embraced window functions a long time ago, MySQL only implemented it in version 8 (released in 2018).
But wait — MySQL’s implementation actually turned out to be the best one! Markus provides a neat visualization of how relational databases adopted these features:
Although MySQL is arguably the most popular database (at least out of free databases), its functionality has been lagging behind. The improvements made by Oracle show that the company is determined to prove that MySQL’s popularity is earned.
How PostgreSQL Evolved in the Last Years
The same logic we outlined in the previous section applies to PostgreSQL: 2019 saw a release of PostgreSQL 11. The first feature we should focus on is the
over clause, which determines the rows’ visibility to the given window function. PostgreSQL was the first database to support this clause; in version 11, it added support for the frame unit
group and frame exclusion. As of October 2019, PostgreSQL’s implementation remains the most up-to-date.
Speaking of frame exclusion, let’s see what this feature can offer. We can use it to delete selected rows from the given frame; the frames we delete must be related to the current row. This function provides several options to fine-tune its behavior:
EXCLUDE CURRENT ROWexcludes the row from the frame.
EXCLUDE GROUPremoves the current row along with all its peers from the frame.
EXCLUDE TIESremoves the peers of the current row, but not the current row itself.
A Few Words about Performance
The database’s performance is arguably the most crucial factor — as explained in the “Database as a Bottleneck” section below, inefficient queries can stall the performance of the whole project. A simpler answer would be “PostgreSQL performs better, what’s there to discuss?” This would be an oversimplification — and it can be easily countered by the fact that Facebook uses MySQL (albeit a modified version of it called MyRocks) to handle petabytes of data.
To sum it up, the database’s performance doesn’t really depend on the specific database itself — both PostgreSQL and MySQL can handle “heavy data” well. It all comes down to the rather simple optimization practices — be ready to invest some time into implementing them:
- Avoid overoptimization/premature optimization. It’s tempting to jump head-first into optimizing the database from day one. However, it would be more efficient to postpone any improvements: as project grows in size, user base, and data, you’ll keep tweaking your database architecture over and over — and this may render all of your optimization useless.
- Ensure proper indexing. A good example would be indexing the product’s ID instead of its name.
Database as a Bottleneck
In the IT world, one of the key metrics we like to use is performance — how fast does the software run? How well is it optimized? How can it be optimized even further? The drive to make the software perform well often turns into an unhealthy obsession, forcing the developers to compare various technologies solely on the basis of its performance.
Therefore, when discussing a topic like “Python vs. Java”, people ignore all other aspects that make the given programming language worth learning or using. “Python is slow! It’s a subpar choice for web development! Or any computation-heavy development, for that matter! Use C!”
If we take web development as an example, the not-so-obvious truth is this: Python is indeed not the fastest language, but the real bottleneck frequently lies in the database. This is why we often see the following scenario in numerous tech companies: instead of changing, say, their main programming language, they invest lots and lots of resources into switching to a different database. Let’s use Uber as an example: they made the switch in 2016 and explained it in their blog post titled Why Uber Engineering Switched from Postgres to MySQL.
When an article’s called “Technology A vs. Technology B”, it’s usually expected that the authors provide a definitive answer. Is Technology A ultimately better? Does Technology B actually reign supreme? In case of PostgreSQL and MySQL, there is no clear winner — before MySQL 8, we’d probably choose PostgreSQL for its rich functionality. Nowadays, however, MySQL is starting to catch up and even outpace other relational databases in some regards.
Another question you may be asking is “Which one should I learn?” To answer it, let’s look at these databases in perspective:
- PostgreSQL is still superior… for now.
- MySQL is catching up… but it still has a handful of problems to sort out. Its vendor, Oracle, is investing time and resources into making it
It’s tempting to try to guess which database will be better in, say, 5 years, but stating “Learn Database B — it’ll outpace Database A in just two years” would simply be fortune-telling. Our advice, therefore, is as follows: pick any of the two, learn it, and have no regrets.