You are on page 1of 17

PostgreSQL vs.

MySQL: Which is
Better for Analytics?
Looking into differences (and some similarities) between PostgreSQL and
MySQL to help you decide which one suits your needs best.

There are three main similarities between PostgreSQL and MySQL. They are highly popular, they’re
open-source, and they both use their own dialects to communicate with the databases.

They are for years among the most popular RDBMSes (or databases, as we’re going to call them
colloquially from now on). Data from Statista shows both databases in the four most popular databases
as of January 2022.
Of those four, only PostgreSQL and MySQL are open-source databases. Being open-source means these
two are completely free, which also has a significant (positive!) impact on database popularity. It’s not
the only benefit. The possibility that anyone can modify the software leads to flexibility and a community
providing quality support. It’s only multiplied by the popularity of the two databases and the sheer
number of its users and contributors.

All this means both PostgreSQL and MySQL dialects are almost unavoidable in the data science world.

But how do you decide which one you should use? Nobody has ever decided between two choices based
on their similarities. The differences are what make, well, the difference. As a data scientist, you’re
probably not that concerned about database architecture, performance, integrity, and all other stuff
RDBMS is for.

What will make you give a pass (or not) to a certain database are the possibilities of the SQL dialect and
how well it serves you to do your job.

That’s why we’re going to look into the syntax differences between PostgreSQL and MySQL and their
possibilities.

Here’s the overview of the differences we’ll cover, with detailed explanations following.
Postgres vs MySQL: Data Types
Speaking of possibilities, when working with data, one of the starting points to consider should be data
types. Knowing which data types a database allows give you information on how easy you’ll be able to
complete your project. Or if you’ll be able!

Difference Overview
Difference Category MySQL PostgreSQL

Data Types Supports fewer datatypes Supports more datatypes

MySQL Data Types


MySQL has five data type categories:
● Numeric Data Types
● Date and Time Data Types
● String Data Types
● Spatial Data Types
● The JSON Data Types
PostgreSQL Data Types
MySQL’s five data types seem rather basic compared to PostgreSQL:
● Numeric Data Types
● Monetary Data Types
● Character Data Types
● Binary Data Types
● Date and Time Data Types
● Boolean Data Type
● Enumerated Data Types
● Geometric Data Types
● Network Address Data Types
● Bit String Data Types
● Text Search Data Types
● UUID Data Types
● XML Data Types
● JSON Data Types
● Arrays
● Composite Data Types
● Range Data Types
● Object Identifier Data Types
● pg_lsn Data Type
● Pseudo-Types

Verdict: PostgreSQL Wins!


Basically, it doesn’t matter which dialect you use if you’re working with basic data types, such as
numeric, date-and-time, or text data. However, if you require a wider range of data types, some atypical
data types such as geometric data types, or working with unstructured data, learning PostgreSQL might
be a go-to. It’s also interesting to note that MySQL, unlike PostgreSQL, doesn’t have the Boolean data
type. Instead, the TINYINT(1) is used for such data.

Postgres vs MySQL: Case Sensitivity


Case sensitivity influences the ease of SQL code writing. If the dialect is case-sensitive, it means you have
to be very careful about getting this right. If the code is case-insensitive, you can relax and write
whatever you want in regards to the upper or lower case.
Difference Overview
Difference Category MySQL PostgreSQL

Case Sensitivity No Yes

MySQL Case Sensitivity


MySQL is case insensitive, which means you don’t have to think about the upper and lower case when
writing a query.

Let’s take this interview question by the City of Los Angeles to showcase this:

“Find all inspections which are part of an inactive program.”

Link to the question:


https://platform.stratascratch.com/coding/10277-find-all-inspections-which-are-part-of-an-inactive-prog
ram

For the sake of readability, we’ll only select two columns in the query otherwise the same as the official
solution. If your MySQL code looks like this, you’d get a correct answer.

SELECT serial_number,
program_status
FROM los_angeles_restaurant_health_inspections
WHERE program_status = 'inactive'

Don't worry if you check this solution and find the answer is not correct. It's only because we selected two
instead of all columns, like in the official solution.

Here are the first several rows of the output:


Even though the values in the column program_status are written as ‘INACTIVE’, you could write
‘inactive’ in the WHERE clause. In MySQL, this doesn’t matter. You could write ‘INACTIVE’, ‘inactive’ or
even go wild and write ‘iNaCtIvE’, the output will contain all the inspections that are part of an inactive
program, no matter the case.

The above code would cover all possibilities, even if the table was named
Los_Angeles_restaurant_health_inspections.

PostgreSQL Case Sensitivity


PostgreSQL is case sensitive. Here’s what that means in practical terms. To get the correct answer to the
above question, your code must be written like shown below.

SELECT serial_number,
program_status
FROM los_angeles_restaurant_health_inspections
WHERE program_status = 'INACTIVE'

What if you write it this way?

SELECT serial_number,
program_status
FROM los_angeles_restaurant_health_inspections
WHERE program_status = 'inactive'

The output is here. Sort of.

There’s nothing because your string in the WHERE clause doesn’t match the case of data in the column
program_status.

PostgreSQL is also case-sensitive when it comes to column and table names. Or, to be more specific,
case-sensitivity becomes important if your column and/or table name includes upper case. If the above
table was named Los_Angeles_restaurant_health_inspections and the columns are Serial_Number and
Program_Status, your PostgreSQL code should be:
SELECT "Serial_Number",
"Program_Status"
FROM Los_Angeles_restaurant_health_inspections
WHERE "Program_Status" = 'INACTIVE'

Names with the uppercase should be double-quoted and match the table or column case.

Some workarounds to make PostgreSQL case-insensitive include using citext extension, ILIKE operator, or
LOWER() function.

When the names are all in lowercase, and you write your query in uppercase, this query would still work:

SELECT SERIAL_NUMBER,
PROGRAM_STATUS
FROM LOS_ANGELES_RESTAURANT_HEALTH_INSPECTIONS
WHERE PROGRAM_STATUS = 'INACTIVE'

Verdict: MySQL Wins!


It’s simply more practical to write a code when you don’t have to think about the upper or lower case.
Working on any serious project would become painstakingly difficult otherwise. To get everything right,
you would have to know how exactly every table or column is named in terms of cases. The same goes
for the data in tables.

Postgres vs MySQL: Converting Character Sets


and Strings to UTF-8
UTF-8 is an encoding system that allows storing every possible character and symbol. Within this system,
you’re not limited to standard Latin letters. You could store any letter from any alphabet and various
other symbols. This even includes emojis.

Difference Overview
Difference Category MySQL PostgreSQL

UTF-8 Conversion required Conversion not required


MySQL Conversion to UTF-8
Depending on the version you’re working on, you might be required to convert to UTF-8.

PostgreSQL Conversion to UTF-8


PostgreSQL works completely the opposite. There’s no need to convert character sets and strings to
UTF-8. Not only that but the UTF-8 syntax is not allowed at all.

Verdict: PostgreSQL Wins!


Both dialects support the UTF-8 encoding system, which is great. This means you can work with all data
outside the typical Latin letters.

But, there’s a but! To do that in PostgreSQL, you don’t have to do anything. You simply use the characters
you want, and the database does the rest. In MySQL, you have one additional step, which is a
conversion. That’s why PostgreSQL is a winner here.

Postgres vs MySQL: Conditional Statement


The IF-THEN-ELSE or a conditional statement is common in programming languages. It is used to
compare data, and based on that comparison, the query returns (different) results. In other words, the
output depends on whether the set condition is met or not.

Difference Overview
Difference Category MySQL PostgreSQL

Conditional Statement IF() and NULLIF() functions CASE WHEN statement


MySQL Conditional Statement
MySQL allows you to use the IF() and NULLIF() functions when writing a conditional statement. The IF()
function is a classic IF-THEN-ELSE statement whose arguments specify the condition, the output if the
condition is true, and the output if the condition is false.

The question by ESPN will be our showcase here:

“Add a column to each row which will classify Olympics that athlete is
competing in as 'European' or 'NonEuropean' based on the city it was hosted.
Output all details along with the corresponding city classification.
European cities are Athina, Berlin, London, Paris, and Lillehammer.”

Link to the question:


https://platform.stratascratch.com/coding/10185-european-and-non-european-olympics

If you want to solve it in MySQL, you’d use the IF() function:

SELECT *,
IF (city = 'Athina'
OR city = 'Berlin'
OR city = 'London'
OR city = 'Paris'
OR city = 'Lillehammer',
'European',
'NonEuropean') AS city_classification
FROM olympics_athletes_events

PostgreSQL Conditional Statement


In PostgreSQL, the IF() and NULLIF() functions don’t exist. However, that doesn’t mean you can’t write
conditional statements. To do that, you’ll need to use the CASE WHEN statements.

Here’s how it works in a code:


SELECT *,
(CASE
WHEN city IN ('Athina',
'Berlin',
'London',
'Paris',
'Lillehammer') THEN 'European'
ELSE 'NonEuropean'
END) AS city_classification
FROM olympics_athletes_events

Both codes return the same result (which we’ll show only partially).

Verdict: PostgreSQL Wins!


The PostgreSQL wins here, too. While in the above example, both IF() function and the CASE WHEN
statement do the same thing in an equally easy way, conditions can get much more complicated. When
they do, the CASE WHEN then becomes much easier to write and read. It allows you to set out all the
conditions in a very tidy way. In the IF() statement, you’d probably have to nest multiple IF() functions,
which can get quite messy and is prone to errors.

Additionally, the CASE WHEN statement is standard SQL, so it will work on any other SQL-standard
database.

Postgres vs MySQL: Deleting Data and Tables


As you would probably expect, both SQL dialects support DELETE, DROP, and TRUNCATE commands. Still,
there are some differences in the allowed extensions of these commands.
Difference Overview
Difference Category MySQL PostgreSQL

DROP CASCADE No Yes

DROP TEMPORARY TABLE Yes No

TRUNCATE Yes, only TRUNCATE Yes, with enhancing options

MySQL DROP and TRUNCATE


When you want to DROP a table together with its dependent objects, such as other tables and views,
using the CASCADE keyword along with the DROP command would be very useful. Unfortunately, MySQL
doesn’t support it but is probably to be introduced in some future versions.

MySQL supports temporary tables. It also has the TEMPORARY keyword used in the DROP command to
delete only the temporary tables.

Regarding TRUNCATE, MySQL is really basic here. It allows you to TRUNCATE the table, and that’s it. No
additional possibilities like in PostgreSQL.

PostgreSQL DROP and TRUNCATE


Here, you can use DROP CASCADE to drop a table and its dependent objects.

While PostgreSQL supports temporary tables, it doesn’t have a special keyword used for deleting them.
To do that, you’ll use the DROP TABLE statement and specify the temporary table you want to delete as
you would do with any other table.

However, when it comes to the TRUNCATE statement, PostgreSQL offers much more flexibility. It has
features such as CASCADE (you can truncate dependent objects), RESTART IDENTITY (automatically
restarts sequences associated with the truncated table’s columns), CONTINUE IDENTITY (the default
argument that doesn’t change the values of sequences), and RESTRICT (the default argument not
allowing truncate if any tables are referenced by the other tables’ foreign key).

Verdict: PostgreSQL Wins!


The only MySQL benefit here is having the DROP TEMPORARY TABLE command. PostgreSQL allows you
to DROP CASCADE and offers several very useful options to go with TRUNCATE.
Postgres vs MySQL: Joining Tables and
Combining Data
Being a data scientist, your work heavily depends on the dialect’s ability to join two (or multiple) tables
and combine outputs of different queries into one dataset. Let’s see how MySQL and PostgreSQL fare in
this area!

Difference Overview

Difference Category MySQL PostgreSQL

FULL OUTER JOIN No Yes

INTERSECT and EXCEPT No Yes

MySQL Joining and Combining Data


In regards to JOINs, MySQL supports four types:
● INNER JOIN
● LEFT JOIN
● RIGHT JOIN
● CROSS JOIN

When it comes to combining data from two or more SELECT statements, it supports UNION and UNION
ALL. However, unlike in PostgreSQL, INTERSECT and EXCEPT operators are not supported.

PostgreSQL Joining and Combining Data


It offers all MySQL’s joins, with the additional one: a FULL (OUTER) JOIN.

To find out how all these joins work in practice, we prepared a guide with the interview question
examples.

PostgreSQL, too, supports UNION and UNION ALL. On top of that, there are also INTERSECT and EXCEPT
operators.
The INTERSECT operator returns all the rows appearing in both output data sets. The below image shows
a visualization of what this operator does.

The EXCEPT operator outputs rows from the first query that are not found in the second query. The
following picture shows how it works.
Verdict: PostgreSQL Wins!
With one more JOIN than MySQL and two more options for merging data from two or more SELECT
statements, no wonder PostgreSQL wins again.

Postgres vs MySQL: Window Functions


Window functions or analytical functions are used to achieve a new level of data analysis. While they are
similar to the aggregate functions in that they also aggregate data, the window functions do not collapse
individual rows. This translates to being able to show aggregate along with non-aggregate data.

Difference Overview
Difference Category MySQL PostgreSQL

Window Functions Yes, including some aggregate functions Yes, including all aggregate functions
MySQL Window Functions
MySQL offers the following window or analytical functions:
● ROW_NUMBER() – numbers the rows
● RANK() – ranks the rows, skips the rank sequence in case of ties
● DENSE_RANK() – ranks the rows, doesn’t skip the rank sequence in case of ties
● PERCENT_RANK() – returns the relative rank of a row
● CUME_DIST() – returns the cumulative distribution
● NTILE() – divides rows into groups
● LAG() – returns data from the previous number of rows
● LEAD() – returns data from the following number of rows
● FIRST_VALUE() – returns the first value
● LAST_VALUE() – returns the last value
● NTH_VALUE() – returns the nth value

The aggregate functions can also be used as window functions. These are the ones MySQL offers, and
the majority of them can be used as window functions.

PostgreSQL Window Functions


While PostgreSQL offers the same window functions as MySQL, it offers much more aggregate functions.
Not only that, but it allows all of them to be used as window functions, unlike MySQL.

To easier deal with the window functions in both SQL dialects, we prepared the ultimate guide to the
window functions in our blog. If you’re interested only in ranking window functions, you should look at
this introduction talking about RANK() and DENSE_RANK().

Verdict: PostgreSQL Wins!


There’s a win for PostgreSQL in this last category, too. Yes, there are the same window functions in both
MySQL and PostgreSQL. However, by offering more aggregate functions and allowing them all to be used
as window functions, PostgreSQL simply offers more possibilities when it comes to data analysis.
Should You Choose PostgreSQL or MySQL?
The answer is: choose PostgreSQL!

As you saw from our analysis, the only area where MySQL takes the win is its case insensitivity. Other
than that, PostgreSQL is much more suitable for the demands of a data scientist’s job. It’s no wonder
since PostgreSQL was built with analytics in mind.

While PostgreSQL is much more suitable for data science, it doesn’t mean you must use it. There are
some additional criteria you should consider.

Postgres vs MySQL: Syntax


Both PostgreSQL and MySQL are the SQL dialects. It’s no wonder that learning one dialect will allow you
to use the other dialect to some degree.

What makes these two dialects’ syntax differ is the level of compliance with the SQL standards.
PostgreSQL has a much higher compliance level (more than 150 SQL standards), while MySQL is partially
compliant.

If you learned standard SQL or simply want to stay closer to the standard (and maybe use some other
SQL standard compliant databases), it’s better you learn PostgreSQL. Also, if there’s a chance that you’ll
be working with other SQL standard compliant databases, PostgreSQL is a better choice.

In case you’re just starting, MySQL could be a better choice due to its fewer features, making it easier to
learn. However, due to its lower SQL standard compliance, you could be tied to this dialect, making it
harder to work with other databases that are more true to the SQL standard.

Postgres vs MySQL: Ease of Use


MySQL is more straightforward and user-friendly, making it especially beneficial for beginners. The
learning curve is not very steep, which makes it easier to start building your data science projects much
faster as you learn.

PostgreSQL is very robust and features-heavy. The learning curve could be pretty steep for beginners. If
your project is heavy and requires some sophisticated calculations, PostgreSQL with its features could be
a better choice.
Of course, popularity also plays its part. MySQL is much more ubiquitous, so there’s a better chance that
your employer will require MySQL, not PostgreSQL. So what your desired employers use could also
influence your decision between two dialects.

Postgres vs MySQL: Performance


Generally, MySQL is faster when dealing with read-only commands. If most of your projects are of such
kind and MySQL satisfies all your requirements, then there really isn’t necessary to learn PostgreSQL. You
might be better off with MySQL.

However, if your projects include (or you want them to include in the future!) working with big data sets
and complex queries that are both read and write-intensive, PostgreSQL performs better. Therefore, it’s a
better choice.

Conclusion
Learning any of these two SQL dialects is a wise decision. It could become even wiser by choosing which
suits you best.

As you saw here, there are a lot of similarities but also plenty of differences between PostgreSQL and
MySQL.

Some of the main differences are data types, case sensitivity, need for a conversion to UTF-8, functions
used for IF-THEN-ELSE statements, deleting and joining tables and data, and the range of the window
functions offered.

In general, MySQL is more suitable for users who want to learn fast, are mainly interested in read-only
intensive projects, and want to start those projects as soon as possible.

PostgreSQL is better for data scientists because they often work on big projects requiring complex (read
and write) queries, have to invest more time to learn how to handle such projects, and are or want to be
familiar with the standard SQL.

But it’s not always that simple. So you should carefully go through your current and possible future
needs, compare them to all the details we talked about here and only then make a choice. If you do that,
chances are the choice will be right.

You might also like