Professional Documents
Culture Documents
4 Types of SQL JOIN Every Data Scientist Should Know
4 Types of SQL JOIN Every Data Scientist Should Know
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Save
Motivation
Relational databases try to model real-life entities and their relationships. However, as a
data practitioner, you will be dealing with more than one table when interacting with
those databases. Efficient interaction with those tables requires a better understanding of
the joint statements because each one gives a different result.
This article will first build your understanding of each JOIN clause before walking you
through hands-on practice. Before that, we will be creating from scratch the relevant
tables.
Input Data
To better understand the process, we will need the following two tables: StudentTable
and TeachingAssistantTable. This section aims to create those tables and populate them
Create Tables
Creating a table is straightforward, and both tables are created as follows in our database.
This section is not mandatory for understanding the rest of the article.
→ Student Table
1 -- TABLE 1
2 CREATE TABLE `StudentTable` (
3 `Student` varchar(100),
4 `Gender` varchar(100),
5 `Age` int(8),
6 `Email` varchar(100),
7 PRIMARY KEY (`Email`)
8 );
Populate Tables
Now that our tables are created, we can finally populate them with new data using the
INSERT INTO [table name] VALUES statement.
→ Student Table
1. Inner Join
Let’s start with this simple example. We want to know which Students are also Teachers.
In other words, where are the matching rows in both Student and Teacher tables,
meaning their intersections. We can observe that both of our tables contain Ibrahim,
Mamadou, and Fatim.
Here is where INNER JOIN comes in handy. It only looks at the intersections between two
tables based on the column specified in the ON clause. For instance, the following
instruction gets all the intersected rows based on the Email column.
SELECT *
FROM StudentTable st
JOIN TeachingAssistantTable tat ON st.Email = tat.Email;
SELECT * means “get all the columns” from all the tables.
JOIN TeachingAssistantTable tat ON st.Email = tat.Email means only get the rows
having the same Email from both tables.
Thi i th hi l lt tf th i SQL d
This is the graphical result we get from the previous SQL command.
Open in app Get started
The join does not only apply to key columns as we can see for Email but any column the
user decides to use in the ON clause. For instance, it could be: ON st.Student = tat.Teacher
which would generate the table where student name equals teacher’s name.
Take all the rows from the primary table, without any distinction.
All the rows in the secondary table that do not match the primary table based on the
column in the ON clause are discarded (replaced with NULL).
SELECT *
FROM StudentTable st LEFT JOIN TeachingAssistantTable tat
ON st.Email = tat.Email;
Open in app Get started
123
SELECT *
FROM StudentTable st RIGHT JOIN TeachingAssistantTable tat
ON st.Email = tat.Email;
Perform a right outer join on the original tables and consider the result as temporary
table 1.
Run a left outer join on the original tables and consider the result as temporary table
2.
SELECT *
FROM StudentTable st FULL OUTER JOIN TeachingAssistantTable tat
ON st.Email = tat.Email;
Open in app Get started
From all the previous final results, we notice that we selected all the columns from all the
tables, which creates duplicate values for columns such as Email, Student, and Teacher.
However, we can specify in the SELECT clause the columns we want in the final result. For
instance, the following instruction returns all the columns in the Student Name, its Email,
Gender, Age, and Degree.
INNER JOIN Applied to StudentTable and TeachingAssistantTable with column selection (Image by Author)
Conclusion
This article has covered the four main join cases in SQL. The versatility of SQL can provide
you with the skills to assist businesses in analyzing data and taking smart decisions.
If you like reading my stories and wish to support my writing, consider becoming a
Medium member to unlock unlimited access to stories on Medium.
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge
research to original features you don't want to miss. Take a look.
By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.