You are on page 1of 11

Open in app Get started

Published in Towards Data Science

You have 2 free member-only stories left this month. Sign up for Medium and get an extra one

Zoumana Keita Follow

Aug 2 · 5 min read · · Listen

Save

4 Types of SQL JOIN Every Data Scientist Should


Know: Visual Representation
Complete guide to creating tables, and running all four types of SQL
JOINS.
Photo by Caspar Camille Rubin on Unsplash
Open in app Get started

Motivation
Relational databases try to model real-life entities and their relationships. However, as a
data practitioner, you will be dealing with more than one table when interacting with
those databases. Efficient interaction with those tables requires a better understanding of
the joint statements because each one gives a different result.

This article will first build your understanding of each JOIN clause before walking you
through hands-on practice. Before that, we will be creating from scratch the relevant
tables.

Input Data
To better understand the process, we will need the following two tables: StudentTable

and TeachingAssistantTable. This section aims to create those tables and populate them

with some sample data.

Create Tables
Creating a table is straightforward, and both tables are created as follows in our database.
This section is not mandatory for understanding the rest of the article.

→ Student Table

Each row StudentTable will represent a single student’s information.

1 -- TABLE 1
2 CREATE TABLE `StudentTable` (
3 `Student` varchar(100),
4 `Gender` varchar(100),
5 `Age` int(8),
6 `Email` varchar(100),
7 PRIMARY KEY (`Email`)
8 );

create_student_table.SQL hosted with ❤ by GitHub view raw


Open in app Get started

SQL script to Create Student Table

→ Teaching Assistant Table

Each row of the TeachingAssistantTable will represent a single teacher’s information.

1 CREATE TABLE `TeachingAssistantTable` (


2 `Teacher` varchar(100),
3 `Email` varchar(100),
4 `Degree`varchar(100),
5 `Department` varchar(100),
6 PRIMARY KEY (`Email`)
7 );

create_teaching_assistant_table.SQL hosted with ❤ by GitHub view raw

SQL script to Create Teaching Assistant Table

Populate Tables
Now that our tables are created, we can finally populate them with new data using the
INSERT INTO [table name] VALUES statement.

→ Student Table

1 INSERT INTO `StudentTable` VALUES ('Sira' 'Female' 18 'sira@info com')


1 INSERT INTO StudentTable VALUES ( Sira , Female , 18, sira@info.com ),
2 ('Moussa', 'Male', 19,Open
'mouss@info.com'),
in app Get started
3 ('Mamadou', 'Male', 22, 'mam@info.com'),
4 ('Nabintou', 'Female', 21, 'nab@info.com');

insert_data_into_student_table.SQL hosted with ❤ by GitHub view raw

SQL script to upload data into Student Table

→ Teaching Assistant Table

1 INSERT INTO `TeachingAssistantTable` VALUES ('Ibrahim', 'ib@info.com', 'M.S in Data Scienc


2 ('Nabintou', 'nab@info.com', 'B.S in St
3 ('Mamadou', 'mam@info.com', 'B. Comp Sc
4 ('Fatim', 'fat@info.com', 'M.S Architec
5 ('Aziz', 'aziz@info.com', 'B.S in Accou

insert_data_into_teaching_assistant_table.SQL hosted with ❤ by GitHub view raw

Below are the final representations of the two tables.


Student (StudentTable) and TA (TeachingAssistantTable) tablesOpen
(Image
in by
appAuthor)Get started

Why Do We Join Tables


If you have been on the SQL journey, you might have already heard of Joins, but why do
we use them? The JOIN statement is an important clause in relational databases such as
Oracle, Microsoft SQL, MySQL, PostgreSQL, just to name a few. It is used to create a single
table that is the combination of two or more tables, based on some logical relationships
between those tables.

What are the different types of JOIN?


Now that you know what it is used for, let’s explore the [X] types of JOIN.

1. Inner Join
Let’s start with this simple example. We want to know which Students are also Teachers.
In other words, where are the matching rows in both Student and Teacher tables,
meaning their intersections. We can observe that both of our tables contain Ibrahim,
Mamadou, and Fatim.

Here is where INNER JOIN comes in handy. It only looks at the intersections between two
tables based on the column specified in the ON clause. For instance, the following
instruction gets all the intersected rows based on the Email column.

SELECT *
FROM StudentTable st
JOIN TeachingAssistantTable tat ON st.Email = tat.Email;

SELECT * means “get all the columns” from all the tables.

FROM StudentTable st creates st as an alias of the StudentTable, meaning that


whenever we want to use TeacherTable, can use st instead. The same applies to
TeachingAssistantTable tat

JOIN TeachingAssistantTable tat ON st.Email = tat.Email means only get the rows
having the same Email from both tables.

Thi i th hi l lt tf th i SQL d
This is the graphical result we get from the previous SQL command.
Open in app Get started

INNER JOIN Applied to StudentTable and TeachingAssistantTable (Image by Author)

The join does not only apply to key columns as we can see for Email but any column the
user decides to use in the ON clause. For instance, it could be: ON st.Student = tat.Teacher

which would generate the table where student name equals teacher’s name.

2. Left Outer Join / Left Join


This type of join deals with joining two types of table: the primary table and the
secondary table. The primary table is the one on the LEFT of the equality sign. It works
like this:

Identify the column specified in the ON clause.

Take all the rows from the primary table, without any distinction.

All the rows in the secondary table that do not match the primary table based on the
column in the ON clause are discarded (replaced with NULL).

SELECT *
FROM StudentTable st LEFT JOIN TeachingAssistantTable tat
ON st.Email = tat.Email;
Open in app Get started

This is the graphical result we get from the LEFT JOIN.

123

LEFT JOIN Applied to StudentTable and TeachingAssistantTable (Image by Author)

3. Right Outer Join / Right Join


This join is very similar to the previous one. The only difference is that it considers the
table on the right to be the primary table, and the one on the left to be the secondary
table.

SELECT *
FROM StudentTable st RIGHT JOIN TeachingAssistantTable tat
ON st.Email = tat.Email;

This is the graphical result we get from the RIGHT JOIN.


Open in app Get started

RIGHT JOIN Applied to StudentTable and TeachingAssistantTable (Image by Author)

4. Full outer join


A full outer join is the combination of the left and right joins. In this statement, there is no
room for discarding rows from any of the two tables, they are all taken into consideration.
It works as follows:

Perform a right outer join on the original tables and consider the result as temporary
table 1.

Run a left outer join on the original tables and consider the result as temporary table
2.

Concatenate temporary tables 1 and 2 as the final result.

SELECT *
FROM StudentTable st FULL OUTER JOIN TeachingAssistantTable tat
ON st.Email = tat.Email;
Open in app Get started

FULL OUTER JOIN Applied to StudentTable and TeachingAssistantTable (Image by Author)

From all the previous final results, we notice that we selected all the columns from all the
tables, which creates duplicate values for columns such as Email, Student, and Teacher.
However, we can specify in the SELECT clause the columns we want in the final result. For
instance, the following instruction returns all the columns in the Student Name, its Email,
Gender, Age, and Degree.

SELECT st.Student, st.Gender, st.Email, st.Age, tat.Degree


FROM StudentTable st INNER JOIN TeachingAssistantTable tat
ON st.Email = tat.Email;
Open in app Get started

INNER JOIN Applied to StudentTable and TeachingAssistantTable with column selection (Image by Author)

Conclusion
This article has covered the four main join cases in SQL. The versatility of SQL can provide
you with the skills to assist businesses in analyzing data and taking smart decisions.

If you like reading my stories and wish to support my writing, consider becoming a
Medium member to unlock unlimited access to stories on Medium.

Feel free to follow me on Medium, Twitter, or say Hi on LinkedIn. It is always a pleasure to


discuss AI, ML, Data Science, NLP, and MLOps stuff!
Sign up for The Variable Open in app Get started

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge
research to original features you don't want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.

Get this newsletter

About Help Terms Privacy

Get the Medium app

You might also like