You are on page 1of 58

Joining Data In PostGre SQL

Talking to Databases
Part 1:
JOINS

Joining Data in SQL


JOINS
Relational database to flat table:

To avoid data redundancy and optimize storage space, a relational


database stores interrelated data in separate tables.

Joining tables is used to gather related data from distinct tables in one
place (a flat table).
JOINS
JOINS are used to retrieve data from multiple tables.

Types of JOINS:
• INNER JOIN
• OUTER JOIN:
• LEFT JOIN
• RIGHT JOIN
• FULL JOIN
• CROSS JOIN
JOINS
General syntax:

table1 join_type table2 [ join_condition ]

The join condition is specified in the ON or USING clause.


JOIN TYPES
id col_b
id col_a 1 w
1 a 2 x
2 b 4 y
3 c 5 z

id col_a col_b id col_a col_b id col_a col_b id col_a col_b


1 a w 1 a w 1 a w 1 a w
2 b x 2 b x 2 b x 2 b x
3 c 4 y 3 c
INNER JOIN
5 z 4 y
OUTER LEFT JOIN 5 z
OUTER RIGHT JOIN
FULL OUTER JOIN
INNER JOIN

Joining Data in SQL


INNER JOIN
An INNER JOIN is the most common type of join and is the default type
of JOIN.

The keyword INNER can be used optionally:

INNER JOIN = JOIN


INNER JOIN
Using the film database below for practice.

films people roles reviews


id id id id
title name film_id film_id
release_year birthdate person_id num_user
country deathdate role num_critic
duration imdb_score
language num_votes
certification facebook_likes
gross
budget
INNER JOIN
Q1. Query for title, release year from films table together with
imdb_score from reviews table.
INNER JOIN
A1. Query for title, release year from films table together with
imdb_score from reviews table.
INNER JOIN
Q2. Query for title, release year, country, and imdb_score of films which
were released in 2012 and had scores greater than 7.5.
INNER JOIN
A2. Query for title, release year, country, and imdb_score of films which
were released in 2012 and had scores greater than 7.5.
INNER JOIN
Q3. Query for the number of total films, max and min imdb_scores and
average imdb_scores of each country after 2010.
INNER JOIN
A3. Query for the number of total films, max and min imdb_scores and
average imdb_scores of each country after 2010.
USING
When the common columns in 2 tables use the same column name,
the ON clause can be replaced be USING(column):

ON table1.id_x = table2.id_x is equivalent to just writing USING(id_x)


INNER JOIN
Q4. Whose film got most Facebook likes? Query for person_id, name
and total Facebook likes for each person. Sort the results by total likes in
descending order.
INNER JOIN
A4. Whose film got most Facebook likes? Query for person_id, name
and total Facebook likes for each person. Sort the results by total likes in
descending order.
OUTER JOIN

Joining Data in SQL


JOIN TYPES
id col_b
id col_a 1 w
1 a 2 x
2 b 4 y
3 c 5 z

id col_a col_b id col_a col_b id col_a col_b id col_a col_b


1 a w 1 a w 1 a w 1 a w
2 b x 2 b x 2 b x 2 b x
3 c 4 y 3 c
INNER JOIN
5 z 4 y
OUTER LEFT JOIN 5 z
OUTER RIGHT JOIN
FULL OUTER JOIN
LEFT JOIN

LEFT JOIN

Syntax:
SELECT columns
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;
RIGHT JOIN

RIGHT JOIN

Syntax:
SELECT columns
FROM table1
RIGHT JOIN table2
ON table1.column = table2.column;
FULL JOIN

FULL JOIN

Syntax:
SELECT columns
FROM table1
FULL JOIN table2
ON table1.column = table2.column;
OUTER JOIN
Here is a more complicated country database:

countries languages cities economies populations


code lang_id name econ_id pop_id
name code country_code code country_code
continent name city_proper_po year year
p
region percent income_group fertility_rate
metroarea_pop
surface_area official gdp_percapita
urbanarea_pop
indep_year gross savings
local_name inflation_rate
gov_form total_investment
capital unemployment_rate
OUTER JOIN
Q1. Query for country code, name from countries table as country, and
name from languages table as language. Use countries as the left table
and languages as the right table.

Try all 4 joins and check the number of rows in the results.
OUTER JOIN
Q2. How many different languages are spoken in each country? Query
for country name from countries, and number of languages as num_lan.
Sort the resulting data by number of languages in descending order.
OUTER JOIN
A2. How many different languages are spoken in each country? Query
for country name from countries, and number of languages as num_lan.
Sort the resulting data by number of languages in descending order.
OUTER JOIN
Q3. How many different countries and distinct different languages exist
on each continent? Sort results by continent in alphabetical order.
OUTER JOIN
A3. How many different countries and distinct different languages exist
on each continent? Sort results by continent in alphabetical order.
CROSS JOIN

Joining Data in SQL


CROSS JOIN
1 1
2 2
3 3

table1 table2

CROSS JOIN
Syntax:
SELECT columns
FROM table1
CROSS JOIN table2;
CROSS JOIN
Q. Compare the results of using CROSS JOIN and INNER JOIN to query
for country name and language name pairs in the America continent.
CROSS JOIN
A. CROSS JOIN
CROSS JOIN
A. INNER JOIN
Part 2:
Set Clauses

Joining Data in SQL


Set Clauses
Set clauses are used to combine or reorganize the results from multiple
SELECT statements into a single result set
Syntax:
query1 as q1 set operator (ALL) query2 as q2.

Set Operators:
• UNION (ALL): q1∪q2
• INTERSECT (ALL): q1 ∩ q2
• EXCEPT (ALL): q1 – q2

ALL can be used after set operators to keep duplicates.


UNION vs JOIN
Id col_a Id col_b Id col_a col_b
1 A 1 X 1 A X
JOIN
2 B 2 Y 2 B Y
3 C 3 Z 3 C Z

Id col_a
Id col_a Id col_a
1 A
1 A 3 C
UNION 2 B
2 B 4 D
3 C
3 C 5 E
4 D
5 E
Set Clauses
Q1. How many different country codes are returned when using UNION
to combine the countries table and the languages table?

What about with INTERSECT?


Set Clauses
A1. FULL JOIN
Set Clauses
A1. INNER JOIN
Set Clauses
Q2. List the language codes from the languages table which are not
listed in the countries table.
Set Clauses
A2. List the language codes from the languages table which are not
listed in the countries table.
Set Clauses
Q3. Query for the economy codes which have non-null unemployment
rate records in both 2010 and 2015.
Set Clauses
A3. Query for the economy codes which have non-null unemployment
rate records in both 2010 and 2015.
Set Clauses
Q4. Which continents have populations greater than 1 billion? Filter out
the continents with total area greater than 25000000.
Set Clauses
A4. Which continents have populations greater than 1 billion? Filter out
the continents with total area greater than 25000000.
Part 3:
Subqueries

Joining Data in SQL


Subqueries
SQL statements can have multiple SELECT queries embedded within
them.
A query inside the main query is called a Subquery or an Inner Query.
The main query is called the Outer Query.

A subquery can reside in:


• WHERE or HAVING clause
• FROM clause
• SELECT clause.
In WHERE
Q1. In 2015, which countries had populations greater than the average
population in that year? Query for these countries’ country code and
size.
In WHERE
A1. In 2015, which countries had populations greater than the average
population in that year? Query for these countries’ country code and
size.
In WHERE
Q2. In 2015, what was the income group distribution of countries whose
fertility rate was lower than 2? Sort results by number of countries in
income group desc.
In WHERE
A2. In 2015, what was the income group distribution of countries whose
fertility rate was lower than 2? Sort results by number of countries in
income group desc.
In FROM
Q3. Query for country name, area per capita using
surface_area/population, and gdp_percapita.
In FROM
A3. Query for country name, area per capita using
surface_area/population, and gdp_percapita..
In FROM
Q4. Query for names of people along with the average imdb score of
any film they ever took part in. Sort by average score in descending
order.
In FROM
A4. Query for names of people along with the average imdb score of
any film they ever took part in. Sort by average score in descending
order.
In SELECT
Q5. Query for name of person, their birthday and number of films this
person was a part of. Try to use subquery in select instead of GROUP
BY.
In SELECT
A5. Query for name of person, their birthday and number of films this
person was a part of. Try to use subquery in select instead of GROUP
BY.

You might also like