You are on page 1of 26

State of the UNION

J O I N I N G D ATA I N S Q L

Chester Ismay
Data Science Evangelist, DataRobot
Set Theory Venn Diagrams

doesnt double count records that are includes every record in both table and replicate
in both tables those that are in both tables

only common subset of data from


results in only those records in one table but not the other
both tables

JOINING DATA IN SQL


JOINING DATA IN SQL
JOINING DATA IN SQL
monarchs table
SELECT *
FROM monarchs;

+-----------+-------------+-------------------------+
| country | continent | monarch |
|-----------+-------------+-------------------------|
| Brunei | Asia | Hassanal Bolkiah |
| Oman | Asia | Qaboos bin Said al Said |
| Norway | Europe | Harald V |
| Spain | Europe | Felipe VI |
+-----------+-------------+-------------------------+

JOINING DATA IN SQL


All prime ministers and monarchs
SELECT prime_minister AS leader, country
FROM prime_ministers
UNION
SELECT monarch, country
filefs included in operation must be same type

FROM monarchs
ORDER BY country;

JOINING DATA IN SQL


Resulting table from UNION
+-------------------------+-----------+
| leader | country |
|-------------------------+-----------|
| Malcolm Turnbull | Australia |
| Hassanal Bolkiah | Brunei |
| Sherif Ismail | Egypt |
| Jack Guy Lafontant | Haiti |
| Narendra Modi | India |
| Harald V | Norway |
| Erna Solberg | Norway |
| Qaboos bin Said al Said | Oman |
| Antonio Costa | Portugal |
| Mariano Rajoy | Spain |
| Felipe VI | Spain |
| Nguyen Xuan Phuc | Vietnam |
+-------------------------+-----------+

JOINING DATA IN SQL


UNION ALL with leaders
SELECT prime_minister AS leader, country
FROM prime_ministers
UNION ALL
SELECT monarch, country
FROM monarchs
ORDER BY country
LIMIT 10;

+-------------------------+-----------+
| leader | country |
|-------------------------+-----------|
| Malcolm Turnbull | Australia |
| Hassanal Bolkiah | Brunei |
| Hassanal Bolkiah | Brunei |
| Sherif Ismail | Egypt |
| Jack Guy Lafontant | Haiti |
| Narendra Modi | India |
| Erna Solberg | Norway |
| Harald V | Norway |
| Qaboos bin Said al Said | Oman |
| Qaboos bin Said al Said | Oman |
+-------------------------+-----------+

JOINING DATA IN SQL


Let's practice!
J O I N I N G D ATA I N S Q L
INTERSECTional
data science
J O I N I N G D ATA I N S Q L

Chester Ismay
Data Science Evangelist, DataRobot
INTERSECT diagram and SQL code
SELECT id
FROM left_one
INTERSECT
SELECT id
FROM right_one;

JOINING DATA IN SQL


Prime minister and president countries
SELECT country
FROM prime_ministers
INTERSECT
SELECT country
FROM presidents;

+-----------+
| country |
|-----------|
| Portugal |
| Vietnam |
| Haiti |
| Egypt |
+-----------+

JOINING DATA IN SQL


INTERSECT on two elds
SELECT country, prime_minister AS leader
FROM prime_ministers
INTERSECT
SELECT country, president
FROM presidents;

+-----------+----------+
| country | leader |
|-----------+----------|
+-----------+----------+

no countries with prime minister and president having the same name

important: intersect looks for records in common, not individual key fields like what a join does to mathc

JOINING DATA IN SQL


which countries also have a city with the same name as their country name?

-- Select fields
SELECT name
-- From economies
FROM countries Singapore
-- Set theory clause
intersect
-- Select fields
SELECT name
-- From populations
FROM cities

Let's practice!
J O I N I N G D ATA I N S Q L
EXCEPTional
J O I N I N G D ATA I N S Q L

Chester Ismay
Data Science Evangelist, DataRobot
Monarchs that aren't prime ministers
SELECT monarch, country
FROM monarchs
EXCEPT
SELECT prime_minister, country
FROM prime_ministers;

+-----------+-----------+
| monarch | country |
|-----------+-----------|
| Harald V | Norway |
| Felipe VI | Spain |
+-----------+-----------+

JOINING DATA IN SQL


JOINING DATA IN SQL
Let's practice!
J O I N I N G D ATA I N S Q L
Semi-joins and Anti-
joins
J O I N I N G D ATA I N S Q L

Chester Ismay joins so far are all additive in that the add cols to the original left table

Data Science Evangelist, DataRobot semi joins and anti joins use a right table to determine which record to keep in the left table,
similar to a WHERE dependent on the values of the second table
Building up to a semi-join
SELECT name
FROM states
WHERE indep_year < 1800;

+----------+
| name |
|----------|
| Portugal |
| Spain |
+----------+

JOINING DATA IN SQL


Another step towards the semi-join
SELECT president, country, continent
FROM presidents;

+-------------------------+-----------+---------------+
| president | country | continent |
|-------------------------+-----------+---------------|
| Abdel Fattah el-Sisi | Egypt | Africa |
| Marcelo Rebelo de Sousa | Portugal | Europe |
| Jovenel Moise | Haiti | North America |
| Jose Mujica | Uruguay | South America |
| Ellen Johnson Sirleaf | Liberia | Africa |
| Michelle Bachelet | Chile | South America |
| Tran Dai Quang | Vietnam | Asia |
+-------------------------+-----------+---------------+

JOINING DATA IN SQL


Finish the semi-join (an intro to subqueries)
SELECT president, country, continent
semi-join: chooses records in the first table
FROM presidents where a condition is met in a second table

WHERE country IN
(SELECT name
FROM states
WHERE indep_year < 1800); identify languages spoken in the Middle East.

-- Select distinct fields


SELECT distinct name
-- From languages

+-------------------------+-----------+-------------+
FROM languages
-- Where in statement
WHERE code IN

| president | country | continent | -- Subquery


(SELECT code
FROM countries
|-------------------------+-----------+-------------| WHERE region='Middle East'
) ...
-- Order by name
| Marcelo Rebelo de Sousa | Portugal | Europe | ORDER BY name;

+-------------------------+-----------+-------------+

JOINING DATA IN SQL


An anti-join
SELECT president, country, continent
FROM presidents
WHERE ___ LIKE '___'
AND country ___ IN
(SELECT name
FROM states
WHERE indep_year < 1800);

SELECT president, country, continent


FROM presidents
WHERE continent LIKE '%America'
AND country NOT IN chooses records in the first table where a condition IS NOT met in the second table
(SELECT name
FROM states
WHERE indep_year < 1800);

JOINING DATA IN SQL


The result of the anti-join
+-------------------+-----------+---------------+
| president | country | continent |
|-------------------+-----------+---------------|
| Jovenel Moise | Haiti | North America |
| Jose Mujica | Uruguay | South America |
| Michelle Bachelet | Chile | South America |
+-------------------+-----------+---------------+

JOINING DATA IN SQL


Semi-join and anti-join diagrams
Identify the country codes that are included in either
economies or currencies but not in populations.

-- Select the city name


SELECT name
-- Alias the table where city name resides
FROM cities AS c1
-- Choose only records matching the result of multiple set theory clauses
WHERE country_code IN
(
-- Select appropriate field from economies AS e
SELECT e.code
FROM economies AS e
-- Get all additional (unique) values of the field from currencies AS c2
UNION
SELECT c2.code
FROM currencies AS c2
-- Exclude those appearing in populations AS p
EXCEPT
SELECT p.country_code
FROM populations AS p
);

JOINING DATA IN SQL


Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.

You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!

Begin by determining the number of countries in countries that are listed in Oceania

-- Select statement
select count(name)
-- From countries
from countries
-- Where continent is Oceania
where continent='Oceania';

Complete an inner join with countries AS c1 on the left and currencies AS c2


on the right to get the different currencies used in the countries of Oceania.

-- 5. Select fields (with aliases)


select c1.name,c1.code,c2.basic_unit as
currency
-- 1. From countries (alias as c1)
Let's practice!
from countries as c1
-- 2. Join with currencies (alias as c2) J O I N I N G D ATA I N S Q L
inner join currencies as c2
-- 3. Match on code
using(code)
-- 4. Where continent is Oceania
where continent='Oceania';

Note that not all countries in Oceania were listed in the resulting inner join with currencies. Use an anti-join to determine which countries were not included!

-- 3. Select fields
select code,name
-- 1. From countries (alias as c1)
from countries
-- 5. Where continent is Oceania
where continent='Oceania'
-- 1. And code not in
and code not in
-- 2. Subquery
(select code
from currencies);

You might also like