Professional Documents
Culture Documents
J O I N I N G D ATA I N S Q L
Chester Ismay
Data Science Evangelist, DataRobot
Set Theory Venn Diagrams
doesnt double count records that are includes every record in both table and replicate
in both tables those that are in both tables
+-----------+-------------+-------------------------+
| country | continent | monarch |
|-----------+-------------+-------------------------|
| Brunei | Asia | Hassanal Bolkiah |
| Oman | Asia | Qaboos bin Said al Said |
| Norway | Europe | Harald V |
| Spain | Europe | Felipe VI |
+-----------+-------------+-------------------------+
FROM monarchs
ORDER BY country;
+-------------------------+-----------+
| leader | country |
|-------------------------+-----------|
| Malcolm Turnbull | Australia |
| Hassanal Bolkiah | Brunei |
| Hassanal Bolkiah | Brunei |
| Sherif Ismail | Egypt |
| Jack Guy Lafontant | Haiti |
| Narendra Modi | India |
| Erna Solberg | Norway |
| Harald V | Norway |
| Qaboos bin Said al Said | Oman |
| Qaboos bin Said al Said | Oman |
+-------------------------+-----------+
Chester Ismay
Data Science Evangelist, DataRobot
INTERSECT diagram and SQL code
SELECT id
FROM left_one
INTERSECT
SELECT id
FROM right_one;
+-----------+
| country |
|-----------|
| Portugal |
| Vietnam |
| Haiti |
| Egypt |
+-----------+
+-----------+----------+
| country | leader |
|-----------+----------|
+-----------+----------+
no countries with prime minister and president having the same name
important: intersect looks for records in common, not individual key fields like what a join does to mathc
-- Select fields
SELECT name
-- From economies
FROM countries Singapore
-- Set theory clause
intersect
-- Select fields
SELECT name
-- From populations
FROM cities
Let's practice!
J O I N I N G D ATA I N S Q L
EXCEPTional
J O I N I N G D ATA I N S Q L
Chester Ismay
Data Science Evangelist, DataRobot
Monarchs that aren't prime ministers
SELECT monarch, country
FROM monarchs
EXCEPT
SELECT prime_minister, country
FROM prime_ministers;
+-----------+-----------+
| monarch | country |
|-----------+-----------|
| Harald V | Norway |
| Felipe VI | Spain |
+-----------+-----------+
Chester Ismay joins so far are all additive in that the add cols to the original left table
Data Science Evangelist, DataRobot semi joins and anti joins use a right table to determine which record to keep in the left table,
similar to a WHERE dependent on the values of the second table
Building up to a semi-join
SELECT name
FROM states
WHERE indep_year < 1800;
+----------+
| name |
|----------|
| Portugal |
| Spain |
+----------+
+-------------------------+-----------+---------------+
| president | country | continent |
|-------------------------+-----------+---------------|
| Abdel Fattah el-Sisi | Egypt | Africa |
| Marcelo Rebelo de Sousa | Portugal | Europe |
| Jovenel Moise | Haiti | North America |
| Jose Mujica | Uruguay | South America |
| Ellen Johnson Sirleaf | Liberia | Africa |
| Michelle Bachelet | Chile | South America |
| Tran Dai Quang | Vietnam | Asia |
+-------------------------+-----------+---------------+
WHERE country IN
(SELECT name
FROM states
WHERE indep_year < 1800); identify languages spoken in the Middle East.
+-------------------------+-----------+-------------+
FROM languages
-- Where in statement
WHERE code IN
+-------------------------+-----------+-------------+
You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!
Begin by determining the number of countries in countries that are listed in Oceania
-- Select statement
select count(name)
-- From countries
from countries
-- Where continent is Oceania
where continent='Oceania';
Note that not all countries in Oceania were listed in the resulting inner join with currencies. Use an anti-join to determine which countries were not included!
-- 3. Select fields
select code,name
-- 1. From countries (alias as c1)
from countries
-- 5. Where continent is Oceania
where continent='Oceania'
-- 1. And code not in
and code not in
-- 2. Subquery
(select code
from currencies);