You are on page 1of 16

POLLUTION IN

SEOUL, KOREA

Lexi Hanna
Intro • After WWI, South Korea began to
industrialize
• During the 1980s and 1990s, the economy
grew 10% every year
• Government in the 1970s focused on
economic development
• New environmentally focused legislatures
• Still the second worst for air quality
amongst advanced nations
• Goal: Identify the most prevalent
pollutants in Seoul’s air
DATA
DESCRIPTION
General • Kaggle
• Collected between 2017
and 2019
• Provides multiple
pollutant level readings
from 25 different districts
in Seoul every hour
Variables Measurement Provides the date and time the
Date: measurement was taken.
(descriptive) Station Code: Identifies which of the 25
stations is being sampled.

Address: Identifies the location of the


station being sampled.

Latitude: In degrees, exact latitude of the


address.

Longitude: In degrees, exact longitude of


the address.
SO2: In ppm, the average value of sulfur dioxide over the hour. Blue on the

Variables
legend is 0.020, green on the legend is 0.050, yellow on the legend is
0.150 and red on the legend is 1.000. Data is given 3 decimal places.

(pollutants)
NO2: In ppm, the average value of nitrogen dioxide over the hour. Blue on the
legend is 0.030, green on the legend is 0.060, yellow on the legend is
0.200, and red on the legend is 2.000. Data is given 3 decimal places.

O3: In ppm, the average value of ozone over the hour. Blue on the legend is
2.000, green on the legend is 9.000, yellow on the legend is 15.000, and
red on the legend is 50.000. Data is given 1 decimal place.

CO: In ppm, the average value of carbon monoxide over the hour. Blue on
the legend is 0.030, green on the legend is 0.090, yellow on the legend is
0.150, and red on the legend is 0.500. Data is given 3 decimal places.

PM10: The average value of particulate matter less than 10 μm over the hour.
Blue on the legend is 30.000, green on the legend is 80.000, yellow on
the legend is 150.000, and red on the legend is 600.000. Data is given no
decimal places.
PM2.5: The average value of particulate matter less than 2.5 μm over the hour.
Blue on the legend is 15.000, green on the legend is 35.000, yellow on
the legend is 75.000, and red on the legend is 500.000. Data is given no
decimal places.
Observations • 647,511 rows
• Each row is a different
measurement
• Each measurement was
taken an hour apart
• 25 different locations
• From 2017-2019
FILTERS AND
SUBSETS
Nonsense • Needed positive
Variables
observations
• Accounted for less
than 4% of the data
• Thrown out
Missing • None were found!
Values
Subsetting Into Variables
• Done before outlier exclusion
• Then excluded outliers

Random Sampling
• Large dataset
• 10,000 observations chosen
• 6 separate datasets with 10,000 observations each
New Variable- Unit of Good Normal Bad Very bad
Color Pollutant measurement (Blue) (Green) (Yellow) (Red)

SO2 ppm 0.02 0.05 0.15 1

Data set provided a legend indicating NO2 ppm 0.03 0.06 0.2 2
severity of pollution. I assigned a
corresponding color to compare each
pollutant to each other CO ppm 2 9 15 50

O3 ppm 0.03 0.09 0.15 0.5

PM10 Mircrogram/m3 30 80 150 600

PM2.5 Mircrogram/m3 15 35 75 500


Main PM10
Pollutants • Second!
• Caused by combustion
• Difference is size, PM10 is larger
• Couple of yellow, lots of green

PM2.5
• First!
• PM2.5 is smaller
• Mostly yellow readings and a few green
Conclusions • Both types of particulate matter were the
worst
and Further • Created a new variable, took random

Studies samples, and filtered the data


• Frequency box plots

• Furthered by analyzing the pollutant levels in


accordance with temperature
• Could use another set of temperature data
from the same places and times
Click icon to add picture
Scatterplot
Showing
Relationship
ANY
QUESTIONS?

You might also like