You are on page 1of 7

Reproducible Research Course Project 2

Data Analysis: Effects of Severe Weather Events


Synopsis
This data analysis uses data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm
database, collected from 1950 to November 2011.
The goal of the data analysis is to investigate the effects of severe weather events on population health and
economy.
We found that the weather events that cause the most harm to population health are tornados, as they
caused the most fatalities, deaths and overall casualties. The weather events that caused the most severe
agricultural damages are drought. The weather events that caused the most property damages and combined
monetary damages are floods.

Data Processing
Load Data
From the course website, we downloaded the dataset provided by U.S. National Oceanic and Atmospheric
Administration’s (NOAA) storm database, and loaded the raw data into R.
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file_url, destfile="StormData.csv.bz2")
stormraw <- read.csv("StormData.csv.bz2")

After reading the raw dataset, we check the dataset dimension and column headers.
dim(stormraw)

## [1] 902297 37
names(stormraw)

## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"


## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"

Analysis for Population Health


To analyze the impact of weather events on population health, we create a new variable casualties which is
the sum of fatalities and injury.
##subset for Fatalities and Injuries
storm_population <- stormraw[, c("EVTYPE", "FATALITIES", "INJURIES")]
##calculate Casualties= FATALITEIS+INJURIES

1
storm_population$Casualties <- storm_population$FATALITIES + storm_population$INJURIES
##confirm that there is no missing values in this subset
sum(is.na(storm_population))

## [1] 0
We then grouped the data by event types and evaluated the total deaths, injuries and casualties for each type
of events.
##group data by event types
library(dplyr)

##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
storm_casualties_bytype <- storm_population %>%
group_by(EVTYPE) %>%
summarise(occurence=n(),
fatalities_total=sum(FATALITIES),
injuries_total=sum(INJURIES),
casualties_total=sum(Casualties),
)
##acquire top 5 event types that causes the highest deaths, injuries and casualties
top5fatalities_total <- storm_casualties_bytype %>%
slice_max(order_by = fatalities_total, n=5)
top5injuries_total <- storm_casualties_bytype %>%
slice_max(order_by = injuries_total, n=5)
top5casualties_total <- storm_casualties_bytype %>%
slice_max(order_by = casualties_total, n=5)

print(top5fatalities_total)

## # A tibble: 5 x 5
## EVTYPE occurence fatalities_total injuries_total casualties_total
## <chr> <int> <dbl> <dbl> <dbl>
## 1 TORNADO 60652 5633 91346 96979
## 2 EXCESSIVE HEAT 1678 1903 6525 8428
## 3 FLASH FLOOD 54277 978 1777 2755
## 4 HEAT 767 937 2100 3037
## 5 LIGHTNING 15754 816 5230 6046
print(top5injuries_total)

## # A tibble: 5 x 5
## EVTYPE occurence fatalities_total injuries_total casualties_total
## <chr> <int> <dbl> <dbl> <dbl>
## 1 TORNADO 60652 5633 91346 96979
## 2 TSTM WIND 219940 504 6957 7461
## 3 FLOOD 25326 470 6789 7259

2
## 4 EXCESSIVE HEAT 1678 1903 6525 8428
## 5 LIGHTNING 15754 816 5230 6046
print(top5casualties_total)

## # A tibble: 5 x 5
## EVTYPE occurence fatalities_total injuries_total casualties_total
## <chr> <int> <dbl> <dbl> <dbl>
## 1 TORNADO 60652 5633 91346 96979
## 2 EXCESSIVE HEAT 1678 1903 6525 8428
## 3 TSTM WIND 219940 504 6957 7461
## 4 FLOOD 25326 470 6789 7259
## 5 LIGHTNING 15754 816 5230 6046
##The event types that are in any of the above three top 5 rankings are:
TopEvents_population <- union(
union(top5injuries_total$EVTYP, top5fatalities_total$EVTYPE),
top5casualties_total$EVTYPE)

## Warning: Unknown or uninitialised column: `EVTYP`.


print(TopEvents_population)

## [1] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT"


## [5] "LIGHTNING" "TSTM WIND" "FLOOD"
We see that the top 5 rankings of events that caused the most deaths, injuries and overall casualties mostly
overlap, and we focus on the 7 event types and appear in these rankings.
##Prepare the dataset for plotting
##subset for the above 7 top harmful events
top_population_harm <- subset(storm_casualties_bytype, EVTYPE %in% TopEvents_population)
##reshape to long format
library(tidyr)
top_population_long <- top_population_harm %>%
pivot_longer(cols=c("fatalities_total", "injuries_total", "casualties_total"),
names_to = "harm",
values_to = "total_number")

Analysis for Economic Consequences


To analyze the economic consequences of severe weather events, we look at the property damages, crop
damages and total moneytary damages (sum of property and crop damages).
##subset for event types, property damages and crop damages.
storm_econ <- stormraw[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

In this dataset, each property damages are represented as a numeric value (“PROPDMG” and “CROPDMG)
and a multiplier (”PROPDMGEXP” and “CROPDMGEXP”). We calculated the actual dollar amount is
calculated by multiplying the numeric values by their multiplier. The total monetary damages are the sum of
the property and crop damages after multiplier conversions.
##Create a function that read and convert the multiplier columns
unique(storm_econ$PROPDMGEXP)

## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm_econ$CROPDMGEXP)

## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"

3
multiplier_conversion <- function(multiplier) {
case_when(
multiplier == "K" ~ 1e3,
multiplier == "k" ~ 1e3,
multiplier == "M" ~ 1e6,
multiplier == "m" ~ 1e3,
multiplier == "B" ~ 1e9,
multiplier == "b" ~ 1e3,
multiplier == "H" ~ 1e2,
multiplier == "h" ~ 1e2,
multiplier == "1" ~ 10,
multiplier == "2" ~ 1e2,
multiplier == "3" ~ 1e3,
multiplier == "4" ~ 1e4,
multiplier == "5" ~ 1e5,
multiplier == "6" ~ 1e6,
multiplier == "7" ~ 1e7,
multiplier == "8" ~ 1e8,
TRUE ~ 1 # Default if multiplier is not one of the above
)
}
##Multiply the numeric values by their corresponding multipliers
storm_econ$property_dollar <- storm_econ$PROPDMG *
multiplier_conversion(storm_econ$PROPDMGEXP)
storm_econ$crop_dollar <- storm_econ$CROPDMG *
multiplier_conversion(storm_econ$CROPDMGEXP)
##create add up the property and crop damages
storm_econ$total_dollar <- storm_econ$property_dollar+storm_econ$crop_dollar
##confirm that there is no missing data
sum(is.na(storm_econ))

## [1] 0
We then grouped the data by event types and evaluated the total dollar amount of property damages, crop
damages and combined monetary damages.
##group data by event types
storm_econ_bytype <- storm_econ %>%
group_by(EVTYPE) %>%
summarise(occurence = n(),
property_total = sum(property_dollar),
crop_total = sum(crop_dollar),
money_total = sum(total_dollar)
)
##accquire top 5 events types that caused the highest property, crop and combined damages.
top5prop_total <- storm_econ_bytype %>%
slice_max(order_by = property_total, n=5)
top5crop_total <- storm_econ_bytype %>%
slice_max(order_by = crop_total, n=5)
top5money <- storm_econ_bytype %>%
slice_max(order_by = money_total, n=5)
print(top5prop_total)

## # A tibble: 5 x 5
## EVTYPE occurence property_total crop_total money_total

4
## <chr> <int> <dbl> <dbl> <dbl>
## 1 FLOOD 25326 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHOON 88 69305840000 2607872800 71913712800
## 3 TORNADO 60652 56935892176. 414953270 57350845446.
## 4 STORM SURGE 261 43323536000 5000 43323541000
## 5 FLASH FLOOD 54277 16822673978. 1421317100 18243991078.
print(top5crop_total)

## # A tibble: 5 x 5
## EVTYPE occurence property_total crop_total money_total
## <chr> <int> <dbl> <dbl> <dbl>
## 1 DROUGHT 2488 1046106000 13972566000 15018672000
## 2 FLOOD 25326 144657709807 5661968450 150319678257
## 3 RIVER FLOOD 173 5118945500 5029459000 10148404500
## 4 ICE STORM 2006 3944927860 5022113500 8967041360
## 5 HAIL 288661 15730372413. 3025954473 18756326886.
print(top5money)

## # A tibble: 5 x 5
## EVTYPE occurence property_total crop_total money_total
## <chr> <int> <dbl> <dbl> <dbl>
## 1 FLOOD 25326 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHOON 88 69305840000 2607872800 71913712800
## 3 TORNADO 60652 56935892176. 414953270 57350845446.
## 4 STORM SURGE 261 43323536000 5000 43323541000
## 5 HAIL 288661 15730372413. 3025954473 18756326886.
##The event types that are in any of the above three top 5 rankings are:
TopEvents_econ <- union(union(
top5prop_total$EVTYPE, top5crop_total$EVTYPE),
top5money$EVTYPE)
print(TopEvents_econ)

## [1] "FLOOD" "HURRICANE/TYPHOON" "TORNADO"


## [4] "STORM SURGE" "FLASH FLOOD" "DROUGHT"
## [7] "RIVER FLOOD" "ICE STORM" "HAIL"
We see that the top 5 rankings by the three criteria have some overlaps and there are nine types of weather
events that appear on these rankings.
##prepare the dataset for plotting
##subset for the nine focus event types
top_econ_damage <- subset(storm_econ_bytype, EVTYPE %in% TopEvents_econ)
##convert the dollar amount to million
top_econ_damage$property_total_M <- top_econ_damage$property_total/1000000
top_econ_damage$crop_total_M <- top_econ_damage$crop_total/1000000
top_econ_damage$money_total_M <- top_econ_damage$money_total/1000000
##reshape dataset to long format
top_econ_long <- top_econ_damage %>%
pivot_longer(cols=c("property_total_M", "crop_total_M", "money_total_M"),
names_to = "damage",
values_to = "million")

5
Results
Most harmful event types with respect to population health
To show the harms to population health by severe weather events, we can make a clustered bar plots of the
top event types that caused the most death, injuries or combined casualties.
##arrange order of clustered bars to display in order of
##1.fatalities, 2.injuries, 3.combined casualties
top_population_long$harm <- factor(
top_population_long$harm,
levels = c("fatalities_total", "injuries_total", "casualties_total"))
##create clustered barplot
library(ggplot2)
g1 <- ggplot(top_population_long, aes(x=EVTYPE, y=total_number, fill=harm))
g1+geom_bar(stat="identity", position="dodge") +
labs(x="Events", y="Total Number", fill="Color Legend",
title = "Top Harmful Weather Events for Population Health") +
scale_fill_discrete(labels = c("Fatalities", "Injuries", "Combined Casualties")) +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.key.size = unit(0.3,"cm"))
Top Harmful Weather Events for Population Health
100000

75000
Total Number

Color Legend
50000 Fatalities
Injuries
Combined Casualties

25000

0
D

O
T

D
EA
EA

O
O

IN
IN

AD
O
O

W
TN
H
H

N
FL
FL
VE

TM
H

TO
H

G
SI

TS
AS

LI
ES

FL
C
EX

Events

From this barplot, it is apparent that Tornados caused overwhelming the most deaths and injuries among all
severe weather events, and therefore is the most harmful to population health.

Most damaging events with respect to economic consequences


To show the economic damages by severe weather events, we can make a clustered bar plots of the top event
types that caused the most properties, crops and combined damages. The damage amount is shown in million
dollars.
##arrange order of clustered bars to display in order of
##1.property damages, 2.crop damages, 3.combined monetary damages
top_econ_long$damage <- factor(

6
top_econ_long$damage,
levels = c("property_total_M", "crop_total_M", "money_total_M"))
##create clustered barplot
g2 <- ggplot(top_econ_long, aes(x=EVTYPE, y=million, fill=damage))
g2 + geom_bar(stat="identity", position="dodge") +
labs(x="Events", y="Total Damage (million dollars)", fill="Damages",
title="Top Damaging Weather Events for Economic Loss ") +
scale_fill_discrete(labels = c("Properties", "Crops","Combined Loss" )) +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.key.size = unit(0.3,"cm"))
Top Damaging Weather Events for Economic Loss
150000
Total Damage (million dollars)

100000
Damages
Properties
Crops
Combined Loss
50000

0
T

L
D

O
N

D
AI
H

G
O
O

AD
R
G

R
O

H
O

O
O
U

SU

N
FL
FL

PH

FL
ST
RO

R
TO
TY
H

ER

M
E
D

AS

IC

R
E/

IV

O
FL

AN

ST
IC
R
R
U
H

Events

From this barplot, it is apparent that floods caused caused the heaviest economic impact among all severe
weather events, and this was mostly driven by the enormous property damages. Droughts, while caused less
damages to properties, are the most severe events for agricultural losses.

You might also like