You are on page 1of 18

C3-Assignment

PRAKHAR AGRAWAL

IDS2021907
Question
6. The following table gives the number of commercial airline
accidents and fatalities in the United States in the years from
1980 to 1995.
(a) Represent the number of yearly airline accidents in a frequency
table.
(b) Give a frequency polygon graph of the number of yearly airline
accidents.
(c) Give a cumulative relative frequency plot of the number of
yearly airline accidents.
(d) Find the sample mean of the number of yearly airline accidents.
(e) Find the sample median of the number of yearly airline
accidents. (f ) Find the sample mode of the number of yearly
airline accidents.
(f) Find the sample standard deviation of the number of yearly
airline accidents.
U.S. Airline Safety, Scheduled Commercial Carriers, 1980–1995

Figure 1: Source: National Transportation Safety Board.


Solution:
a.) Number of yearly airline accidents in a frequency table
Our samples can be displayed with the following table.
allom <- read.csv("Question6.csv")
allomhd <- allom[,c("YEAR", "Fatal_Accidents")]
accidents = allomhd$Fatal_Accidents
freqTable = transform(table(accidents))
freqTable
## accidents Freq
## 1 0 1
## 2 1 2
## 3 2 2
## 4 3 1
## 5 4 8
## 6 6 1
## 7 11 1
Here Left Column is the number of Accident in a year. And in
right column there is how many times did that exact no of
accident occur in last 16 year.
b.) plot a frequency polygon graph using R program
By using the information provided above Its trivial to plot a
frequency polygon graph using R program.
The following frequency polygon is obtained by simply plotting the
class on the X-axis and & frequencies on the Y-axis.
library("tidyverse")

## -- Attaching packages ----------------------------------


## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
qplot(x=allomhd$Fatal_Accidents,geom="freqpoly",
xlab = "Fatal Accidents",
ylab = "Frequency",
main = "frequency polygon graph of the number
of yearly airline accidents.",
col = I("darkblue"))

## `stat_bin()` using `bins = 30`. Pick better value with `


Figure 2: frequency polygon graph of the number of yearly airline
accidents.
c.) cumulative relative frequency plot of the number of
yearly airline accidents
freqTable <- transform(table(allomhd$Fatal_Accidents))
transform(freqTable,relative =prop.table(Freq),
cumFreq=cumsum(prop.table(Freq)))
## Var1 Freq relative cumFreq
## 1 0 1 0.0625 0.0625
## 2 1 2 0.1250 0.1875
## 3 2 2 0.1250 0.3125
## 4 3 1 0.0625 0.3750
## 5 4 8 0.5000 0.8750
## 6 6 1 0.0625 0.9375
## 7 11 1 0.0625 1.0000
Commulative relative frequency plot
freqTable <- transform(table(allomhd$Fatal_Accidents))
comtable <- transform(freqTable,relative=prop.table(Freq),
cumFreq = cumsum(prop.table(Freq)))
plot(comtable$Var1,comtable$cumFreq,
type="o",
xlab="No of deaths in a year",
ylab = "Commulative proportion",
main="Commulative relative frequency plot")
lines(comtable$Var1,comtable$cumFreq)
Figure 3: “Commulative relative frequency plot”
Sample Mean
To determine the sample mean of a data set that is presented in a
frequency table listing the k distinct values v1 , ..., vk having
corresponding frequencies f1 , ..., fk . Since such a data set consists
of n = ki=1 fi observations, with the value vi appearing fi times,
P

for each i = 1, . . . , k, it follows that the sample mean of these n


data values is:
k
X
x= vi fi /n
i=1

By writing the preceding as


f1 f2 fn
x= v1 + v2 + · · · + vn
n n n
we see that the sample mean is a weighted average of the distinct
values, where the weight given to the value vi is equal to the
proportion of the n data values that are equal to vi , i = 1, . . . , k.
So, Sample Mean would be:

x = (1 · 2 + 2 · 2 + 3 · 1 + 8 · 4 + 6 · 1 + 11 · 1)/16 ∼ 3.625

Sample Median
To find the median, order the values of a data set of size n from
smallest to largest. If n is odd, the sample median is the value in
position n+1
2 ; if n is even, it is the average of the values in positions
n n+1
2 and 2 .
So, Here value of n is even, Hence sample median is the value in
position 8.5th. from the CF, next observation next to 8 is 4.
Therefore median of data is 4.
Sample Mode
Sample mode, defined to be the value that occurs with the greatest
frequency. If no single value occurs most frequently, then all the
values that occur at the highest frequency are called modal values.
Here frequency of observation 4 is maximum that is 8. Hence z = 4.

Sample Standard Deviation


The quantity s(Sample standard deviation), defined by
v
u n
uX
s = t (xi − x )2 /(n − 1))
i=1

is called the sample standard deviation.


Therefore Sample standard deviation would be,
v
u n
uX
s = t (xi − 3.625)2 /(15))
i=1

s= 6.25
s = 2.5
R code to calculate mean, median and SD.
allom <- read.csv("Question6.csv")
allomhd <- allom[,c("YEAR", "Fatal_Accidents")]
freqTable = transform(table(allomhd$Fatal_Accidents))
s <- as.numeric(as.character(freqTable$Var1))
f <- as.numeric(as.character(freqTable$Freq))
d2 <- rep(s, f)
multi.fun <- function(x) {
c(mean = mean(x),
median = median(x),
var = var(x),
sd = sd(x))
}
multi.fun(d2)
## mean median var sd
## 3.625 4.000 6.250 2.500
R code to calculate Mode.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Create the vector with numbers.


v <- allom$Fatal_Accidents

# Calculate the mode using the user function.


result <- getmode(v)
print(result)
## [1] 4
Conclusion

Frequency table and frequency polygon tells us that number of


yearly accident is most of time is 4 as the the peak of frequency
polygon is lies at frequency 4. Analysis of mean median of data tell
us that there is not that much outlires present in data as value of
mean is nearly same as median. Similar mode value tell us that
value 4 is having maximum frequency.

You might also like