C3-Assignment: Prakhar Agrawal

C3-Assignment
PRAKHAR AGRAWAL
IDS2021907
Question
6. The following table gives the number of commercial airline
accidents and fatalities in the United States in the years from
1980 to 1995.
(a) Represent the number of yearly airline accidents in a frequency
table.
(b) Give a frequency polygon graph of the number of yearly airline
accidents.
(c) Give a cumulative relative frequency plot of the number of
yearly airline accidents.
(d) Find the sample mean of the number of yearly airline accidents.
(e) Find the sample median of the number of yearly airline
accidents. (f ) Find the sample mode of the number of yearly
airline accidents.
(f) Find the sample standard deviation of the number of yearly
airline accidents.
U.S. Airline Safety, Scheduled Commercial Carriers, 1980–1995
Figure 1: Source: National Transportation Safety Board.

Solution:
a.) Number of yearly airline accidents in a frequency table
Our samples can be displayed with the following table.
allom <- read.csv("Question6.csv")
allomhd <- allom[,c("YEAR", "Fatal_Accidents")]
accidents = allomhd$Fatal_Accidents
freqTable = transform(table(accidents))
freqTable
## accidents Freq
## 1 0 1
## 2 1 2
## 3 2 2
## 4 3 1
## 5 4 8
## 6 6 1
## 7 11 1
Here Left Column is the number of Accident in a year. And in
right column there is how many times did that exact no of
accident occur in last 16 year.
b.) plot a frequency polygon graph using R program
By using the information provided above Its trivial to plot a
frequency polygon graph using R program.
The following frequency polygon is obtained by simply plotting the
class on the X-axis and & frequencies on the Y-axis.
library("tidyverse")
## -- Attaching packages ----------------------------------

## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
qplot(x=allomhd$Fatal_Accidents,geom="freqpoly",
xlab = "Fatal Accidents",
ylab = "Frequency",
main = "frequency polygon graph of the number
of yearly airline accidents.",
col = I("darkblue"))
## `stat_bin()` using `bins = 30`. Pick better value with `

Figure 2: frequency polygon graph of the number of yearly airline
accidents.
c.) cumulative relative frequency plot of the number of
yearly airline accidents
freqTable <- transform(table(allomhd$Fatal_Accidents))
transform(freqTable,relative =prop.table(Freq),
cumFreq=cumsum(prop.table(Freq)))
## Var1 Freq relative cumFreq
## 1 0 1 0.0625 0.0625
## 2 1 2 0.1250 0.1875
## 3 2 2 0.1250 0.3125
## 4 3 1 0.0625 0.3750
## 5 4 8 0.5000 0.8750
## 6 6 1 0.0625 0.9375
## 7 11 1 0.0625 1.0000
Commulative relative frequency plot
freqTable <- transform(table(allomhd$Fatal_Accidents))
comtable <- transform(freqTable,relative=prop.table(Freq),
cumFreq = cumsum(prop.table(Freq)))
plot(comtable$Var1,comtable$cumFreq,
type="o",
xlab="No of deaths in a year",
ylab = "Commulative proportion",
main="Commulative relative frequency plot")
lines(comtable$Var1,comtable$cumFreq)
Figure 3: “Commulative relative frequency plot”
Sample Mean
To determine the sample mean of a data set that is presented in a
frequency table listing the k distinct values v1 , ..., vk having
corresponding frequencies f1 , ..., fk . Since such a data set consists
of n = ki=1 fi observations, with the value vi appearing fi times,
P
for each i = 1, . . . , k, it follows that the sample mean of these n

data values is:
k
X
x= vi fi /n
i=1
By writing the preceding as

f1 f2 fn
x= v1 + v2 + · · · + vn
n n n
we see that the sample mean is a weighted average of the distinct
values, where the weight given to the value vi is equal to the
proportion of the n data values that are equal to vi , i = 1, . . . , k.
So, Sample Mean would be:
x = (1 · 2 + 2 · 2 + 3 · 1 + 8 · 4 + 6 · 1 + 11 · 1)/16 ∼ 3.625
Sample Median
To find the median, order the values of a data set of size n from
smallest to largest. If n is odd, the sample median is the value in
position n+1
2 ; if n is even, it is the average of the values in positions
n n+1
2 and 2 .
So, Here value of n is even, Hence sample median is the value in
position 8.5th. from the CF, next observation next to 8 is 4.
Therefore median of data is 4.
Sample Mode
Sample mode, defined to be the value that occurs with the greatest
frequency. If no single value occurs most frequently, then all the
values that occur at the highest frequency are called modal values.
Here frequency of observation 4 is maximum that is 8. Hence z = 4.
Sample Standard Deviation

The quantity s(Sample standard deviation), defined by
v
u n
uX
s = t (xi − x )2 /(n − 1))
i=1
is called the sample standard deviation.

Therefore Sample standard deviation would be,
v
u n
uX
s = t (xi − 3.625)2 /(15))
i=1
√
s= 6.25
s = 2.5
R code to calculate mean, median and SD.
allom <- read.csv("Question6.csv")
allomhd <- allom[,c("YEAR", "Fatal_Accidents")]
freqTable = transform(table(allomhd$Fatal_Accidents))
s <- as.numeric(as.character(freqTable$Var1))
f <- as.numeric(as.character(freqTable$Freq))
d2 <- rep(s, f)
multi.fun <- function(x) {
c(mean = mean(x),
median = median(x),
var = var(x),
sd = sd(x))
}
multi.fun(d2)
## mean median var sd
## 3.625 4.000 6.250 2.500
R code to calculate Mode.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Create the vector with numbers.

v <- allom$Fatal_Accidents
# Calculate the mode using the user function.

result <- getmode(v)
print(result)
## [1] 4
Conclusion
Frequency table and frequency polygon tells us that number of

yearly accident is most of time is 4 as the the peak of frequency
polygon is lies at frequency 4. Analysis of mean median of data tell
us that there is not that much outlires present in data as value of
mean is nearly same as median. Similar mode value tell us that
value 4 is having maximum frequency.

C3-Assignment: Prakhar Agrawal

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

C3-Assignment: Prakhar Agrawal

Uploaded by

Copyright:

Available Formats

C3-Assignment

Figure 1: Source: National Transportation Safety Board.

## -- Attaching packages ----------------------------------

## `stat_bin()` using `bins = 30`. Pick better value with `

for each i = 1, . . . , k, it follows that the sample mean of these n

By writing the preceding as

Sample Standard Deviation

is called the sample standard deviation.

# Create the vector with numbers.

# Calculate the mode using the user function.

Frequency table and frequency polygon tells us that number of

You might also like