You are on page 1of 7

Lab Assessment 2

19BCE2698
SHAIL PATEL
STATISTICS FOR ENGINEERS

Creating the Student Dataset

df1<-data.frame(
StudentName = c(paste("Student", 1:60)),
MathMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
PhyMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
ChemMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
BioMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
EnglishMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),

1
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
)
)
view(df1)

Exporting to a csv file

write.table(
df1,
file = "Student_dataset2.csv",
sep = ",",
append = TRUE,
row.names = FALSE
)

## Warning in write.table(df1, file = "Student_dataset2.csv", sep = ",", append =


## TRUE, : appending column names to file

After uploading the data to a cloud manager like GitHub, we will access the
data from there

df3<-read.csv("https://raw.githubusercontent.com/rohilsaraf97/datasets/main/Student_dataset2.csv",sep=",
head(df3)

## StudentName MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 1 Student 1 87 84 93 80 84
## 2 Student 2 93 80 82 100 94
## 3 Student 3 87 96 94 97 85
## 4 Student 4 91 87 89 100 96
## 5 Student 5 89 85 83 86 81
## 6 Student 6 96 82 86 89 90

Descriptive Statistics

Measure of central tendency

Mean:

colMeans(df3[2:6])

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 74.70000 72.63333 71.20000 73.00000 75.28333

Rounding off the to the nearest integer

2
round(colMeans(df3[2:6]))

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 75 73 71 73 75

Therefore, one can deduce the mean marks for each subject from the result above

Median:

apply(df3[2:6], 2, median)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 73 73 71 74 78

Rounding off the to the nearest integer

round(apply(df3[2:6], 2, median))

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 73 73 71 74 78

Therefore, one can deduce the median marks for each subject from the result above

Mode

getmode <- function(v) {


uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}

apply(df3[2:6], 2, getmode)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 73 84 71 86 83

Therefore, one can deduce the mode marks for each subject from the result above

Measures of Dispersion

Following are some of the measures of variability that R offers to differentiate between data sets:

• Variance
• Standard Deviation
• Range
• Mean Deviation
• Interquartile Range

3
Variance

Variance is a measure that shows how far is each value from a particular point, preferably mean value.
Mathematically, it is defined as the average of squared differences from the mean value.

apply(df3[2:6], 2, var)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 156.7220 215.5243 165.5525 184.6441 173.7658

Therefore, one can deduce the variance of marks from the mean marks for each subject from the result above

Standard Deviation

Standard deviation in statistics measures the spreadness of data values with respect to mean and mathe-
matically, is calculated as square root of variance.

apply(df3[2:6], 2, sd)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 12.51887 14.68075 12.86672 13.58838 13.18203

Range

Range is the difference between maximum and minimum value of a data set. In R language, max() and
min() is used to find the same, unlike range() function that returns the minimum and maximum value of
data set.

getrange <- function(v) {


max(v) - min(v)
}

apply(df3[2:6], 2, getrange)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 50 58 56 53 55

Mean Deviation

Mean deviation is a measure calculated by taking an average of the arithmetic mean of the absolute difference
of each value from the central value. Central value can be mean, median, or mode.

getMeanAD <- function(x) {


md <- sum(abs(x - mean(x))) / length(x)
md
}

apply(df3[2:6], 2, getMeanAD)

4
About Mean

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 10.28333 11.74556 10.44667 11.63333 11.02167

getMedianAD <- function(x) {


md <- sum(abs(x - median(x))) / length(x)
md
}

apply(df3[2:6], 2, getMedianAD)

About Median

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 10.10000 11.73333 10.43333 11.56667 10.85000

getModeAD <- function(x) {


md <- sum(abs(x - getmode(x))) / length(x)
md
}
apply(df3[2:6], 2, getModeAD)

About Mode

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 10.10000 14.33333 10.43333 15.23333 11.65000

Inter Quartile Range

Interquartile Range is based on splitting a data set into parts called as quartiles. There are 3 quartile values
(Q1, Q2, Q3) that divide the whole data set into 4 equal parts. Q2 specifies the median of the whole data
set.
Mathematical equation for representing the Inter Quartile Range is,
IQR = Q3 − Q1

apply(df3[2:6], 2, IQR)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 20.50 19.25 19.00 22.25 19.75

Beta coefficient of kurtosis

5
getBeta2<- function(v){
m4=sum((v-mean(v))ˆ4)/length(v)
m2=var(v)
beta2=m4/(m2ˆ2)
beta2

apply(df3[2:6],2, getBeta2)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## 2.109147 2.470402 2.452819 1.999763 2.268649

Fisher’s Gamma coefficient of kurtosis

getGama2<-function(v){
gama2=getBeta2(v)-3
gama2
}

apply(df3[2:6],2, getGama2)

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks


## -0.8908528 -0.5295980 -0.5471811 -1.0002375 -0.7313506

Graphical Representation

df4<-data.frame(
Subjects=colnames(df3[2:6]),
meanMarks=apply(df3[2:6],2, mean)
)

ggplot(df4, aes(x=Subjects, y=meanMarks, group = 1))+geom_point()+geom_line(color="red")+ggtitle("Line C

6
Line Chart Representation for mean marks

75
Mean marks for each subject

74

73

72

71
BioMarks ChemMarks EnglishMarks MathMarks PhyMarks
Subjects

Link for data set Student Dataset

You might also like