Lab Assessment 2: Creating The Student Dataset

Lab Assessment 2
19BCE2698
SHAIL PATEL
STATISTICS FOR ENGINEERS
Creating the Student Dataset
df1<-data.frame(
StudentName = c(paste("Student", 1:60)),
MathMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
PhyMarks =
c(
),
ChemMarks =
c(
),
BioMarks =
c(
),
EnglishMarks =
c(
1
)
)
view(df1)
Exporting to a csv file
write.table(
df1,
file = "Student_dataset2.csv",
sep = ",",
append = TRUE,
row.names = FALSE
)
## Warning in write.table(df1, file = "Student_dataset2.csv", sep = ",", append =

## TRUE, : appending column names to file
After uploading the data to a cloud manager like GitHub, we will access the
data from there
df3<-read.csv("https://raw.githubusercontent.com/rohilsaraf97/datasets/main/Student_dataset2.csv",sep=",
head(df3)
## StudentName MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

## 1 Student 1 87 84 93 80 84
## 2 Student 2 93 80 82 100 94
## 3 Student 3 87 96 94 97 85
## 4 Student 4 91 87 89 100 96
## 5 Student 5 89 85 83 86 81
## 6 Student 6 96 82 86 89 90
Descriptive Statistics
Measure of central tendency
Mean:
colMeans(df3[2:6])
## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

## 74.70000 72.63333 71.20000 73.00000 75.28333
Rounding off the to the nearest integer
2
round(colMeans(df3[2:6]))

## 75 73 71 73 75
Therefore, one can deduce the mean marks for each subject from the result above
Median:
apply(df3[2:6], 2, median)

## 73 73 71 74 78
Rounding off the to the nearest integer
round(apply(df3[2:6], 2, median))

## 73 73 71 74 78
Therefore, one can deduce the median marks for each subject from the result above
Mode
getmode <- function(v) {

uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
apply(df3[2:6], 2, getmode)

## 73 84 71 86 83
Therefore, one can deduce the mode marks for each subject from the result above
Measures of Dispersion
Following are some of the measures of variability that R offers to differentiate between data sets:
• Variance
• Standard Deviation
• Range
• Mean Deviation
• Interquartile Range
3
Variance
Variance is a measure that shows how far is each value from a particular point, preferably mean value.
Mathematically, it is defined as the average of squared differences from the mean value.
apply(df3[2:6], 2, var)

## 156.7220 215.5243 165.5525 184.6441 173.7658
Therefore, one can deduce the variance of marks from the mean marks for each subject from the result above
Standard Deviation
Standard deviation in statistics measures the spreadness of data values with respect to mean and mathe-
matically, is calculated as square root of variance.
apply(df3[2:6], 2, sd)

## 12.51887 14.68075 12.86672 13.58838 13.18203
Range
Range is the difference between maximum and minimum value of a data set. In R language, max() and
min() is used to find the same, unlike range() function that returns the minimum and maximum value of
data set.
getrange <- function(v) {

max(v) - min(v)
}
apply(df3[2:6], 2, getrange)

## 50 58 56 53 55
Mean Deviation
Mean deviation is a measure calculated by taking an average of the arithmetic mean of the absolute difference
of each value from the central value. Central value can be mean, median, or mode.
getMeanAD <- function(x) {

md <- sum(abs(x - mean(x))) / length(x)
md
}
apply(df3[2:6], 2, getMeanAD)
4
About Mean

## 10.28333 11.74556 10.44667 11.63333 11.02167
getMedianAD <- function(x) {

md <- sum(abs(x - median(x))) / length(x)
md
}
apply(df3[2:6], 2, getMedianAD)
About Median

## 10.10000 11.73333 10.43333 11.56667 10.85000
getModeAD <- function(x) {

md <- sum(abs(x - getmode(x))) / length(x)
md
}
apply(df3[2:6], 2, getModeAD)
About Mode

## 10.10000 14.33333 10.43333 15.23333 11.65000
Inter Quartile Range
Interquartile Range is based on splitting a data set into parts called as quartiles. There are 3 quartile values
(Q1, Q2, Q3) that divide the whole data set into 4 equal parts. Q2 specifies the median of the whole data
set.
Mathematical equation for representing the Inter Quartile Range is,
IQR = Q3 − Q1
apply(df3[2:6], 2, IQR)

## 20.50 19.25 19.00 22.25 19.75
Beta coefficient of kurtosis
5
getBeta2<- function(v){
m4=sum((v-mean(v))ˆ4)/length(v)
m2=var(v)
beta2=m4/(m2ˆ2)
beta2
apply(df3[2:6],2, getBeta2)

## 2.109147 2.470402 2.452819 1.999763 2.268649
Fisher’s Gamma coefficient of kurtosis
getGama2<-function(v){
gama2=getBeta2(v)-3
gama2
}
apply(df3[2:6],2, getGama2)

## -0.8908528 -0.5295980 -0.5471811 -1.0002375 -0.7313506
Graphical Representation
df4<-data.frame(
Subjects=colnames(df3[2:6]),
meanMarks=apply(df3[2:6],2, mean)
)
ggplot(df4, aes(x=Subjects, y=meanMarks, group = 1))+geom_point()+geom_line(color="red")+ggtitle("Line C
6
Line Chart Representation for mean marks
75
Mean marks for each subject
74
73
72
71
BioMarks ChemMarks EnglishMarks MathMarks PhyMarks
Subjects
Link for data set Student Dataset

Lab Assessment 2: Creating The Student Dataset

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Assessment 2: Creating The Student Dataset

Uploaded by

Copyright:

Available Formats

Lab Assessment 2

Creating the Student Dataset

Exporting to a csv file

## Warning in write.table(df1, file = "Student_dataset2.csv", sep = ",", append =

## StudentName MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

Measure of central tendency

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

Rounding off the to the nearest integer

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

Rounding off the to the nearest integer

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

getmode <- function(v) {

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

getrange <- function(v) {

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

getMeanAD <- function(x) {

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

getMedianAD <- function(x) {

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

getModeAD <- function(x) {

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

Inter Quartile Range

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

Beta coefficient of kurtosis

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

Fisher’s Gamma coefficient of kurtosis

## MathMarks PhyMarks ChemMarks BioMarks EnglishMarks

ggplot(df4, aes(x=Subjects, y=meanMarks, group = 1))+geom_point()+geom_line(color="red")+ggtitle("Line C

Link for data set Student Dataset

You might also like