STAT 2100 Exercise 9 Object and Dataframe Codes

Lhuriely Bautista
BSAC 2-1
STAT 2100
Exercise 9
OBJECT AND DATAFRAME CODES
student <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30)
sex <- c("female", "female", "female", "male", "male", "female", "female", "male", "male",
"female", "male", "male", "female", "male", "female", "female", "male", "female", "male",
"female", "male", "female", "male", "female", "male", "male", "male", "female", "male", "female")
parental.level.education <- c("bachelor's degree", "some college", "master's degree", "associate's
degree", "some college", "associate's degree", "some college", "some college", "high school",
"high school", "associate's degree", "associate's degree", "high school", "some college", "master's
degree", "some high school", "high school", "some high school", "master's degree", "associate's
degree", "high school", "some college", "some college", "some high school", "bachelor's degree",
"master's degree", "some college", "bachelor's degree", "high school", "master's degree")
lunch <- c("standard", "standard", "standard","free/reduced", "standard", "standard", "standard",
"free/reduced", "free/reduced", "free/reduced", "standard", "standard", "standard","standard",
"standard", "standard", "standard", "free/reduced", "free/reduced", "free/reduced", "standard",
"free/reduced", "standard", "standard", "free/reduced", "free/reduced", "standard", "standard",
"standard", "standard")
test.preparation.course <- c("none", "completed", "none", "none", "none", "none", "completed",
"none", "completed", "none", "none", "none", "none", "completed", "none", "none", "none",
"none", "completed", "none", "none", "completed", "none", "none", "completed", "none", "none",
"none", "none", "none")
math.score <- c(72, 69, 90, 47, 76, 71, 88, 40, 64, 38, 58, 40, 65, 78, 50, 69, 88, 18, 46, 54, 66, 65,
44, 69, 74, 73, 69, 67, 70, 62)
reading.score <- c(72, 90, 95, 57, 78, 83, 95, 43, 64, 60, 54, 52, 81, 72, 53, 75, 89, 32, 42, 58, 69,
75, 54, 73, 71, 74, 54, 69, 70, 70)
writing.score <- c(74, 88, 93, 44, 75, 78, 92, 39, 67, 50, 52, 43, 73, 70, 58, 78, 86, 28, 46, 61, 63,
70, 53, 73, 80, 72, 55, 75, 65, 75)
student <- data.frame(sex, parental.level.education, lunch, test.preparation.course, math.score,
reading.score, writing.score)
sex.table <- table(student$sex)
parental.level.education <- table(student$parental.level.education)
lunch <- table(student$lunch)
test.preparation.course <- table(student$test.preparation.course)
math.score <- table(student$math.score)
reading.score <- table(student$reading.score)
writing.score <- table(student$writing.score)
1. Create pie chart for sex
The graph above shows the sex distribution of the respondents. As we can see, there are an
equal number of male and female respondents, with a composition of 50% males and 50% females,
for a total percentage of 100% each.
Code:
colors.pie <- c("#FB6F92", "#FFE5EC")
pie(sex.table, col=colors.pie, main="Sex")
2. Create Bar Graph for parental level of education.
The bar graph above depicts the distribution of respondents' parental level of education
and, as can be seen, the majority of the respondents' responses were "some college level," with 8
respondents and the minority were "bachelor's degree and some high school level," with 3
respondents. While the other respondents' responses are associates and master's degrees with 5
respondents and high school level with 6 respondents.
Code:
colors.bar <- c("#DB6551", "#E6896B", "#E5A186", "#F3C3B0", "#F3CEB0", "#F9DCC5")
barplot(parental.level.education, col=colors.bar, main="Parental Level of Education")
3. Create Bar Chart for Lunch.
The bar graph above shows the lunch distribution of the respondents. As can be seen, most of the
respondents have a standard lunch with a number of 20. On the other hand, there are 10 respondents
who respond that they only have a free or reduced lunch.
Code:
colors.bar <- c("#FB6F92", "#FFE5EC")
barplot(lunch, col=colors.bar, main="Lunch")
4. Create pie chart for test preparation course
The graph above shows the distribution of the respondents' whether they completed or did
not have a test preparation course. A majority of the respondents (76.67%) said that they do not
have a test preparation course, while 23.33% said that they have completed the test preparation
course.
Code:
colors.pie <- c("#FB6F92", "#FFE5EC")
pie(test.preparation.course,col=colors.pie,labels=
paste(test.preparation.course/sum(test.preparation.course)*100, "%"), main="Test Preparation
Course")
legend("bottom", legend=c("Completed", "None"), fill=c("#FB6F92", "#FFE5EC"))
5. Create stem and lead plot for math, reading and writing scores.
Math Score
The stem and leaf plot above illustrates the math scores of the respondents. As we can see,
the lowest score is 18, while the highest score is 90. The range for math score is 72 and the median
is 66.5.
Code:
stem(student$math.score)
The stem and leaf plot above illustrates the reading scores of the respondents. As we can
see, the lowest score is 32, while the highest score is 95. The range for reading score is 63 and the
median is 70.
Code:
stem(student$reading.score)
The stem and leaf plot above illustrates the writing scores of the respondents. As we can
see, the lowest score is 28, while the highest score is 93. The range for writing score is 65 and the
median is 70.
Code:
stem(student$writing.score)
6. Create Boxplot for math, reading and writing scores.
The boxplot above illustrates the distribution of math scores of the respondents. The 1st
quartile is at 51, which indicates that 25% of the math scores are lower than 51. On the other hand,
the 3rd quartile is at 71.75, which means that 75% of the scores are lower than 71.75. As we can
see above, the boxplot is negatively skewed.
Code:
boxplot(student$math.score, horizontal=T)
boxplot(student$math.score, horizontal=T, col="#FFE5EC", main="Math Score Box Plot")
The boxplot above illustrates the distribution of reading scores of the respondents. The 1st
quartile is at 54.75, which indicates that 25% of the reading scores are lower than 54.75. On the
other hand, the 3rd quartile is at 70, which means that 75% of the scores are lower than 70. As we
can see above, the boxplot is negatively skewed.
Code:
boxplot(student$reading.score, horizontal=T)
boxplot(student$reading.score, horizontal=T, col="#FFE5EC", main="Reading Score Box Plot")
The boxplot above illustrates the distribution of writing scores of the respondents. The 1st
quartile is at 53.50, which indicates that 25% of the writing scores are lower than 53.50. On the
other hand, the 3rd quartile is at 75, which means that 75% of the scores are lower than 75. As we
can see above, the boxplot is negatively skewed.
Code:
boxplot(student$writing.score, horizontal=T)
boxplot(student$writing.score, horizontal=T, col="#FFE5EC", main="Writing Score Box Plot")
7. Create Histogram for math, reading and writing scores.
The histogram above shows the distribution of the respondents' math scores. As we can
see above, the mean is at 62.67. Moreover, the histogram is negatively skewed, which also means
that the majority of the scores of the respondents are above the passing mark.
Code:
hist(student$math.score, main= "Math Score Histogram")
hist(student$math.score, freq=F, col="#DB6551", main="Math Score Histogram")
curve(dnorm(x, mean(student$math.score), sd(student$math.score)), add=T)
The histogram above shows the distribution of the respondents' reading scores. As we can
see above, the mean is at 67.47. Moreover, the histogram is negatively skewed, which also means
that the majority of the scores of the respondents are above the passing mark.
Code:
hist(student$reading.score, main= "Reading Score Histogram")
hist(student$reading.score, freq=F, col="#DB6551", main="Reading Score Histogram")
curve(dnorm(x, mean(student$reading.score), sd(student$reading.score)), add=T)
The histogram above shows the distribution of the respondents' writing scores. As we can
see above, the mean is at 65.87. Moreover, the histogram is negatively skewed, which also
means that the majority of the scores of the respondents are above the passing mark.
Code:
hist(student$writing.score, main="Writing Score Histogram")
hist(student$writing.score, freq=F, col="#DB6551", main="Writing Score Histogram")
curve(dnorm(x, mean(student$writing.score), sd(student$writing.score)), add=T)
8. Create normal probability plot for math, reading and writing scores.
The data, as shown in the normal probability plot of math scores above, indicates that the
data is normally distributed because it has an obvious pattern wherein it lies near the line.
Code:
qqnorm(student$math.score, col= "#DB6551", main= "Math Score Q-Q Plot")
qqline(student$math.score, main= "Math Score Q-Q Plot")
The data, as shown in the normal probability plot of reading scores above, indicates that
the data is normally distributed because it has an obvious pattern wherein it lies near the line.
Code:
qqnorm(student$reading.score, col= "#DB6551", main= "Reading Score Q-Q Plot")
qqline(student$reading.score, main= "Reading Score Q-Q Plot")
The data, as shown in the normal probability plot of writing scores above, indicates that
the data is normally distributed because it has an obvious pattern wherein it lies near the line.
Code:
qqnorm(student$writing.score, col= "#DB6551", main= "Writing Score Q-Q Plot")
qqline(student$writing.score, main= "Writing Score Q-Q Plot")
9. Create scatter plot for (a) math and reading scores, (b) math and writing scores and
(c) reading and writing scores.
The scatter plot above shows that the relationship between math scores and reading
scores has a strong positive correlation. Because of the upward pattern, and as we can also
observe, as the math score increases, the reading score also increases.
Code:
plot(math.score~reading.score,student, col="#DB6551", main= "Scatter Plot of Math and Reading
Scores")
abline(coef(lm(math.score~reading.score,student)))
The scatter plot above shows that the relationship between math scores and writing scores
has a very strong positive correlation. Because of the upward pattern, and as we can also
observe, as the math score increases, the writing score also increases.
Code:
plot(math.score~writing.score,student, col="#DB6551", main= "Scatter Plot of Math and Writing
Scores")
abline(coef(lm(math.score~writing.score,student)))
The scatter plot above shows that the relationship between reading scores and writing
scores has a very strong positive correlation. Because of the upward pattern, and as we can also
observe, as the math score increases, the writing score also increases.
Code:
plot(reading.score~writing.score,student, col= "#DB6551", main= "Scatter Plot of Reading and
Writing Scores")
abline(coef(lm(reading.score~writing.score,student)))

STAT 2100 Exercise 9 Object and Dataframe Codes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT 2100 Exercise 9 Object and Dataframe Codes

Uploaded by

Copyright:

Available Formats

Lhuriely Bautista

OBJECT AND DATAFRAME CODES

1. Create pie chart for sex

You might also like