You are on page 1of 12

Mel Kian Caesar I.

Dupol

BSAC 2-1

STATS 2100 QUIZ

Data and Data Frame Codes


ID_no. <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20)
Smoking.Status <- c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2)
Exercise <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0,
0)
Weight <- c(120, 106, 127, 132, 109, 143, 123, 145, 118, 143, 196,
187, 199, 156, 142, 162, 160, 215, 170, 234)
Serum_Cholesterol <- c(193, 168, 179, 215, 175, 188, 220, 210, 176,
206, 199, 220, 253, 214, 184, 218, 200, 215, 242, 238)
Systolic_Pressure <- c(126, 120, 128, 129, 119, 136, 131, 163, 132,
138, 148, 115, 149, 142, 156, 135, 156, 153, 122, 142)

ID_no. <- data.frame(Smoking.Status, Exercise, Weight,


Serum_Cholesterol, Systolic_Pressure)

Smoking.Status.table <- table(ID_no.$Smoking.Status)


Exercise.table <- table(ID_no.$Exercise)
Weight.table <- table(ID_no.$Weight)
Serum_Cholesterol.table <- table(ID_no.$Serum_Cholesterol)
Systolic_Pressure.table <- table(ID_no.$Systolic_Pressure)
1. Create a qualitative frequency distribution table (include
relative frequency in the FDT) and appropriate graphical
presentation for variable smoking status. Interpret.

table(ID_no.$Smoking.Status)
table(ID_no.$Smoking.Status)/length(ID_no.$Smoking.Status)
Interpretation:
The table depicts and shows that the respondents in terms of smoking
status has 7 out of 20 or 35% of the respondents does not smoke, 8 out
of 20 or 40% of the respondents smoke less than one pack per day while
5 of 20 or 25% respondents smoke one or more than one pack per day.
Having a total of 20 respondents and having the higher percentage of
the respondents that smoke less than one pack per day.

colors.pie <- c("yellow", "blue", "green")


pie(Smoking.Status.table, col=colors.pie, main = "Smoking Status")
legend("topleft", legend=c("does not smoke", "smoke less than one pack
per day", "smoke one pr more than one pack per day"), fill=c("yellow",
"blue", "green"))

2. Create a qualitative frequency distribution table (include


relative frequency in the FDT) and appropriate graphical
presentation for variable smoking status. Interpret

table(ID_no.$Exercise)
table(ID_no.$Exercise)/length(ID_no.$Exercise)
Interpretation:
The table shows the distribution of respondents in terms of
exercise wherein 8 out of 20 or 40% of the respondents do not exercise
while 12 out of 20 or 60% of the respondents says that they do
exercise. Having the total of 20 respondents and having the highest
percentage of the respondents are doing exercises.
colors.pie <- c("blue", "skyblue")
pie(Exercise.table, col=colors.pie, main = "Exercise")
legend("topleft", legend=c("no", "yes"), fill = c("blue", "skyblue"))
3. Compute descriptive statistics such as minimum, maximum, mean and
standard deviation for weight, serum cholesterol amd systolic
pressure. Interpret.
a.

min(ID_no.$Weight)
max(ID_no.$Weight)
mean(ID_no.$Weight)
sd(ID_no.$Weight)
Interpretation:
As what can be observed from the given data, the lowest weight
that has been recorded from the respondents is 106 lbs while the
highest is 234 lbs. Having the average weight recorded is 154.35 lbs
and the standard deviation having 36.1128 units.
b.

min(ID_no.$Serum_Cholesterol)
max(ID_no.$Serum_Cholesterol)
mean(ID_no.$Serum_Cholesterol)
sd(ID_no.$Serum_Cholesterol)
Interpretation:
As what can be observed from the given data, the lowest serum
cholesterol is 168mg% while the highest is 253 mg%. The average serum
cholesterol is 205.65 mg% and the standard deviation is 23.39652.
c.

min(ID_no.$Systolic_Pressure)
max(ID_no.$Systolic_Pressure)
mean(ID_no.$Systolic_Pressure)
sd(ID_no.$Systolic_Pressure)
Interpretation:
As what can be observed from the given data, the lowest systolic
pressure recorded Is 115 while the highest is 163. The average
systolic pressure is 137 and the standard deviation is 13.81075 units.
4. Create histogram, normal probability plot and box plot for the
variable weight. Interpret.

hist(ID_no.$Weight, main = "Weight Histogram")


hist(ID_no.$Weight, freq = F, col="Skyblue", main = "Weight
Histogram")
curve(dnorm(x,mean(ID_no.$Weight), sd(ID_no.$Weight)), add=T)
Interpretation:
The histogram shows the distribution of weight of the respondents
which can be observed the histogram is somehow unimodal and has been
skewed on the right. Meaning that it is positively skewed and it can
be seen that the mean is at 154.35.
qqnorm(ID_no.$Weight, main = "Weight")
qqline(ID_no.$Weight, main = "Weight")
Interpretation:
The Normal probability plot above depicts that the data has an
obvious pattern of coming away from the line which indicates that it
is not normally distributed and skewed.
boxplot(ID_no.$Weight, horizontal = T, col = "Skyblue", main =
"Weight")
Interpretation:
The box plot shows that the 1st quartile is at 126 that means 25%
of the weight is lower than 126. Its median is 144 and the 3rd quartile
is at 174.2. Having the minimum of 106 and the maximum of 234. The box
plot is somehow skewed to the right that implies that it Is somehow
positively skewed.

5. Create histogram, normal probability plot and boxplot for the


variable serum cholesterol. Interpret.

hist(ID_no.$Serum_Cholesterol, main = "Serum Cholesterol Histogram")


hist(ID_no.$Serum_Cholesterol, freq = F, col="Skyblue", main = "Serum
Cholesterol Histogram")
curve(dnorm(x,mean(ID_no.$Serum_Cholesterol),
sd(ID_no.$Serum_Cholesterol)), add=T)
Interpretation:
The histogram shows that it is unimodal and looks approximately
symmetric with a few skewness on the right. This means that it is
somewhat positively skewed and it can be seen that the mean is at
205.7.
qqnorm(ID_no.$Serum_Cholesterol, main = "Serum Cholesterol")
qqline(ID_no.$Serum_Cholesterol, main = "Serum Cholesterol")
Interpretation:
The normal probability shows that the data has an obvious pattern
of coming away from the line which implies that it is not normally
distributed and skewed.
boxplot(ID_no.$Serum_Cholesterol, horizontal = T, col = "Skyblue",
main = "Serum Cholesterol")
Interpretation:
The box plot shows that the 1st quartile is at 187, which implies
that 25% of the serum cholesterol is lower than 187. Its median is 208
and the third quartile is at 218.5. Having a minimum of 168 and a
maximum of 253. The box plot is somehow skewed to the right which
means that it is somehow positively skewed.

6. Create histogram, normal probability plot and boxplot for the


variable systolic pressure. Interpret.

hist(ID_no.$Systolic_Pressure, main = "Systolic Pressure Histogram")


hist(ID_no.$Systolic_Pressure, freq = F, col="Skyblue", main =
"Systolic Pressure Histogram")
curve(dnorm(x,mean(ID_no.$Systolic_Pressure),
sd(ID_no.$Systolic_Pressure)), add=T)
Interpretation:
The histogram shows that it is unimodal and looks normally
distributed while the mean is at 137.
qqnorm(ID_no.$Systolic_Pressure, main = "Systolic Pressure")
qqline(ID_no.$Systolic_Pressure, main = "Systolic Pressure")
Interpretation:
The normal probability shows that the data has an obvious pattern
of coming away from the line which indicates that it is not normally
distributed and skewed.
boxplot(ID_no.$Systolic_Pressure, horizontal = T, col = "Skyblue",
main = "Systolic Pressure")
Interpretation:
The box plot above shows that the 1st quartile is at 127.5 which
means that 25% of the weight is lower than 127.5. The median is at
135.5 and having the 3rd quartile at 148.2. Having a minimum of 115 and
a maximum of 163. The box plot is somehow skewed to the right which
means it is somehow positively skewed.

You might also like