You are on page 1of 3

DataMiningSpring2017HW1

1. From your text book page 80

Solutions
a) What is the mean of the data? What is the median?
 The mean = 30 The median = 25
b) What is the mode of the data? Comment on the data's modality (i.e., bimodal, trimodal, etc.).
 This data set has two values that occur with the same highest frequency and is, therefore,
bimodal.
 The modes of the data are 25 and 35.
c) What is the midrange of the data?
 The midrange (average of the largest and smallest values in the data set) of the data is:
(70+13)=2 = 41.5
d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
 Q1: 20. Q3: 35
e) Give the five-number summary of the data.
 13, 20, 25, 35, 70.
f) Show a boxplot of the data.

g) How quantile quantile plot different from a quantile plot


 The quantile-quantile (q-q) plot is a graphical technique for determining if two
data sets come from populations with a common distribution.
 A quantile plot is a plot which plots the quantiles of the density function in
question against the explanatory density.
 the Q-Q plots the data distribution against a known distribution. The other Q
plots the data against the independent variable

Using R
x = c(13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45,46,
52, 70)
summary(x)
boxplot(x)
plot(quantile(x))
qqnorm(x)
2. Let a football coach tested the age and body fat data for 18 randomly selected player with the
following result
age= (23,23,27,27,39,41,47,49,50,52,54,54,56,57,58,58,60,61)
bodyfat= (9.5,26.5,7.8,17.8, 31.4,25.9,27.4,27.2,31.2,34.6,42.5,28.8,33.4,30.2,34.1,32.9,41.2,35.7)

2.1. Calculate the mean, median and standard deviation of age and body fat.
mean median standard deviation
age 46.44 51 13.21
body fat 28.78 30.7 9.25

2.2. Draw the box-plots for age and body fat.

2.3. Draw a scatter plot and a q-q plot based on these two variables.

Using R
#HW02_Q2
age= c(23,23,27,27,39,41,47,49,50,52,54,54,56,57,58,58,60,61)
bodyfat= c(9.5,26.5,7.8,17.8, 31.4,25.9,
27.4,27.2,31.2,34.6,42.5,28.8,33.4,30.2,34.1,32.9,41.2,35.7)
#part(a)
mean(age)
median(age)
sd(age)
mean(bodyfat)
median(bodyfat)
sd(bodyfat)

#part(b)
boxplot(age,main ="age")
boxplot(bodyfat, main="body fat")

#part(c)
plot(age,bodyfat, main= "scatter")
qqnorm(age, main= "age")
qqline(age, col = 2)

qqnorm(bodyfat, main="body fat")


qqline(bodyfat, col = 2)

qqplot(age,bodyfat, main="Q-Q plot")

3. Given two samples denoted by the rows (22, 1, 42, 10) and (20, 0, 36, 8)
3.1. Compute the Euclidean distance between the two samples.
6.7082
3.2. Compute the Manhattan distance between the two samples.
11
3.3. Compute the Minkowski distance between the two samples, using h = 3.
6.1534

Using R
r1 = c(22, 1, 42, 10)
r2 = c(20, 0, 36, 8)
dist(rbind(r1, r2), method = "euclidean")
dist(rbind(r1, r2), method = "manhattan")
dist(rbind(r1, r2), method = "minkowski", p = 3)

You might also like