You are on page 1of 34

Chapter 2

Tutorial
2nd & 3rd LAB
Boxplot Example 1
Draw the boxplot For the following set
1
1
3 Min=
5 Q1=
Median=
6 Q3=
6 Max=
7
8 IQR=
12 1.5×IQR=
Q1- 1.5×IQR=
13 Q2+1.5×IQR=
15
Boxplot Example 1
Draw the boxplot For the following set
1
1
3 Min=1
5 Q1=3
Median=6
6 Q3=12
6 Max=15
7
8 IQR=12-3=9
12 1.5×IQR=13.5
Q1- 1.5×IQR=-10.5
13 Q3+1.5×IQR=25.5
15
Boxplot Example 1
14

12

10 Min=1
8
Q1=3
Median=6
6 Q3=12
4
Max=15

0
Sample 1
Boxplot Example 2
Draw the boxplot For the following set
2
3
3 Min=
6 Q1=
Median=
7 Q3=
7 Max=
7
9 IQR=
13 1.5×IQR=
Q1- 1.5×IQR=
15 Q2+1.5×IQR=
30
Boxplot Example 2
Draw the boxplot For the following set
2
3
3 Min=2
6 Q1=3
Median=7
7 Q3=13
7 Max=30
7
9 IQR=10
13 1.5×IQR=15
Q1- 1.5×IQR=-12
15 Q2+1.5×IQR=28
30
Boxplot Example 2
Min=2
35 Q1=3
30
Median=7
Q3=13
25 Max=30
20

15 Q1- 1.5×IQR=-12
Q2+1.5×IQR=28
10

0
Sample 1
Q3-Q2 Q2-Q1
Boxplot Example 2
Min=2
35 Q1=3
30
Median=7
Q3=13
25 Max=30
20
Terminate whiskers at the
15 Q1- 1.5×IQR=-12
most extreme observation
Q2+1.5×IQR=28
10 within 1.5×IQR of the
quartiles
5

0
Sample 1
Q3-Q2 Q2-Q1
Q2
2) Suppose that the data for analysis includes the attribute grade.
The grade values for the data tuples are:

4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20

(a) What is the mean of the data? What is the median?


• Using Equation (2.3), the mean = 13.61
• The median = (13+14)/2 = 13.5
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20

(b) What is the mode of the data? Comment on the data's


modality (i.e., bimodal, trimodal, etc.).
• The mode (value occurring with the greatest frequency) of the
data is 13, the mode is only one value so it’s called unimodal.
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20

(c) What is the midrange of the data?


• The midrange (average of the largest and smallest values in
the data set) of the data is:
• (20+ 4) / 2 = 12
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20

(d) Can you find (roughly) the first quartile (Q1) and the third
quartile (Q3) of the data?
• The first quartile (corresponding to the 25th percentile) of the
data is: 12. The third quartile (corresponding to the 75th
percentile) of the data is: 17.
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20

(e) Give the five-number summary of the data.


• The five number summary of a distribution consists of the
minimum value, first quartile, median value, third quartile,
and maximum value. It provides a good summary of the shape
of the distribution and for this data is: 4,12,13.5,17,20
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20

(f) Show a boxplot of the data.


18

16

14

12

10

0
Sample 1
Q3-Q2 Q2-Q1
Q3
3) Suppose that the data for analysis includes the attribute age.
The age values for the data tuples are

13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(a) What is the mean of the data? What is the median?
• Using (Equation 2.1), the (arithmetic) mean of the data is: =
809/27 = 30. The median (middle value of the ordered set, as
the number of values in the set is odd) of the data is: 25.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(b) What is the mode of the data? Comment on the data's
modality (i.e., bimodal, trimodal, etc.).
• This data set has two values that occur with the same highest
frequency and is, therefore, bimodal.
• The modes (values occurring with the greatest frequency) of
the data are 25 and 35.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(c) What is the midrange of the data?
• The midrange (average of the largest and smallest values in
the data set) of the data is: (70+13)=2 = 41.5
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(d) Can you find (roughly) the first quartile (Q1) and the third
quartile (Q3) of the data?
• The first quartile (corresponding to the 25th percentile) of the
data is: 20. The third quartile (corresponding to the 75th
percentile) of the data is: 35.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(e) Give the five-number summary of the data.
• The five number summary of a distribution consists of the
minimum value, first quartile, median value,
• third quartile, and maximum value. It provides a good
summary of the shape of the distribution and for this data is:
13, 20, 25, 35, 70.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(f) Show a boxplot of the data.
Q4
4) Suppose a manager tested the age and body fat data for 18
randomly selected adults with the following result

23 9.5
23 26.5
27 7.8
27 17.8
39 31.4
41 25.9
47 27.4
49 27.2
50 31.2
52 34.6
54 42.5
54 28.8
56 33.4
57 30.2
58 34.1
58 32.9
60 41.2
61 35.7
Q4
(a) Calculate the mean, median and standard deviation of age
and score.
• For the variable age the mean is 46.44, the median is 51, and
the standard deviation is 12.85.
• For the variable score the mean is 28.78, the median is 30.7,
and the standard deviation is 8.99
Q4
(b) Draw the box-plots for age and score.
7.8
9.5
17.8
25.9
26.5
27.2
27.4
28.8
30.2
31.2
31.4
32.9
33.4
34.1
34.6
35.7
41.2
42.5
Q4
(b) Draw the box-plots for age and score.
23
23
27
27
39
41
47
49
50
52
54
54
56
57
58
58
60
61
Q4
(c) Draw a scatter plot and a q-q plot based on these two
variables.
Q4
(c) Draw a scatter plot based on these two variables.

23 9.5
23 26.5
27 7.8
27 17.8
39 31.4
41 25.9
47 27.4
49 27.2
50 31.2
52 34.6
54 42.5
54 28.8
56 33.4
57 30.2
58 34.1
58 32.9
60 41.2
61 35.7
Q4
(c) q-q plot based on these two variables.
23 7.8
23 9.5
27 17.8
27 25.9
39 26.5
41 27.2
47 27.4
49 28.8
50 30.2
52 31.2
54 31.4
54 32.9
56 33.4
57 34.1
58 34.6
58 35.7
60 41.2
61 42.5
Q5
• Given two objects represented by the tuples (22, 1, 42, 10)
and (20, 0, 36, 8):
• (a) Compute the Euclidean distance between the two objects.
• (b) Compute the Manhattan distance between the two
objects.
• (c) Compute the Minkowski distance between the two
objects, using h = 3.
Q5
• To compute distance between Numeric attributes
• Euclidean distance

• The Manhattan (or city block) distance


=

Q5
• (22, 1, 42, 10) and (20, 0, 36, 8):
(a) Compute the Euclidean distance between the two objects.
=

=6.7082
Q5
• (22, 1, 42, 10) and (20, 0, 36, 8):
(b) Compute the Manhattan distance between the two objects
=
= 11
Q5
• (22, 1, 42, 10) and (20, 0, 36, 8):
(c) Compute the Minkowski distance between the two objects,
using h = 3

= 6.1534

You might also like