Sanizah 4/5/2013
1
CHAPTER 4
NUMERICAL DESCRIPTIVE
MEASURES
Prepared by Sanizah Ahmad
1 sanizah@tmsk.uitm.edu.my
Learning outcomes
Understand mean, median and mode as measures
of central tendency
Calculate mean, median and mode for ungrouped
and grouped data
Explain and use the properties of mean, median
and mode
Draw boxandwhiskers plot
Calculate first quartile and third quartile
sanizah@tmsk.uitm.edu.my
2
Numerical Descriptive Measures
(Summary Statistics)
Measures of
Central Tendency
(Average)
Arithmetic
mean
Median
Mode
Measures
of Position
(Skewness)
Quartile
Boxand
Whisker
plots
Measures of
Dispersion
(Spread)
Range
Variance
Standard
deviation
3
sanizah@tmsk.uitm.edu.my
TYPES OF DATA
4
sanizah@tmsk.uitm.edu.my
Ungrouped
Data
Measures of
Central
Tendency
Measures of
Position
Measures of
Dispersion
Grouped
Data
Measures of
Central
Tendency
Measures of
Position
Measures of
Dispersion
QMT412 Business Statistics Pn. Sanizah 4/5/2013
2
CHAPTER 4 PART I
SUMMARY MEASURES
(FOR UNGROUPED DATA)
5 sanizah@tmsk.uitm.edu.my
Central tendency is a single value that is situated
at the centre of a data.
 summary value for that data.
 also called the average.
Types of average:
Mean
Median
Mode
1. MEASURES OF CENTRAL
TENDENCY
6
sanizah@tmsk.uitm.edu.my
MEAN (for ungrouped data)
The mean is the sum of the values, divided by the total
number of values.
Population mean,
where N is the population size.
Sample mean,
where n is the sample size.
Useful in comparing two or more population.
Use for interval and ratio level.
N
X
N
X X X X
N
i
i
N
¿
=
=
+ + + +
=
1 3 2 1
...
µ
n
x
n
x x x x
x
n
i n
¿
=
=
+ + + +
=
1 3 2 1
...
7
sanizah@tmsk.uitm.edu.my
Example 1
The following data give the hours spent studying
per week by nine selected students. Find the mean.
12 16 7 22 11 18 7 25 16
8
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
3
Pros and cons of using mean
Pros
Summarizes data in a way that is easy to
understand.
Uses all the data
Used in many statistical applications
Cons
Affected by extreme values or outliers
9
sanizah@tmsk.uitm.edu.my 10
Agent No. recruited No. recruited
1 18 18
2 19 19
3 20 20
4 20 20
5 20 20
6 20 20
7 20 20
8 22 22
9 24 24
10 28 120
Mean=21.1 Mean=30.3
Illustration: Distribution of agent with number of new
customers recruited last month
**No one in department is
close to 30.3
**Mean is NOT a
reasonable measure of
central tendency
**Mean is affected by
extreme value or outliers
sanizah@tmsk.uitm.edu.my
MEDIAN (for ungrouped data)
The median is the value that lies in the centre of the data.
Can be computed for ratio, interval and ordinal level.
Steps in computing the median of raw data
Step 1: Arrange the data in increasing order.
Step 2: Select the middle point. If the middle point falls
halfway between two values, find the median by adding the
two values and dividing by two.
Median = Value of the th term in a ranked data set.
Is median affected by outliers?
11

.

\
 +
2
1 n
sanizah@tmsk.uitm.edu.my 12
Agent No. recruited No. recruited
1 18 18
2 19 19
3 20 20
4 20 20
5 20 20
6 20 20
7 20 20
8 22 22
9 24 24
10 28 120
Mean=21.1 Mean=30.3
Median
Median is
NOT
affected
by outliers
The best
measure of
location / central
tendency
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
4
Example 3 (odd numbered observations)
Find the median for ages of seven preschool children.
The ages are
1 3 4 2 3 5 1
Solutions: Arrange the data in ascending order
1 1 2 3 3 4 5
So, median age is 3 years old.
median
13
sanizah@tmsk.uitm.edu.my
value 4
value ) 1 7 (
2
1
Median of Locat ion
th
th
=
+ =
Example 4 (even numbered observations)
The number of cloudy days for the top 6 cloudiest cities is
shown. Find the median.
209 223 211 227 213 240
Solution:
209 211 213 223 227 240
218
2
223 213
Median =
+
=
median
value 5 . 3
value ) 1 6 (
2
1
Median of Location
th
th
=
+ =
14 sanizah@tmsk.uitm.edu.my
Example 5
The following data give the speeds (in miles per
hour) of 12 cars traveling on a highway. Find the
median.
67 71 57 54 69 74 77 62 61
59 58 63
15
sanizah@tmsk.uitm.edu.my
Mode (for ungrouped data)
MODE is the most frequent score in the distribution
(or value that occurs with the highest frequency in a
data set).
A distribution where a single score is most frequent has
one mode and is called unimodal.
When there are ties for the most frequent score, the
distribution is bimodal if two scores tie or multimodal if
more than two scores tie.
If there is no data value(s) that occur most frequently,
we say that the data set has no mode.
16
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
5
MODE
Qualitative/Categorical data
(ungrouped)
Quantitative Data
(grouped)
17
The mode for the bar
chart is basketball
The mode is the class
interval 2125
Disadvantage: A set of data may have
one, two or many modes or no mode at
all.
sanizah@tmsk.uitm.edu.my
Example 6
The following data represent the duration (in days)
of U.S. Space Shuttle voyages for the years 1992
1994. Find the mode.
8 9 9 14 8 8 10 7 6
9 7 8 10 14 11 8 14 11
Solution: Mode for the data set is 8.
18
sanizah@tmsk.uitm.edu.my
Example 7
Suppose the following data give the amount of
snowfall (in inches) for 10 days during January to
March of 2004 for the state of Utah. Find the mode.
(a) 8 11 3 9 10 2 4 6 4 8
(b) 5 6 4 7 10 3 8 12 9 2
19
sanizah@tmsk.uitm.edu.my sanizah@tmsk.uitm.edu.my 20
QMT412 Business Statistics Pn. Sanizah 4/5/2013
6
Techniques that divide a set of data into equal
groups.
 to determine the measurement of position,
the data must be sorted from lowest to
highest.
QUARTILES
BOX AND WHISKER PLOT
2. MEASURES OF POSITION
(for ungrouped data)
21
sanizah@tmsk.uitm.edu.my
BoxandWhisker Plot (for ungrouped
data)
Purpose: to find out what information can be
discovered about the data such as the center and
spread
The Box and Whisker plot consists of FIVE number
summary:
1. The median ( 2
nd
quartile)
2. The 1
st
quartile
3. The 3
rd
quartile
4. The maximum value in a data set
5. The minimum value in a data set
22 sanizah@tmsk.uitm.edu.my
sanizah@tmsk.uitm.edu.my 23
QUARTILE (for ungrouped data)
Quartiles are the values that divide a set of data (the distribution) into
four equally sized groups (quarters), separated by Q
1
, Q
2
and Q
3
.
Quartile 1(Q
1
) is located at 25%
th
position.
Quartile 3(Q
3
) is located at 75%
th
position.
Position of Q
1
and Q
3
:
Quartile2 (Q
2
) is the same as the median and has 50% of the values
less than and 50% of the values greater than it.
4
) 1 ( 3
and
4
1
3 1
+
=
+
=
n
Q
n
Q
Finding Data Values Corresponding to Q
1
, Q
2
and
Q
3
.
Step 1 Arrange the data in
order from lowest to highest.
Step 2 Find the median of the
data values. That is the
value for Q
2
.
Step 3 Find the median of the
data values that fall below.
This is the value for Q
1.
Step 4 Find the median of the
data values that fall above.
This is the value for Q
3.
24
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
7
Example 8
(Do not use method in textbook pg. 73)
Find Q
1
, Q
2
and Q
3
for the data set
15 13 6 5 12 50 22 18
Solution: Arrange data in ascending order
5 6 12 13 15 18 22 50
sanizah@tmsk.uitm.edu.my
25
Example 9
Refer Example 7 pg. 73 textbook.
26
sanizah@tmsk.uitm.edu.my
INTERQUARTILE RANGE (IQR)
The interquartile range (IQR) is defined as the
difference between
and is the range of the middle 50% of the data.
Interquartile range is used to identify outliers, and it is
also used as a measure of variability .
If data values are smaller than or
larger than then those data are known
as outliers.
1 3
Q Q IQR ÷ =
) ( 5 . 1
1
IQR Q ÷
) ( 5 . 1
3
IQR Q +
1 3
and Q Q
27 sanizah@tmsk.uitm.edu.my
Example 10
Use the following set of data to create the 5 number
summary. Construct a box and whisker plot for the
data.
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91,
93, 100
Solution:
Median = 68
Lower Quartile = 52 , Upper Quartile = 87
Min = 18 , Max = 100 (IQR = 35 )
Take out your graph paper so we can practice graphing
the data.
28
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
8
Graphing The Data
Notice, the Box includes the lower quartile, median,
and upper quartile.
The Whiskers extend from the Box to the max and
min.
29
sanizah@tmsk.uitm.edu.my
Exercises
30
Refer textbook pg. 74
Quick Check 4A Q4, 5 and 6.
sanizah@tmsk.uitm.edu.my
31
SKEWNESS: a symmetric histogram and
frequency curve (no skewness) .
Symmetrical (normal) distribution
Bellshape curve
x x x ˆ
~
= =
sanizah@tmsk.uitm.edu.my
32
SKEWNESS: histogram and frequency curve
skewed to the right/positively skewed.
x x x < <
~
ˆ
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
9
33
SKEWNESS: A histogram and frequency
curve skewed to the left/negatively skewed.
x x x ˆ
~
< <
sanizah@tmsk.uitm.edu.my
Mean, Median and Mode
sanizah@tmsk.uitm.edu.my
34
help to understand the spread or
variability of a set of data
RANGE
VARIANCE
STANDARD DEVIATION
3. MEASURES OF DISPERSION
(for ungrouped data)
35
sanizah@tmsk.uitm.edu.my
Range = highest value  lowest value
Example 12: The following data give the hours
spent studying per week by a sample of nine
students. Find the range
12 16 7 22 11 18 7 25 16
Solution:
Highest value =
Lowest value =
Range =
RANGE
36
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
10
useful to measure the variability (how the values
fluctuate around the mean).
Formula for variance and standard deviation of
population:
Variance,
Standard deviation,
Variance and Standard deviation
( )
N
N
X
X
2
2
2
¿
¿
÷
= o
2
o = o
37
sanizah@tmsk.uitm.edu.my
Formula for variance and standard deviation of
sample:
Variance,
Standard deviation,
( )
1
2
2
2
÷
÷
=
¿
¿
n
n
x
x
s
2
s s =
38
sanizah@tmsk.uitm.edu.my
Find the sample variance and standard deviation
for the amount of European auto sales for a sample
of 6 years shown. The data are in millions of
dollars.
11.2 11.9 12.0 12.8 13.4 14.3
Example 11
39
sanizah@tmsk.uitm.edu.my
Solution: Data 11.2 11.9 12.0 12.8 13.4 14.3
( )
276 1
5
6
6 75
94 958
1
2
2
2
2
.
.
.
n
n
x
x
s
=
÷
=
÷
÷
=
¿
¿
13 1 276 1
2
. . s s = = =
94 958 3 14 9 11 2 11
6 75 3 14 9 11 2 11
2 2 2 2
. . ... . . x
. . ... . . x
= + + + =
= + + + =
¿
¿
40 sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
11
The following data give the number of selected
patients admitted to a hospital on seven days
during the month of January 2003.
19 14 9 25 21 13 16
Calculate the values of the variance and standard
deviation.
Example 12
41
sanizah@tmsk.uitm.edu.my
Useful measure when comparing distributions of
different means and variances.
Indicate the ratio of the standard deviation to the
arithmetic mean expressed as a percent.
Coefficient of Variation
100
mean sample
deviation standard sample
CV × =
42
sanizah@tmsk.uitm.edu.my
%
x
s
CV 100 × =
Typist Ani can type 40 words per minute with
standard deviation of 5 while typist Jura can type
160 words per minute with standard deviation of
10. Which typist is more consistent in her work?
Example 13
Refer textbook pg 113 Ex. 14
43
sanizah@tmsk.uitm.edu.my
Measure of Skewness
Measure of skewness is used to determine the
difference between the mean and the mode of the
distribution.
Pearson Coefficient of Skewness
i) If skewness = 0 ¬ distribution is symmetrical.
ii) If skewness = +ve ¬ skewed to the right.
Iii) If skewness = ve ¬ skewed to the left.
44
deviation standard
median) 3(mean
deviation standard
mode mean
skewness
÷
=
÷
= or
sanizah@tmsk.uitm.edu.my
QMT412 Business Statistics Pn. Sanizah 4/5/2013
12
Skewness
sanizah@tmsk.uitm.edu.my
45
Example 16
Refer textbook pg. 120 Ex. 16
Given that the mean, mode and standard deviation
of a set of data are 4, 5 and 0.5, respectively.
Find the Pearson coefficient of skewness and
explain the distribution.
46
sanizah@tmsk.uitm.edu.my
Example 17
Do Quick Check 5C Q7 pg. 117
47
sanizah@tmsk.uitm.edu.my