You are on page 1of 103

Chapter 3

Numerical Descriptive
Measures
Learning Outcome
 Explore, describe and compare data sets
using the basic tools after collecting the
data.

 Use statistical software to do simple


exploratory data analysis.
Contents
3.1 Measures of central tendency for ungrouped data.

3.2 Measures of central tendency for group data.

3.3 Measures of dispersion.


3.3.1 Measures of dispersion for ungrouped data.
3.3.2 Measures of dispersion for grouped data.

3.4 Measures of position.

3.5 Skewness.
3.1 Measures of central tendency for
ungrouped data
 Represent a data set by some numerical
measures (typical values).
 A single value that summarizes a set of data.
 It locates the centre of the values.
 Give the centre of a histogram or a frequency
distribution curve.
Measures of central tendency (cont'd)

 There are three measurements of central


tendency:
 Median

 Mode

 Mean
Median
 Median is the value of the middle term in a
data set that has been ranked in increasing
or decreasing order.
 Median is the value of the [(n+1)/2]th term in
a ranked data set;
n = total number of elements in the set.

Note:
1. If n is odd, then median is the value of the middle term in the ranked data.
2. If n is even, then median is the average value of the two middle terms.
Median (cont'd)
 Example 3.1
i) Find the median of set A = { 10, 5, 19, 8, 3 }
and set B = { 2, 7, 3, 6, 4, 5 }.
 Solution:
Median (cont'd)
ii) Find the median of set C = {3, 5, 8, 10, 90}
Compare the median of set C with the
median of set A, and state your conclusion.
 Solution:
Median (cont'd)
 For ungrouped data in the form of frequency
distribution of single-valued classes
 Median can be found either from ungrouped
frequency distribution or from the cumulative
frequency distribution.
Median (cont'd)
 Example 3.2
Find the median of the following frequency
distribution.

No. of 0 1 2 3 4 5
children
Frequency 3 5 12 9 4 2
Median (cont'd)
 Solution:
No. of children < 0 < 1 < 2 < 3 < 4 < 5 < 6
Cumulative 0 3 8 20 29 33 35
frequency

The position of median = (n+1)/2=36/2=18


So, median is 2.
Mode
 Mode is the value that occurs with the highest
frequency in a data set.
 Example 3.3
Find the mode of each of the following data
set.
i) 74, 9, 5, 8, 3, 8, 8
ii) 2, 2, 6, 6, 8, 8, 9, 9
iii) 2, 6, 6, 6, 3, 8, 8, 8, 3
iv) B, C, D, A, A, C, C, C, B, A
Mode (cont’d)
 Mode is not influence by the extreme value
 May not exist, Unimodal, Bimodal and
Multimodal.
 Can be used for both quantitative and
qualitative data.
Mode (cont'd)
 Example 3.4
Find the mode of the following frequency
distribution.
Mean
 The mean for population data x1, x2, ..., xN is
denoted by  and is defined asN
x1  x2  ...  xN 1
   xi
N N i 1

 The mean for sample data x1, x2, ..., xn is


defined as x  x  ...  x 1 n
X 1 2

n

n

n
 x
i 1
i
Mean (cont'd)
 Example 3.5
Find the arithmetic mean for the data set
{ 158, 189, 265, 127, 191 }
 Solution:
Mean (cont'd)
 Find the arithmetic mean for the data set
{ 158, 189, 265, 127, 1091 }. Compare your
answer with the answer of example 3.5.
 Solution:
Mean (cont'd)
Note:
 Not necessary takes one of the values in the

original data.
 Influenced by extreme value

 Not suitable for the data set that contain

extreme value.
Mean (cont'd)
 For ungrouped data in the form of frequency
distribution of single-valued classes.

f1 x1  f 2 x2  ...  f n xn 1 n f i xi
X   fi xi 
n n i 1 f i
Mean (cont'd)
 Example 3.6
Find the mean of the following frequency
distribution.

xi 2 5 6 8

fi 1 3 4 2
Mean (cont'd)
 Solution:
xi 2 5 6 8

fi 1 3 4 2

fi xi
Mean (cont'd)
Effect of outliers or extreme values

 Outliers are the values that are very small or


very large relative to the majority of the
values in a data set.
Example 3.7 (Effect of an outlier on the mean)
Given the values
5610 3243 609 1187 32268
Find the mean and median.
Solution:
Mean = Median =

If we exclude the extreme value 32268


Solution:
Mean = Median=
3.2 Measures of Central Tendency for
grouped data
 Mode
 Modal class
 The class which has the largest standard
frequency
 An estimate of the mode can be obtained from the
modal class.
Example 3.8
 Estimate the mode of the following frequency
distribution.

Weight (kg) 60 – 62 63 – 65 66 – 68 69 – 71 72 – 74

Frequency 3 4 5 6 2
Solution: Method 1
Estimate the mode by graphically
(histogram) Histogram

6
 Modal class
5
= 69-71
Frequency
4
 Estimated mode
3
=69
2

0
61 64 67 70 73
Weight
Mode
 For grouped data,
f m  fb
Mode  Lm  C
 f m  fb    f m  f a 
f m  fb
 Lm  C
2 f m   fb  f a 

Lm  Lower class boundary of the modal class


f m  frequency of the modal class
f b  frequency of the next lower class below the modal class
f a  frequency of the next higher class above the modal class
C  the size of the modal class
Mode
Or  f m  fb 
xˆ  Lm    C
 2 fm   f b  fa  
 B 
 Lm   C
  A  B 

Lm  lower boundary of the modal class


 B  Difference between the modal class and the class before.
 A  Difference between the modal class and the class after.
C  the size of the modal class
Mode (cont'd)
 Find the mode of the weights Frequency
(kg)
following data.
60 – 62 3
 Solution:
63 – 65 4
L  68.5,  B  1 66 – 68 5
 A  4, C  3 69 – 71 6
72 - 74 2
 1 
xˆ  68.5   3
 1 4 
 69.1
Median (cont'd)
 For identify the median class for grouped
distributions, take the middle number as
term.  1
N  F

 2 B 
% L  
 For grouped data, x  fm
C

 

L  lower boundary of the median class


FB  cumulative frequency before median class
f m  frequency of median class
C  median class size
N  the total number of data
Median (cont'd)
 Find the median of the
following data. weights Frequency Cumulative
(kg) Frequency
 Solution:

20  1 60 – 62 3 3
Tx   10.5
2
63 – 65 4 7
LB  65.5, FB  7,
f m  5, C  3 66 – 68 5 12

69 – 71 6 18
1 
 2 
20  7 
x  65.5  
%  3  67.3 72 - 74 2 20
 5 
 
Using cumulative frequency curve
Cumulative curve

25

20

15

10

0
59.5 62.5 65.5 68 68.5 71.5 74.5
Mean (cont'd)
 For grouped data in the form of frequency
distribution
 Suppose data are grouped into k class
intervals, and
fi = the frequency of class i , N= fi =population size
mi = the midpoint of class i ,n =  fi = sample size
mean for population data:
f i mi

N
mean for sample data:
f i mi
X
n
Mean (cont'd)
 Example 3.9
 Find the mean of the following frequency
distribution.

UCCM2233 10-12 13-15 16-18 19-21


Scores
Number of 4 12 20 14
students
Mean (cont'd)
 Solution:

fi mi  832, n  50
fi mi
X  16.64
n
Using Excel to find Mean
Relationships among the mean,
median and mode
 Symmetric histogram and frequency curve with
one peak.
 The values of the mean, median and mode are identical
and lie at the centre of the distribution.
Relationships among the mean,
median and mode (cont'd)
 Histogram and frequency curve skewed to the
right or positively skewed.
 The value of the mean is the largest, mode is the smallest
and median lies between these two.
Relationships among the mean,
median and mode (cont'd)
 Histogram and frequency curve skewed to
the left or negatively skewed.
 The value of the mean is the smallest, mode is
the largest and median lies between these two.
Relationships among the mean,
median and mode (cont'd)

When to use mean, median and


mode?

The following table summarizes the appropriate


methods of determining the middle or typical
value of a data set based on the measurement
scale of the data.
How do we interpret these numbers?

 Imagine the sample dataset represents the


number of days it takes a business to
process an order.

 The average time is 6 days (the mean), most


orders will be processed in 5 days (mode)
and 50 % will take more than 5.5 days
(median) or 50 % will take less than 5.5 days
according to the salesman.
Measurement Scale Best Measure of the "Middle"

Nominal
Mode
(Categorical)

Ordinal Median

Symmetrical data: Mean


Interval
Skewed data: Median

Symmetrical data: Mean


Ratio
Skewed data: Median
3.3 Measures of dispersion
 Measures of dispersion for ungrouped
data
 The measures of central tendency only are not
enough to reveal the whole picture of the
distribution of a whole data set
 The measure of central tendency does not
describe how the data is distributed
Measures of dispersion
 Example
Data set Data Mean Median Mode
A 1, 3, 6, 10, 10, 21, 26 11 10 10
B 7, 8, 10, 10, 10, 15, 17 11 10 10

The mean, median and mode are the same


for data set A and B but the distribution of
the data are different.
Measures of dispersion
 Three measurements of dispersion:
 Range
 Interquartile Range
 Standard deviation or Variance
Measures of dispersion for ungrouped data
 Range
 The range for a data set {x1, x2,..., xn} is defined to be
the difference between the largest value and
smallest value.

Range  Largest Value  Smallest Value


 Example
 Find the range for data set A and data set B above.
 RA=26-1=25, RB=17-7=10
Measures of dispersion for ungrouped data
 Variance
 The variance is the average of the squared
deviation of the data from the mean.
 Consider a population of N measurements
x1 , x2 ,..., xN 1 N

 Population Mean =  
N
x
i 1
i

N
1
 Population Variance =  2
 
N i 1
( xi   ) 2

1 N 2
 ( xi )   2

N i 1
Measures of dispersion for ungrouped data

 Consider a sample of n measurements


x1 , x2 ,..., xn
n
 1
Sample Mean = X   xi
n i 1
n
1
 Sample Variance = s 2  
n  1 i 1
( xi  X ) 2

1  n
1 n
 
2

   xi    xi  
2

n  1  i 1 n  i 1  
Measures of dispersion for ungrouped data
 Standard Deviation
 The standard deviation is the positive square root
of the variance
Sample standard deviation = s  s

2

 Population standard deviation =   2

 Note:
 A small standard deviation means that the data are
distributed closely to their mean.
 A large standard deviation means that the data are widely
scattered about their mean.
 It is influenced by extreme values
Measures of dispersion for ungrouped data

 Example 3.11:
 Data shows the salary per day for all 6 employees
of a small company.
29.50, 16.50, 35.40, 21.30, 49.70, 24.60
Calculate the variance and standard deviation for
these data.
Measures of dispersion for ungrouped data

 Solution:
 Mean,  =

xi xi   ( xi   ) 2
xi 2
29.50
16.50
35.40
21.30
49.70
24.60
Measures of dispersion for ungrouped data
 Method 1
 Population Variance =

 Method 2
 Population Variance =

 Hence, population standard deviation,


Using Excel
Measures of dispersion for ungrouped data
 Example 3.12:
A sample consists of 5 data values: 72, 49, 79, 55 and
57. Calculate the variance and standard deviation.
 Solution:

 Sample variance =
n5 xi  312 xi2  20100

 Sample standard deviation ,s =


Using Excel
Mean Absolute Deviation (MAD)
 MAD measures the mean of the absolute
values of the deviations from the mean.

 x X
MAD 
n
Example 3.13
 Calculate the mean absolute deviation of the data
[29.50, 16.50, 35.40, 21.30, 49.70, 24.60] in Ex
3.11.
 Solution
 x    0.00  13.00  5.90  8.20  20.20  4.90
 52.2
1
MAD   52.2   8.7
6
Measures of dispersion for ungrouped data

 Solution:
 Mean,  =
x i

177
 29.5
n 6
xi xi   ( xi   ) 2
xi 2
29.50 0.00 0.00 870.25
16.50 -13 169 272.25
35.40 5.90 34.81 1253.16
21.30 - 8.20 67.24 453.69
49.70 20.20 408.04 2470.09
24.60 - 4.90 24.01 605.16
0 703.1 5924.6
Measures of dispersion for grouped data
 Variance, N
1
 Population Variance =  
2

N

i 1
f i (mi   ) 2
2
f i m  f i mi 
2
 
i

N  N 

n
1
 Sample Variance = s2   i i
n  1 i 1
f ( m  X ) 2

1  n
1 n
 
2

   fi mi    f i mi  
2

n  1  i 1 n  i 1  
Measures of dispersion for grouped data
 Example:
Find the variance from the following frequency
distribution if it represent
a) population b) sample

Height (m) 20 – 22 23 – 25 26 – 28 29 – 31 32 – 34
Frequency 3 6 12 9 2
Measures of dispersion for grouped data
 Solution:

Height Midpoint, m Frequency, f fm f m2


20 – 22 3
23 – 25 6
26 – 28 12
29 – 31 9
32 – 34 2
Total:
Measures of dispersion for grouped data
 Solution (cont'd):
2 2
fi m  f i mi 
2
23805  867 
  2
 i
     9.83496
N  N  32  32 

1  n
1  n

2

s2    i i    f i mi  
2
f m
n  1  i 1 n  i 1  
1  1 2
  23805   807    10.15222
31  32 
Use of Standard Deviation
 Find the proportion or percentage of the total
observations that fall within a given interval
about the mean by using the following
theorem.
 Chebyshev’s Theorem
For any number k greater than 1, at least
(1-1/k2)100% of the data values lie within k
standard deviations of the mean.
Use of Standard Deviation
 Example 3.15:
 For a statistics class, the mean for the midterm
scores is 75 and the standard deviation is 8.
 Find the percentage of students who scored
between 59 and 91.
 Solution:
59  75 91  75
x  75, s  8, k L   2, k H  2
8 8
1
k  2, where 1  2  0.75
2
According to Chebyshev's Theorem, at least 75% of
the students scored between 59 and 91.
Use of Standard Deviation
 Empirical Rule
 For a bell-shaped distribution, approximately
 68% of the observations lie within one standard
deviation of the mean.
 95% of the observations lie within two standard
deviations of the mean.
 99.7% of the observations lie within three standard
deviations of the mean.
Use of Standard Deviation
 Example 3.16:
 Recalculate the percentage in Previous Example if
the midterm scores is following a bell-shaped
distribution.
 Solution:
 With k=2, by using empirical rule, approximately 95
% of the students scored between 59 & 91.
Coefficient of variation (V )
 Coefficient of variation (V ) expresses the
standard deviation as a percentage of what is
being measured, at least on the average.

s 
V   100% or V   100%
x 
 If V is smaller for one set of data than another,
there is relatively less variability in the first set of
data, and we say that it is “more consistent”.
Coefficient of variation (V )
 Example 3.17:
 During the past few months, one runner runs
averaged 12 miles per week with a standard
deviation of 2 miles, while another runner runs
averaged 25 miles per week with a standard
deviation of 3 miles.
 Which of the two runners is relatively more
consistent in his weekly running habits?
Coefficient of variation (V )
 Solution:

 Since VB is smaller than VA, the second runner is


relatively more consistent in his weekly running
habits.
3.4 Measures of position
 determine the position of a single value in
relation to other values in a sample or a
population data set.
 Quartiles
 Inter Quartile Range
 Percentiles
Quartiles
 Quartiles are 3 summary measures that divide
a ranked data set into 4 equal parts.
 second quartile (Q2) is the median of a data set.
 first quartile (Q1) is the value of the middle term
among the observations that are less than the
median.
 third quartile (Q3) is the value of the middle term
among the observations that are greater than the
median.
Quartiles
To Find The Quartiles of Ungrouped
Data

Q 1 Q 3

m m
Q 2
Sound difficult?

3.75>3.5=4
2.25<2.5=2
To Find The Quartiles of Ungrouped
Data (Cont’d)

Q 1 Q 3

m m
Q 2
Sound difficult?

Round up ALL to 0.5


Interquartile Range (IQR)
 Interquartile range is a measurement of
dispersion,
IQR  Q3  Q1

 The semi-interquartile range = The quartile


deviation =
(Q3  Q1 ) / 2
Percentiles
 The (approximate) value of the kth percentile,
denoted by Pk is
 kn 
value of the   th term in a ranked data set
 100 
 where k denotes the number of the percentile
and n represents the sample size.
 Note that round (kn)/100 to the nearest integer or .5
value, for example:
 2.2 → 2.0 ; 2.3 → 2.5;
2.7 → 2.5 ; 2.8 → 3.0.
Example 18:
The following are the scores of 12 students in a
mathematics class.
75 80 68 53 99 58 76 73
85 88 91 79
a)Find the values of the three quartiles. Where does
the score of 88 lie in relation to these quartiles?
b)Find the interquartile range.
c) Find the quartile deviation.
d)Find the value of the 62nd percentile.
Example 18: (cont'd)
 Solution:
Using Excel to find Quartiles
Using Excel to find Percentile
To Find The Quartiles of Grouped
Data
To Find The Quartiles of Grouped
Data
Example 3.19
 Compute the Q1 and Q3 for the following
distribution.

Weight (kg) 60 – 62 63 – 65 66 – 68 69 – 71 72 – 74

Frequency 3 4 5 6 2
Cumulative 3 7 12 18 20
frequency
Example 3.19 (Cont’d)
n 20
  5,
4 4
3  20 
Q1  62.5    3   64
4 4 
3n 3  20 
  15,
4 4
3
Q3  68.5   15  12   70
6
3.5 Skewness
 There are three shapes commonly observed:
symmetric, positively skewed and negatively
skewed.
 In a symmetric set of observations, the mean and
median are equal and the data values are evenly
spread around these values.
 A set of values is skewed to the right or positively
skewed if there is a single peak and the values
extend much further to the right of the peak than to
the left of the peak. The mean is larger than the
median.
 In a negatively skewed distribution, there is a single
peak but the observations extend further to the left.
The mean is smaller than the median.
Skewness
 To calculate skewness of a distribution or a set
of data, we can use Pearson’s coefficient of
skewness
3( Mean  Median)
sk 
 Note: s
 The value range from -3 to 3.
 If sk < 0, then the distribution is negatively skewed.
 If sk > 0, then the distribution is positively skewed.
 A value of 0 occurs when the distribution is
symmetrical and there is no skewness present.
Note:
1.The more negative it is, the distribution is
more negatively skewed.
2.The more positive it is, the distribution is
more positively skewed.
Skewness
Example 3.21
 Following are the earning per share for a sample of
15 software companies for the year 2000. The
earnings per share are arranged from smallest to
largest.
 0.09 0.13 0.41 0.51 1.12 1.20 1.49 3.18 3.50 6.36
7.83 8.92 10.13 12.99 16.40
 Compute the mean, median and standard deviation.
Find the coefficient of skewness using Pearson’s
estimate. What is your conclusion regarding the
shape of the distribution?
Example 3.21 (Solution)
Kurtosis
 Measures the relative concentration of values in the
center of the distribution, as compared with the tails.

 Measure of the degree


to which a distribution
is “peaked” or flat in
comparison to a
normal distribution.
Box-and-whisker plot
 a plot that shows the center, the spread, and
the skewness of a data set.
 Also helps to detect outliers.
Box-and-whisker plot
Procedures:
Step 1: Rank the data in increasing order and calculate the
first, second and third quartiles.
Step 2: Find IQR
Lower Inner Fence = Q1-1.5(IQR)
Upper Inner Fence =Q3+1.5(IQR)
Step 3: Determine the smallest and the largest values in the
given data set within the two inner fences.
Step 4: Draw a box with its left side at the position of Q1 and
the right side at the position of Q3 . Inside the box,
draw a vertical line at the position of Q2.
Box-and-whisker plot
 Step 5: Draw two lines called whiskers joining the box to
the points of the smallest and the largest values
within the two inner fences found in step 2. A
value that falls outside the two inner fences is
shown by marking an asterisk and is called an
outlier.
Example 3.22
 Construct a box-and-whisker plot for the
following data:
 23 17 32 60 22 52 29 38
42 92 27 46
 Solution:
Example 3.22
Example 3.22

20 40 60 80

Skewed to the right with 92 is the outlier.


Box plot

2 4 6 8 10 2 4 6 8 10

Symmetry Right Skewed

2 4 6 8 10

2 4 6 8 10

Left Skewed
Left Skewed and an outlier
Review
3.1 Measures of central tendency for ungroup data.

3.2 Measures of central tendency for group data.

3.3 Measures of dispersion.


3.3.1 Measures of dispersion for ungrouped data.
3.3.2 Measures of dispersion for grouped data.

3.4 Measures of position.

3.5 Skewness.
The End
Chapter 3

You might also like