Professional Documents
Culture Documents
Descriptive
Statistics
Chapter 3 Part 2
Range, Variance, Standard
Deviation
Mean, median and mode do not reveal the
whole picture of the distribution of a data set.
Two data set with the same mean may have a
completely different spreads.
Mean, median and mode is usually not by itself a
sufficient measure to reveal the shape of
distribution of a data set.
Range, Variance, Standard
Deviation
Range, Variance, Standard
Deviation
Team I
72 73 76 78
Team II
67 72 76 84
Range, Variance, Standard
Deviation
Three measures of dispersion:
Range
Variance
Standard deviation
Apply difference formula for ungrouped
and grouped data
Range, Variance, Standard
Deviation
Range
Range = Max – Min
Where
Max= highest value
Min=lowest value
Range, Variance, Standard
Deviation
Example 1
What is the range for each team?
Team I =6
Team II=17
Range, Variance, Standard
Deviation
Standard Deviation & Variance
Standard Deviation values tells how closely the values of a data
set clustered around the mean
Lower value of standards deviation indicates that the data set
value are spread over relatively smaller range around the
mean.
Larger value of data set indicates that the data set values are
spread over relatively larger around the mean (far from mean).
Standard deviation is obtained the positive root of the variance.
Range, Variance, Standard
Deviation
Variance (ungrouped data)
Population =
Sample =
Range, Variance, Standard
Deviation
Standard Deviation (ungrouped data)
Population =
Sample =
Range, Variance, Standard
Deviation
Example 2
The above table shows the number of films
production in the year 2006. Find the variance
and standard deviation.
Production Total production (unit)
Company
20th Century Fox 62
Warner Bros 93
Walt Disney Pictures 126
Paramount Pictures 75
Touchstone Pictures 34
Range, Variance, Standard
Deviation
Example 2 (Cont’)
Solution
Let x the total films produced by each
production house
x x2
62 3844
93 8649
126 15,876
75 5625
34 1156
∑x = ∑x2 = 35,150
390
Range, Variance, Standard
Deviation
Example 2 (Cont’)
Solution
s2 =
s2 =1182.50
s = √ 1182.50 = 34.3875
Range, Variance, Standard
Deviation
The value of the variance and the standard
deviation are never negative.
Usually the values of the variance and standard
deviation are positive.
However, if there is no variation in the data set,
the value of variance and standard deviation are
both equal to 0.
Range, Variance, Standard
Deviation
Variance (grouped data)
Population =
Sample =
where is the midpoint and is frequency
Range, Variance, Standard
Deviation
Standard deviation (grouped data)
Population =
Sample =
Range, Variance, Standard
Deviation
Example 3
Find variance and standard deviation for the
following data
Number of Orders f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Total ∑ 50
Range, Variance, Standard
Deviation
Example 3 (Cont’)
Solution
Number of f m fm m2 f.m2
Orders
10 – 12 4 11 44 121 484
13 – 15 12 14 168 196 2352
16 – 18 20 17 340 289 5780
19 – 21 14 20 280 400 5600
Total ∑ 50 832 14,216
Range, Variance, Standard
Deviation
Example 3 (Cont’)
Solution
=7.5820
Range, Variance, Standard
Deviation
Utilization of standard deviation
Chebyshev’s Theorem
According to Chebyshev’s Theorem, for any
number k greater than 1, at least of
= 0.75 or 75%
Therefore, according to Chebyshev’s theorem, at
least 75% of the values of a data set lie within two
standard deviations of the mean.
Range, Variance, Standard
Deviation
Example 4
The average systolic blood pressure for 4000 women
who were screened for high blood pressure was
found to be 187 with a standard deviation of 22.
Using Chebyshv’s theorem, find at least what
percentage of women in this group have a systolic
blood pressure between 143 and 231.
Range, Variance, Standard
Deviation
Example 4 (Cont’)
Solution
Given the mean population, μ = 187 and the
standard deviation of population, σ= 22
value of k is obtained by dividing the distance
k = 44/22 = 2
Empirical Rule
Empirical rule applies only to a specific type of
distribution called a bell-shaped distribution.
For bell-shaped distribution, approximately;
68% of the observations lie within one standard deviation of the
mean.
95% of the observations lie within two standard deviations of the
mean
99.7% of the observations lie within three standard deviations of
the mean.
Empirical Rule
Figure below illustrates the empirical rule.
The empirical rule applies to population
data as well as to sample data.
99.7%
95%
68%
π
Empirical Rule
Example 5
The age distribution of a sample of 5000
persons is bell-shaped with a mean of 40
years and a standard deviation of 12 years.
Determine the approximate percentage of
people who are 16 to 64 years old.
Empirical Rule
Example 5 (Cont’)
Solution
Given
Each point of 16 and 64 is 24 away from the mean
40 – 16 = 24
64 – 40 = 24
Q1 Q2 Q3
Example 6 (Cont’)
Find the values of the three quartiles.
Where does the revenue of RM 103.5 million
fall in relation to these quartiles?
Find the interquartile range.
Measures of Position
Example 6 (Cont’)
Solution
Arrange the data in increasing order
74.1
76.4
79.4
79.9
80.2
82.1
86.8
89.3
98.0
103.5
109.7
121.2
Measures of Position
Example 6 (Cont’)
Solution
Q1 is at the position of (n+1) = (12+1) = 3.25
4 4
Q2 is at the position of 2(n+1) = 2(12+1) = 6.5
4 4
Q3 is at the position of 3(n+1) = 3(12+1) = 9.75
4 4
Measures of Position
Example 6 (Cont’)
Solution
Q1 = 79.4 + 79.9 = 79.65
2
Q2 = 82.1 + 86.8 = 84.45 (also the median)
2
Q3 = 98.0 + 103.5 = 100.75
2
By looking at the position of RM 103.5 million,
74.1
76.4
79.4
79.9
80.2
82.1
86.8
89.3
98.0
103.5
109.7
121.2
Measures of Position
Example 6 (Cont’)
Solution
IQR = Q3 – Q1
= 100.75 – 79.65
= RM 21.10 million
Lower and Upper Limits
Lower Limit = Q1 – (1.5 x IQR)
Upper Limit = Q3 + (1.5 x IQR)
Potential outliers
Boxplots
Steps :
1) Determine the quartiles, potential outliers &
the adjacent values
2) Draw a horizontal axis on which the numbers
obtained in 1 & 2 can be located. Mark the
quartiles and the adjacent values with vertical
lines
4) Connect the quartiles to make a box and
connect the box to the adjacent values with
lines
5) Plot each potential outlier with an asterisk
Boxplots
Example: TV-viewing times in hours
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 38 38 41 43 66
Q1 = 23, Q2 = 30.5, Q3 = 36.5
Adjacent values
Q1 Q2 Q3
Potential
values
*
0 10 20 30 40 50 60 70
TV- viewing times (hrs)