You are on page 1of 41

Numerical

Descriptive
Statistics
Chapter 3 Part 2
Range, Variance, Standard
Deviation
 Mean, median and mode do not reveal the
whole picture of the distribution of a data set.
 Two data set with the same mean may have a
completely different spreads.
 Mean, median and mode is usually not by itself a
sufficient measure to reveal the shape of
distribution of a data set.
Range, Variance, Standard
Deviation
Range, Variance, Standard
Deviation
Team I

72 73 76 78

Team II

67 72 76 84
Range, Variance, Standard
Deviation
 Three measures of dispersion:
 Range
 Variance
 Standard deviation
 Apply difference formula for ungrouped
and grouped data
Range, Variance, Standard
Deviation
 Range
 Range = Max – Min
 Where
 Max= highest value
 Min=lowest value
Range, Variance, Standard
Deviation
 Example 1
 What is the range for each team?
 Team I =6
 Team II=17
Range, Variance, Standard
Deviation
 Standard Deviation & Variance
 Standard Deviation values tells how closely the values of a data
set clustered around the mean
 Lower value of standards deviation indicates that the data set
value are spread over relatively smaller range around the
mean.
 Larger value of data set indicates that the data set values are
spread over relatively larger around the mean (far from mean).
 Standard deviation is obtained the positive root of the variance.
Range, Variance, Standard
Deviation
 Variance (ungrouped data)

 Population =

 Sample =
Range, Variance, Standard
Deviation
 Standard Deviation (ungrouped data)

 Population =

 Sample =
Range, Variance, Standard
Deviation
 Example 2
 The above table shows the number of films
production in the year 2006. Find the variance
and standard deviation.
Production Total production (unit)
Company
20th Century Fox 62
Warner Bros 93
Walt Disney Pictures 126
Paramount Pictures 75
Touchstone Pictures 34
Range, Variance, Standard
Deviation
 Example 2 (Cont’)
 Solution
 Let x the total films produced by each
production house
x x2
62 3844
93 8649
126 15,876
75 5625
34 1156
∑x = ∑x2 = 35,150
390
Range, Variance, Standard
Deviation
 Example 2 (Cont’)
 Solution

 s2 =

 s2 =1182.50
 s = √ 1182.50 = 34.3875
Range, Variance, Standard
Deviation
 The value of the variance and the standard
deviation are never negative.
 Usually the values of the variance and standard
deviation are positive.
 However, if there is no variation in the data set,
the value of variance and standard deviation are
both equal to 0.
Range, Variance, Standard
Deviation
 Variance (grouped data)

 Population =

 Sample =
where is the midpoint and is frequency
Range, Variance, Standard
Deviation
 Standard deviation (grouped data)

 Population =

 Sample =
Range, Variance, Standard
Deviation
 Example 3
 Find variance and standard deviation for the
following data
Number of Orders f

10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Total ∑ 50
Range, Variance, Standard
Deviation
 Example 3 (Cont’)
 Solution

Number of f m fm m2 f.m2
Orders
10 – 12 4 11 44 121 484
13 – 15 12 14 168 196 2352
16 – 18 20 17 340 289 5780
19 – 21 14 20 280 400 5600
Total ∑ 50 832 14,216
Range, Variance, Standard
Deviation
 Example 3 (Cont’)
 Solution

=7.5820
Range, Variance, Standard
Deviation
 Utilization of standard deviation
 Chebyshev’s Theorem
 According to Chebyshev’s Theorem, for any
number k greater than 1, at least of

the data values lie within k standard deviations of


the mean.
 Thus; for example if k = 2, then

= 0.75 or 75%
 Therefore, according to Chebyshev’s theorem, at
least 75% of the values of a data set lie within two
standard deviations of the mean.
Range, Variance, Standard
Deviation
 Example 4
 The average systolic blood pressure for 4000 women
who were screened for high blood pressure was
found to be 187 with a standard deviation of 22.
Using Chebyshv’s theorem, find at least what
percentage of women in this group have a systolic
blood pressure between 143 and 231.
Range, Variance, Standard
Deviation
 Example 4 (Cont’)
 Solution
 Given the mean population, μ = 187 and the
standard deviation of population, σ= 22
 value of k is obtained by dividing the distance

between the mean and each point by the


standard deviation
← 143 - 187 = 44 → ← 231 - 187 = 44 →
14 π = 187 231
3

k = 44/22 = 2
Empirical Rule
 Empirical rule applies only to a specific type of
distribution called a bell-shaped distribution.
 For bell-shaped distribution, approximately;
 68% of the observations lie within one standard deviation of the
mean.
 95% of the observations lie within two standard deviations of the
mean
 99.7% of the observations lie within three standard deviations of
the mean.
Empirical Rule
 Figure below illustrates the empirical rule.
The empirical rule applies to population
data as well as to sample data.
99.7%
95%

68%

π
Empirical Rule
 Example 5
 The age distribution of a sample of 5000
persons is bell-shaped with a mean of 40
years and a standard deviation of 12 years.
Determine the approximate percentage of
people who are 16 to 64 years old.
Empirical Rule
 Example 5 (Cont’)
 Solution
 Given
 Each point of 16 and 64 is 24 away from the mean

 40 – 16 = 24

 24/12 = 2 (within 2 standard deviation)

 64 – 40 = 24

 Value of K= 24/12 = 2 (within 2 standard deviation)

 the approximate percentage of people who are 16

to 64 years old is 95% of the sample.


Measures of Position

 A Measure of Position determines the position of


a single value in relation to other values in a
sample or a population data set.
 Quartiles, percentiles and percentile rank are
examples of measures of position
Measures of Position
 Quartiles are three summary measures
that divide a ranked data set into four
equal parts.
 The 2nd quartiles – median of a data set
 The 1st quartiles – denoted as Q1
 The 3rd quartiles – denoted as Q3
Measures of Position

Q1 Q2 Q3

Each of these portions contains 25% of the


observations of a data set arranged in
increasing order
Measures of Position
 Position of Q1 = (n + 1)
4
 Position of Q2 = 2(n + 1)
4
 Position of Q3 = 3(n + 1)
4
Measures of Position

 Interquartile range (IQR).


 The difference between the third and the first
quartiles for a data set
 IQR = Q3 – Q1
Measures of Position
 Example 6
 Table below lists the total revenue for the 12 top
tourism company in Malaysia
Tourism Company Total Revenue (RM Million)
Indah Pura 98.0
Setia Wangsa 74.1
Berjaya Group 121.2
Kamunting Berhad 103.5
D & D Berhad 79.4
Langkawi Isle 89.3
Cameron de Tour 79.9
Vacation La Tour 80.2
Balade Group 76.4
Wilda Entertainment 109.7
Damar Jaya 86.8
Asian Makmur 82.1
Measures of Position

 Example 6 (Cont’)
 Find the values of the three quartiles.
 Where does the revenue of RM 103.5 million
fall in relation to these quartiles?
 Find the interquartile range.
Measures of Position
 Example 6 (Cont’)
 Solution
 Arrange the data in increasing order

74.1
76.4
79.4
79.9
80.2
82.1
86.8
89.3
98.0
103.5
109.7
121.2
Measures of Position
 Example 6 (Cont’)
 Solution
 Q1 is at the position of (n+1) = (12+1) = 3.25
4 4
 Q2 is at the position of 2(n+1) = 2(12+1) = 6.5
4 4
 Q3 is at the position of 3(n+1) = 3(12+1) = 9.75
4 4
Measures of Position
 Example 6 (Cont’)
 Solution
 Q1 = 79.4 + 79.9 = 79.65
2
 Q2 = 82.1 + 86.8 = 84.45 (also the median)

2
 Q3 = 98.0 + 103.5 = 100.75

2
 By looking at the position of RM 103.5 million,

we can state that this value lies in the top 25%


of the revenues.
Measures of Position
 Example 6 (Cont’)
 Solution
 Arrange the data in increasing order

74.1
76.4
79.4
79.9
80.2
82.1
86.8
89.3
98.0
103.5
109.7
121.2
Measures of Position
 Example 6 (Cont’)
 Solution
 IQR = Q3 – Q1
= 100.75 – 79.65
= RM 21.10 million
Lower and Upper Limits
 Lower Limit = Q1 – (1.5 x IQR)
 Upper Limit = Q3 + (1.5 x IQR)

 Observations that lie below or above the


upper limit are potential outliers.
Lower Limit Upper Limit

Potential outliers
Boxplots
Steps :
1) Determine the quartiles, potential outliers &
the adjacent values
2) Draw a horizontal axis on which the numbers
obtained in 1 & 2 can be located. Mark the
quartiles and the adjacent values with vertical
lines
4) Connect the quartiles to make a box and
connect the box to the adjacent values with
lines
5) Plot each potential outlier with an asterisk
Boxplots
 Example: TV-viewing times in hours
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 38 38 41 43 66
Q1 = 23, Q2 = 30.5, Q3 = 36.5
Adjacent values
Q1 Q2 Q3
Potential
values
*

0 10 20 30 40 50 60 70
TV- viewing times (hrs)

You might also like