STATS Chapter 3 Part 2

Numerical
Descriptive
Statistics
Chapter 3 Part 2
Range, Variance, Standard
Deviation
 Mean, median and mode do not reveal the
whole picture of the distribution of a data set.
 Two data set with the same mean may have a
completely different spreads.
 Mean, median and mode is usually not by itself a
sufficient measure to reveal the shape of
distribution of a data set.
Deviation
Deviation
Team I
72 73 76 78
Team II
67 72 76 84
Deviation
 Three measures of dispersion:
 Range
 Variance
 Standard deviation
 Apply difference formula for ungrouped
and grouped data
Deviation
 Range
 Range = Max – Min
 Where
 Max= highest value
 Min=lowest value
Deviation
 Example 1
 What is the range for each team?
 Team I =6
 Team II=17
Deviation
 Standard Deviation & Variance
 Standard Deviation values tells how closely the values of a data
set clustered around the mean
 Lower value of standards deviation indicates that the data set
value are spread over relatively smaller range around the
mean.
 Larger value of data set indicates that the data set values are
spread over relatively larger around the mean (far from mean).
 Standard deviation is obtained the positive root of the variance.
Deviation
 Variance (ungrouped data)
 Population =
 Sample =
Deviation
 Standard Deviation (ungrouped data)
 Population =
 Sample =
Deviation
 Example 2
 The above table shows the number of films
production in the year 2006. Find the variance
and standard deviation.
Production Total production (unit)
Company
20th Century Fox 62
Warner Bros 93
Walt Disney Pictures 126
Paramount Pictures 75
Touchstone Pictures 34
Deviation
 Example 2 (Cont’)
 Solution
 Let x the total films produced by each
production house
x x2
62 3844
93 8649
126 15,876
75 5625
34 1156
∑x = ∑x2 = 35,150
390
Deviation
 Solution
 s2 =
 s2 =1182.50
 s = √ 1182.50 = 34.3875
Deviation
 The value of the variance and the standard
deviation are never negative.
 Usually the values of the variance and standard
deviation are positive.
 However, if there is no variation in the data set,
the value of variance and standard deviation are
both equal to 0.
Deviation
 Variance (grouped data)
 Population =
 Sample =
where is the midpoint and is frequency
Deviation
 Standard deviation (grouped data)
 Population =
 Sample =
Deviation
 Example 3
 Find variance and standard deviation for the
following data
Number of Orders f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Total ∑ 50
Deviation
 Solution
Number of f m fm m2 f.m2
Orders
10 – 12 4 11 44 121 484
13 – 15 12 14 168 196 2352
16 – 18 20 17 340 289 5780
19 – 21 14 20 280 400 5600
Total ∑ 50 832 14,216
Deviation
 Solution
=7.5820
Deviation
 Utilization of standard deviation
 Chebyshev’s Theorem
 According to Chebyshev’s Theorem, for any
number k greater than 1, at least of
the data values lie within k standard deviations of

the mean.
 Thus; for example if k = 2, then
= 0.75 or 75%
 Therefore, according to Chebyshev’s theorem, at
least 75% of the values of a data set lie within two
standard deviations of the mean.
Deviation
 Example 4
 The average systolic blood pressure for 4000 women
who were screened for high blood pressure was
found to be 187 with a standard deviation of 22.
Using Chebyshv’s theorem, find at least what
percentage of women in this group have a systolic
blood pressure between 143 and 231.
Deviation
 Solution
 Given the mean population, μ = 187 and the
standard deviation of population, σ= 22
 value of k is obtained by dividing the distance
between the mean and each point by the

standard deviation
← 143 - 187 = 44 → ← 231 - 187 = 44 →
14 π = 187 231
3
k = 44/22 = 2
Empirical Rule
 Empirical rule applies only to a specific type of
distribution called a bell-shaped distribution.
 For bell-shaped distribution, approximately;
 68% of the observations lie within one standard deviation of the
mean.
 95% of the observations lie within two standard deviations of the
mean
 99.7% of the observations lie within three standard deviations of
the mean.
Empirical Rule
 Figure below illustrates the empirical rule.
The empirical rule applies to population
data as well as to sample data.
99.7%
95%
68%
π
Empirical Rule
 Example 5
 The age distribution of a sample of 5000
persons is bell-shaped with a mean of 40
years and a standard deviation of 12 years.
Determine the approximate percentage of
people who are 16 to 64 years old.
Empirical Rule
 Solution
 Given
 Each point of 16 and 64 is 24 away from the mean
 40 – 16 = 24
 24/12 = 2 (within 2 standard deviation)
 64 – 40 = 24
 Value of K= 24/12 = 2 (within 2 standard deviation)
 the approximate percentage of people who are 16
to 64 years old is 95% of the sample.

Measures of Position
 A Measure of Position determines the position of

a single value in relation to other values in a
sample or a population data set.
 Quartiles, percentiles and percentile rank are
examples of measures of position
 Quartiles are three summary measures
that divide a ranked data set into four
equal parts.
 The 2nd quartiles – median of a data set
 The 1st quartiles – denoted as Q1
 The 3rd quartiles – denoted as Q3
Q1 Q2 Q3
Each of these portions contains 25% of the

observations of a data set arranged in
increasing order
 Position of Q1 = (n + 1)
4
 Position of Q2 = 2(n + 1)
4
 Position of Q3 = 3(n + 1)
4
 Interquartile range (IQR).

 The difference between the third and the first
quartiles for a data set
 IQR = Q3 – Q1
 Example 6
 Table below lists the total revenue for the 12 top
tourism company in Malaysia
Tourism Company Total Revenue (RM Million)
Indah Pura 98.0
Setia Wangsa 74.1
Berjaya Group 121.2
Kamunting Berhad 103.5
D & D Berhad 79.4
Langkawi Isle 89.3
Cameron de Tour 79.9
Vacation La Tour 80.2
Balade Group 76.4
Wilda Entertainment 109.7
Damar Jaya 86.8
Asian Makmur 82.1
 Find the values of the three quartiles.
 Where does the revenue of RM 103.5 million
fall in relation to these quartiles?
 Find the interquartile range.
 Solution
 Arrange the data in increasing order
74.1
76.4
79.4
79.9
80.2
82.1
86.8
89.3
98.0
103.5
109.7
121.2
 Solution
 Q1 is at the position of (n+1) = (12+1) = 3.25
4 4
 Q2 is at the position of 2(n+1) = 2(12+1) = 6.5
4 4
 Q3 is at the position of 3(n+1) = 3(12+1) = 9.75
4 4
 Solution
 Q1 = 79.4 + 79.9 = 79.65
2
 Q2 = 82.1 + 86.8 = 84.45 (also the median)
2
 Q3 = 98.0 + 103.5 = 100.75
2
 By looking at the position of RM 103.5 million,
we can state that this value lies in the top 25%

of the revenues.
 Solution
 Arrange the data in increasing order
74.1
76.4
79.4
79.9
80.2
82.1
86.8
89.3
98.0
103.5
109.7
121.2
 Solution
 IQR = Q3 – Q1
= 100.75 – 79.65
= RM 21.10 million
Lower and Upper Limits
 Lower Limit = Q1 – (1.5 x IQR)
 Upper Limit = Q3 + (1.5 x IQR)
 Observations that lie below or above the

upper limit are potential outliers.
Lower Limit Upper Limit
Potential outliers
Boxplots
Steps :
1) Determine the quartiles, potential outliers &
the adjacent values
2) Draw a horizontal axis on which the numbers
obtained in 1 & 2 can be located. Mark the
quartiles and the adjacent values with vertical
lines
4) Connect the quartiles to make a box and
connect the box to the adjacent values with
lines
5) Plot each potential outlier with an asterisk
Boxplots
 Example: TV-viewing times in hours
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 38 38 41 43 66
Q1 = 23, Q2 = 30.5, Q3 = 36.5
Adjacent values
Q1 Q2 Q3
Potential
values
*
0 10 20 30 40 50 60 70
TV- viewing times (hrs)

STATS Chapter 3 Part 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STATS Chapter 3 Part 2

Uploaded by

Copyright:

Available Formats

Numerical

the data values lie within k standard deviations of

between the mean and each point by the

 24/12 = 2 (within 2 standard deviation)

 Value of K= 24/12 = 2 (within 2 standard deviation)

 the approximate percentage of people who are 16

to 64 years old is 95% of the sample.

 A Measure of Position determines the position of

Each of these portions contains 25% of the

 Interquartile range (IQR).

we can state that this value lies in the top 25%

 Observations that lie below or above the

You might also like