Professional Documents
Culture Documents
Chap02
Chap02
Chapter 2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1
Chapter Goals
After completing this chapter, you should be able to:
Compute and interpret the mean, median, and mode for a
set of data
Find the range, variance, standard deviation, and
coefficient of variation and know what these values mean
Apply the empirical rule to describe the variation of
population values around the mean
Explain the weighted mean and when to use it
Explain how a least squares regression line estimates a
linear relationship between two variables
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-2
Chapter Topics
Measures of central tendency, variation, and
shape
Mean, median, mode, geometric mean
Quartiles
Range, interquartile range, variance and standard
deviation, coefficient of variation
Symmetric and skewed distributions
Population summary measures
Mean, variance, and standard deviation
The empirical rule and Bienaymé-Chebyshev rule
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-3
Chapter Topics
(continued)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-4
Describing Data Numerically
Describing Data Numerically
Mode Variance
Standard Deviation
Coefficient of Variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-5
2.1
Measures of Central Tendency
Overview
Central Tendency
x i
x i 1
n
Arithmetic Midpoint of Most frequently
average ranked values observed value
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-6
Arithmetic Mean
The arithmetic mean (mean) is the most
common measure of central tendency
For a population of N values:
N
x i
x1 x 2 x N Population
μ i1
values
N N
Population size
n
For a sample ofx size n:
i
x1 x 2 x n Observed
x i 1
values
n n
Sample size
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-7
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-8
Median
In an ordered list, the median is the “middle”
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-9
Finding the Median
n 1
Median position position in the ordered data
2
If the number of values is odd, the median is the middle number
If the number of values is even, the median is the average of
the two middle numbers
n 1
Note that 2 is not the value of the median, only the
position of the median in the ranked data
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-10
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-11
Review Example
Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-12
Review Example:
Summary Statistics
House Prices:
Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000
Median: middle value of ranked data
Sum 3,000,000
= $300,000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-13
Which measure of location
is the “best”?
Mean is generally used, unless extreme
values (outliers) exist . . .
Then median is often used, since the median
is not sensitive to extreme values.
Example: Median home prices may be reported for
a region – less sensitive to outliers
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-14
Shape of a Distribution
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-15
Geometric Mean
Geometric mean
Used to measure the rate of change of a variable
over time
1/n
x g (x 1 x 2 x n ) (x1 x 2 x n )
n
1/n
rg (x 1 x 2 ... x n ) 1
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-17
Example
(continued)
Same center,
different variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-19
Range
Simplest measure of variation
Difference between the largest and the smallest
observations:
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-20
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-21
Interquartile Range
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-22
Interquartile Range
Example:
X Median X
minimum Q1 (Q2) Q3 maximum
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-23
Percentiles
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-25
Quartile Formulas
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-27
Population Variance
Average of squared deviations of values from
the mean
N
Population variance:
2
(x μ)
i
2
σ i1
N
Where μ = population mean
N = population size
xi = ith value of the variable x
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-28
Sample Variance
Average (approximately) of squared deviations
of values from the mean
n
Sample variance:
2
(x i x) 2
s i 1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-29
Population Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
i
(x μ) 2
σ i1
N
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-30
Sample Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
i
(x x) 2
S i1
n -1
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-31
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-33
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.570
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-34
Advantages of Variance and
Standard Deviation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-35
Coefficient of Variation
s
CV 100%
x
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-36
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
s $5
CVA 100% 100% 10%
x $50 Both stocks
Stock B: have the same
standard
Average price last year = $100
deviation, but
Standard deviation = $5 stock B is less
variable relative
to its price
s $5
CVB 100% 100% 5%
x $100
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-37
Using Microsoft Excel
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-38
Using Excel
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-39
Using Excel
Enter input
range details
Click OK
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-40
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-41
Chebychev’s Theorem
2
100[1 (1/k )]%
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-42
Chebychev’s Theorem
(continued)
Regardless of how the data are distributed, at
least (1 - 1/k2) of the values will fall within k
standard deviations of the mean (for k > 1)
Examples:
At least within
(1 - 1/1.52) = 55.6% ……... k = 1.5 (μ ± 1.5σ)
(1 - 1/22) = 75% …........... k = 2 (μ ± 2σ)
(1 - 1/32) = 89% …….…... k = 3 (μ ± 3σ)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-43
The Empirical Rule
If the data distribution is bell-shaped, then
the interval:
μ 1σ
contains about 68% of the values in
the population or the sample
68%
μ
μ 1σ
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-44
The Empirical Rule
μ 2σcontains about 95% of the values in
the population or the sample
μ 3σcontains almost all (about 99.7%) of
the values in the population or the sample
95% 99.7%
μ 2σ μ 3σ
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-45
2.3
Weighted Mean
w x i i
w 1x1 w 2 x 2 w n x n
x i1
n n
Where wi is the weight of the ith observation
and n w i
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-46
Approximations for Grouped Data
Suppose data are grouped into K classes, with
frequencies f1, f2, . . . fK, and the midpoints of the
classes are m1, m2, . . ., mK
fm i i
K
where n fi
x i1
i 1
n
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-47
Approximations for Grouped Data
Suppose data are grouped into K classes, with
frequencies f1, f2, . . . fK, and the midpoints of the
classes are m1, m2, . . ., mK
i i
f (m x) 2
s2 i 1
n 1
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-48
2.4
The Sample Covariance
The covariance measures the strength of the linear relationship
between two variables
(x i x )(y i y )
Cov (x , y) xy i 1
N
The sample covariance:
n
(x x)(y i i y)
Cov (x , y) s xy i 1
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-49
Interpreting Covariance
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-50
Coefficient of Correlation
Measures the relative strength of the linear relationship
between two variables
Cov (x , y)
ρ
σXσY
Sample correlation coefficient:
Cov (x , y)
r
sX sY
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-51
Features of
Correlation Coefficient, r
Unit free
Ranges between –1 and 1
The closer to –1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker any positive linear
relationship
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-52
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-53
Using Excel to Find
the Correlation Coefficient
Select Data / Data Analysis
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-54
Using Excel to Find
the Correlation Coefficient
(continued)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-55
Interpreting the Result
Scatter Plot of Test Scores
r = .733 100
95
There is a relatively
Test #2 Score
90
85
strong positive linear 80
relationship between 75
test score #1 70
70 75 80 85 90 95 100
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-56
Chapter Summary
Described measures of central tendency
Mean, median, mode
Illustrated the shape of the distribution
Symmetric, skewed
Described measures of variation
Range, interquartile range, variance and standard deviation,
coefficient of variation
Discussed measures of grouped data
Calculated measures of relationships between
variables
covariance and correlation coefficient
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-57