Professional Documents
Culture Documents
Research Methods
William G. Zikmund
Chapter 23
Bivariate Analysis: Measures of
Associations
Measures of Association
• A general term that refers to a number of
bivariate statistical techniques used to
measure the strength of a relationship
between two variables.
Relationships Among Variables
• Correlation analysis
• Bivariate regression analysis
Type of Measure of
Measurement Association
Chi-Square
Nominal Phi Coefficient
Contingency Coefficient
Correlation Coefficient
• A statistical measure of the covariation or
association between two variables.
• Are dollar sales associated with advertising
dollar expenditures?
The Correlation coefficient for two
rxy
variables, X and Y is
.
Correlation Coefficient
• r
• r ranges from +1 to -1
• r = +1 a perfect positive linear relationship
• r = -1 a perfect negative linear relationship
• r = 0 indicates no correlation
Simple Correlation Coefficient
rxy ryx
X X Y Y
i i
Xi X Yi Y
2 2
Simple Correlation Coefficient
xy
rxy ryx
2 2
x y
Simple Correlation Coefficient
Alternative Method
2
=xVariance of X
2
yVariance of Y
=
= xyCovariance of X and Y
Y
Correlation Patterns
NO CORRELATION
X
.
Correlation Patterns
Y
PERFECT NEGATIVE
CORRELATION -
r= -1.0
X
.
Correlation Patterns
Y
X
.
Calculation of r
6.3389
r
17.837 5.589
6.3389
.635
99.712
Pg 629
Coefficient of Determination
2 Explained variance
r
Total Variance
Correlation Does Not Mean
Causation
• High correlation
• Rooster’s crow and the rising of the sun
– Rooster does not cause the sun to rise.
• Teachers’ salaries and the consumption of
liquor
– Covary because they are both influenced by a
third variable
Correlation Matrix
DICTIONARY GOING OR
DEFINITION MOVING
BACKWARD
120
110
100
90
80
70 80 90 100 110 120 130 140 150 160 170 180 190
.
Regression Line and Slope
Y
130
120
110
100
Yˆ aˆ ̂X
90 Yˆ
80
X
80 90 100 110 120 130 140 150 160 170 180 190
X
.
160
Y
Least-Squares
150 Regression Line
140
Actual Y for
Dealer 7
130
120
90
Actual Y for
80
Dealer 3
70 80 90 100 110 120 130 140 150 160 170 180 190
X
Scatter Diagram of Explained
Y
and Unexplained Variation
130
Deviation not explained
}
{}
120
Total deviation
110
Deviation explained by the regression
100
Y
90
80
80 90 100 110 120 130 140 150 160 170 180 190
X
.
The Least-Square Method
• Uses the criterion of attempting to make the
least amount of total error in prediction of Y
from X. More technically, the procedure
used in the least-squares method generates a
straight line that minimizes the sum of
squared deviations of the actual values from
this predicted regression line.
The Least-Square Method
• A relatively simple mathematical technique
that ensures that the straight line will most
closely represent the relationship between X
and Y.
Regression - Least-Square
Method
e
i 1
is 2
minimum
i
e=i Y- i Yˆi (The “residual”)
Y=i actual value of the dependent variable
Y=ˆi estimated value of the dependent variable (Y hat)
n = number of observations
i = number of the observation
The Logic behind the Least-
Squares Technique
• No straight line can completely represent
every dot in the scatter diagram
• There will be a discrepancy between most
of the actual scores (each dot) and the
predicted score
• Uses the criterion of attempting to make the
least amount of total error in prediction of Y
from X
Bivariate Regression
aˆ Y ̂X
Bivariate Regression
ˆ n XY X Y
n X X
2 2
=̂estimated slope of the line (the “regression coefficient”)
=âestimated intercept of the y axis
=Ydependent variable
=Ymean of the dependent variable
=Xindependent variable
=Xmean of the independent variable
= number of observations
n
ˆ 15193,345 2,806 ,875
15245,759 3,515,625
2,900 ,175 2,806 ,875
3,686 ,385 3,515,625
93,300
.54638
170 ,760
aˆ 99.8 .54638125
99.8 68.3
31.5
aˆ 99.8 .54638125
99.8 68.3
31.5
Yˆ 31.5 .546 X
31.5 .54689
31.5 48.6
80.1
Yˆ 31.5 .546 X
31.5 .54689
31.5 48.6
80.1
Dealer 7 (Actual Y value 129)
Yˆ7 31.5 .546165
121.6
Dealer 3 (Actual Y value 80)
Yˆ3 31.5 .54695
83.4
ˆ
ei Y9 Y9
97 96.5
0 .5
Dealer 7 (Actual Y value 129)
Yˆ7 31.5 .546165
121.6
Dealer 3 (Actual Y value 80)
Yˆ3 31.5 .54695
83.4
ˆ
ei Y9 Y9
97 96.5
0 .5
Y9 31.5 .546119
ˆ
F-Test (Regression)
Y Y Yˆ Y
i i Y Y
i
ˆ
i
Deviation
Deviation unexplained by
Total
= explained by the + the regression
deviation regression
(Residual
error)
Y
= Mean of the total group
Y=ˆ Value predicted with regression equation
Y=iActual value
Y Y Y
2 2
2
Yˆ Yi Yˆi
i i
Total Unexplained
Explained
variation = + variation
variation
explained (residual)
Sum of Squares
2 SSr SSe
r 1
SSt SSt
Source of Variation
• Explained by Regression
• Degrees of Freedom
– k-1 where k= number of estimated constants
(variables)
• Sum of Squares
– SSr
• Mean Squared
– SSr/k-1
Source of Variation
• Unexplained by Regression
• Degrees of Freedom
– n-k where n=number of observations
• Sum of Squares
– SSe
• Mean Squared
– SSe/n-k
r2 in the Example
2 3,398.49
r .875
3,882.4
Multiple Regression
• Extension of Bivariate Regression
• Multidimensional when three or more
variables are involved
• Simultaneously investigates the effect of
two or more variables on a single dependent
variable
• Discussed in Chapter 24
Correlation Coefficient, r = .75
Correlation: Player Salary and Ticket
Price
30
20 Change in Ticket
10 Price
0 Change in Player
-10 Salary
-20
1995 1996 1997 1998 1999 2000 2001