Professional Documents
Culture Documents
Business Statistics Session 17: Simple Correlation and Regression
Business Statistics Session 17: Simple Correlation and Regression
Session 17
Simple correlation and regression
Learning Objectives
Understand the concept of Correlation and regression
Compute correlation coefficient
Test the significance of correlation coefficient
Compute the equation of a simple regression line from
a sample of data, and interpret the slope and intercept
of the equation.
Understand the usefulness of residual analysis in
testing the assumptions underlying regression
analysis and in examining the fit of the regression line
to the data.
Compute a standard error of the estimate and interpret
its meaning.
Regression and Correlation
Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another
variable.
Correlation is a measure of the degree of
relatedness of two variables.
Pearson Product-Moment
Correlation Coefficient
SSXY
r
SSX SSY
X X Y Y
X X Y Y
2 2
XY
X Y
n 1 r 1
2 X
2
Y 2
Y 2
X
n n
Degrees of Correlation
Correlation is a measure of the degree of
relatedness of variables
Coefficient of Correlation (r) - applicable
only if both variables being analyzed have
at least an interval level of data
Degrees of Correlation
The term (r) is a measure of the linear
correlation of two variables
The number ranges from -1 to 0 to +1
Closer to +1, the higher the correlation
between the dependent and the independent
variables
See the formula for Pearson Product Moment
correlation coefficient –
Correlation Coefficient
Covariance
COV(X,Y)=∑(Xi-X)*(Yi-Y)/N-1
r<0 r>0
r=0
Computation of r for
the Economics Example
Futures
Interest Index
Day X Y X2 Y2 XY
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
Computation of r
Economics Example
X Y
XY
n
r
2 X 2
Y 2 Y
2
X n n
92.93 2725
21,115.07
12
720.22
92 .93
2
619,207 2725
2
12 12
.815
Testing the Significance of the
Correlation Coefficient
Null hypothesis: Ho : p = 0
Alternative hypothesis: Ha : p ≠ 0
Test statistic
Yˆ b0 b1 X
where : b
0
= the sample intercept
X Y
X X Y Y XY nXY XY n
b1
X X X n X
2 2 2
X
2
X 2
n
Y X
b 0
Y b1 X
n
b1
n
Solving for b1 and b0 of the Regression
Line: Airline Cost Example
Number of
Passengers Cost ($1,000)
X Y X2 XY
X = 930 Y = 56.69 X 2
= 73,764 XY = 4,462.22
Solving for b1 and b0 of the Regression
Line: Airline Cost Example
SS XY
X Y
4 , 462 . 22
( 930 )( 56 . 69 )
68 . 745
XY
n 12
SS X 2
( X ) 2
73 , 764
( 930 ) 2
1689
XX
n 12
SS 68 . 745
b 1 XY
. 0407
SS XX 1689
b
Y
b
X
56 . 69
(. 0407 )
930
1 . 57
0 1
n n 12 12
Y ˆ 1 . 57 . 0407 X
Residual Analysis: Airline Cost Example
Number of Predicted
Passengers Cost ($1,000) Value Residual
X Y Yˆ Y Yˆ
(Y Yˆ ) .001
Standard Error of the Estimate
Residuals represent errors of estimation for
individual points.
A more useful measurement of error is the
standard error of the estimate
The standard error of the estimate, denoted
se,is a standard deviation of the error of the
regression model
Standard Error of the Estimate
Sum of Squares Error 2
SSE Y Y
Y b0 Y b1 XY
2
Standard Error
of the
Estimate SSE
Se n 2
Standard Error of the Estimate for
the Airline Cost Example
SSE
Y Yˆ 2
0.31434
SSE
S e
n2
0.31434
10
0.1773
Thank you