You are on page 1of 24

Business statistics

Session 17
Simple correlation and regression
Learning Objectives
 Understand the concept of Correlation and regression
 Compute correlation coefficient
 Test the significance of correlation coefficient
 Compute the equation of a simple regression line from
a sample of data, and interpret the slope and intercept
of the equation.
 Understand the usefulness of residual analysis in
testing the assumptions underlying regression
analysis and in examining the fit of the regression line
to the data.
 Compute a standard error of the estimate and interpret
its meaning.
Regression and Correlation
 Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another
variable.
 Correlation is a measure of the degree of
relatedness of two variables.
Pearson Product-Moment
Correlation Coefficient

SSXY
r
 SSX   SSY 


 X  X Y  Y 
  X  X    Y Y 
2 2


 XY 
  X   Y 
n 1 r  1

 2   X
2

  Y 2 
Y  2


 X 
n  n 
  
Degrees of Correlation
 Correlation is a measure of the degree of
relatedness of variables
 Coefficient of Correlation (r) - applicable
only if both variables being analyzed have
at least an interval level of data
Degrees of Correlation
 The term (r) is a measure of the linear
correlation of two variables
 The number ranges from -1 to 0 to +1
 Closer to +1, the higher the correlation
between the dependent and the independent
variables
 See the formula for Pearson Product Moment
correlation coefficient –
Correlation Coefficient
Covariance
COV(X,Y)=∑(Xi-X)*(Yi-Y)/N-1

Pearson product moment correlation


coefficient
rxy=Cov xy/Sx*Sy
Correlation (contd.)
 Population correlation (p) - If the database
includes an entire population
 Sample correlation (r) - If measure is
based on a sample
Three Degrees of Correlation

r<0 r>0

r=0
Computation of r for
the Economics Example
Futures
Interest Index
Day X Y X2 Y2 XY
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
Computation of r
Economics Example

  X   Y 
 XY 
n
r

 2  X  2

  Y  2  Y
2


 X  n  n 
  
 92.93  2725
 21,115.07 
 12

 720.22 

92 .93
2

 
  619,207  2725
 2


 12  12 
  
.815
Testing the Significance of the
Correlation Coefficient
 Null hypothesis: Ho : p = 0
 Alternative hypothesis: Ha : p ≠ 0
 Test statistic

Example: n = 6 and r = .70


t=.70√6-2/1-.702
= 1.96

At  = .05 , n-2 = 4 degrees of freedom,


Critical value of t = 2.78
Since 1.96<2.78, we fail to reject the null hypothesis.
Simple Regression Analysis
 Bivariate (two variables) linear regression
-- the most elementary regression model
 dependent variable, the variable to be
predicted, usually called Y
 independent variable, the predictor or
explanatory variable, usually called X
 Nonlinear relationships and regression
models with more than one independent
variable can be explored by using multiple
regression models
Regression Models
 Deterministic Regression Model
Y =  0 +  1X
 Probabilistic Regression Model
Y =  0 +  1X + 
 0 and 1 are population parameters
 0 and 1 are estimated by sample
statistics b0 and b1
Equation of the Simple
Regression Line

Yˆ  b0  b1 X
where : b
0
= the sample intercept

b = the sample slope


1

Yˆ = the predicted value of Y


Least Squares Analysis
 Least squares analysis is a process
whereby a regression model is developed
by producing the minimum sum of the
squared error values
 The vertical distance from each point to
the line is the error of the prediction.
 The least squares regression line is the
regression line that results in the smallest
sum of errors squared.
Least Squares Analysis

  X   Y 
 X  X Y  Y   XY  nXY  XY  n
b1   
 X  X   X n X
2 2 2
X
2

X 2

n

Y X
b 0
 Y  b1 X 
n
 b1
n
Solving for b1 and b0 of the Regression
Line: Airline Cost Example

Number of
Passengers Cost ($1,000)
X Y X2 XY

61 4.28 3,721 261.08


63 4.08 3,969 257.04
67 4.42 4,489 296.14
69 4.17 4,761 287.73
70 4.48 4,900 313.60
74 4.30 5,476 318.20
76 4.82 5,776 366.32
81 4.70 6,561 380.70
86 5.11 7,396 439.46
91 5.13 8,281 466.83
95 5.64 9,025 535.80
97 5.56 9,409 539.32

X = 930 Y = 56.69 X 2
= 73,764  XY = 4,462.22
Solving for b1 and b0 of the Regression
Line: Airline Cost Example

SS   XY 
 X  Y
 4 , 462 . 22 
( 930 )( 56 . 69 )
 68 . 745
XY
n 12

SS   X 2

(  X ) 2

 73 , 764 
( 930 ) 2
 1689
XX
n 12

SS 68 . 745
b 1  XY
  . 0407
SS XX 1689

b 
 Y
 b
 X

56 . 69
 (. 0407 )
930
 1 . 57
0 1
n n 12 12

Y ˆ  1 . 57  . 0407 X
Residual Analysis: Airline Cost Example

Number of Predicted
Passengers Cost ($1,000) Value Residual
X Y Yˆ Y  Yˆ

61 4.28 4.053 .227


63 4.08 4.134 -.054
67 4.42 4.297 .123
69 4.17 4.378 -.208
70 4.48 4.419 .061
74 4.30 4.582 -.282
76 4.82 4.663 .157
81 4.70 4.867 -.167
86 5.11 5.070 .040
91 5.13 5.274 -.144
95 5.64 5.436 .204
97 5.56 5.518 .042

 (Y  Yˆ )  .001
Standard Error of the Estimate
 Residuals represent errors of estimation for
individual points.
 A more useful measurement of error is the
standard error of the estimate
 The standard error of the estimate, denoted
se,is a standard deviation of the error of the
regression model
Standard Error of the Estimate

 
Sum of Squares Error 2

SSE   Y Y
  Y  b0  Y  b1  XY
2
Standard Error
of the
Estimate SSE
Se  n  2
Standard Error of the Estimate for
the Airline Cost Example

SSE  
 Y Yˆ 2

 0.31434
SSE
S e

n2
0.31434

10
 0.1773
Thank you

You might also like