You are on page 1of 22

Correlation and Regression

Analysis
Recall : Covariance

L (xi -X)(yi -Y)


cov(x,y)= -i_l_ _ _ __
n-1
Interpreting Covariance

cov(X,Y) > 0 ~ nd Y are positively correlated

cov(X,Y) < 0 '?ttlnd Y are inversely correl ated

cov(X,Y) = 0 X-and Y are independent


Correlation coefficient

■ Pearson's Correlation Coefficient is


standardized covariance (unitless):

cov arianc£(x, y)
r = -----;::===----;:::===--
.Jvar x.Jvar y
Correlat ion

• Measures the relative strength of the linear


relationship between two variables
• Unit-less

• Ranges between - 1 and 1

• The closer to - 1, the stronger the negative linear relationship

• The closer to 1, the stronger the positive linear relationship

• The closer to 0, the weaker any positive linear relationship


Scatter Plots of Data with Various
Correlation Coefficients
y Ye y


•••••• •••
X • • • • X X
Ir= -1 I Ir= I Ir= oI
-.6

y Yi
• ••• • y

• • •• •••• • •
Ir= +1 I
X • •••••
Ir= I Ir= o I
+.3
X X
•Slide from : Statlstlcs fo, Managers Using Mletosoh• Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
ILinear relationships ICurvilinear relationships
y • • y

y y

• X X
■S lid e from: Statistics for Managers Using Microsoft• Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
IStrong relationships IWeak relationships
y
y ,.., ··· .
..---. ...
.-···· • • •
......
_

.-···/
....../
X

y ...... y
..
.. ·• .......
• ..

. ...
X X
■Slide from: Statistics for Managers Using Microsoft• Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
INo relationship
y •••••

•••• •• ••
•• ••• X

X
■S lid e from: Statistics for Managers Using Microsoft• Excel 4th Edition, 2004 Prentice-Hall
Calculating by hand ...

JI

L (X; - x)(Y; - y )
;- 1
~ covariance(x, y ) n -1
r= -
✓var x .Jvar y JI JI
2
L (x; - x)2 L (Y; - y)
1 i=I 1 i=I
n-1 n-1
Simpler ca lculation formula ...

Numerator of
covariance
"
~) x, - x)(y, - ji)

r =
n
1=1

~
n ~
I
ss.ry
~) x, - x) 2 L (Y, - ji) 2 r =
1 1=1
1 1=1 JSfxS~>'
~ ---,r-:j"
"
\ /
L (x, -x)(y, - ji)
SSX),
Numerators of
1=1 variance

" Jss,ss)'
1 ~ (x, - x)2 ,
L" (Y, - .Y)2
l=I
Distribution of the correlation coefficient:

SE(r) = l-r 2
, n-2

The sample correlation coefficient follows a T-distribution with n-


2 degrees of freedom {since you have to estimate the standard
error).

• note, like a proportion, the variance of the correlation coefficient depends on


the correlation coefficient itself➔ substitute in estimated r
Regression Analysis

• A set of statistical methods used to estimate the relationships


between a dependent variable and one or more independent
variables.
• It can be used to assess the strength of the relationship between
variables and for modeling the future relationship between them.
Variations of Regression Analysis:

• Simple Linear Regression


• Multiple Linear Regression
• Nonlinear Regression
Simple Linear Regression
Simple linear regression is a model that assesses the relationship
between a dependent variable and an independent variable. The
simple linear model is expressed using the following equation:
Y =a+ bX + e
Where:
• Y - Dependent variable
• X - Independent (explanatory) variable
• a - Intercept
• b -Slope
• e - Residual (error)
Fundamenta l Assumptions of Simple Li near
Regression Analysis
• The dependent and independent variables show a linear relationship
between the slope and the intercept.
• The independent variable is not random.
• The value of the residual (error) is zero.
• The value of the residual (error) is constant across all observations.
• The value of the residual (error) is not correlated across all
observations.
• The residual (error) values follow the normal distribution.
M ultip le Lin ear Regression
Multiple linear regression analysis is essentially similar to the simple linear
model, with the exception that multiple independent variables are used in
the model. The mathematical representation of multiple linear regression is:
V =a+ bX1 + cX2 + dX3 + E

Where:
• V - Dependent variable
• X1, X2, X3 - Independent (explanatory) variables
• a - Intercept
• b, c, d - Slopes
• e - Residual (error)
Fundamental Assumptions of Multiple Linear
Regression Analysis
• The dependent and independent variables show a linear relationship
between the slope and the intercept.
• The independent variable is not random.
• The value of the residual (error) is zero.
• The value of the residual (error) is constant across all observations.
• The value of the residual (error) is not correlated across all observations.
• The residual (error) values follow the normal distribution.
• Non-collinearity: Independent variables should show a minimum of
correlation with each other. If the independent variables are highly
correlated with each other, it will be difficult to assess the true
relationships between the dependent and independent variables.
Nonlinear Regression

• Nonlinear regression analysis is commonly used for more complicated


data sets in which the dependent and independent variables show a
nonlinear relationship.
Examp les of App li cations of Regression
Analysis in Fin ance
• Beta and CAPM
....
1nno1t
1N2018
-·..,.
,..,.
lltffUH'I

.........
Oat•
ll2'l018
t/912018
..... .....
••k•
.,,,
-.~um

...
--
16.61 lf16/'10l8 m, ~
1t,6,l:X,18
,..,, ,.,,.
1/2312(118
1/lOr.:018
,,.,,.,.
1'.17

,.,,
1'.02
...,..
t/2ll'2018

'""""'
,..,,
,..., ......
.....
.... .,.... ...
2161'2018
,..., ,,,..
VU/2018
,,,.,,.,.
U2'1/2018
,....,
16.97
,.,..
......
,,,,,,
,,,.,,..,
l/27121)18 .,..
L7'6
-....
,

Beta Chart
, ..
.... ....
Examp les of App li cations of Regression
Analysis in Fin ance
• Forecasting Revenues and Expenses

Data Rad101d1 Re"Venue


Jan 21 SS.3500
Relationship between ads and revenue
Feb 180 $22.7560
S30JXO
Mar ~ $13,◄560
~r 195 S21,1000 S25.000
May 96 $15,0000 520,000

Joo
JIA
44
171
$12.5000
S20.7000 I 515.000
Aug 135 S19.7220 • Sl0,000
Sep 120 $16,1150
S5,000
0<:t 7S $13,1000
Nov
Dec
106
198
$15,670.0
S25.3000
"' • 100 JS,, ,..
Nutnbt:f of radio acts
To~ls 1.391 $203,767.0
Average 116 $16,980.6

You might also like