Professional Documents
Culture Documents
Master in Finance
Class #5
Jorge Caiado, PhD Luís Silveira Santos, PhD
CEMAPRE/ISEG, University of CEMAPRE/ISEG, University of
Lisbon Lisbon
Email: jcaiado@iseg.ulisboa.pt Email: lsantos@iseg.ulisboa.pt
Web:
http://jcaiado100.wixsite.com/
jorgecaiado
III. Regression Analysis
Correlation Analysis
Correlation coefficient. It measures the strength of (linear) association of two
variables:
𝑠𝑠𝑋𝑋𝑋𝑋
𝑟𝑟𝑋𝑋𝑋𝑋 =
𝑠𝑠𝑋𝑋 𝑠𝑠𝑌𝑌
1
where 𝑠𝑠𝑋𝑋𝑋𝑋 = ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)(𝑦𝑦 � is the sample covariance of X and Y, and 𝑠𝑠𝑋𝑋 and
̅ 𝑖𝑖 − 𝑦𝑦)
𝑛𝑛
𝑠𝑠𝑌𝑌 are the sample standard deviations of X and Y, respectively.
Illustrative example:
Return co-
movements and volati
lity co-movements
across international
stock markets
2
III. Regression Analysis
How to test the significance of correlation coefficient?
To test the null hypothesis (H0) that the correlation in the population, 𝜌𝜌𝑋𝑋𝑋𝑋 , is zero
(𝜌𝜌𝑋𝑋𝑋𝑋 = 0) against the alternative hypothesis that the correlation in the population is
different from zero (𝜌𝜌𝑋𝑋𝑋𝑋 ≠ 0), we use the following test statistic:
𝑟𝑟𝑋𝑋𝑋𝑋 𝑛𝑛 − 2
𝑡𝑡 = ~𝑡𝑡(𝑛𝑛 − 2)
2
1− 𝑟𝑟𝑋𝑋𝑋𝑋
Example: The sample correlations among GDP, money supply (M2) and interest rate
(TB1YR) in the US during 1997Q1-2001Q1 are shown in the following table. Test the
null hypothesis that each of these correlations, individually, is equal to 0 at the 0.05
level of significance.
M2 GDP TB1YR
M2 1.000000 0.993685 0.068864
GDP 0.993685 1.000000 0.167160
TB1YR 0.068864 0.167160 1.000000
3
III. Regression Analysis
Simple Linear Regression
The problem of estimation
Simple linear regression or two-variable linear regression is a linear regression in which
the dependent variable is related to a single independent or explanatory variable. The
population regression specification of a simple regression model is as follows:
𝑌𝑌𝑖𝑖 = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑁𝑁 (1)
where 𝑌𝑌𝑖𝑖 is the dependent variable, 𝑋𝑋𝑖𝑖 is the independent variable, 𝑏𝑏0 and 𝑏𝑏1 are the
regression coefficients (intercept and slope coefficients, respectively) and 𝑢𝑢𝑖𝑖 is the
stochastic error term. In this model, 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋𝑖𝑖 is the deterministic component and 𝑢𝑢𝑖𝑖
is the stochastic component.
Since the model (1) is not directly observable, we estimate it from the sample
regression function:
𝑌𝑌𝑖𝑖 = 𝑏𝑏�0 + 𝑏𝑏�1 𝑋𝑋𝑖𝑖 + 𝑢𝑢� 𝑖𝑖 (2)
or
𝑌𝑌𝑖𝑖 = 𝑌𝑌�𝑖𝑖 + 𝑢𝑢� 𝑖𝑖 (3)
where 𝑢𝑢� 𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 are the residuals and 𝑌𝑌�𝑖𝑖 = 𝑏𝑏�0 + 𝑏𝑏�1 𝑋𝑋𝑖𝑖 .
4
III. Regression Analysis
The method of ordinary least squares (OLS) chooses 𝑏𝑏�0 and 𝑏𝑏�1 values that minimize
the sum of squared residuals, ∑ 𝑢𝑢� 𝑖𝑖2 . In other words, it provides unique estimates of 𝑏𝑏0
and 𝑏𝑏1 that minimize the distances between the observations and the regression line.
2
The differentiation of ∑ 𝑢𝑢� 𝑖𝑖2 = ∑ 𝑌𝑌𝑖𝑖 − 𝑏𝑏�0 − 𝑏𝑏�1 𝑋𝑋𝑖𝑖 yields the following normal
equations:
∑ 𝑌𝑌𝑖𝑖 = 𝑛𝑛𝑏𝑏�0 + 𝑏𝑏�1 ∑ 𝑋𝑋𝑖𝑖
and
5
III. Regression Analysis
Solving the normal equations, we obtain
𝑏𝑏�0 = 𝑌𝑌� − 𝑏𝑏�1 𝑋𝑋�
and
1 1 1
∑ 𝑌𝑌𝑖𝑖 𝑋𝑋𝑖𝑖 − ∑ 𝑋𝑋𝑖𝑖 ∑ 𝑌𝑌𝑖𝑖
𝑏𝑏�1 = 𝑛𝑛 𝑛𝑛 𝑛𝑛
2
1 2 1
∑ − ∑ 𝑋𝑋𝑖𝑖
𝑛𝑛 𝑋𝑋𝑖𝑖 𝑛𝑛
𝑆𝑆𝑋𝑋𝑋𝑋 𝑆𝑆𝑌𝑌
= 2 = 𝑅𝑅𝑋𝑋𝑋𝑋 ×
𝑆𝑆𝑋𝑋 𝑆𝑆𝑋𝑋
6
III. Regression Analysis
Example: Fitted regression line of GDP on M2
7
III. Regression Analysis
Exercise: Source: DeFusco et al., 2015
8
III. Regression Analysis
Assumptions of the Simple Linear Regression Model (SLRM):
A1: The regression model is linear in the parameters 𝑏𝑏0 and 𝑏𝑏1 , as follows
𝑌𝑌𝑖𝑖 = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖
A2: The independent variable 𝑋𝑋 is non-stochastic.
A3: The conditional mean value of the error term 𝑢𝑢 is zero: 𝐸𝐸(𝑢𝑢𝑖𝑖 𝑋𝑋𝑖𝑖 = 0, for each i
A4: The conditional variance of the error term 𝑢𝑢 is the same for all observations:
𝑉𝑉𝑉𝑉𝑉𝑉(𝑢𝑢𝑖𝑖 𝑋𝑋𝑖𝑖 = σ2 .
A5: The error term 𝑢𝑢 is uncorrelated across observations: 𝐶𝐶𝐶𝐶𝐶𝐶(𝑢𝑢𝑖𝑖 , 𝑢𝑢𝑗𝑗 𝑋𝑋𝑖𝑖 , 𝑋𝑋𝑗𝑗 = 0, 𝑖𝑖 ≠
𝑗𝑗.
A6: The error term 𝑢𝑢 is normally distributed.
∑ 𝑋𝑋𝑖𝑖2 𝜎𝜎�
𝑠𝑠𝑠𝑠(𝑏𝑏�0 ) = 𝜎𝜎� 𝑠𝑠𝑠𝑠(𝑏𝑏�1 ) =
� 2
𝑛𝑛 ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋) � 2
∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋)
�𝑖𝑖2
∑ 𝑢𝑢
where 𝜎𝜎� = SEE = is the OLS estimator of 𝜎𝜎 and SEE stands for standard error
𝑛𝑛−2
of estimate.
∑( �𝑖𝑖 − 𝑌𝑌)
𝑌𝑌 � 2 ESS
𝑟𝑟 2 = =
� 2 TSS
∑(𝑌𝑌𝑖𝑖 − 𝑌𝑌)
or, equivalently,
2
2
∑ 𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 RSS
𝑟𝑟 = 1 − = 1−
∑ 𝑌𝑌𝑖𝑖 − 𝑌𝑌� 2 TSS
11
III. Regression Analysis
Exercise: Source: DeFusco et al., 2015
12
III. Regression Analysis
Confidence Intervals and Hypothesis Testing
If the sampling or probability distributions of the OLS estimators are known, one can
perform a hypothesis test using the confidence interval approach.
Assuming the error term 𝑢𝑢 is normally distributed, the OLS estimators 𝑏𝑏�0 and 𝑏𝑏�1 are
themselves normally distributed as follows:
𝑏𝑏�0 ~𝑁𝑁(𝑏𝑏0 , 𝜎𝜎�2 ) and 𝑏𝑏�1 ~𝑁𝑁(𝑏𝑏1 , 𝜎𝜎 �2 )
𝑏𝑏0 𝑏𝑏1
where
∑ 𝑋𝑋𝑖𝑖2 𝜎𝜎 2
𝜎𝜎𝑏𝑏�20 = 𝜎𝜎 2 𝜎𝜎𝑏𝑏�21 =
�
𝑛𝑛 ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋)2 � 2
∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋)
13
III. Regression Analysis
Once 𝜎𝜎 is rarely known, we may replace 𝜎𝜎 by its estimate, 𝜎𝜎,
� and use the t-student
distribution as follows:
𝑏𝑏� −𝑏𝑏 𝑏𝑏� −𝑏𝑏
0
𝑡𝑡0 = 𝑠𝑠𝑠𝑠( 0
~𝑡𝑡(𝑛𝑛−2) and 𝑡𝑡1 = 1 1
~𝑡𝑡(𝑛𝑛−2)
0𝑏𝑏� ) 𝑠𝑠𝑠𝑠(𝑏𝑏� )
1
14
III. Regression Analysis
To illustrate the confidence interval approach, consider a simple regression of
personal consumption expenditure (Y) on gross domestic product (X), both in billions
of dollars, in the United States, 1982-1996 (N=15):
𝑌𝑌�𝑖𝑖 = −184.078 + 0.7064𝑋𝑋𝑖𝑖
𝜎𝜎� = 411.4913; 𝑠𝑠𝑠𝑠(𝑏𝑏�0 ) = 46.2619; 𝑠𝑠𝑠𝑠(𝑏𝑏�1 ) = 0.007827
Solution:
𝐶𝐶𝐼𝐼0.95 𝑏𝑏0 = −184.078 ± 2.16 × 46.2619
−284.004 < 𝑏𝑏0 < −84.152
Now suppose that we postulate 𝐻𝐻0 : 𝑏𝑏1 = 0.5 versus 𝐻𝐻1 : 𝑏𝑏1 ≠ 0.5. Is the observed 𝑏𝑏�1
compatible with 𝐻𝐻0 ? 15
III. Regression Analysis
An alternative but complementary approach to the confidence interval method of
estimation is the test of statistical hypothesis.
Suppose that 𝑏𝑏𝑘𝑘∗ is the conjectured value of 𝑏𝑏𝑘𝑘 under the null hypothesis (𝐻𝐻0 : 𝑏𝑏𝑘𝑘 = 𝑏𝑏𝑘𝑘∗ ),
then the test statistic can be defined as:
𝑏𝑏�0 −𝑏𝑏0∗ 𝑏𝑏�1 −𝑏𝑏1∗
𝑡𝑡0 = ~𝑡𝑡(𝑛𝑛−2) or 𝑡𝑡1 = ~𝑡𝑡(𝑛𝑛−2)
𝑠𝑠𝑠𝑠(𝑏𝑏�0 ) 𝑠𝑠𝑠𝑠(𝑏𝑏�1 )
Here, again, the critical value depends on the significance level, α, and on the type of
the test:
• For 𝐻𝐻0 : 𝜃𝜃 = 𝑏𝑏𝑘𝑘∗ vs. 𝐻𝐻1 : 𝜃𝜃 ≠ 𝑏𝑏𝑘𝑘∗ (two-tailed hypothesis test), two rejection points exist
(one negative, one positive), ±𝑡𝑡𝛼𝛼/2 .
• For 𝐻𝐻0 : 𝜃𝜃 = 𝑏𝑏𝑘𝑘∗ or 𝐻𝐻0 : 𝜃𝜃 ≤ 𝑏𝑏𝑘𝑘∗ vs. 𝐻𝐻1 : 𝜃𝜃 > 𝑏𝑏𝑘𝑘∗ (right-tailed hypothesis test), one
rejection point exists (positive only), 𝑡𝑡𝛼𝛼 .
• For 𝐻𝐻0 : 𝜃𝜃 = 𝑏𝑏𝑘𝑘∗ or 𝐻𝐻0 : 𝜃𝜃 ≥ 𝑏𝑏𝑘𝑘∗ vs. 𝐻𝐻1 : 𝜃𝜃 < 𝑏𝑏𝑘𝑘∗ (left-tailed hypothesis test), one
rejection point exists (negative only), −𝑡𝑡𝛼𝛼 .
16
III. Regression Analysis
In the consumption-GDP example, we know that 𝑏𝑏�0 = −184.078, 𝑠𝑠𝑠𝑠(𝑏𝑏�0 ) = 46.2619,
𝑏𝑏�1 = 0.7064, 𝑠𝑠𝑠𝑠(𝑏𝑏�1 ) = 0.007827. How to test 𝐻𝐻0 : 𝑏𝑏1 = 0.5 vs. 𝐻𝐻1 : 𝑏𝑏1 ≠ 0.5, at the 5%
level? And 𝐻𝐻0 : 𝑏𝑏0 = −200 vs. 𝐻𝐻1 : 𝑏𝑏0 ≠ −200, at the same level?
Solution:
𝑛𝑛 − 2 = 15 − 2 = 13 df → 𝑡𝑡 13 , thus the critical value is equal to 𝑡𝑡0.05/2 13 = 2.16.
The observed values of the test statistics are:
−184.078−(−200) 0.7064−0.5
𝑡𝑡0,𝑜𝑜𝑜𝑜𝑜𝑜 = = 0.344 and 𝑡𝑡1,𝑜𝑜𝑜𝑜𝑜𝑜 = = 26.370
46.2619 0.007827
17
III. Regression Analysis
Analysis of Variance (ANOVA) in a Simple Linear Regression
Recall that in the previous section, we decompose the total sum of squares (TSS) into
two components: explained sum of squares (ESS) and residual sum of squares (RSS).
2
(note that 𝐹𝐹 1,𝑚𝑚 = 𝑡𝑡𝑚𝑚 )
18
III. Regression Analysis
To illustrate, consider the following expenditure regression of total expenditure (T) on
food expenditure (F) in India, for a sample of 55 rural households (Source: Mukherjee
et al., 1998):
𝐹𝐹�𝑖𝑖 = 94.2087 + 0.4368𝑇𝑇𝑖𝑖
𝑠𝑠𝑠𝑠(𝑏𝑏�0 ) = 50.8563, 𝑠𝑠𝑠𝑠(𝑏𝑏�1 ) = 0.0783
𝑡𝑡0,𝑜𝑜𝑜𝑜𝑜𝑜 = 1.8524, 𝑡𝑡1,𝑜𝑜𝑜𝑜𝑜𝑜 = 5.5770
𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 = 31.1034 (𝑝𝑝 value = 0.000)
𝑟𝑟 2 = 0.3698
Test the null hypothesis that there is no relationship between food expenditure and
total expenditure (𝐻𝐻0 : 𝑏𝑏1 = 0)?
Solution:
2
𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 = 𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = 5.57702 = 31.103
At the 5% level, the critical value for the 𝐹𝐹(1,55 − 2) distribution is 𝑓𝑓𝛼𝛼 = 4.03
(obtained from the statistical tables, based on a 𝐹𝐹(1,50) distribution). Since 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 > 𝑓𝑓𝛼𝛼 ,
𝐻𝐻0 can be rejected.
19
III. Regression Analysis
Exercise: Source: DeFusco et al., 2015
20
III. Regression Analysis
21
III. Regression Analysis
22