You are on page 1of 7

ECONOMETRICS CRT M2

Regression Model Evaluation.


In the following table, the results of 6 models attempt to explain a dependent variable of interest, y. You may
assume that there is sufficient theoretical reason to consider any or all of the explanatory
variables x1, x2, x3 and x4 in a model for y, but it is unknown whether all of them are necessary to effectively
model the data generating process of y.

Provide a thorough, rigorous analysis of which of the models is preferred. Your analysis should include features of each
coefficient, each model, and each of the diagnostic statistics. Do NOT analyses them one-by-one, but by theme as identified in
Module 2 of Econometrics. For the preferred model, give an analysis of the likely correlation among the explanatory variables.

ANSWERS

1. A Linear Regression Model is used to predict the value of a dependent variable Y, using a set
of independent variables and an intercept/constant.
When the model is run, it returns a set of coefficients for each of the input variables, their significance in
the predictive model and a set of consolidated test scores for the model, such as R-squared, Adjusted R-
squared, F-statistic, etc.
2. We try to reject the following Null Hypothesis and accept the alternative
hypothesis. Null Hypothesis : The coefficient of all the input variables is
zero. Alternative Hypothesis : The estimated coefficient is not zero.
3. Explanation of some of the KPIs from the Regression output:
a. Coefficient of variables: If the coefficient is positive, it means that the independent variable
x and the dependent variable Y are proportional to each other, i.e. a positive movement in x
will result in a positive movement in Y.
b. P-Value of variables: p-value refers to the degree of significance of the independent variable.
It is the measure of the probability that an observed difference could have occurred just by
random chance. A smaller p-value signifies stronger evidence in favor of the alternative
hypothesis.
c. R-squared: It represents the proportion of the variance for a dependent variable that is explained
by the set of independent variables used in the model.
4. Model selection:
Step 1: Reject the model where any of the variable is having a p-value > 0.05. If any of the variable is
having a higher p-value, null value cannot be rejected for it, thus the model becomes insignificant.
Model 3, Model 4 and Model 6 are rejected as x4 is having p-value > 0.05 in all the three
models.

a. Step 2: Check the degree of variance in the dependent variable being explained by
the independent variables by looking at R-squared.

R-squared of Model 5 (0.4638) > Model 2 (0.4148) > Model 1 (0.3775)

It appears that Model 5 is the best looking at this result. However, it will be too early to judge at
this point, as the higher value of R-squared can be due to higher number of independent variables
used in the model.

Model 5 is having 3 variables which is greater than the no of variables in Model 1 and Model 2.
Thus, the higher value of R-squared can be due to additional variables.

However, between Model 1 and Model 2, we can conclude that Model 2 is better, as the number
of variables is same for both the models.

b. Step 3: Compute adjusted R-squared to finalize the best Model


Adjusted R-squared is the metric that adds a penalty to the R-squared values of higher order
models. Thus, adjusted R-squared can be used to identify the best model from the set of models
with different no of parameters.

Adjusted R-squared = 1 – (1 – R2) (n – 1)


(n – p – 1)
where,
R2 = R-squared value of the model
n = No of observations in the model
p = No of independent variables used.

We do not have the information on no of observations in the question. So, we calculate it using
F-Statistics equation.
F-statistics = R2 * (n – p – 1)
(1 – R2) * p
In Model 5, R2 = 0.4638, p = 3, F-statistics = 56.51. Thus, we obtain value of n by substituting in
the above formula.
56.51 = 0.4638 * ( n – 3 – 1)
(1 – 0.4638) * 3

Thus, n = 200
Using the value of n in the Adjusted R-squared formula for Model 2 and Model 5 we get:
Model 2:
Adjusted R-squared = 1 – (1 – R2) (n – 1)
(n – p – 1)

= 1 – (1 – 0.4148) (200 – 1)
(200 – 2 – 1)

= 0.4088

Model 5:
Adjusted R-squared = 1 – (1 – R2) (n – 1)
(n – p – 1)

= 1 – (1 – 0.4638) (200 – 1)
(200 – 3 – 1)

= 0.4556

We can clearly see, that Model 5 is having higher adjusted R-squared. Hence, it is the best model
amongst the set of models in the question.

For the preferred model, give an analysis of the likely correlation among the explanatory variables.

We have the following model:

Y = 0.07374 – 0.0813 X1 + 0.33752 X2 + 0.23387 X3

Y is proportionally dependent on X2 and X3 and inversely proportional to X1.

VIF=1/(1-R^2)

VIF=1/(1-0.4638)= 1.8649

The correlation among explanatory variable is very small as can be seen from the VIF calculation.
(3) Write the Fama-French 3 factor model equation, specifying what each term means

E(R) = Rf + β1 (Rm − Rf ) + β2 (SMB) + β3 (HML) + α

E(R) = Expected rate of return

RF = Risk-free rate

β1, β2, β3 = Factor coefficients

(Rm − Rf ) = Market risk premium

SMB = Historic excess returns of small-cap companies over large-cap companies

HML = Historic excess returns of value stocks over growth stocks

α = its the risk

(4) Explain in words how the model improves upon CAPM

According to the Fama-French three factor model, small-cap companies outperform large-cap companies and value
companies outperform growth companies. The model expands over the CAPM model to adjust for these out
performance tendencies.

(5) Formulate the Fama-French regression using your stock’s returns, all the Fama-French factors, and the
benchmark returns.

MKTRF = col_number(), SMB = col_number(),

HML = col_number(), RF = col_number(),

TR = col_number(), XF = col_number()))

View(ffdata)

MKTRF<-ffdata[,2]

SMB<-ffdata[,3]

HML<-ffdata[,4]

XF<ffdata[,7]

XF<-ffdata[,7]
Call:

lm(formula = XF ~ MKTRF + SMB + HML, data = ffdata)

Coefficients:

(Intercept) MKTRF SMB HML

-0.8693 1.1497 0.9102 -0.3090

print(summary(ffregression))

SUMMARY OF THE REGRESSION RESULTS

ffregression←lm(XF~MKTRF+SMB+HML, data = ffdata)

print(summary(ffregression))

Call:

lm(formula = XF ~ MKTRF + SMB + HML, data = ffdata)

Residuals:

Min 1Q Median 3Q Max

-15.148 -1.244 0.174 1.506 16.096

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.8693 0.1872 -4.644 5.55e-06 ***

MKTRF 1.1497 0.2366 4.860 2.09e-06 ***

SMB 0.9102 0.4171 2.182 0.030 *

HML -0.3090 0.3198 -0.966 0.335

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.934 on 247 degrees of freedom

(1 observation deleted due to missingness)

Multiple R-squared: 0.1374, Adjusted R-squared: 0.1269

F-statistic: 13.11 on 3 and 247 DF, p-value: 5.653e-08

(6) What is the Greek letter used in front of each factor?

alpha refers to the excess return over the benchmark. beta refers to the factor coefficients, seen in the FF equation
above.

(7) Which model performed better? CAPM


Summary OF THE REGRESSION RESULTS
ffregression<-lm(XF~MKTRF,data =ffdata)
Summary OF THE REGRESSION RESULTS
ffregression←lm(XF~MKTRF, data =ffdata)
print(summary(ffregression))

Call:
lm(formula = XF ~ MKTRF, data = ffdata)

Residuals:
Min 1Q Median 3Q Max
-15.6947 -1.2038 0.1935 1.4596 16.1200

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8920 0.1877 -4.751 3.42e-06 ***
MKTRF 1.3153 0.2266 5.804 1.97e-08 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.953 on 249 degrees of freedom


(1 observation deleted due to missingness)

Multiple R-squared: 0.1192, Adjusted R-squared: 0.1156


F-statistic: 33.68 on 1 and 249 DF, p-value: 1.965e-08
Fama-French 3 factor model has the following statistical parameters for significance test

adjusted R^2 =12.69%

F-Statistics = 13.11

P-value = 5.653e-08

CAPM has the following statistical parameters for significance test.

Adjusted R^2 =11.56%

F-statistics =33.68

p-value: =1.965e-08

Fama-French 3 factor model better approximates as it accounts for out-performance tendencies from growth/value
and small/large cap companies over the CAPM model.
Choosing the model based on adjusted R^2:
The adjusted R^2 on Fama-French model is higher (12.69%) as compared to to the adjusted R^2 on CAPM
(11.56%) hence Fama-French 3 factor model performed better as compared to CAPM.

You might also like