Professional Documents
Culture Documents
www.irfanullah.co
Graphs, charts, tables, examples, and figures are copyright 2012, CFA Institute. Reproduced
and republished with permission from CFA Institute. All rights reserved.
www.irfanullah.co 1
Contents and Introduction
1. Introduction
www.irfanullah.co 2
2. Multiple Linear Regression
Multiple linear regression allows us to determine the effect of more than one
independent variable on a particular dependent variable
www.irfanullah.co 3
Example 1: Explaining the Bid—Ask Spread
www.irfanullah.co 4
Example 1: Evaluating the Regression Output
1. The relationship between the dependent variable, Y and the independent variables X1 , X2, … , Xk is
linear
2. The independent variables (X1 , X2, …… , Xk ) are not random. Also no exact linear relation exists
between two or more of the independent variables
3. The expected value of error term, conditioned on the independent variables, is E(ε| X1 , X2, …… , Xk)
=0
4. The variance of the error term is the same for all observations: E(εi2 ) = σ²ε .
5. The error term is uncorrelated across observations: E(εi εj ) = 0 , j ≠ i.
6. The error term is normally distributed
www.irfanullah.co 6
2.2 Predicting the Dependent Variable in a Multiple
Regression Model
To predict the value of a dependent variable using a multiple linear regression model,
we follow these three steps:
Read Example 4
www.irfanullah.co 7
2.3 Testing whether All Population Regression
Coefficients Equal Zero
To test the null hypothesis that all of the slope coefficients in the multiple regression model are jointly
equal to 0 (H0 : b1 = b2 = … = bk = 0) against the alternative hypothesis that at least one slope coefficient
is not equal to 0 we must use an F-test.
www.irfanullah.co 8
2.4 Adjusted R2
R2 =
www.irfanullah.co 9
Interpreting R2
• Three independent variables together explain 85% of the variation in Y
Adjusted R2 is also 85%
www.irfanullah.co 10
3. Using Dummy Variables In Regressions
www.irfanullah.co 11
Example 5: Month-of-the-Year Effects on Small-Stock Returns
F-test
www.irfanullah.co 12
F-Table
df1 = 11
www.irfanullah.co
df2 = 276 13
Summary: F-stat, R-squared and Adjusted R-squared
• Adjusted R squared
www.irfanullah.co 14
4. Violations of Regression Assumptions
www.irfanullah.co 15
4.1 Heteroskedasticity
• Error term variance differs across observations
Unconditional not a problem
Conditional problem
• Consequences of heteroskedasticity
F-test for the overall significance of the regression is unreliable
Coefficient estimates fine but standard error understated
What is the impact on the t-stat?
www.irfanullah.co
Example 7 and Example 8 16
4.2 Serial Correlation
• Serial correlation (autocorrelation): errors correlated across observations
Assumption: independent variable is not a lagged value of the dependent variable
www.irfanullah.co 17
4.3 Multicollinearity
• Multicollinearity: two or more independent variables (or combinations of
independent variables) are highly (but not perfectly) correlated with each
other
• Consequences of multicollinearity
Inflates SE’s t-stats of coefficients artificially small
• Detecting multicollinearity
A matter of degree rather than absence or presence
Symptom: high R2, significant F-stat, inflated standard error, low t-stat for coefficients
• Correction:
Omit one or more of the “X” variables
www.irfanullah.co
Example 9 18
4.4 Summarizing the Issues
Problem Effect Solution
Heteroskedasticity F-test is unreliable Robust standard errors
Multicollinearity Inflated SE’s t-stats of Omit one or more of the “X” variables
coefficients artificially small
www.irfanullah.co 19
5. Model Specification and Errors in
Specification
1. Principles of Model Specification
www.irfanullah.co 20
5.1 Principles of Model Specification
• The model should be grounded in cogent economic reasoning
• The functional form chosen for the variables in the regression should be
www.irfanullah.co 21
5.2 Misspecified Functional Form
Whenever we estimate a regression, we must assume that the regression has the correct
functional form. This assumption can fail in several ways:
2. One or more of the regression variables may need to be transformed (for example, by
taking the natural logarithm of the variable) before estimating the regression
Example 11. Nonlinearity and the bid-ask spread.
Example 12
3. The regression model pools data from different samples that should not be pooled (shown
graphically on the next slide)
www.irfanullah.co 22
www.irfanullah.co 23
5.3 Time Series Misspecification
Regression assumption 3: error term has an expected value of 0. When working with time
series data, this assumption is frequently violated which causes the estimated regression
coefficients will be biased and inconsistent. (this is different from t-stats being biased and
inconsistent)
Three common problems that create this type of time-series misspecification are:
www.irfanullah.co 24
Example 14: The Fisher Effect with Measurement Error
www.irfanullah.co 25
5.4 Other Types of Time-Series Misspecification
The most frequent source of misspecification in linear regressions that use time series from two or
more different variables is nonstationarity.
Situations where we need to use stationarity tests before we use regression statistical inference
• Relations among time series with trends (for example, the relation between consumption and GDP)
• Relations among time series that may be random walks (time series for which the best predictor of
next period’s value is this period’s value). Exchange rates are often random walks.
www.irfanullah.co 26
6. Models with Qualitative Dependent
Variables
Qualitative dependent variables are dummy variables used as dependent variables instead of as
independent variables. For example, to predict whether or not a company will go bankrupt, we need
to use a qualitative dependent variable (bankrupt or not) as the dependent variable and use data on
the company’s financial performance (e.g., return on equity, debt-to-equity ratio, or debt rating) as
independent variables.
Linear regression is not appropriate in these situations. We should use probit, logit, or discriminant
analysis. Probit and logit models estimate the probability of a discrete outcome given the values of the
independent variables used to explain that outcome. The probit model, which is based on the normal
distribution, estimates the probability that Y = 1 (a condition is fulfilled) given the value of the
independent variable X. The logit model is identical, except that it is based on the logistic distribution
rather than the normal distribution. Discriminant analysis yields a linear function which can then be
used to create an overall score. Based on the score, an observation can be classified into the bankrupt
or not bankrupt category.
www.irfanullah.co 27
Summary
• ANOVA
• Assumptions
• Dummy variables
• Heteroskedasticity
• Serial correlation
• Multicollinearity
• Model misspecifications
www.irfanullah.co 28
Conclusion
• Read summary
• Examples
• Practice problems
www.irfanullah.co 29