11 MLRSpecificationIssues

Specification Issues for MLR Models
1
• Recall that a model is correctly
SPECIFIED when
Specification
– No important variables are
omitted.
– No irrelevant variables are

included.
2 – The correct functional form is

used.
• While R2 provides a useful tool in
evaluating goodness of fit across
Adjusted models, it has a major flaw:
R-
• As additional explanatory variables
Squared are added to a model, R2 will
ALWAYS go up, no matter how
weak the relationship with Y.
• Seeing R2 increase as the number

of variables increases tells us
3 nothing about whether our model
has been improved.
Adjusted R2
• The Adjusted R2 (denoted )
R 2
introduces a
“penalty” for each additional variable added.
• If Adjusted R2 increases, then the explanatory

power of the new variable is large enough to
overcome the penalty, and therefore our model
has improved.
• If Adjusted R2 falls when a new variable is

added, then it did not significantly add to the
explanatory power of the model, and should be
removed. 4
Adjusted R2
• Formula:
RSS /(n  k  1)
R  1
2
TSS /(n  1)
 2  n  1 
R  1  (1  R )
2

  n  k  1 
 RSS  n  1 
R  1  
2
 
 TSS  n  k  1  5
Adjusted R2
• Adding a new variable will have to reduce RSS by more than it

increases penalty term [(n-1)/(n-k-1)] in order to “improve” the model.
• Note that the “severity” of the penalty term for each additional variable
will be greater when n is small.
6
Adjusted R2 Example
n 100 100 100 100 30 30 30 30
k 2 3 4 5 2 3 4 5
R2 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8
Rbar2 0.796 0.794 0.792 0.789 0.785 0.777 0.768 0.758
Penalty 1.02 1.03 1.04 1.05 1.07 1.12 1.16 1.21

7
• Using the following tools:
– t-tests
– F-tests
Back to the
– Adjusted R2
issue of
specification:
And avoiding the following
problems:
- Omitted variable bias
- Multicollinearity
- Extraneous variables
8 We can choose the correct set of

independent variables, but what
about. . . .
Alternative Functional Forms
• OLS requires that the regression equation is

linear in parameters, such that B0 and/or B1 9
(etc.) do not enter the equation in anything but a
linear fashion.
• However, variables Y and X (etc.) do not have to

enter the equation in a linear fashion. The
following general functional form will be
consistent with OLS:
f(Yi) = B0 + B1 [g(Xi)] + ui
Level-Level Form
10
Level-Level Form
11
12
• When would it be appropriate

to use non-linear functions of
Y and/or X?
– Underlying theory
Functional suggests non-linear DGP.
Forms – Sample data suggests

non-linear DGP.
– Alternative coefficient
interpretations are desired.
Typical Non-linear Transformations
• 1. Double Log Form (Log-Log): Elasticities
ln(Yi) = B0 + B1ln(Xi) +ui
Take total differential :

1 1
dYi  B1 dX solve for B1
Yi Xi
X dYi dYi / Yi %Yi
B1   
Y dX i dX i / X i %X i
13
Log-Log Form
14
• 2. Semi-log form: partial elasticities
a.) ln(Yi) = B0 + B1Xi +ui
dY / Y %Y
B1  
dX X
B1 is the percent change in Yi due to a one UNIT

change in Xi
15
Log-Level Form
16
2. Semi-log form: partial elasticities
b.) Yi = B0 + B1 ln(Xi) +ui
dY Y
B1  
dX / X %X
B1 is the UNIT change in Yi due to a one

PERCENTAGE point change in Xi
Remember – don’t attempt a log transformation

when X or Y is potentially negative – growth
rates for example. 17
Level-Log Form
18
• 3.) Polynomial Functions: Squared terms
Yi = B0 + B1Xi + B2 Xi2 +ui Yi

 B1  2 B2 X i
X i
For B1 >0 and B2 > 0, the positive impact of X on Y
increases as X increases – increasing returns.
For B1 < 0 and B2 < 0, the negative impact of X on Y

increases as X increases.
For B1 >0 and B2 < 0, the positive impact of X on Y

decreases as X increases – diminishing returns.
19
• 4.) Polynomial Functions: Square-root terms
Yi = B0 + B1Xi + B2 Xi0.5 +ui
Yi 1
 B1  .5 * B2
X i Xi
Impact of X decreases as the value of X increases.
Useful for “Diminishing Returns” functions or

arguments – utility functions, productions
functions, etc.
20
• 5.) Inverse transformation
Yi = B0 + B1Xi + B2 (1/Xi) +ui
Yi  1 
 B1  B2  
X i  Xi 
Again, the marginal effect of X on Y decreases as
the value of X increases. In this case, it
approaches zero (or B1) as X approaches infinity.
Rarely used in practice, makes coefficients difficult

to interpret. 21
22
• 1. If you desire elasticity

interpretation of coefficient
When to (natural log).
• 2. If theory dictates a Diminishing

Transform Returns argument
root, log, or inverse).
(square-
Your • 3. If theory dictates an Increasing

Returns argument (squared-
Data? term)
(revisited) • 4. If sample data indicates a non-

linear relationship between Yi
and Xi,, regardless of theory.
How to
detect
non- • 1. Observe scatter plot diagrams for
each independent variable against
linearities the dependent variable.
in sample
data • 2. Check for skewness of variables –
where skew represents a lack of
symmetry of the frequency
distribution about the mean.
23
Measuring Skew
• We can measure the degree of symmetry
or skewness in a distribution by calculating
the third moment of the distribution, or the
coefficient of skewness:
1
n
 i
(Y  Y ) 3
a3 
s3
• Where s3 is the standard error of Yi to the

3rd power. 24
1
n
 i
(Y  Y ) 3
a3  3
0
s
•If a3 = 0 (zero skewness) then the sum of deviations for

observations greater than the mean is just equal to the
sum of deviations for observations below the mean.
•The distribution will be symmetric.
•A normal distribution has a coefficient of skewness

equal to zero.
25
Zero (approx.) Skew
1
n
 i
(Y  Y ) 3
a3  3
 .07
s
60
Series: X1
Sample 1 526
50
Observations 526
40 Mean 0.052784
Median 0.099718
30 Maximum 2.743626
Minimum -3.371083
Std. Dev. 1.025389
20 Skewness -0.074157
Kurtosis 2.930136
10
Jarque-Bera 0.589082
Probability 0.744873
0
-3 -2 -1 0 1 2 26
1
n
 i
(Y  Y ) 3
a3  3
0
s
• a3 > 0 :positive skew, or skewed to the
right.
• The sum of deviations for observations

greater than the mean will be greater than
the sum of deviations for observations
below the mean.
• There will be observations widely dispersed

above the mean. 27
Positive Skew
1
 (Yi  Y ) 3
a3  n 3
 2 .0
s
140
Series: WAGE
120 Sample 1 526
Observations 526
100
Mean 5.896103
80 Median 4.650000
Maximum 24.98000
Minimum 0.530000
60
Std. Dev. 3.693086
Skewness 2.007325
40
Kurtosis 7.970083
20 Jarque-Bera 894.6195
0
0 2 4 6 8 10 12 14 16 18 20 22 24 28
1
n
 i
(Y  Y ) 3
a3  3
0
s
• a3 < 0: negative skew, skewed to the left.
• The sum of deviations for observations greater

than the mean will be less than the sum of
deviations for observations below the mean.
• There will be observations widely dispersed below

the mean.
29
Negative Skew
1
n
 i
(Y  Y ) 3
a3  3
 .62
s
STATA: hist educ
sum educ, detail
200
Series: EDUC
Sample 1 526
160 Observations 526
Mean 12.56274
120 Median 12.00000
Maximum 18.00000
Minimum 0.000000
80 Std. Dev. 2.769022
Skewness -0.619574
Kurtosis 4.884245
40
Jarque-Bera 111.4653
0
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
30
Problem: Relationship between
Skewed variables will be non-linear
WAGE vs. EDUC
30
WAGE is positively
25
skewed – larger
20 dispersion above mean.
15 EDUC is negatively
WAGE
10
skewed – larger
dispersion below mean.
5
-5
0 4 8 12 16 20
31
EDUC
In order to make the relationship linear in
Order parameters, data must be transformed
such that skew is reduced.
Correcting
for Skew
Square Square-root, natural log will reduce
positive skew.
32 Reduce Square transformation will reduce

negative skew.
• Log transformation “bunches together” larger values,
reducing dispersion above the mean, making positive
skew more symmetric.
• Likewise, smaller values become relatively more
dispersed.
4.5
4
3.5
3
LN(Y)
2.5
2
1.5
1
0.5
0
Y 33
1
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
•Square transformation increases dispersion of
larger values, reduces relative dispersion of smaller
values.
•Thus, negative skew is reduced. Distribution
becomes more symmetric.
Square Transformation to Reduce Negative Skew
700
600
500
Y2 400
300
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12Y13 14 15 16 17 18 19 20 21 22 23 2434
Variable Skew Coefficient
EDUC -0.62
Transformations (EDUC)2 0.39
WAGE 2.00
Ln(WAGE) 0.39
35
Transformed Scatter Plots
LWAGE vs. EDUC LWAGE vs. EDUC_SQ

4 4
3 3
2 2
LWAGE
LWAGE
1 1
0 0
-1 -1
0 4 8 12 16 20 0 50 100 150 200 250 300 350
EDUC EDUC_SQ
36
Transformed Regressions
(1) (2) (3)

Dep. Var.: Dep. Var.: Dep. Var.:
WAGE Ln(WAGE) Ln(WAGE)
EDUC 0.5413 0.0827 -
(10.17) (10.94)
EDUC2 - - 0.004
(11.80)
R2 0.1648 0.1858 0.2099
37
• (1) A one unit increase in
EDUC increases WAGE by
0.54 units (dollars)
• (2) A one unit increase in

EDUC increases WAGE by
Interpret 8.27%.
Coefficients • (3) %WAGE/dEDUC =

(2)(.004)(EDUCi)
Going from 10 to 11 years of
education will cause a
(2)(.004)(10)=.08 = 8%
increase in WAGE.
38
General 39
Interaction Terms
• A general interaction term captures the change in the
marginal effect of Xi on Yi due to a one unit change in a
second independent variable Zi.
• Where Zi is a continuous variable.
• Let the coefficient on Xi be a linear function of Zi.

Interaction Term Regression Model
• Original Equation: Yi =  + Xi + ei
• Interaction assumption:  = 1 + 2Zi
• Regression: Yi =  + (1 + 2Zi)Xi + ei
Yi =  + 1Xi + 2XiZi + ei
Yi
 ˆ1  ˆ2 Z
X i
  Yi  ˆ
    2
Z i  X i  40
MPC Example
• Is the marginal propensity to consume of households
dependent on the wealth level of the household?
• Example data set with N = 30 cross sectional

observations of household consumption, income and
wealth.
MODEL: CONSi =  + INCi + ei
ASSUME:  = 1 + 2WEALTHi
REGRESSION:
CONSi =  + 1INCi + 2INCi*WEALTHi
41
In STATA, generate the interaction term between income and
wealth
gen inc_wlth=inc*wealth
Include interaction term in regression:

reg cons inc inc_wlth
Number of obs = 30
F( 2, 27) = 65.91
Prob > F = 0.0000
R-squared = 0.8300
Adj R-squared = 0.8174
----------------------------------------------------------
cons | Coef. Std. Err. t P>|t|
-------------+--------------------------------------------
inc | .7079231 .1754693 4.03 0.000
inc_wlth | 1.14e-06 3.84e-07 2.96 0.006
_cons | 17986.08 6367.999 2.82 0.009
----------------------------------------------------------
42
Interpretation:
• 1. Interaction Coefficient 2 is positive and
significant: An increase in WEALTH leads to an
increase in MPC.
CONSi
 ˆ1  ˆ2WEALTH  0.71  0.0000011*WEALTH
INCi
  CONS  ˆ
    2  0.0000011
WEALTH i  INCi 
• Multiply 2 by 10,000 = 0.011. An increase in

WEALTH of $10,000 leads to an increase in MPC
of 0.01 (all else held constant).
43
If wealth = 0, MPC = 0.71. If wealth = $100,000, MPC = 0.81
CONS CONS
INC
Wealth = 100,000
.81
.71
Wealth = 0
INC WEALTH=100,000 44
Other Examples
• Does the asset size of a bank impact its loan

response to a change in interest rates? (interact assets
and policy variable)
• Does the role of microfinance lending in alleviating

poverty depend on education levels? (interact credit
and education)
• Does the change in firm level output due to a change

in labor depend on the size of the firm (are there
significant economies of scale)?
• In order to capture explanatory
power of factors for which there
Chapter 7: does not exist “traditional”
Dummy continuous variables, must use
qualitative / indicator variables.
Independent
Variables
• Intercept Dummy captures the
difference in mean value of Yi due to
a given quality.
• Slope Dummy captures the

46 difference in marginal effect of X on
Y due to a given quality.
Intercept 47
Dummies
• Think of breaking up your data sample into two groups:

– A control group, which does not possess the quality of
interest.
– An experimental or target group, which does possess
the quality of interest.
Dummy variable will take on a value of 1 for each

observation in the “target” group, and zero otherwise.
• Examples:
– Race: where nonwhite=1, white=0,
[white is control, non-white is target]
– Gender: where male=1, female = 0,
[female is control, male is target]
– Union membership: where
member=1, non-member=0.
Intercept – Location: where 1=South, 0=North.
– Time periods: crisis dates 1=crisis
Dummies this period, 0=no crisis.
– Product characteristics: where
1=turbo, 0=no turbo.
Note: the mean of the dummy will be

the proportion of sample in target
group.
Model:
• Yi = B0 + B1Xi + 0Di + ui
• Di = 1 if observation i possesses quality
0 otherwise
For all Di = 0: Yi = B0 + B1Xi + ui
For all Di = 1: Yi = (B0 + 0 )+ B1Xi + ui
The intercept will shift by 0 when Di = 1.

Slope coefficient will be identical for both groups (B1)
Dummy coefficient is estimated by OLS just like others. 49
For D i  0 : Y i  NB0  B1  X i   u i
Y 
NB0
i
 B1
X i

 u i
N N N N
Y( 0 )  B0  B1 X
For D i  1 :  Y  NB  N  B  X   u
i 0 0 1 i i
 Y  NB  N  B  X   u
i 0 0 i i
1
N N N N N
Y(1)  B0   0  B1 X
Therefore :
Y(1)  Y( 0 )   0 50
Case 1: 0 > 0, B0 > 0, B1 > 0. Dummy causes
intercept to shift upwards.
Yi
Y= (B0+0)+B1X
Y= B0+B1X
B0 +0
B0
Xi
51
Case 2: 0 < 0, B0 > 0, B1 > 0. Dummy causes
intercept to shift Downwards.
Yi
Y= B0+B1X
Y= (B0+0)+B1X
B0
B0 +0
Xi
52
Wage2.dta Example
REG WAGE EDUC EXPER FEMALE
Dependent Variable: WAGE

Sample: 1 100
Variable Coefficient Std. Error t-Statistic Prob.
C -4.8590 2.203 -2.2056 0.0298
EDUC 0.8176 0.148 5.5020 0.0000
EXPER 0.1517 0.030 4.9711 0.0000
FEMALE -2.0306 0.702 -2.8890 0.0048
R-squared 0.4013 Adjusted R-squared 0.3826

53
54
Interpretation
• Dummy coefficient of -2.03 means that, holding all other

explanatory variables constant, the mean of WAGE is
lower by 2.03 when Di=1.
• i.e. a female worker with identical education and work

experience will be paid, on average, $2.03 less than a
male worker.
Intercept for males: B0 = -4.86.
Intercept for females: (B0 + ) = -4.86 + (-2.03) = -6.89
WAGE
Males
EDUCi
Females
-4.86
-6.89
Note: holding EXPER
constant.
55
• Dummy Variable Trap: Use one fewer dummy variables
than possible conditions or states. 56
– For example, gender traditionally has two possible
states, so a single variable allows for comparison of
one relative to the other.
– If a MALE and FEMALE dummy are included

simultaneously, then you will violate the assumption
of no perfect linear relationship amongst explanatory
variables.
57
– Other example:
you want to capture geographic effects on bank

performance by Federal Reserve district. There are
12 districts, so include only 11 dummies. The
omitted region will be basis for comparison.
58
Slope-Dummy Variables
• It is often interesting to look not only at the impact of a
given quality on the mean outcome of Y, but also on the
marginal effect of Xi on Yi.
• Is the slope of the regression line different for the

control vs. treatment group?
• For example, does an additional year of education have

a smaller effect on WAGE for females vs. males?
Slope-Dummy Regression
Model
• Assume that intercepts are the same for control and
experimental group, but let slopes differ.
59
• Again, let Di = (0,1).
• Define an interaction term as the product of two

independent variables.
• In this case, the interaction term of interest is the product

of Xi and Di, where Xi is a continuous independent
variable.,
• Regression: Yi = B0 + B1Xi + 1(Xi)(Di)
• If Di = 0, then Yi= B0 + B1Xi

Yi ( 0)
 ̂1
X i
• If D = 1, then Yi = B0 + B1Xi +1Xi

Yi (1)
 ˆ1  ˆ1
X i
60
• Use standard hypothesis tests for
Slope- OLS estimate of slope-dummy
Dummy coefficient.
coefficient
• Only impose interpretation if you can
reject H0: 1 = 0
• NOTE: Interaction variable will

appear in your data as 0’s when Di =
61 0, and equal to Xi when Di = 1.
Slope-Dummy Case 1: 1 > 0, B0>0, B1>0. Xi has a
larger impact on Yi when Di = 1.
Yi
Di = 1 Yi = B0+(B1+ 1)Xi
Yi = B0 + B1Xi
Di = 0
B0
Xi
62
Slope-Dummy Case 2: 1 < 0, B0>0, B1>0. Xi has a
SMALLER impact on Yi when Di = 1.
Yi
Yi = B0 + B1Xi
D=0
Yi = B0+(B1+ 1)Xi
D=1
B0
Xi
63
64
Wage1 Slope-Dummy Example
• Suppose that we do want to examine the impact of
gender on the return to education. . .
• First, we must construct a slope-dummy interaction
term between the dummy FEMALE and EDUC.
• In STATA, type:
“gen EDUC_FEM = EDUC * FEMALE”
65
• Then, we simply include this term in our previous regression.
• Note, you may include both an intercept and slope dummy term
simultaneously, but this can lead to MULTICOLLINEARITY issues.
• We get:
WAGE = -5.66 + 0.87*EDUC + 0.15*EXP - 0.15*EDUC_FEM
(.01) (0.00) (0.00) (0.01)
(p-values in parentheses)
1 = -0.15 < 0 . . .
• …This means that, on average, one additional year
of education will yield $0.15 per hour less in
additional wages for a female vs. a male.
• For Males (Di = 0)

WAGEi ˆ
 B1  0.87
EDUCi
• For Females (Di = 1)
WAGEi ˆ ˆ
 B1  1  0.87  0.15  0.72
EDUCi
66
WAGE = -5.66 + 0.87*EDUC + 0.15*EXP - 0.15*EDUC_FEM
Slope=.87
WAGE
Di=0
Males
Slope=.72
EDUCi
Di=1
-5.66 Females
Note 1: holding EXPER constant.

Note 2: ignoring effect of intercept gender dummy. 67

11 MLRSpecificationIssues

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

11 MLRSpecificationIssues

Uploaded by

Copyright:

Available Formats

Specification Issues for MLR Models

– No irrelevant variables are

2 – The correct functional form is

• Seeing R2 increase as the number

• If Adjusted R2 increases, then the explanatory

• If Adjusted R2 falls when a new variable is

• Adding a new variable will have to reduce RSS by more than it

R2 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8

Rbar2 0.796 0.794 0.792 0.789 0.785 0.777 0.768 0.758

Penalty 1.02 1.03 1.04 1.05 1.07 1.12 1.16 1.21

8 We can choose the correct set of

• OLS requires that the regression equation is

• However, variables Y and X (etc.) do not have to

• When would it be appropriate

Forms – Sample data suggests

ln(Yi) = B0 + B1ln(Xi) +ui

Take total differential :

a.) ln(Yi) = B0 + B1Xi +ui

B1 is the percent change in Yi due to a one UNIT

b.) Yi = B0 + B1 ln(Xi) +ui

B1 is the UNIT change in Yi due to a one

Remember – don’t attempt a log transformation

Yi = B0 + B1Xi + B2 Xi2 +ui Yi

For B1 < 0 and B2 < 0, the negative impact of X on Y

For B1 >0 and B2 < 0, the positive impact of X on Y

Yi = B0 + B1Xi + B2 Xi0.5 +ui

Impact of X decreases as the value of X increases.

Useful for “Diminishing Returns” functions or

Yi = B0 + B1Xi + B2 (1/Xi) +ui

Rarely used in practice, makes coefficients difficult

• 1. If you desire elasticity

• 2. If theory dictates a Diminishing

Your • 3. If theory dictates an Increasing

(revisited) • 4. If sample data indicates a non-

• Where s3 is the standard error of Yi to the

•If a3 = 0 (zero skewness) then the sum of deviations for

•The distribution will be symmetric.

•A normal distribution has a coefficient of skewness

• The sum of deviations for observations

• There will be observations widely dispersed

• The sum of deviations for observations greater

• There will be observations widely dispersed below

32 Reduce Square transformation will reduce

LWAGE vs. EDUC LWAGE vs. EDUC_SQ

(1) (2) (3)

• (2) A one unit increase in

Coefficients • (3) %WAGE/dEDUC =

• Where Zi is a continuous variable.

• Let the coefficient on Xi be a linear function of Zi.

• Example data set with N = 30 cross sectional

Include interaction term in regression:

• Multiply 2 by 10,000 = 0.011. An increase in

• Does the asset size of a bank impact its loan

• Does the role of microfinance lending in alleviating

• Does the change in firm level output due to a change

• Slope Dummy captures the

• Think of breaking up your data sample into two groups:

Dummy variable will take on a value of 1 for each

Note: the mean of the dummy will be

For all Di = 0: Yi = B0 + B1Xi + ui

For all Di = 1: Yi = (B0 + 0 )+ B1Xi + ui

The intercept will shift by 0 when Di = 1.

Dependent Variable: WAGE

Variable Coefficient Std. Error t-Statistic Prob.