You are on page 1of 67

Specification Issues for MLR Models

1
• Recall that a model is correctly
SPECIFIED when
Specification
– No important variables are
omitted.

– No irrelevant variables are


included.

2 – The correct functional form is


used.
• While R2 provides a useful tool in
evaluating goodness of fit across
Adjusted models, it has a major flaw:
R-
• As additional explanatory variables
Squared are added to a model, R2 will
ALWAYS go up, no matter how
weak the relationship with Y.

• Seeing R2 increase as the number


of variables increases tells us
3 nothing about whether our model
has been improved.
Adjusted R2
• The Adjusted R2 (denoted )
R 2
introduces a
“penalty” for each additional variable added.

• If Adjusted R2 increases, then the explanatory


power of the new variable is large enough to
overcome the penalty, and therefore our model
has improved.

• If Adjusted R2 falls when a new variable is


added, then it did not significantly add to the
explanatory power of the model, and should be
removed. 4
Adjusted R2
• Formula:
RSS /(n  k  1)
R  1
2

TSS /(n  1)

 2  n  1 
R  1  (1  R )
2

  n  k  1 

 RSS  n  1 
R  1  
2
 
 TSS  n  k  1  5
Adjusted R2

• Adding a new variable will have to reduce RSS by more than it


increases penalty term [(n-1)/(n-k-1)] in order to “improve” the model.

• Note that the “severity” of the penalty term for each additional variable
will be greater when n is small.

6
Adjusted R2 Example
n 100 100 100 100 30 30 30 30

k 2 3 4 5 2 3 4 5

R2 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8

Rbar2 0.796 0.794 0.792 0.789 0.785 0.777 0.768 0.758

Penalty 1.02 1.03 1.04 1.05 1.07 1.12 1.16 1.21


7
• Using the following tools:
– t-tests
– F-tests
Back to the
– Adjusted R2
issue of
specification:
And avoiding the following
problems:
- Omitted variable bias
- Multicollinearity
- Extraneous variables

8 We can choose the correct set of


independent variables, but what
about. . . .
Alternative Functional Forms

• OLS requires that the regression equation is


linear in parameters, such that B0 and/or B1 9
(etc.) do not enter the equation in anything but a
linear fashion.

• However, variables Y and X (etc.) do not have to


enter the equation in a linear fashion. The
following general functional form will be
consistent with OLS:
f(Yi) = B0 + B1 [g(Xi)] + ui
Level-Level Form

10
Level-Level Form

11
12

• When would it be appropriate


to use non-linear functions of
Y and/or X?

– Underlying theory
Functional suggests non-linear DGP.

Forms – Sample data suggests


non-linear DGP.

– Alternative coefficient
interpretations are desired.
Typical Non-linear Transformations
• 1. Double Log Form (Log-Log): Elasticities

ln(Yi) = B0 + B1ln(Xi) +ui

Take total differential :


1 1
dYi  B1 dX solve for B1
Yi Xi
X dYi dYi / Yi %Yi
B1   
Y dX i dX i / X i %X i
13
Log-Log Form

14
• 2. Semi-log form: partial elasticities

a.) ln(Yi) = B0 + B1Xi +ui

dY / Y %Y
B1  
dX X

B1 is the percent change in Yi due to a one UNIT


change in Xi

15
Log-Level Form

16
2. Semi-log form: partial elasticities

b.) Yi = B0 + B1 ln(Xi) +ui

dY Y
B1  
dX / X %X

B1 is the UNIT change in Yi due to a one


PERCENTAGE point change in Xi

Remember – don’t attempt a log transformation


when X or Y is potentially negative – growth
rates for example. 17
Level-Log Form

18
• 3.) Polynomial Functions: Squared terms

Yi = B0 + B1Xi + B2 Xi2 +ui Yi


 B1  2 B2 X i
X i
For B1 >0 and B2 > 0, the positive impact of X on Y
increases as X increases – increasing returns.

For B1 < 0 and B2 < 0, the negative impact of X on Y


increases as X increases.

For B1 >0 and B2 < 0, the positive impact of X on Y


decreases as X increases – diminishing returns.
19
• 4.) Polynomial Functions: Square-root terms

Yi = B0 + B1Xi + B2 Xi0.5 +ui

Yi 1
 B1  .5 * B2
X i Xi

Impact of X decreases as the value of X increases.

Useful for “Diminishing Returns” functions or


arguments – utility functions, productions
functions, etc.

20
• 5.) Inverse transformation

Yi = B0 + B1Xi + B2 (1/Xi) +ui

Yi  1 
 B1  B2  
X i  Xi 
Again, the marginal effect of X on Y decreases as
the value of X increases. In this case, it
approaches zero (or B1) as X approaches infinity.

Rarely used in practice, makes coefficients difficult


to interpret. 21
22

• 1. If you desire elasticity


interpretation of coefficient
When to (natural log).

• 2. If theory dictates a Diminishing


Transform Returns argument
root, log, or inverse).
(square-

Your • 3. If theory dictates an Increasing


Returns argument (squared-
Data? term)

(revisited) • 4. If sample data indicates a non-


linear relationship between Yi
and Xi,, regardless of theory.
How to
detect
non- • 1. Observe scatter plot diagrams for
each independent variable against
linearities the dependent variable.
in sample
data • 2. Check for skewness of variables –
where skew represents a lack of
symmetry of the frequency
distribution about the mean.
23
Measuring Skew
• We can measure the degree of symmetry
or skewness in a distribution by calculating
the third moment of the distribution, or the
coefficient of skewness:
1
n
 i
(Y  Y ) 3

a3 
s3

• Where s3 is the standard error of Yi to the


3rd power. 24
1
n
 i
(Y  Y ) 3

a3  3
0
s

•If a3 = 0 (zero skewness) then the sum of deviations for


observations greater than the mean is just equal to the
sum of deviations for observations below the mean.

•The distribution will be symmetric.

•A normal distribution has a coefficient of skewness


equal to zero.

25
Zero (approx.) Skew
1
n
 i
(Y  Y ) 3

a3  3
 .07
s
60
Series: X1
Sample 1 526
50
Observations 526

40 Mean 0.052784
Median 0.099718
30 Maximum 2.743626
Minimum -3.371083
Std. Dev. 1.025389
20 Skewness -0.074157
Kurtosis 2.930136
10
Jarque-Bera 0.589082
Probability 0.744873
0
-3 -2 -1 0 1 2 26
1
n
 i
(Y  Y ) 3

a3  3
0
s
• a3 > 0 :positive skew, or skewed to the
right.

• The sum of deviations for observations


greater than the mean will be greater than
the sum of deviations for observations
below the mean.

• There will be observations widely dispersed


above the mean. 27
Positive Skew
1
 (Yi  Y ) 3
a3  n 3
 2 .0
s
140
Series: WAGE
120 Sample 1 526
Observations 526
100
Mean 5.896103
80 Median 4.650000
Maximum 24.98000
Minimum 0.530000
60
Std. Dev. 3.693086
Skewness 2.007325
40
Kurtosis 7.970083
20 Jarque-Bera 894.6195
Probability 0.000000
0
0 2 4 6 8 10 12 14 16 18 20 22 24 28
1
n
 i
(Y  Y ) 3

a3  3
0
s
• a3 < 0: negative skew, skewed to the left.

• The sum of deviations for observations greater


than the mean will be less than the sum of
deviations for observations below the mean.

• There will be observations widely dispersed below


the mean.

29
Negative Skew
1
n
 i
(Y  Y ) 3

a3  3
 .62
s
STATA: hist educ
sum educ, detail
200
Series: EDUC
Sample 1 526
160 Observations 526

Mean 12.56274
120 Median 12.00000
Maximum 18.00000
Minimum 0.000000
80 Std. Dev. 2.769022
Skewness -0.619574
Kurtosis 4.884245
40
Jarque-Bera 111.4653
Probability 0.000000
0
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
30
Problem: Relationship between
Skewed variables will be non-linear
WAGE vs. EDUC
30
WAGE is positively
25
skewed – larger
20 dispersion above mean.
15 EDUC is negatively
WAGE

10
skewed – larger
dispersion below mean.
5

-5
0 4 8 12 16 20
31
EDUC
In order to make the relationship linear in
Order parameters, data must be transformed
such that skew is reduced.
Correcting
for Skew
Square Square-root, natural log will reduce
positive skew.

32 Reduce Square transformation will reduce


negative skew.
• Log transformation “bunches together” larger values,
reducing dispersion above the mean, making positive
skew more symmetric.
• Likewise, smaller values become relatively more
dispersed.

4.5
4
3.5
3
LN(Y)
2.5
2
1.5
1
0.5
0

Y 33
1

7
10

13

16

19

22

25

28

31

34

37

40

43

46

49
•Square transformation increases dispersion of
larger values, reduces relative dispersion of smaller
values.
•Thus, negative skew is reduced. Distribution
becomes more symmetric.
Square Transformation to Reduce Negative Skew

700
600
500

Y2 400
300
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12Y13 14 15 16 17 18 19 20 21 22 23 2434
Variable Skew Coefficient
EDUC -0.62
Transformations (EDUC)2 0.39
WAGE 2.00
Ln(WAGE) 0.39

35
Transformed Scatter Plots

LWAGE vs. EDUC LWAGE vs. EDUC_SQ


4 4

3 3

2 2
LWAGE

LWAGE
1 1

0 0

-1 -1
0 4 8 12 16 20 0 50 100 150 200 250 300 350
EDUC EDUC_SQ

36
Transformed Regressions

(1) (2) (3)


Dep. Var.: Dep. Var.: Dep. Var.:
WAGE Ln(WAGE) Ln(WAGE)
EDUC 0.5413 0.0827 -
(10.17) (10.94)
EDUC2 - - 0.004
(11.80)
R2 0.1648 0.1858 0.2099

37
• (1) A one unit increase in
EDUC increases WAGE by
0.54 units (dollars)

• (2) A one unit increase in


EDUC increases WAGE by
Interpret 8.27%.

Coefficients • (3) %WAGE/dEDUC =


(2)(.004)(EDUCi)
Going from 10 to 11 years of
education will cause a
(2)(.004)(10)=.08 = 8%
increase in WAGE.

38
General 39
Interaction Terms
• A general interaction term captures the change in the
marginal effect of Xi on Yi due to a one unit change in a
second independent variable Zi.

• Where Zi is a continuous variable.

• Let the coefficient on Xi be a linear function of Zi.


Interaction Term Regression Model
• Original Equation: Yi =  + Xi + ei
• Interaction assumption:  = 1 + 2Zi
• Regression: Yi =  + (1 + 2Zi)Xi + ei

Yi =  + 1Xi + 2XiZi + ei

Yi
 ˆ1  ˆ2 Z
X i
  Yi  ˆ
    2
Z i  X i  40
MPC Example
• Is the marginal propensity to consume of households
dependent on the wealth level of the household?

• Example data set with N = 30 cross sectional


observations of household consumption, income and
wealth.
MODEL: CONSi =  + INCi + ei
ASSUME:  = 1 + 2WEALTHi
REGRESSION:
CONSi =  + 1INCi + 2INCi*WEALTHi
41
In STATA, generate the interaction term between income and
wealth
gen inc_wlth=inc*wealth

Include interaction term in regression:


reg cons inc inc_wlth

Number of obs = 30
F( 2, 27) = 65.91
Prob > F = 0.0000
R-squared = 0.8300
Adj R-squared = 0.8174

----------------------------------------------------------
cons | Coef. Std. Err. t P>|t|
-------------+--------------------------------------------
inc | .7079231 .1754693 4.03 0.000
inc_wlth | 1.14e-06 3.84e-07 2.96 0.006
_cons | 17986.08 6367.999 2.82 0.009
----------------------------------------------------------
42
Interpretation:
• 1. Interaction Coefficient 2 is positive and
significant: An increase in WEALTH leads to an
increase in MPC.
CONSi
 ˆ1  ˆ2WEALTH  0.71  0.0000011*WEALTH
INCi
  CONS  ˆ
    2  0.0000011
WEALTH i  INCi 

• Multiply 2 by 10,000 = 0.011. An increase in


WEALTH of $10,000 leads to an increase in MPC
of 0.01 (all else held constant).
43
If wealth = 0, MPC = 0.71. If wealth = $100,000, MPC = 0.81

CONS CONS
INC

Wealth = 100,000
.81

.71
Wealth = 0

INC WEALTH=100,000 44
Other Examples

• Does the asset size of a bank impact its loan


response to a change in interest rates? (interact assets
and policy variable)

• Does the role of microfinance lending in alleviating


poverty depend on education levels? (interact credit
and education)

• Does the change in firm level output due to a change


in labor depend on the size of the firm (are there
significant economies of scale)?
• In order to capture explanatory
power of factors for which there
Chapter 7: does not exist “traditional”
Dummy continuous variables, must use
qualitative / indicator variables.
Independent
Variables
• Intercept Dummy captures the
difference in mean value of Yi due to
a given quality.

• Slope Dummy captures the


46 difference in marginal effect of X on
Y due to a given quality.
Intercept 47
Dummies

• Think of breaking up your data sample into two groups:


– A control group, which does not possess the quality of
interest.
– An experimental or target group, which does possess
the quality of interest.

Dummy variable will take on a value of 1 for each


observation in the “target” group, and zero otherwise.
• Examples:
– Race: where nonwhite=1, white=0,
[white is control, non-white is target]
– Gender: where male=1, female = 0,
[female is control, male is target]
– Union membership: where
member=1, non-member=0.
Intercept – Location: where 1=South, 0=North.
– Time periods: crisis dates 1=crisis
Dummies this period, 0=no crisis.
– Product characteristics: where
1=turbo, 0=no turbo.

Note: the mean of the dummy will be


the proportion of sample in target
group.
Model:
• Yi = B0 + B1Xi + 0Di + ui
• Di = 1 if observation i possesses quality
0 otherwise

For all Di = 0: Yi = B0 + B1Xi + ui

For all Di = 1: Yi = (B0 + 0 )+ B1Xi + ui

The intercept will shift by 0 when Di = 1.


Slope coefficient will be identical for both groups (B1)
Dummy coefficient is estimated by OLS just like others. 49
For D i  0 : Y i  NB0  B1  X i   u i
Y 
NB0
i
 B1
X i

 u i

N N N N
Y( 0 )  B0  B1 X
For D i  1 :  Y  NB  N  B  X   u
i 0 0 1 i i

 Y  NB  N  B  X   u
i 0 0 i i
1
N N N N N
Y(1)  B0   0  B1 X
Therefore :
Y(1)  Y( 0 )   0 50
Case 1: 0 > 0, B0 > 0, B1 > 0. Dummy causes
intercept to shift upwards.

Yi

Y= (B0+0)+B1X

Y= B0+B1X

B0 +0

B0

Xi
51
Case 2: 0 < 0, B0 > 0, B1 > 0. Dummy causes
intercept to shift Downwards.

Yi

Y= B0+B1X

Y= (B0+0)+B1X

B0

B0 +0

Xi
52
Wage2.dta Example
REG WAGE EDUC EXPER FEMALE

Dependent Variable: WAGE


Sample: 1 100

Variable Coefficient Std. Error t-Statistic Prob.

C -4.8590 2.203 -2.2056 0.0298

EDUC 0.8176 0.148 5.5020 0.0000

EXPER 0.1517 0.030 4.9711 0.0000

FEMALE -2.0306 0.702 -2.8890 0.0048

R-squared 0.4013 Adjusted R-squared 0.3826


53
54
Interpretation

• Dummy coefficient of -2.03 means that, holding all other


explanatory variables constant, the mean of WAGE is
lower by 2.03 when Di=1.

• i.e. a female worker with identical education and work


experience will be paid, on average, $2.03 less than a
male worker.
Intercept for males: B0 = -4.86.
Intercept for females: (B0 + ) = -4.86 + (-2.03) = -6.89

WAGE

Males

EDUCi

Females
-4.86

-6.89
Note: holding EXPER
constant.
55
• Dummy Variable Trap: Use one fewer dummy variables
than possible conditions or states. 56
– For example, gender traditionally has two possible
states, so a single variable allows for comparison of
one relative to the other.

– If a MALE and FEMALE dummy are included


simultaneously, then you will violate the assumption
of no perfect linear relationship amongst explanatory
variables.
57

– Other example:

you want to capture geographic effects on bank


performance by Federal Reserve district. There are
12 districts, so include only 11 dummies. The
omitted region will be basis for comparison.
58
Slope-Dummy Variables
• It is often interesting to look not only at the impact of a
given quality on the mean outcome of Y, but also on the
marginal effect of Xi on Yi.

• Is the slope of the regression line different for the


control vs. treatment group?

• For example, does an additional year of education have


a smaller effect on WAGE for females vs. males?
Slope-Dummy Regression
Model
• Assume that intercepts are the same for control and
experimental group, but let slopes differ.
59
• Again, let Di = (0,1).

• Define an interaction term as the product of two


independent variables.

• In this case, the interaction term of interest is the product


of Xi and Di, where Xi is a continuous independent
variable.,
• Regression: Yi = B0 + B1Xi + 1(Xi)(Di)

• If Di = 0, then Yi= B0 + B1Xi


Yi ( 0)
 ̂1
X i

• If D = 1, then Yi = B0 + B1Xi +1Xi


Yi (1)
 ˆ1  ˆ1
X i
60
• Use standard hypothesis tests for
Slope- OLS estimate of slope-dummy
Dummy coefficient.
coefficient
• Only impose interpretation if you can
reject H0: 1 = 0

• NOTE: Interaction variable will


appear in your data as 0’s when Di =
61 0, and equal to Xi when Di = 1.
Slope-Dummy Case 1: 1 > 0, B0>0, B1>0. Xi has a
larger impact on Yi when Di = 1.

Yi

Di = 1 Yi = B0+(B1+ 1)Xi

Yi = B0 + B1Xi

Di = 0

B0

Xi
62
Slope-Dummy Case 2: 1 < 0, B0>0, B1>0. Xi has a
SMALLER impact on Yi when Di = 1.

Yi

Yi = B0 + B1Xi
D=0

Yi = B0+(B1+ 1)Xi

D=1
B0

Xi
63
64
Wage1 Slope-Dummy Example
• Suppose that we do want to examine the impact of
gender on the return to education. . .
• First, we must construct a slope-dummy interaction
term between the dummy FEMALE and EDUC.

• In STATA, type:
“gen EDUC_FEM = EDUC * FEMALE”
65

• Then, we simply include this term in our previous regression.

• Note, you may include both an intercept and slope dummy term
simultaneously, but this can lead to MULTICOLLINEARITY issues.

• We get:
WAGE = -5.66 + 0.87*EDUC + 0.15*EXP - 0.15*EDUC_FEM
(.01) (0.00) (0.00) (0.01)
(p-values in parentheses)

1 = -0.15 < 0 . . .
• …This means that, on average, one additional year
of education will yield $0.15 per hour less in
additional wages for a female vs. a male.

• For Males (Di = 0)


WAGEi ˆ
 B1  0.87
EDUCi

• For Females (Di = 1)

WAGEi ˆ ˆ
 B1  1  0.87  0.15  0.72
EDUCi
66
WAGE = -5.66 + 0.87*EDUC + 0.15*EXP - 0.15*EDUC_FEM

Slope=.87
WAGE
Di=0
Males

Slope=.72

EDUCi

Di=1
-5.66 Females

Note 1: holding EXPER constant.


Note 2: ignoring effect of intercept gender dummy. 67

You might also like