You are on page 1of 61

Handout 6: Nonlinear regression

Yichong Zhang1

1 School of Economics

Singapore Management University


Nonlinear regression functions

I Up until now, we have assumed that the population regression


function was linear.
I Since the slope of the population regression function was a
constant, the effect on Y of a unit change in X did not
depend on the value of X .
I But what if the effect on Y of a unit change in X does
depend on the value of one (or perhaps several) of the
regressors?
I If so, the population regression function is nonlinear.
Nonlinear regression functions (Cont’d)

I We will look at two methods for modeling nonlinearities:


I Allowing the effect on Y of a unit change in X1 to depend on
the value of X1 .
I Typical example: effect of labor market experience on wages.
I This method uses nonlinear functions of the X ’s including
polynomials and logarithms.
I Allowing the effect on Y of a unit change in X1 to depend on
the value of another (or perhaps many) regressors X2 .
I Typical example: effect of schooling on wages, which may
vary with gender, work experience...
I This method uses interaction terms.
I Let’s start with the second approach (figure c), which uses
dummy variables (also called binary variables) and interaction
terms.
I Dummies and interactions are a useful way to expand the
flexibility of OLS.
Dummies and interactions

I We have already seen examples where a single regressor is a


dummy variable.
I Let’s review that case. Suppose we have data1 on earnings,
gender and years of education:
I Gender (female) is a dummy variable that takes on the value 1
for females and 0 for males.
I Wage is average hourly wage in 1998 (in $).
I Yrseduc is years of education (between 6 and 20 years).
I We know that we can test the null hypothesis that average
earnings are equal for males and females by running the
regression
Wage = β 0 + β 1 Gender + u

1 This data is drawn from the 1998 Current Population Survey (CPS) and includes full-time workers, age 25 to 64.
Dummies and interactions (Cont’d)

I We find that women can expect to earn about $3.45 on


average less per hour than men.
I If we test the null hypothesis H0 : β 1 = 0, we see that it is
rejected here at any level of significance.
I In fact, β 0 is the population expected value of male earnings,
β 0 + β 1 is the expected value of female earnings, and β 1 the
difference in expectations across genders.
Dummies and interactions (Cont’d)

I We can also assess the impact of education on earnings (we


will ignore the omitted variable bias associated with not
including ability) by running the regression:

Wage = β 0 + β 1 Yrseduc + u

I Each additional year of school is expected to increase hourly


wages by $1.33.
I But what if we want to analyze both effects at the same time?
Dummies and interactions (Cont’d)
I One option would be to run the regression

Wage = β 0 + β 1 Gender + β 2 Yrseduc + u

I We now have different regression functions for males and


females

Males : Wage = β 0 + β 2 Yrseduc + u


Females : Wage = ( β 0 + β 1 ) + β 2 Yrseduc + u
Males : [ = −1.24 + 1.35 · Yrseduc
Wage
Females : [ = −4.88 + 1.35 · Yrseduc
Wage
Dummies and interactions (Cont’d)

[ =c
Wage β0 + c
β 1 Gender + c
β 2 Yrseduc
I What does this look like graphically?
Dummies and interactions (Cont’d)

I A richer specification would include an interaction term


Gender · Yrseduc.

Wage = β 0 + β 1 Gender + β 2 Yrseduc + β 3 Gender · Yrseduc + u


Dummies and interactions (Cont’d)
What are the different cases now?

Males : Wage = β 0 + β 2 Yrseduc + u


Females : Wage = ( β 0 + β 1 ) + ( β 2 + β 3 ) Yrseduc + u
Males : [ = −1.41 + 1.37 · Yrseduc
Wage
Females : [ = −4.61 + 1.34 · Yrseduc
Wage

What does this look like graphically?


Dummies and interactions (Cont’d)

I What if we force the intercepts to be the same?

Wage = β 0 + β 2 Yrseduc + β 3 Gender · Yrseduc + u


Dummies and interactions (Cont’d)
I What are the different cases now?

Males : Wage = β 0 + β 2 Yrseduc + u


Females : Wage = β 0 + ( β 2 + β 3 ) Yrseduc + u
Males : [ = −2.65 + 1.46 · Yrseduc
Wage
Females : [ = −2.65 + 1.20 · Yrseduc
Wage

I What does this look like graphically?


Dummies and interactions (Cont’d)

I Let’s look at the most flexible specification again

Males : Wage = β 0 + β 2 Yrseduc + u


Females : Wage = ( β 0 + β 1 ) + ( β 2 + β 3 ) Yrseduc + u
Dummies and interactions (Cont’d)

I Question: How do you..


I Test for equal intercepts?
I H0 : β 1 = 0 HA : β 1 6= 0 ⇒ t = −7.84 (can reject null)
I Test for equal slopes?
I H0 : β 3 = 0 HA : β 3 6= 0 ⇒ t = −1.03 (can’t reject null)
I Test for both (i.e., that gender has no effect on wage)?
I H0 : β 1 = β 3 = 0 HA : β 1 6= 0 or β 3 6= 0 ⇒ F -test

I Can reject null


Dummies and interactions (Cont’d)

I Let’s look at a different example


I HIEL is a dummy variable that equals 1 if percentage of
English learners is greater than 10% and 0 otherwise

c = βb0 + βb1 STR = 682.2 − .97 · STR


HIEL = 0 : TS    
HIEL = 1 : TS
c = βb0 + βb2 + βb1 + βb3 STR =
688 − 2.24 · STR
Dummies and interactions (Cont’d)

c = 682.2 − .97 · STR


HIEL = 0 : TS
c = 688 − 2.24 · STR
HIEL = 1 : TS
Dummies and interactions (Cont’d)

I Test for equal intercepts?


I H0 : β 2 = 0 HA : β 2 6= 0 ⇒ t = .29 (can’t reject null)
I Test for equal slopes?
I H0 : β 3 = 0 HA : β 3 6= 0 ⇒ t = −1.32 (can’t reject null)
I Test for both?
I H0 : β 2 = β 3 = 0 HA : β 2 6= 0 and/or β 3 6= 0

Can reject null!


Dummies and Interactions (Cont’d)

I So what’s going on here?

I This is a great example of near perfect multicollinearity!


Aside: Dummy variables vs. categorical variables

I What if you have more than one category?


I Should you make dummies for each or just use a single
categorical variable?
I Let’s drop the Female/Male comparison and look at how
wages vary with education in different parts of the country.
I Suppose we break the country into 4 mutually exclusive
regions: Northeast, Midwest, West, and South.
I Now let’s construct 3 dummy variables for three regions
(Northeast, Midwest, West) and leave the South as the
omitted category.
I Why do we need to?
I Finally, let’s regress wage on education and these three
dummies.
Dummy variables vs. categorical variables (Cont’d)
Dummy variables vs. categorical variables (Cont’d)

I What happens if we replace the three dummies with a single


variable (region) that is

= 0 if south = 1
= 1 if midwest = 1
= 2 if west = 1
= 3 if northeast = 1
Dummy variables vs. categorical variables (Cont’d)

wage = β 0 + β 1 yrseduc + β 2 region + U

I We have forced the vertical differences between the lines to be


the same.
I And we arbitrarily forced an order.
I This is clearly restrictive!
Generate dummies in EViews

I Load the CPS data to EViews.


I Generate dummy “high school graduate” when yrseduc ≥ 12.
I Two ways, ’series highgrad = yrseduc>= 12’
I Generate categorical data “loc”, south = 0, midwest = 1,
west = 2, northeast = 3.
I ‘series loc = 0*south + midwest + 2*west + 3*northeast‘
Interactions between two continuous variables

I OLS can also easily handle interactions between two (or more)
continuous variables.
I For example, suppose you thought that the effect on expected
earnings of an increase in years of education depended on
years of experience (Exper ).
I How could you model this?

Wage = β 0 + β 1 Yrseduc + β 2 Exper + β 3 Yrseduc · Exper + u


Interactions between two continuous variables: expected
change

E (Wage |YrsEduc + 1, Exper ) − E (Wage |YrsEduc, Exper )


= β 1 + β 3 Exper
E (Wage |YrsEduc, Exper + 1) − E (Wage |YrsEduc, Exper )
= β 2 + β 3 YrsEduc

I So β 3 > 0 means that a extra year of education increases


expected wages more for more experienced workers, and,
I an extra year of experience increases expected wages more for
more educated workers.
Nonlinear regression functions

I We’ve seen now that the linearity of OLS is not as bad as it


might seem: OLS is quite flexible!
I We have already seen how to incorporate nonlinearities by
using dummies and interactions.
I These methods allowed the effect on Y of a unit change in X1
to depend on the value of another regressor X2 (or perhaps
several Xk ’s).
I Now we’ll introduce methods that allow the effect on Y of a
unit change in X1 to depend on the value of X1 itself.
Nonlinear regression functions (Cont’d)
Nonlinear regression functions (Cont’d)

I Why might this be useful? Let’s look at an example from


Stock and Watson concerning test scores and a measure of
income.2

\ = 625.4 + 1.88 · AvgInc


TestScr R 2 = .51
(1.87) (.114)

2 Income is average annual per capita income in the school district.


Quadratic regression
I The relationship is clearly nonlinear: a linear approximation is
inadequate.
I Overpredict scores for low income and high income, and
underpredict how much income matters in low income districts
I So how can we adjust our OLS techniques to handle this?
I One approach is to approximate the curved relationship with a
quadratic function (e.g. y = a + bx + cx 2 ).
I Specifically, we can run the quadratic regression

TestScorei = β 0 + β 1 Incomei + β 2 Incomei2 + ui

I Note that this regression is linear in Income and Income 2 :


after creating the variable Incomei2 in EViews we can just use
OLS to estimate this quadratic regression.
Quadratic regression (Cont’d)

I Estimation by OLS yields

\ = 607.3 + 3.85 · AvgInc − .04 · AvgInc 2


TestScr
Quadratic regression (Cont’d)

I This quadratic function is steep for low values of income but


flattens out when income is high.
I Since βb2 < 0 the effect is bigger for small values of income
and then becomes smaller as the quadratic term kicks in.

I Moreover, the nonlinearity here is statistically significant. (How


do I know? H0 : β 2 = 0 is rejected for all α since t2 = −8.85)
Fit a quadratic regression in scatter plot

I Load California Test Score data.


I Regress testscr on avginc and avginc2
I testscr avginc as group.
I Graph – scatter – regression line – option – polynomial 2.
Interpreting nonlinear regression functions
I However, since we clearly can’t “keep Income 2 constant”
while we change Income, the interpretation of the coefficients
is more complicated now.
I The nonlinear models we will consider here have an additive
error
Y = f (X1 , X2 , ..., Xk ) + u
so the expected effect on Y of a change in Xi is simply

∆Y = f (X1 , ..Xi + ∆Xi , .., Xk ) − f (X1 , ..Xi , .., Xk )

I The estimator of this unknown population difference is the


difference between the predicted values

∆Yb = fb (X1 , ..Xi + ∆Xi , .., Xk ) − fb (X1 , ..Xi , .., Xk )


Interpreting nonlinear regression functions (Cont’d)
I In the quadratic regression example above, the expected
change of TestScr of a one unit change in Income is

∆TESTSCRi = β 0 + β 1 (Incomei + 1) + β 2 (Incomei + 1)2


− ( β 0 + β 1 Incomei + β 2 Incomei2 )
= β 1 + β 2 (2Incomei + 1).

The estimate of expected change is

∆TESTSCR
\ i = βb0 + βb1 (Incomei + 1) + βb2 (Incomei + 1)2
h i
− βb0 + βb1 Incomei + βb2 Incomei2
= βb1 + βb2 (2 · Incomei + 1)

which depends on the initial level of Incomei .


Interpreting nonlinear regression functions (Cont’d)

I So what if income increases by m units?

∆TESTSCRi = m · β 1 + β 2 2 · m · Incomei + m2


I The estimate of the expected change:

∆TESTSCR
\ i = m · βb1 + βb2 2 · m · Incomei + m2

Interpreting nonlinear regression functions (Cont’d)
I Since we know that for a one unit increase in Income the expected
change in TestScr is
∆TESTSCRi = β 1 + β 2 (2 · Incomei + 1)
I The estimate is
∆TESTSCR
\ i = βb1 + βb2 (2 · Incomei + 1)
we can calculate the expected change of a one unit increase for any
initial value of Income.
I For example, if income increases from 10 to 11, then
∆TESTSCR
\ = βb1 + βb2 (2 · 10 + 1) = 3.85 − .042 · (21) = 2.96
but if income increases from 40 to 41, then
∆TESTSCR
\ = βb1 + βb2 (2 · 40 + 1) = 3.85 − .042 · (81) = .45

I What if income increases from 10 to 12?


∆TESTSCR
\ = 2 · βb1 + βb2 (4 · 10 + 4) = 7.7 − .042 · (44) = 5.85
Calculating SEs (and CIs) for predicted changes

I What if we want to calculate a confidence interval for the


effect of a given change in Income?
I We know the formula is (for a 95% confidence interval)

∆TESTSCR
\ ± 1.96 · SE (∆TESTSCR
\ )

I But how do we find SE (∆TESTSCR


\ )?
Calculating SEs (and CIs) for predicted changes (Cont’d)

I In this quadratic regression example, we have:

∆Yb = ∆TESTSCR
\ = βb1 + βb2 (2 · Income + 1)
 
I So to compute SE ∆Yb we need to compute
 
SE βb1 + βb2 (2 · Income + 1)

I For the case 


where income
 increases from 10 to 11 we need to
compute SE β 1 + 21 β 2
b b

I To estimate SE ( βb1 + 21 βb2 ) we could transform the regression


so that βb1 + 21 βb2 appears as a coefficient on one of the
variables, but there is a simpler method.
Calculating SEs (and CIs) for predicted changes (Cont’d)
I Note that ∆Ŷ = β̂ 1 + 21 β̂ 2 .
I To use this method, we must first compute the F -statistic for
H0 : β 1 + 21β 2 = 0 (in general, H0 : ∆Y = 0)

The SE is then given by


  |∆Yb |
SE ∆Yb = √
F

I Why? The t-stat for this null is t = β1b+21β2b = ∆Yb b


b b
SE ( β 1 +21 β 2 ) SE (∆Y )
 2
∆Yb |
 
∆Yb
I 1 restriction: F = t 2 =
SE (∆Yb )
⇒ SE ∆Yb = |√ F
 
I In this case F = 299.94 so SE ∆Yb = √ ∆Yb
F
= √ 2.96 = 0.17
299.94
Calculating SEs (and CIs) for predicted changes (Cont’d)
I Therefore, a 95% confidence interval for the change in the
expected value of Y is
2.96 ± 1.96 · .17 = (2.63, 3.29)

I How about when income increases from 40 to 41? Recall that


in this case ∆Yb = ∆TESTSCR
\ = .45
I To compute this standard error we compute the F -statistic for
H0 : β 1 + 81β 2 = 0.

 
I Now F = 9.68 so SE ∆Yb = √.45 = 0.14
9.68
I Therefore, a 95% confidence interval for the change in the
expected value of Y in this case is
.45 ± 1.96 · .14 = (.17, .73)
Identifying and modeling nonlinearities (Cont’d)

I There are several ways to model nonlinearities.


I We have already looked at the quadratic approximation.
I This approach can easily be extended to higher order
polynomials (X 3 , X 4 , etc.).
I Logarithms are another popular method. Why?
I Their coefficients are easier to interpret than the coefficients in
quadratic or polynomial regressions.
I Logarithms convert changes in variables into percentage
changes, which is nice because a lot of relationships are
naturally thought of in terms of percentages.
I Let’s look now at how logarithms can be used to model
nonlinearities.
Logarithms

I The exponential function of x is e x , where e is the


mathematical constant3 2.71828...
I The natural logarithm (written ln but pronounced log) is the
inverse of the exponential function (i.e. the function for which
x = f (e x )). y = ln (x ) is graphed below:

3 The transcendental number e (whose symbol honors the Swiss mathematician Leonhard Euler) can be defined by the limit e ≡ limn→∞ 1 + n1
n
.
Logarithms (Cont’d)

I Some useful properties of logarithms:

ln (xy ) = ln (x ) + ln (y )
ln (y /x ) = ln (y ) − ln (x )
ln (x y ) = y ln (x )
ln (1) = 0

I But:

ln (x + y ) 6= ln (x ) + ln (y )
ln(x − y ) 6= ln(x ) − ln(y )
Logarithms (Cont’d)
I Logarithms are often useful because they (approximately)
convert changes in variables into percentage changes:
∆x ∆x
ln (x + ∆x ) − ln (x ) ≈ (when is small)
x x
I For example,
.01
ln (1.01) − ln (1) = 0.00995 ≈ .01 =
1
ln (101) − ln (100) = 0.00995 ≈ .01
ln (1.05) − ln (1) = 0.04879 ≈ .05
but
ln (1.5) − ln (1) = 0.405 47 6= 0.5
I This will turn out to be very useful in what follows.
I There are three main ways to use logs in regressions.
I We’ll discuss each one in turn and show how to interpret their
estimation results.
Logarithms: Linear-log model
I Assume that the regression has the following shape

Y = β 0 + β 1 ln X + u

I When would we want to use this approach?

I β 1 clearly does not measure the expected change in Y


induced by a unit change in the level of X .
I Rather, it measures the expected change in Y induced by a
unit change in the logarithm of X . But what does this mean?
Logarithms: Linear-log model (Cont’d)

I Suppose that there’s a small change in X . What’s the effect


on the expected value of Y ?

E [Y | X + ∆X ] − E [Y | X ]
= β 0 + β 1 ln (X + ∆X ) − [ β 0 + β 1 ln (X )]
= β 1 [ln (X + ∆X ) − ln (X )]

So ∆Y ≈ β 1 ∆X
X if
∆X
X is small.4
I So, in the linear-log model, a small percentage change in X
has a constant effect on the expected value of Y .
I Specifically, a 1% change in X (i.e. ∆X
X = .01) is associated
with a change in Y of .01β 1 .

4 Recall that with simple univariate OLS, ∆Y = β


1 ∆X .
I Let’s look at an example.
I Suppose we have data (for a single firm) on SALES (monthly
sales in $1000, mean = 8196) and ADVERT (advertising
expenditures in $1000, mean = 218) and we want to assess
the impact of advertising on sales.
I Let’s start with a regression without any logs (we just regress
SALES on ADVERT )

I What is the interpretation of β 1 ?


I According to this regression, a $1000 increase in advertising is
associated with an expected increase in average sales of $7193
dollars.
Logarithms: Linear-log model (Cont’d)

I Let’s look at the linear-log model now.


I Suppose we regress SALES on ln (ADVERT ) (you need to
create this variable in EViews).5
I The estimation output is

I What is the interpretation of β 1 ?


I According to this regression, a 1% increase in advertising is
associated with an increase in sales of
.01 × 2630 × 1000 = 26300 dollars.

5 gen lnadvert = ln(advert)


Logarithms: Linear-log model (Cont’d)
I R 2 has also increased by .074.
I What about the predicted difference in sales for advertising
expenditures of 201 versus 200 thousand dollars?

∆Yb = 2630 (ln (201) − ln (200)) = 13.11 or $13, 100.

I What about for advertising expenditures of 220.18 versus 218


thousand dollars (218 is the mean of ADVERT, so this is a
1% change at the mean)?

∆Yb = 2630 (ln (220.18) − ln (218)) = 26.17 or $26, 169.

I Note that these are now exact values, not approximations.


I Let’s look now at a couple of alternative nonlinear
specifications that use logs.
Logarithms: Log-linear model
I What if we apply the log to Y instead of X ?
I Now the regression has the following shape

ln Y = β 0 + β 1 X + u

I This is also called the exponential model since

Y = e β0 + β1 X · v =⇒ ln Y = β 0 + β 1 X + ln v

I When would we want to use this approach?


Logarithms: Log-linear model (Cont’d)
Suppose that there’s a change in X . What’s the effect on the
expected value of ln (Y )?

E [ln (Y ) | X + ∆X ] − E [ln (Y ) | X ] = β 1 (X + ∆X ) − β 1 X
= β 1 ∆X

So the difference in expected values is given by

ln (Y + ∆Y ) − ln (Y ) = β 1 ∆X
∆Y
but if ∆Y is small, ln (Y + ∆Y ) − ln (Y ) ≈ Y
∆Y
so Y ≈ β 1 ∆X
so here a unit change in X implies (approximately) a constant
proportional change in Y equal to β 1 .
Specifically, a change in X of one unit (∆X = 1) is associated with
a 100β 1 % change in Y .
Logarithms: Log-linear model (Cont’d)
I Let’s look at our sales example again. Suppose we regress
ln (SALES ) on ADVERT .

I What is the interpretation of β 1 ?


I According to this regression, a $1000 increase in advertising is
associated with a 100 × .0008 = .08 percent increase in sales.
I Evaluated at the mean of sales, this is
.0008 · 8196 · 1000 = $6, 557
I We cannot compare the R 2 here with the previous regressions
because the dependent variable is now different and the R 2
can only be used to compare regressions with the same
dependent variable.
Logarithms: Log-log model
I Now the regression has the following shape
ln Y = β 0 + β 1 ln X + u

I This is also called the multiplicative model since


Y = βe0 · X β1 · v =⇒ ln Y = ln βe0 + β 1 ln X + ln v

I Using similar arguments as before


ln (Y + ∆Y ) − ln (Y ) ≈ β 1 ln (X + ∆X ) − β 1 ln (X )
∆Y ∆X
≈ β1
Y X
or
∆Y /Y 100 × (∆Y /Y ) percentage change in Y
β1 = = =
∆X /X 100 × (∆X /X ) percentage change in X
so β 1 is the ratio of the % change in Y to the % change in X .
Logarithms: Log-log model (Cont’d)

I If the % change in X is 1% (i.e. ∆X X = .01), then β 1 is the %


change in Y associated with a 1% change in X .
I This is what we call the elasticity of Y with respect to X .
I So, for the log-log model, a 1% change in X is associated
with a β 1 % change in Y .
Logarithms: Log-log model (Cont’d)
I Let’s look at our sales example once more. Suppose we
regress ln (SALES ) on ln (ADVERT )

I What is the interpretation of β 1 ?


I According to this regression, a 1% increase in advertising is
associated with a .33% increase in sales.
I Evaluated at the mean this is .0033 · 8196 · 1000 = $27, 047
I Also, R 2 has increased by .123 from the previous regression
(okay again since the dependent variables are the same).
Logarithms
I We can see what’s going on here by looking at the data:
typically the most appropriate specification will be the one
which looks the most linear.
Log-transform: another example
I Regressing levels on logs: Y = α + β log(X ) + ε, dY = β dX
X .
E.g.,

CEO salary = 4.822 + 1, 812.5 log(sales) + ε,

where salary is expressed in $1000 dollars. For 1% increase in


sales, salary increases by $18,125.
I Regressing logs on levels: log(Y ) = α + βX + ε, dY
Y = βdX .
E.g.,

log(wage) = α − β1{being a minority} + ε.

I Regressing logs on logs: log(Y ) = α + β log(X ) + ε,


dY dX
Y = β X . E.g.,

log(CEO salary) = 4.822 + 0.257 log(sales) + ε.

1% increase in sales leads to 0.257% increase in salary.


Be cautious about the percentage change

y 0 −y ∼ 0
y = log(y 0 ) − log(y ), only when | y −y y | is small.
y 0 −y
I y = −5%, log(y 0 ) − log(y ) = −4.9%.
y 0 −y
I = −75%, log(y 0 ) − log(y ) = −139%. In this case, the
y
percentage change for the log is not appropriate.
0
I Convert from log change to percentage change when | y −y y | is
large.

y0 y0 − y
a := log(y 0 ) − log(y ) = log( )⇒ = exp(a) − 1.
y y
Arguments for and against logs

For:
I Gives a nice percentage/elasticity interpretation.
I Can be generated by some theoretical specifications (e.g.,
Cobb-Douglas...).
I Parsimony.
Against:
I Restrictive vs. a more flexible form given by polynomials.
I May need more flexible forms to test whether restrictions
imposed by logs are valid.
When to take log

I For variables with positive currency amount (e.g. wage), large


integral values (e.g. population).
I Do not take logs for variables measured in years or as
proportions.
I Use logs can mitigate influence of outliers.

You might also like