You are on page 1of 8

Econ 140 (Fall 2014) - Section 10∗

GSIs: Marion Aouad, Fenella Carpena,


Alessandra Fenizia, Paolo Zacchia†

1 Non-Linear Regression Functions


We now consider the case where the population regression function is a nonlinear function of the independent
variables (i.e.E(Y |X1i , ..., Xki ) is a nonlinear function of one or more of the X’s. A function, f (X) is linear
if the slope of f (X) is the same for all values of X. However, if the slope of f (X) depends on the value of
X, then f (X) is nonlinear.

Since the models are still linear functions of the unknown population parameters (i.e. βi ) of the population
regression model, we can estimate these non-linear models using OLS and earlier methods used.

Note: In general, when deciding on a regression model, we want economic theory to guide our decision as to
what the relevant variables are for inclusion. Economic intuition should also guide our decision as to what is
the appropriate functional form for the model.

1.1 Quadratic regression model


Consider the regression model: T estscorei = β0 + β1 Incomei + ui

If we believe instead that a more accurate depiction of the relationship between T estscore and Income is
quadratic (i.e. T estscore is non-linear in Income), we would use the quadratic regression model:

T estscorei = β0 + β1 Incomei + β2 Income2i + ui

We can test the hypothesis that the model is linear vs. quadratic by testing H0 : β2 = 0. vs. H1 : β2 6= 0.

1.2 The Expected Change on Y of a Change in X in Nonlinear models


In nonlinear models, this change is more difficult to compute because it depends on the values of the Xi .

If f (X1i , X2i , ..., Xki ) is the population nonlinear regression function, the expected change in Y , 4Y, associ-
ated with the change in X1 , 4X1 , holding X2 ,..., Xk constant can be written as:

4Y = f (X1 + 4X1 , X2 , ..., Xk ) − f (X1, X2 , ..., Xk )

Because 4Y is a function of population data, which we often and likely cannot observe, we have to estimate
it using sample data:
∗ This
material is the same as what is posted under “Common Section Material” which is labelled as Section 9.
† Wethank previous GSIs for the great section material they built over time. These section notes are heavily based on their
previous work, including that of GSI: Daniel Gross. Also used is Stock & Watson - Introduction to Econometrics, 3rd Edition.

1
4Ŷ = fˆ(X1 + 4X1 , X2 , ..., Xk ) − fˆ(X1, X2 , ..., Xk )

1.3 Polynomial Terms


We can also imagine a model of Y with higher order X terms:

Yi = f (Xi ) = β0 + β1 Xi + β2 Xi2 + . . . + βm Xim + ui (1)


In general, if you include polynomial terms in the RHS, you should have a good motivation for doing so.

• Increasing/decreasing returns to scale is a common economic motivation for including a quadratic term
(X 2 ).

1.4 Interpretation with Polynomial Terms


The mathematical interpretation of coefficients on polynomial terms is fairly straightforward. Suppose our
model is the following:
Yi = β0 + β1 Xi + β2 Xi2 + ui

where the second term is quadratic in Xi . Taking the derivative of Y w.r.t. Xi , we see that the effect of a
change in Xi on Y depends on the value of Xi itself:
dYi
= β1 + 2β2 Xi
dXi

An example of this might be a regression of testscore on income and income2 . This model specifica-
tion implies that the effect of a change in income on testscore will vary by the starting income level (e.g.
income=$10,000 vs. income =$30,000).

1.5 Polynomial Terms - Testing Model Specification


Which degree polynomial should you use for polynomial model (1)? We can use sequential hypothesis testing
to see if the higher degree polynomials can be omitted from the regression (i.e. are the coefficient estimates
for these variables different from zero?) To do this, we can do the following steps:
1. Pick a maximum value of m and estimate the polynomial regreesion for that m.
2. Use the t-statistic to test the hypothesess that the coefficent on X m (i.e. βm ) is zero. If you reject this
hypothesis, then X m belongs in the regression.
3. If you do not reject βm = 0 in step 2, then elimate X m from the regression and estimate a polynomial
regression of degree m − 1. Test whether the coefficient on X m−1 is zero. If you reject the test, use the
polynomial of degree m − 1.
4. If you do not reject βm−1 = 0 in step 3, continue this procedure until the coefficient on the highest
power in your polynomial is statistically significant.

1.6 Logarithms
We can specify nonlinear models using the natural logarithm of Y and/or X.

Observe: d ln(X) = dX
X ≈ %∆X for small ∆X. Alternatively said, when ∆X is small, ln(X + ∆X) - ln(X)
is approximately equal to the percentage change in X divided by 100.

We have three cases when the logarithms might be used and the interpretation of the model/coefficients will
vary by case:

2
• Linear-Log Model: When we take the natural log of X but not of Y .
• Log-Linear Model: When we take the natural log of Y but not of X.

• Log-Log Model: When we take the natural log of both X and Y .

You can interpret the three variants of the natural log model as follows:
• Linear-Log Model: Yi = β0 + β1 ln(Xi ) + ui : A 1% increase in X ⇒ 0.01 × β-unit increase in Y .

• Log-Linear Model: ln(Yi ) = β0 + β1 Xi + ui : A 1-unit increase in X ⇒ 100 × β% increase in Y .


• Log-Log Model: ln(Yi ) = β0 + β1 ln(Xi ) + ui : A 1% increase in X ⇒A β% increase in Y .

Question: How can we compare the linear-log model and the log-log model? We cannot use the R̄2 because
the dependent variables in these two models are different. Because of this, it is best to use economic theory
to guide your decision in choosing which model is best.

1.7 Interactions Between Independent Variables:


When using binary and/or continuous variables in our regression, there are three cases to consider. See below
for the scenarios and examples:

• Interaction between two binary variables: Yi = β0 + β1 D1i + β2 D2i + β3 (D1i × D2i ) + ui

• Interaction between a continuous and a binary variable: When we use the interaction term
Xi × Di , there are three possiblities for the population regression functions:

– Scenario 1; Different intercept, same slope: Yi = β0 + β1 Xi + β2 Di + ui

– Scenario 2; Different intercept, different slope: Yi = β0 + β1 Xi + β2 Di + β3 (Xi × Di ) + ui

– Scenario 3; Same intercept, different slope: Yi = β0 + β1 Xi + β2 (Xi × Di ) + ui

• Interaction between two continuous variables: Yi = β0 + β1 X1i + β2 X2i + β3 (X1i × X2i ) + ui

– Here, including the interaction term allows the effect on Y of a change in X1 to depend on the
value of X2 , and conversely, allows the effect of a change in X2 on Y to depend on the value of
X1 .

3
2 Exercises
Question 1: Refer to the table below, from section 8.4 of the book:

a) Comparing columns (1) and (2) how does the coefficient of Student − teacher ratio , (ST R), change after
incuding ln(Average district income) ino the regression? Why do you think this change occurs?

b) Looking at column (4), can we reject the hypothesis that the effect of ST R is the same for districts with
low and high percentages of English learners (i.e for HiEL = 1 and for HiEl = 0) at the 5% significance
level?

c) Looking at column (5), what can you say about the effect of ST R on testscore? Is there a linear relationship
between ST R and testscore?

d) Looking at column (6), which variables allow us to test if the regression function relating testscore and
ST R differ between districts with low and high percentages of English learners? Based on the results, are
the regression functions different for districts with high and low percentages of English learners?

e) Looking at column (6), what is the predicted impact on testscore of a one-unit change in ST R for an
HiEL = 1 district with a baseline ST R=20? For an HiEL = 0 district with a baseline ST R=20?

Question 2:

Suppose we are testing a cholesterol-lowering drug using a clinical trial. Let Yi be a variable that measures
a patient’s cholesterol level, let Xi be the amount of the experimental drug randomly assigned to patients,
and let Di be a binary variable that takes on the value 1 for men and 0 for women.

a) Suppose the relationship between Yi and Xi is linear and negative, but we expect men to have a larger
response to the drug treatment - that is, the cholesterol levels of men fall by more than the cholesterol levels
of women for every additional amount of drug treatment they are given. Also, assume that the baseline level
of cholesterol (i.e. when Xi = 0), is the same for men as it is for women.

i. Write out the regression model that we would use to verify this and say how you would test the claim
that men are more responsive to the drug.

ii. If you write an interaction model, what do you expect the sign on the interaction term to be? The
signs on the other terms?

b) Suppose we think women have a lower baseline level of cholesterol (so when Xi = 0 (i.e. no drug), women
have a lower cholesterol level), and we think the relationship between Yi and Xi is linear and negative.
However, now we think the drug is just as effective in reducing women’s cholesterol levels as it is for reducing
men’s levels (i.e. the response to the drug is now the same for men and women).

i. Write out a model that is consistent with this statement/that would allow us to test this statement.

ii. What are your expectations for the signs on the coefficients?

4
c) Now, we think women and men have different baseline levels of cholesterol and have different response
rates to the cholesterol-reducing drug. The response between Yi and Xi is still negative and linear.

i. Write out a model that is consistent with this statement/that would allow us to test this statement.

ii. After you’ve written this model, state the hypothesis and test-statistic you would use to test if the
response rates to the cholesterol-reducing drug are (statistically) different between men and women.

5
3 Solutions
Question 1:

a) The coefficient on Student − teacher ratio (ST R) maintains its statistical significance and increases from
-1.00 to -0.73 after the inclusion of ln(Average district income). A reason for this result might be that since
school districts with higher average incomes tend to have a lower ST R, and since school districts with higher
incomes tend to have higher test scores, excluding ln(Average district income) from the regression leads to
a downward biased estimate of the coefficient on ST R in the regression in column (1).

b) No, look at the interaction term, HiEL × ST R, which interacts the binary variable, HiEL with the
continuous variable, ST R. The t-statistic to test if the coefficient on this interaction variable is different
from 0 is t = −0.58/0.50 = −1.16.

c) The estimates in the regression in column (5) suggest that ST R has a nonlinear effect on testscore. We
can reject the null hypothesis that the relationship is linear vs. the alternative that it’s cubic at the 1%
significance level. This is because the p-value associated with the F-statistic testing if the coefficients on
ST R2 and ST R3 are jointly zero, is <0.001.

d) The interaction terms between HiEl and ST R, ST R2 , and ST R3 (i.e. HiEL × ST R, HiEL × ST R2 ,
and HiEL × ST R3 ) allow us to check this assertion. The F-statistic which tests the null hypothesis that
the coefficients on these three interaction terms are jointly zero, is 2.69 with an associated p-value of 0.046.
Thus, we can reject the null hypothesis that the coefficients are jointly zero at the 5% significance level. This
suggests that the effect of ST R depends not just on the value of ST R but also on the fraction of English
learners in the district (HiEL).

e) If we write the regression in column (6) as:

testcore
\ = βˆ0 + βˆ1 ST R + βˆ2 ST R2 + βˆ3 ST R3 + βˆ4 HiEL + βˆ5 (HiEL × ST R) + βˆ6 (HiEL × ST R2 )

+β̂7 (HiEL × ST R3 ) + βˆ8 (%Eligible Subs Lunch) + βˆ9 ln(Avg dist income)

\ w.r.t ST R is: βˆ1 + 2β


The partial derivative of testscore ˆ 2 ST R + 3βˆ3 ST R2 + HiEL(βˆ5 + 2βˆ6 × ST R + 3βˆ7 ×
2
ST R )

\ is: 83.7+(2×−4.38×20)+(3×0.075×202 )+1×(−123.3+


For HiEL = 1, ST R=20, the change in testscore
2
(2 × 6.12 × 20) + (3 × −0.101 × 20 ))

\ is: 83.7 + (2 × −4.38 × 20) + (3 × 0.075 × 202 )


For HiEL = 0, ST R=20, the change in testscore

Question 2:

a) We could test this by using the model Yi = β0 + β1 Xi + β2 (Xi × Di ) + ui . Using OLS, we can get estimates
for β0 , β1 , and β2 . If we test the hypothesis , H0 : β2 = 0 vs. H1 : β2 6= 0, and we reject the null at
some pre-specified significance level, this lends support to the claim that men and women have a different
responsiveness to the drug in terms of cholesterol reduction. We are told that the drug helps to reduce

7
cholesterol levels, so we expect β1 < 0. Since, we are told in the question that men experience a larger
reduction in cholesterol in response to the drug, we expect that for every increase in Xi , 4Yi will be more
negative for men. This implies that β2 < 0 (assuming that we reject H0 ).

b) Use the specification: Yi = β0 + β1 Xi + β2 Di + ui . Taking the drug still lowers cholesterol, so we expect
β1 < 0. However, now we believe women have a lower baseline level of cholesterol. This implies that β2 > 0.

c) Use the specification: Yi = β0 + β1 Xi + β2 Di + β3 (Xi × Di ) + ui . To test whether women and men have the
same response to the drug, in terms of the drug’s ability to reduce cholesterol, we can use the t-statistic to
test H0 : β3 = 0 vs. H1 : β3 6= 0. If we reject the null hypothesis in favor of the alternative, this suggests that
men and women do respond differently to the drug in terms of the drug’s ability to reduce their cholesterol.

You might also like