Choosing A Functional Form

Specification: Choosing a Functional Form
I. Should a Constant Term be Included?
Constant term is there to provide for flexibility of the shape (position) of the
regression line. Suppose the correct regression model is:
lnW i =  0 +  1 S i +  i
with  0  0 , but we estimate the following model:
lnW i =  1 S i +  i
The effect of suppressing  0 can be seen from the graph:

12
10
8
lnW
6
4
2
0
0 5 10 15 20
S
Suppressing the constant is that the slope coefficient estimates are biased. Also,
Under the 'false' model:
Var ( ˆ1 ) =  2
2
*
S i
Under the true model:
Var ( ˆ1 ) =  2
2
s i
n n n n
s   ( S i  S )  S i  nS . S i , the t-ratio is inflated.

2 2 2 2 2
Since i
i 1 i 1 i 1 i 1
Include the constant term if data are not in the neighborhood of the origin.
Unless you have strong reason, do not suppress the constant term. Although the
constant term is important from the specification view point, it should NOT be
relied on for purposes of interpretation and analysis.
II. Functional Forms.
The Log-Log Regression Model
Consider the following 'exponential' regression model:
Y i =  X i 1 e i
which we can express as a linear (in logs) regression model by taking natural
logarithms of both sides:
lnY i =  0 +  1 lnX i +  i
where 'ln' denotes the natural log, ‘e’ is the natural number (i.e., e = 2.71828)
and
 0 = ln 
The model is linear in the logarithms, even though it was originally nonlinear in
terms of both the variables and parameters. Also referred to as a Double-Log or
Log-Log model.
If the classical assumptions are fulfilled, then we can estimate the parameters
using OLS by letting:
Y i =  0 + 1 X i +  i
* *
where:
* *
Y i = lnY i X i = lnX i
The estimates are BLUE. This is useful specification for a regression model,
because the slope coefficient can be interpreted as an ‘elasticity’. Using
calculus:
dY / Y dY X % Y
= = = 1
dX / X dX Y % X
The assumption is that elasticity is constant.
A numerical example. Coffee demand function.
lˆnY t = - .7774 - .2530 lnX t R = .7448

2
(.0152) (.0494)
where Yt = Coffee consumption in cups per day.

Xt = Coffee price per pound.
The price elasticity is -0.253, implying that for a 1% increase in the price of
coffee, the quantity of coffee demanded (as measured by cups consumed each
day) decreases by 0.253%.
Should also mention that the coefficients of determination between two

regressions with different dependent variables cannot be compared. For
example, here the R2 is .7448. Suppose we estimated the regression without the
logs (i.e., we regressed cups of coffee against the per pound cost of both coffee
and tea). If the R2 for this regression was .6519, we couldn't say that the log-
linear regression had a 'better fit'.
The Log-Lin Regression Model
Take an example from labour economics. The theory of human capital

investment says that individuals will invest in education because it raises their
productivity, and higher productivity raises their potential wages in the labour
market.
i
W i = Y 0 e  1S i e
Taking the logs of both sides.
lnW i =  0 +  1 S i +  i where  0 = lnY 0
where W is income or earnings, and S is the number of years of schooling

(education). Y0 represents earnings in the absence of all education. This is
known as a Semilog regression model, because only one variable (in this case
the dependent variable) is written as a log. This is also expressed as a Log-Lin
model (a Lin-Log model has the independent variable as the only log).
In this model, the slope coefficient measures ' ... the constant proportional
change in W for a given absolute change in X.' In this case, this is the
percentage change in earnings for a one-year increase in educational attainment.
Numerical example:
lˆnW i = 2.574 + .085 S i R = .215

2
(.339) (.009)
The estimated coefficient on schooling indicates that the ‘incremental impact’ of

a year of education is to raise earnings by 8.5%.
The Polynomial Form
Take another example from labour economics.
Earnings i =  0 +  1 Agei   2 Agei +  i

2
This model can produce slopes that changes as the independent variable
changes.
dEarningsi
=  1  2  2 Agei
dAgei
The Inverse Form
Take an example from macroeconomics.
W t =  0 + 1/Ut +  t
This model can produce slopes that changes as the independent variable
changes.
dWt 2
=  1 /U t
dU t
So the slope changes as U changes. As U t is getting larger and larger, Wt is

getting closer and closed to the constant  0 .
III. Problems with Adopting Wrong Functional Forms.
Suppose we estimate:
lnW i =  0 +  1 S i +  i
but the 'true' model is:

lnW i =  0 + 1 S i +  2 S i +  i
2 *
We want an estimate of the 'rate of return' to education. However, we assume in

our estimated regression that it is constant for each year of education. The truth
may be that it decreases with the level of education (i.e., 1 >0 and  2 <0).
The rate of return is just the partial derivative of the regression function:
 lnW i
= 1  2  2 S i
 Si
Thus, we'd get a biased estimate of the overall rate of return to education, if we
ignored the fact that it's a linear function of the level of education. The SRF is a
biased estimate of the PRF, because the wrong functional form was adopted
from the outset.
IV. Dummy Independent Variables

14
12
10
8
lnW
6
4
2
0
0 2 4 6 8 10
S
Dummy variables are 'discrete' and 'qualitative' (e.g., male or female, in the
labour force or not, working under a collective or individual employment
contract, renting or owning your home). Units of measurement are
‘meaningless’. Normally 1 is assigned to the presence of some characteristic or
attribute; 0 for the absence of that characteristic or attribute.
EXAMPLE: A regression model of labour market discrimination by gender.

Y i =  0 +  1 S i +  2 Gi +  i
where Yi = annual earnings

Si = years of education.
th
Gi = 1 if i person is a male
0 if ith person is a female.
No special estimation issues as long as the regression meets the all the classical
assumptions. Only the nature of the independent variables has changed.
The expected salary of a female is:

E ( Y i | S i , Gi = 0 ) =  0 +  1 S i
The expected salary of a male is:
E ( Y i | S i , Gi = 1 ) =  0 +  1 S i +  2
= (  0 +  2 ) + 1 Si
Since E(  i | Si, Gi)=0. Testing for discrimination (i.e., H0: β2=0) is a test for a
difference in the intercept terms.
Watch for the Dummy Variable Trap: Suppose we estimate the following:
Y i =  0 + 1 Si +  2 Fi +  3 M i +  i
where Fi = 1 if ith person is female

0 if ith person is male
Mi = 1 if ith person is male
0 if ith person is female
This is known as the 'Dummy Variable Trap'. We're including redundant

information in the regression. Suppose the sample looks like this:
Constant Fi Mi
1 1 0
1 0 1
1 1 0
1 0 1
1 1 0
1 1 0
1 0 1
The problem is that the two dummies are a linear function of the constant (i.e.,
Fi+Mi = 1). Perfect multicollinearity. Violates Assumption (6). We’ll see in
Ch8 that the estimated coefficients and their standard errors can’t be computed.
The solution is simple -- drop a dummy variable or the constant term.
Rule of Thumb: If you have 'm' categories, then use 'm-1' dummies.
Slope dummy variables: We could allow for differences in these returns by

adding an 'interacted' variable:
Y i =  0 +  1 S i +  2 Gi +  3 Gi  S i +  i
This is a more 'flexible' specification.
The expected salary of female is:
E ( Y i | S i , Gi = 0 ) =  0 +  1 S i
The expected salary of male is:
E ( Y i | S i , Gi = 1 ) = (  0 +  2 ) + (  1 +  3 ) S i
We now have both a 'composite' intercept term and slope coefficient for male.
If β2>0, then male regression line has a higher intercept.

V. How to Detect the Problem of Adopting a Wrong Functional
Form?
Plot the residuals and look for 'distinct pattern'. If there is a systematic pattern
between ei and Xi, a different function form is called for. If there is a systematic
pattern between ei and a dummy variable, a dummy variable is needed.
VI. Questions for Discussion: Q7.9
VII. Computing Exercise: Q7.16 (Johnson, Ch 7)

Choosing A Functional Form

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Choosing A Functional Form

Uploaded by

Copyright:

Available Formats

Specification: Choosing a Functional Form

I. Should a Constant Term be Included?

with  0  0 , but we estimate the following model:

The effect of suppressing  0 can be seen from the graph:

Under the 'false' model:

Under the true model:

s   ( S i  S )  S i  nS . S i , the t-ratio is inflated.

II. Functional Forms.

The Log-Log Regression Model

Consider the following 'exponential' regression model:

The assumption is that elasticity is constant.

A numerical example. Coffee demand function.

lˆnY t = - .7774 - .2530 lnX t R = .7448

where Yt = Coffee consumption in cups per day.

Should also mention that the coefficients of determination between two

The Log-Lin Regression Model

Take an example from labour economics. The theory of human capital

Taking the logs of both sides.

lnW i =  0 +  1 S i +  i where  0 = lnY 0

where W is income or earnings, and S is the number of years of schooling

lˆnW i = 2.574 + .085 S i R = .215

The estimated coefficient on schooling indicates that the ‘incremental impact’ of

The Polynomial Form

Take another example from labour economics.

Earnings i =  0 +  1 Agei   2 Agei +  i

The Inverse Form

Take an example from macroeconomics.

So the slope changes as U changes. As U t is getting larger and larger, Wt is

but the 'true' model is:

We want an estimate of the 'rate of return' to education. However, we assume in

IV. Dummy Independent Variables

EXAMPLE: A regression model of labour market discrimination by gender.

where Yi = annual earnings

The expected salary of a female is:

The expected salary of a male is:

where Fi = 1 if ith person is female

This is known as the 'Dummy Variable Trap'. We're including redundant

The solution is simple -- drop a dummy variable or the constant term.

Slope dummy variables: We could allow for differences in these returns by

This is a more 'flexible' specification.

The expected salary of female is:

The expected salary of male is:

If β2>0, then male regression line has a higher intercept.

VI. Questions for Discussion: Q7.9

VII. Computing Exercise: Q7.16 (Johnson, Ch 7)

You might also like