Lecture 8 - Limited Dependent Var PDF

APPLIED ECONOMETRICS
Qualitative Response Regression Model
NOVEMBER `1, 2019
Khoirunurrofik
Topics
Week Topic
13 Time Series: Introduction Stochastic Process, Unit Root Stochastic Process

(Des 4: 13.30) Test of stationarity, The Unit Root Test
Co integration Test ,Error correction model (ECM)
8 Qualitative Response Regression Introduction: Qualitative Response / Qualitative Dependent Variable

Model Logit Estimation , Probit Estimation
(Nov 1 : 09.00) Interpretation : Binomial Dependent Variable
9 Linear Simultaneous Equation Econometric model , Economic model

Model Structural/behavioral Model
(Nov 7 : 13.30) Identification: Order, Identification: Rank
Case Study:
10 Linear Simultaneous Equation Indirect least square, Instrument variable
Estimation and Interpretation 2SLS, 3SLS, Interpretation and Simulation
(Nov 12: 13.30) Case Study:
11 Pooling Time Series and Cross Econometric model , Generating Data

Sectional Data Model Individual Effect: fixed effect and random effect.
(Nov 13: 13.30) Coefficient Specification: common and specific.
Case Study:
12 Pooling Time Series and Cross Fixed and random effect selection: Hausman test
Sectional Data: Model selection Case Study:
(Des 2: 8.00)
14 Presentation
(Des 9 : 8.00)
Grading Policy
• QUIZ = 10
• Paper +Presentation =40
Agenda
• Linear Probability Model
• Probit Model
• Logit Model
Introduction
• In particular, researchers analyzing consumer choice
often must cope with dummy dependent variables
(also called qualitative dependent variables).
• For example, how do high school students decide
whether to go to college? What distinguishes Pepsi
drinkers from Coke drinkers? How can we convince
people to use public transportation instead of
driving?
• For an econometric study of these topics, or of any
topic that involves a discrete choice of some sort, the
dependent variable is typically a dummy variable.
Models with Binary Dependent Variables
• Many of the choices that individuals and firms make
are ‘‘either–or’’ in nature.
• A high school graduate decides either to attend college or not.
• Aworker decides either to drive to work or to get there using a
different means of transportation.
• A household decides either to purchase a house or to rent.
• A firm decides either to advertise its product in a local newspaper
or it decides not to.
• As economists we are interested in explaining why

particular choices are made, and what factors enter
into the decision process.
• We also want to know how much each factor affects
the outcome.
• Such questions lead us to the problem of constructing a
statistical model of binary, either–or, choices.
• Such choices can be represented by a binary (indicator)
variable that takes the value 1 if one outcome is chosen
and the value 0 otherwise.
• The binary variable describing a choice is the
dependent variable rather than an independent
variable.
• This fact affects our choice of a statistical model.
 Examples:
▪ An economic model explaining why some individuals take a second, or
third, job and engage in “moonlighting.”
▪ An economic model of why the federal government awards development
grants to some large cities and not others.
▪ An economic model explaining why someone is in the labour force or
not
▪ An economic model explaining why some loan applications are accepted
and others not at a large metropolitan bank.
▪ An economic model explaining why some individuals vote “yes” for
increased spending in a school board election and others vote “no.”
▪ An economic model explaining why some female college students
decide to study engineering and others do not.
 Examples:
• A consumer decides whether to buy a car or not
• A commuter chooses a particular mode of transportation from
• several available ones
• A worker decides whether to take a job offer or not
• A worker decides whether to be a member of union or not
• A student decides whether to go to college or not
• A firm decides whether to install a new factory or not
• A driver decides which route to take
• A firm decides to enter a market
Illustration : Binary choice models using an important
problem from transportation economics.
• How can we explain an individual’s choice between driving (private
transportation) and taking the bus (public transportation) when
commuting to work, assuming, for simplicity, that these are the only two
alternatives?
• An individual’s choice by the indicator variable
• As long as these exhaust the possible (mutually exclusive)
options
1 individual drives to work
y=
0 individual takes bus to work
• If the probability that an individual drives to work is p, then

P  y = 1 = p. It follows that the probability that a person uses
public transportation is P  y = 0 = 1 − .p
f ( y ) = p y (1 − p )1− y , y = 0,1 E ( y ) = p; var ( y ) = p (1 − p )

• What factors might affect the probability that an individual chooses one
transportation mode over the other?
• One factor will certainly be how long it takes to get to work one way or
the other. Define the explanatory variable
• There are other factors that affect the decision, but let us focus on this
single explanatory variable.
• A priori we expect that as x increases, and commuting time by bus
increases relative to commuting time by car, an individual would be more
inclined to drive.
• That is, we expect a positive relationship between x and p, the probability
that an individual will drive to work.
Linear Probability Model
Break the dependent variable into
y = E ( y) + e = p + e fixed and random parts
E ( y) = p = 1 + 2 x We are assuming that the probability of

driving is related to the difference in
driving times, x, in the transportation
y = E ( y) + e = 1 + 2 x + e example.
• One problem with the linear probability model is that the error
term is heteroskedastic; the variance of the error term e varies
from one observation to another
y value e value Probability
1 1 − ( 1 + 2 x ) p = 1 + 2 x
0 − ( 1 + 2 x ) 1 − p = 1 − ( 1 + 2 x )
var ( e ) = ( 1 + 2 x )(1 − 1 − 2 x )
• Using generalized least squares, the estimated variance is:

ˆ i2 = var ( ei ) = ( b1 + b2 xi )(1 − b1 − b2 xi )

yi* = yi ˆ i So the problem of heteroskedasticity

is not insurmountable…
xi* = xi ˆ i
yi* = 1ˆ i−1 + 2 xi* + ei*
p̂ = b1 + b2 x
dp
= 2
dx
Problems:
• It implies marginal effects of changes in continuous explanatory variables
are constant, which cannot be the case for a probability model.
• This feature also can result in predicted probabilities outside the [0, 1]
interval.
• The linear probability model error term is heteroskedastic, so that a
better estimator is generalized least squares
• R2 is usually very poor and a questionable guide for goodness of fit
Problem of LPM
1. R2 is not an accurate measure of overall fit.
For models with a dummy dependent
variable, R2 tells us very little about how well
the model explains the choices of the
decision makers.
2. DN i is not bounded by 0 and 1. Since Di is a

dummy variable, we’d expect DN i to be
limited to a range of from 0 to 1
3.The error term is neither homoskedastic

nornormally distributed. In addition, the error
term in a linear probability model is
heteroskedastic and is not distributed
normally, mainly because Di takes on only two
values (0 and 1).
Example
• We are interested in examining the labour force
participation decision of adult females.
• The question is: why do some women enter the
labour force while others do not?
• Labour economics suggests that the decision to go
out to work or not is a function of the
unemployment rate, average wage rate, level of
education, family income, age and so on.
• However, for simplicity, we assume that the decision
to go out to work or not is affected by only one
explanatory variable (X2i) – the level of family
income.
Example
LPM
the expected value of Di is equal to the

probability that the ith individual is working
Non-normality and heteroskedasticity of the
disturbances
since the variance of the

disturbances depends on Pi,
which differs for every
individual according to their
level of family income, the
disturbance is
heteroskedastic.
Data Working
• use ”working.dta”
• regress dummy fam_inc
• predict dumhat
• graph twoway (scatter dumhat dummy fam_inc)
Data Working
Exercise : Use data transport
Exercise : Use data transport
p̂ = b1 + b2 x
auto Coef. Std.
dtime 0.0703099
_cons 0.4847951
d p
5 0.8363446
10 1.1878941
As x increases the probability of driving continue

to increase at a constant rate.
However, since 0 <= p <= 1, a constant rate of
increase is impossible. To overcome this problem
we consider the nonlinear probit model.
Pros and Cons for the LPM
Probit Model
• To keep the choice probability p within the interval [0, 1], a
nonlinear S-shaped relationship between x and p can be
used.
• As x increases, the probability curve rises rapidly at first, and
then begins to increase at a decreasing rate.
• The slope of this curve gives the change in probability given a
unit change in x.
• The slope is not constant as in the linear probability model.
• The idea behind using the probit model as being more
suitable than the logit model is that most economic variables
follow the normal distribution and hence it is better to
examine them through the cumulative normal distribution
Probit Model :
Standard normal cumulative distribution function (1).
Standard normal probability density function (2).
Probit Model
• If Z is a standard normal random variable, then its
probability density function is
• The probit function is related to the standard

normal probability distribution.
Probit Model
• The probit statistical model expresses the
probability p that y takes the value 1 to be
• The probit model is said to be nonlinear because is a

nonlinear function of β1 and β2.
• If β1 and β2 were known, we could use p to find the
probability that an individual will drive to work.
However, since these parameters are not known, we
will estimate them.
Probit Model : Interpretation
• Marginal effect of a one-unit change in x on the probability
that y = 1
• Since (1 + 2 x) is a probability density function its value is

always positive. Consequently the sign of dp/dx is determined by
the sign of 2. In the transportation problem we expect 2 to be
positive so that dp/dx > 0; as x increases we expect p to increase.
• In this case p = Φ(0) = .5 and an individual is equally likely to
choose car or bus transportation.
• On the other hand, if β1 + β2x is large, say near 3, then the
probability that the individual chooses to drive is very large and
close to 1. In this case a change in x will have relatively little
effect since Φ(β1 + β2x) will be nearly 0
Marginal Effects
Criteria Evaluation
• We can use either the Lagrange Multiplier (LM) test,
the Likelihood Ratio (LR) test, or the Wald test to
test for multiple exclusion restriction
• The percentage of correct prediction for each
outcome
• Pseudo R2 is used as a measure of goodness of fit for
binary response and that depends on the values of
the likelihood function
Exercise : Transportation Problem
• Ben-Akiva and Lerman1 have sample data on automobile and
public transportation travel times and the alternative chosen
for N = 21 individuals.
• The complete set of data is in the file transport.dat. In the
data file, AUTO is an indicator variable taking the value one
if automobile transportation is chosen and zero otherwise.
• The data set also includes the variables AUTOTIME and
BUSTIME, which are commuting times, in minutes.
• The explanatory variable we consider is
DTIME = (BUSTIME AUTOTIME) : 10, which is the commuting time differential in 10-
minute increments.
Use Working Data
Example : TRANSPORT
Example : TRANSPORT
• The values in parentheses below the parameter estimates are estimated

standard errors that are valid in large samples.
• These standard errors can be used to carry out hypothesis tests and
construct interval estimates in the usual way, with the qualification that
they are valid in large samples.
• The negative sign of β1 implies that when commuting times via bus and
auto are equal so DTIME = 0, individuals have a bias against driving to
work, relative to public transportation, though the estimated coefficient
is not statistically significant.
• The positive sign of β2 indicates that an increase in public transportation
travel time, relative to auto travel time, increases the probability that an
individual will choose to drive to work, and this coefficient is statistically
significant.
Probit : Marginal Effect
• estimate the marginal effect of increasing public
transportation time, given that travel via public
transportation currently takes 20 minutes longer
than auto travel
• For the probit probability model, an incremental (10-minute)

increase in the travel time via public transportation increases
the probability of travel via auto by approximately 0.1037,
given that taking the bus already requires 20 minutes more
travel time than driving
Probit : Prediction
• The estimated parameters of the probit model can also be
used to ‘‘predict’’ the behavior of an individual who must
choose between auto and public transportation to travel to
work.
• If an individual is faced with the situation that it takes 30
minutes longer to take public transportation than to drive to
work
• Since the estimated probability that the individual will

choose to drive to work is 0.7983, which is greater than 0.5,
we ‘‘predict’’ that when public transportation takes 30
minutes longer than driving to work, the individual will
choose to drive
Probit : Prediction
Marginal Effect at Mean
The average time travel differential is DTIME = 0:1224 (1.2 minutes), and
for this value the marginal effect of a 10-minute increase in the time travel
differential is 0.1191.
When the mean difference in travel time is near zero, the effect of a
change in travel time difference is greater.
Average Marginal Effect
Average Marginal Effect
Rather than evaluate the marginal effect at a specific value, or the mean
value, the average marginal effect (AME) is often considered
Logit Model
• Probit model estimation is numerically complicated
because it is based on the normal distribution.
• A frequently used alternative to the probit model for
binary choice situations is the logit model.
• These models differ only in the particular S-shaped
curve used to constrain probabilities to the [0, 1]
interval.
• If L is a logistic random variable, then its probability
density function is
Differences between logit and probit
probabilities
Logit Model
• The cumulative distribution function for a logistic
random variable is
• In the logit model, the probability p that the

observed value y takes the value 1 is
Logit Model
• The probability that y = 1 can be written as
1 exp ( 1 + 2 x )
p= =
1 + e−(1 +2 x ) 1 + exp ( 1 + 2 x )
• The probability that y = 0 can be written as

1
1− p =
1 + exp ( 1 + 2 x )
General Approach
• In the linear probability model, we saw that the dependent
variable Di on the left-hand side, which reflects the
probability Pi, can take any real value and is not limited to
being in the correct range of probabilities – the (0,1) range.
• A simple way to resolve this problem involves the following
two steps. First, transform the dependent variable, Di, as
follows, introducing the concept of odds:
Here, oddsi is defined as the ratio of the
probability of success to its complement (the
probability of failure).
Using the labour force participation example, if
the probability for an individual to join the
labour force is 0.75 then the odds ratio is
0.75/0.25 = 3/1, or the odds are three to one
that an individual is working.
General Approach
• The second step involves taking the natural logarithm of the
odds ratio, calculating the logit, Li, as:
General Approach
Therefore, we see that the logit
model maps probabilities from the
range (0,1) to the entire real line.
We can see that Dˆ i asymptotically

approaches 1 and 0 in the two
extreme cases.
The S-shape of this curve is known as

a sigmoid curve and functions of this
type are called sigmoid functions.
Estimation of the logit model is done

by using the maximum-likelihood
method.
This method is an iterative
estimation technique that is
particularly useful for equations that
are non-linear in the coefficients.
Interpretation of the estimates in logit models
• After estimating a logit model, the regular hypothesis testing analysis can
be undertaken using the z-statistics obtained.
• However, the interpretation of the coefficients is totally different from
that of regular OLS.
• Given this, the coefficient β2 obtained from a logit model estimation
shows the change in Li = ln(Pi/(1 − Pi)) for a unit change in X, which has
no particular meaning.
Interpretation of the estimates in logit models :
Calculate the change in average Dî:
• To do this, first insert the mean values of all the explanatory variables
into the estimated logit equation and calculate the average Dˆ i.
• Then recalculate, but now increasing the value of the explanatory variable
under examination by one unit to obtain the new average Dî.
• The difference between the two Dîs obtained shows the impact of a one-
unit increase in that explanatory variable on the probability that Di = 1
(keeping all other explanatory variables constant).
• This approach should be used cautiously when one or more of the
explanatory variables is also a dummy variable (for example how can
anyone define the average of gender?). When dummies of this kind (for
example gender) exist in the equation, the methodology used is to
calculate first the impact for an ‘average male’ and then the impact for an
‘average female’ (by setting the dummy explanatory variable first equal to
one and then equal to zero) and comparing the two results.
Interpretation of the estimates in logit models : Take the
partial derivative
Interpretation of the estimates in logit models :
Multiply the obtained βj coefficients by 0.25
• A simpler, but not so accurate method is to multiply

the coefficients obtained from the probit model by
0.25 and use this for the interpretation of the
marginal effect. This comes from the substitution of
the value Dî = 0.5
Goodness of Fit
Goodness of Fit : McFadden’s pseudo-R2
Use Working Data
the logistic function is indeed fitting the data in an appropriate way through its
sigmoid form.
An Empirical Example From Marketing
• The example of a linear probability model for the choice between Coke
and Pepsi. Here we compare the linear probability model to the probit
and logit models for this binary choice
• The expected value of this variable is E(COKE) = pCOKE = probability

that Coke is chosen.
• We use the relative price of Coke to Pepsi (PRATIO) as an explanatory
variable, as well as DISP_COKE and DISP_PEPSI, which are indicator
variables taking the value one if the respective store display is present
and zero if it is not.
• We expect that the presence of a Coke display will increase the
probability of a Coke purchase, and the presence of a Pepsi display will
decrease the probability of a Coke purchase.
The Probit and Logit Model
Data : Coke
Data : Coke
Linear Probability Model : Estimation
Linear Probability Model : Prediction
Probit Model : Estimation
Probit Model : Classification
we find that of the 510

consumers who chose
COKE, 247 were
correctly predicted
Probit Model : Average Marginal Effect
Probit Model : Prediction
Logit Model : Estimation
Logit Model : Classification
Of the 630 who chose PEPSI

507 were corrected predicted
Logit Model : Average Marginal Effect
Logit Model : Prediction
Table Comparison
• esttab lpm probit logit , se(%12.4f) b(%12.5f) star(* 0.10 ** 0.05 *** 0.01)
scalars(ll_0 ll chi2)gaps mtitles("LPM" "probit" "logit") title("Coke-Pepsi
Choice Models")
Result Analysis (1)
• Suppose that PRATIO = 1.1, indicating that the price
of Coke is 10% higher than the price of Pepsi, and no
store displays are present.
• Using the linear probability model, the predicted
probability of Coke choice is 0.4493 with standard
error 0.0202.
• Using probit the predicted probability is 0.4394 with
standard error 0.0218, and for logit the predicted
probability is 0.4323 with standard error 0.0224.
Result Analysis (2)
• In the linear probability model the marginal effect of
PRATIO is 0.4009. This does not depend on the
values of the variables.
• For the probit model the average marginal effect
(AME) of PRATIO is 0.4097 with standard error
0.0616.
• For the logit model the average marginal effect
(AME) of PRATIO is 0.4333 with standard error
0.0639.
Result Analysis (3)
• The average marginal effects from the probit and
logit models are not too different from that implied
by the linear probability model.
• If we examine specific scenarios, then differences
appear. For example, suppose PRATIO = 1.1,
indicating that the price of Coke is 10% higher than
the price of Pepsi, and no store displays are present.
• The marginal effect of PRATIO from the probit
model is 0.4519, with standard error 0.0703.
• For logit the marginal effect of PRATIO is estimated
to be 0.4898 with standard error 0.0753.
Referensi
• Gujarati and Porter (2009). Chapter 16
• Gujarati (2011). Chapter 17
• Carter Hill, et all (2011)
• Woldridge (2010).
• Woldridge (2012).
• Katchova (2013).

Lecture 8 - Limited Dependent Var PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 8 - Limited Dependent Var PDF

Uploaded by

Copyright:

Available Formats

APPLIED ECONOMETRICS

Qualitative Response Regression Model

NOVEMBER `1, 2019

13 Time Series: Introduction Stochastic Process, Unit Root Stochastic Process

8 Qualitative Response Regression Introduction: Qualitative Response / Qualitative Dependent Variable

9 Linear Simultaneous Equation Econometric model , Economic model

11 Pooling Time Series and Cross Econometric model , Generating Data

• As economists we are interested in explaining why

• If the probability that an individual drives to work is p, then

f ( y ) = p y (1 − p )1− y , y = 0,1 E ( y ) = p; var ( y ) = p (1 − p )

E ( y) = p = 1 + 2 x We are assuming that the probability of

• Using generalized least squares, the estimated variance is:

yi* = yi ˆ i So the problem of heteroskedasticity

2. DN i is not bounded by 0 and 1. Since Di is a

3.The error term is neither homoskedastic

the expected value of Di is equal to the

since the variance of the

As x increases the probability of driving continue

• The probit function is related to the standard

• The probit model is said to be nonlinear because is a

• Since (1 + 2 x) is a probability density function its value is

• The values in parentheses below the parameter estimates are estimated

• For the probit probability model, an incremental (10-minute)

• Since the estimated probability that the individual will

• In the logit model, the probability p that the

• The probability that y = 0 can be written as

We can see that Dˆ i asymptotically

The S-shape of this curve is known as

Estimation of the logit model is done

• A simpler, but not so accurate method is to multiply

• The expected value of this variable is E(COKE) = pCOKE = probability

we find that of the 510

Of the 630 who chose PEPSI

You might also like