Professional Documents
Culture Documents
Applied Econometrics
Winter Term 2020/21
Prof. Dr. Simone Maxand
Humboldt University Berlin
5.1 Introduction 2 | 144
Contents I
5.1 Introduction
5.2 Binary response models
5.2.1 Introduction & model formulation
5.2.2 Probit and logit models
5.2.3 Maximum likelihood estimation
5.2.4 Model diagnostics
5.3 Limited dependent variables
5.3.1 Introduction
5.3.2 Truncation and censoring
5.3.3 Truncated regression model
5.3.4 Censored regression model
5.A Literature
5.1 Introduction
What is Microeconometrics?
I Analysis of individual data, i.e. data concerning the behaviour and
attitudes of persons, households or firms.
. Econometric methods to study microeconomic phenomena.
. The underlying model is typically a microeconomic model
where individual decisions and behavior are a function of
exogenous parameters.
I Typical questions: How do individual characteristics affect
. decision to work (or to buy a new product)?
. choice of travel mode (train, bus, car, bike)?
. household purchases of durable goods?
. number of hours worked?
. number of children?
. duration of unemployment?
Applied Econometrics – Chapter 5
5.1 Introduction 4 | 144
1.2
1.0
0.8
0.4
0.2
0.0
-0.2
0 1000 2000 3000 4000 5000
X: INC
2000
1600
Y: RENT 1200
800
400
30
25
20
Y: Wage of Wifes 15
10
-5
7 8 9 10 11 12
X: log(Family income)
Truncated data
I Data for (xi , yi ) are not available if yi is above or below a certain
threshold.
I That is, some observations have been systematically excluded from
the sample.
I Example: Sample of (data on) households with income below
100,000 $
. The sample necessarily excludes all households with income
above that level. ⇒ No random sample of all households.
I Using truncated sample for investigating relationship between y and
x is potentially misleading (when using a linear model).
I Solution: Truncated regression model
Example 1: Data
I US General Social Survey (GSS):
. Annual or biannual cross-sectional survey (started in 1972)
. Information on no. of children ever born by a women, etc.
I Number of children is a count variable!
I Alternatively, we could investigate the proportion of childless women
⇒ binary variable!
I Here: Every 4th year 1974 - 2002
I Restriction to women beyond child-bearing age (40 years) to avoid
interfering effect of age:
. “Younger women tend to have less children than older”.
. Otherwise: Consider no. of children for younger women as
censored.
Applied Econometrics – Chapter 5
5.1 Introduction 16 | 144
Example 1: Descriptive statistics
I Pool observations over years ⇒ 5150 women (age ≥ 40)
No. of children ever born Frequencies
to women (age ≥ 40) Absolute Relative
0 744 14.45
1 706 13.71
2 1368 26.56
3 1002 19.46
4 593 11.51
5 309 6.00
6 190 3.69
7 89 1.73
8 or more 149 2.89
Table 1: Fertility distribution
Applied Econometrics – Chapter 5
5.1 Introduction 17 | 144
Contents I
5.1 Introduction
5.2 Binary response models
5.2.1 Introduction & model formulation
5.2.2 Probit and logit models
5.2.3 Maximum likelihood estimation
5.2.4 Model diagnostics
5.3 Limited dependent variables
5.3.1 Introduction
5.3.2 Truncation and censoring
5.3.3 Truncated regression model
5.3.4 Censored regression model
5.A Literature
Bernoulli variable
I Two possible outcomes of y are usually coded by
1 (yes/“success”) and 0 (no/“failure”), i.e.:
y = 1 if “event occurred”, otherwise y = 0.
I No loss in generality if interest is only in the probability of success.
!
⇒ With p = P(y = 1) = 1 − P(y = 0):
y ∼ Bernoulli(p) = Bin(1, p)
I “Probability of success”
pi := P(yi = 1|xi ) = E(yi |xi ) = xi0 β
⇒ yi ∼ Bernoulli(pi ) = Bin(1, pi )
⇒ V(yi |xi ) = pi (1 − pi ) = xi0 β(1 − xi0 β)
I OLSE with robust standard errors (or MLE for pi = xi0 β) may serve
as useful exploratory tool (often: reasonable direct estimation of
average marginal effects and hint to statistically relevant variables).
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.1 Introduction & model formulation 29 | 144
21.07%
Identification considerations
I Identifiability of β requires: G(z) is strictly increasing, rk(X ) = K .
I Moreover, mean and variance must be fixed: Let G, G e be cdf’s with
associated densities g, ge, and suppose that
Z ∼ G , U := (Z + µ)/σ ∼ Ge (for σ > 0).
⇒ G(u)
e := P(U < u) = G(σu − µ) , ge(u) = σg(σu − µ)
⇒ P(yi = 1|xi ) = G(xi0 β)
0 K
!
xi β + µ β 1 + µ X β j
=Ge =G e + xij
σ σ j=2
σ
I Motivation of threshold 0:
. εi , xi independent with εi ∼ G (otherwise, εi |xi ∼ G) ⇒
P(yi = 1|xi ) = P(yi∗ > 0|xi ) = 1 − G(−xi0 β)
!
= G(xi0 β) [if g(z) = g(−z)]
I Again, identification of single-index model requires restriction on
V(εi ), because β is identifiable only up to scaling.
. Observe only, whether
yi∗ > 0 ⇔ xi0 β + εi > 0
⇔ xi0 (σβ) + (σεi ) > 0 (∀ σ > 0).
⇒ Uniqueness is achievable
by an restriction on error variance,
1, in probit model
e.g. V(εi ) =
π 2 /3, logit model.
Choice of threshold
or yi = 0 ⇔ xi0 β + εi ≤ 0
yi = 0 if yi∗ < 0
yi = 1 if yi∗ ≥ 0
1.2
1.0
0.8
Y 0.6
0.4
0.2
0.0
-0.2
0 40 80 120 160 200
X
1.2
1.0
0.8
0.6
Y
0.4
0.2
0.0
PROB Y
-0.2
0 40 80 120 160 200
X
(z−µ)
d e− κ
λ(z; µ, κ) = Λ(z; µ, κ) = (density function)
dz (z−µ) 2
h i
κ 1 + e− κ
Model comparison
I Moments
cdf Expectation Variance Skewness Kurtosis
Φ 0 1 0 3
π2 6
Λ 0 3 0 3+ 5
1.0
Standard normal distribution
0.8
0.6 Standardized logistic distribution
G(x)
0.4
0.2
0.0
−4 −2 0 2 4
0.5
Standard normal distribution
Standardized logistic distribution
0.4
0.3
g(x)
0.2
0.1
0.0
−4 −2 0 2 4
x
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.2 Probit and logit models 49 | 144
Comparing parameters
I For comparing the parameter √ estimates in both models, a scaling
different from the factor π/ 3 ≈ 1.8 is recommended.
I The parameters should be scaled such that the maximal effects
(obtained at x 0 β = 0) are comparable:
√1 e 0
maxz φ(z) φ(0) 2π 4
= = e0
=√ ≈ 1.6 =: ρ
maxz λ(z) λ(0) (e 0 +1)2
2π
λ(z)
e = ρ · λ(ρz) and λ(0)
e = φ(0)
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.2 Probit and logit models 51 | 144
I Again, one can compute the average effect, or the effect for the
average characteristics.
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.2 Probit and logit models 53 | 144
ML estimator (MLE)
I Log-likelihood function:
N
X
`(β) = `(β; y ) = ln[L(β)] = {yi ln(pi ) + (1 − yi ) ln(1 − pi )}
i=1
N
X
= {yi ln[G(xi0 β)] + (1 − yi ) ln[1 − G(xi0 β)]} .
i=1
Numerical procedures
I There is no explicit solution to the likelihood equations.
I However, in case of both the logit and the probit model the
log-likelihood function `(β) is globally concave, so that a unique
MLE exist.
Fisher information
I The Fisher information is the negative expected Hessian matrix
(here: conditional expectation given X if X is random), i.e.:
N
g(xi0 β)2
X
∂s(β)
I(β) = −Eβ [H(β)] = −Eβ 0
= xi xi0 .
∂β i=1
p i (1 − pi )
Statistical inference
I For large N, the following approximate distribution can be used:
βb ≈ NK (β, V b suitable estimator of I(β)−1 ).
b ) (V
(a) V b −1
b1 = I(β)
" N #−1
X ∂`i ∂`i
(b) V
b2 =
i=1
∂β ∂β 0 β=βb
" #−1
∂ 2 `
(c) V3 = −
b
∂β∂β 0 β=βb
" N
#−1
X g(xi0 β)
b2
(a) ⇒ V
b1 = xi x 0
bi ) i
bi (1 − p
p
i=1
N
" #−1
X (yi − pbi )2
(b) ⇒ V
b2 =
2 2
g(xi0 β)
b 2 xi x 0
i
p
i=1 i
b (1 − p
b i )
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.3 Maximum likelihood estimation 64 | 144
N
∂ 2 `(β) X
⇒ = − pi (1 − pi )xi xi0 n.d.
∂β∂β 0 i=1
N
X
I Assume that X 0 X = xi xi0 is regular (p.d.) and pi ∈ (0, 1),
i=1
implying λi = pi (1 − pi ) > 0 (∀i).
I Note that the Hessian matrix does not depend on y .
Probit model
I But the Hessian matrix is here again negative definite, so that there
are generally no problems with the numerical determination of the
MLE.
Perfect prediction
I An MLE does not always exist.
For example, if rank(X ) < K , then the parameter β is not
identifiable (as in the linear case).
Assuming rank(X ) = K (achievable e.g. by a re-parametrization of
the model) avoids that problem.
I However, in a nonlinear binary response model one may be
confronted with the so-called problem of perfect prediction.
. It is typically a problem of the sample at hand and not of
identification.
. It would possibly disappear if more data (or another sample)
were available.
-----------------------------------------------------------------------
childless | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------+---------------------------------------------------------
time | .0027483 .0025405 1.08 0.279 -.002231 .0077275
educ | .0314184 .0071296 4.41 0.000 .0174446 .0453923
white | .0625978 .0626362 1.00 0.318 -.0601669 .1853624
sibs |-.0117455 .0071229 -1.65 0.099 -.0257061 .0022152
_cons | -1.503 .1157881 -12.98 0.000 -1.729941 -1.27606
. estat ic
-----------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 5150 -2126.891 -2107.112 5 4224.223 4256.957
-----------------------------------------------------------------------------
. estat ic
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 5150 -2126.891 -2105.961 5 4221.922 4254.656
-----------------------------------------------------------------------------
M
Y nj ey
⇒ Likelihood function: pj j (1 − pj )nj −eyj
yej
j=1
bj = G(xj0 β).
where p b
I Under H0 :
d
LR −−−−→ χ2K −1 .
N→∞
McFadden’s pseudo R 2
ln(LU )
RF2 = 1 −
ln(LR )
I RF2 = 1 ⇔ LU = 1 (⇔ ln(LU ) = 0)
⇔p
bi = yi (∀i) (practically, not achievable for finite β)
b
(i.e. model provides perfect prediction).
I But values between 0 and 1 have no natural interpretation!
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.4 Model diagnostics 79 | 144
Model selection
I Comparison of model candidates m ∈ M e.g. by
2b 2|m|
AIC (m) = − `m + , where
N N
. `m is the maximized log-likelihood for model m
b
. |m| denotes the dimension (number of parameters) of model m
⇒ Minimizing AIC (m) over m ∈ M provides trade-off between
good model fit (small bias) and low model complexity (small
estimation error/variance)
. AIC (m) - approximately unbiased estimate of (twice the)
expected Kullback-Leibler discrepancy of model m
2 2
. NLM: AIC (m) = ln(b σm ) + 2|m|/N (bσm MLE of σ 2 under m)
. Min.-AIC-procedure is (under ass.) asymptotically optimal
I BIC (m) uses factor ln(N) instead of 2 as penalty for |m|
(under assumptions: consistent model selection procedure)
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.4 Model diagnostics 80 | 144
Predictive quality
2 × 2 classification table
I Results can be summarized in a 2 × 2 classification table of the
predicted responses ybi against the observed responses yi :
Actual value
yi = 1 yi = 0 Total
Predicted ybi = 1 TP FP TP+FP
outcome ybi = 0 FN TN FN+TN
Total TP+FN FP+TN N
Analysis of residuals
I MLE is inconsistent if the model is not correctly specified.
I yi |xi ∼ Bernoulli(pi )
⇒ E(yi |xi ) = pi and Var(yi |xi ) = pi (1 − pi )
⇒ Pearson (or “standardized”) residuals:
yi − pbi
ri = p
bi (1 − p
p bi )
I Case of covariate patterns (as before)
P
. yej := i:xi =xj yi ∼ Bin(nj , pj )
⇒ E(e
yj |xj ) = nj pj and V(e
yj |xj ) = nj pj (1 − pj )
M
yej − nj p
bj X
⇒ rj = p ⇒ χ2 = rj2
bj (1 − p
nj p bj ) j=1
Applied Econometrics – Chapter 5
5.2 Binary response models | 5.2.4 Model diagnostics 88 | 144
Contents I
5.1 Introduction
5.2 Binary response models
5.2.1 Introduction & model formulation
5.2.2 Probit and logit models
5.2.3 Maximum likelihood estimation
5.2.4 Model diagnostics
5.3 Limited dependent variables
5.3.1 Introduction
5.3.2 Truncation and censoring
5.3.3 Truncated regression model
5.3.4 Censored regression model
5.A Literature
Truncation
4 4
2 2
YSTAR
0 0
Y
-2 -2
-4 -4
-6 -6
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
X X
Censoring
Graphical illustration
YSTAR vs. X
of censoringY vs.effects
X
6 6
4 4
2 2
YSTAR
0 0
Y
-2 -2
-4 -4
-6 -6
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
X X
0, otherwise.
0.8
truncated normal
normal
0.6
f(y)
0.4
0.2
0.0
a µ
Remarks
(i) We always have (∀z): 0 < δ(z) < 1 and 0 < δ(z)
e < 1.
φ(−z) φ(z)
(ii) λ(−z) = = = −λ(z).
e
1 − Φ(−z) Φ(z)
(iii) Truncation reduces the variance.
(iv) Truncation from below (above) increases (reduces) the expectation.
(v) For a = 0 it follows:
φ(µ/σ)
E(Y |Y > 0) = µ + σ ,
Φ(µ/σ)
φ(z)
where λ(−z) = −λ(z)
e = is the “inverse Mills ratio”.
Φ(z)
(vi) (a2) follows from (a1), since −Y ∼ N(−µ, σ 2 ) and
E(Y |Y < a) = −E(−Y | − Y > −a).
Applied Econometrics – Chapter 5
5.3 Limited dependent variables | 5.3.2 Truncation and censoring 109 | 144
Example
I Let Yi∗ denote the price that an individual i is willing to pay for a
good (e.g. a refrigerator).
1.00
1.00
0.75
0.75
0.50
0.50
F(y)
f(y)
Φ((a − µ) σ)
Φ((a − µ) σ)
(
o
a µ a µ
y y
I Remarks:
(i) λ(z) and δ(z) are explained in Theorem 5.1.
(ii) For a = 0 it follows E(Y ) = Φ(µ/σ) · µ + σφ(µ/σ).
yi = xi0 β + εi , i = 1, ..., N
a − xi0 β
E(yi |xi ; yi > a) = xi0 β + σλ ,
σ
a − xi0 β
V(yi |xi ; yi > a) = σ 2 1 − δ .
σ
Marginal effects
I Under truncation:
a − xi0 β
∂E(yi |xi ; yi > a) ∂
= βj + σ λ
∂xij ∂xij σ
0
a − xi β βj
= βj + σδ · −
σ σ
0
a − xi β
= βj 1 − δ
σ
I There, we have used the following result for the derivative of λ(z):
φ(z) φ(z)
λ0 (z) = · −z = δ(z).
1 − Φ(z) 1 − Φ(z)
I Truncation leads to a shrinking of βj .
⇒ Correction of the truncation effect is necessary!
I For interpretation, we calculate average values of these effects (over
the individuals).
I The relative effects of the j-th and the k-th explanatory variable
remains βj /βk , since the shrinking factors for βj and βk are equal.
Parameter estimation
N N
Y Y f (yi )
L(β, σ 2 ) = fa (yi ) =
i=1 i=1
1 − F (0)
yi −xi0 β 0
N 1
φ N φ yi −xi β
Y σ σ Y σ
= 0 =
−xi β
0
xi β
i=1 1 − Φ σ i=1 σΦ σ
z 2
⇒ Log-Likelihood (note: φ(z) = √1 e − 2 )
2π
`(β, σ 2 ) ln L(β, σ 2 )
=
N N
= − ln(σ 2 ) − ln(2π)
2 2
N N
1 X X
− 2 (yi − xi0 β)2 − ln[Φ(xi0 β/σ)]
2σ i=1 i=1
Marginal effects
xi0 β
I The difference to βj is small (large), if σ is large (small).
xi0 β
I This is not surprising, since for large σ also yi∗ will be large, so
censoring occurs only rarely.
x 0β
I On the other hand, if iσ is small, we mostly get yi = 0 and
therefore large probabilities P(yi = 0|xi ).
Parameter estimation
I Linear OLS is based on
yi = xi0 β + ηi (i = 1, . . . , N).
⇒ ηi = εi + σλi with
−xi0 β φ(xi0 β/σ)
λi = λ =
σ Φ(xi0 β/σ)
⇒ E(ηi |xi ; yi > 0) 6= 0
X
(yi − Φi xi0 β − σφi )2 → min
β,σ
i
I ML estimation:
|I1 | =: N1 = N − N0
0 1−di h idi
x β y −x 0 β
⇒ f (yi |xi ) = Φ − iσ · σ1 φ i σ i
⇒ Likelihood function:
N
Y
L(β, σ 2 ) = f (yi |xi )
i=1
N 0 1−di di
yi − xi0 β
Y xβ 1
= 1−Φ i · φ
i=1
σ σ σ
0 Y
Y yi − xi0 β
xβ 1
= 1−Φ i · φ
σ σ σ
i∈I0 i∈I1
⇒ Log-likelihood function:
0
X xβ
2
`(β, σ ) = ln 1 − Φ i
σ
i∈I0
⇒ Likelihood equations:
N
∂` 1 X σφi 0
= −(1 − di ) + di (yi − xi β) xi = 0,
∂β σ 2 i=1 1 − Φi
N
φi xi0 β (yi − xi0 β)2
∂` X di 1
= (1 − di ) + − 2 = 0.
∂σ 2 i=1
3
2σ (1 − Φi ) 2 σ 4 σ
yi = xi0 β + εi .
I Tobit Model ⇒
yi = 1|xi )
P(e = P(yi > 0|xi )
0 0
xβ xβ
= 1−Φ − i =Φ i
σ σ
0
xβ
yi = 0|xi )
P(e = P(yi = 0|xi ) = 1 − Φ i
σ
I 1st step:
Estimate γ = β/σ using ML in the probit model:
φ (xi0 γ)
E(yi |xi , yi > 0) = xi0 β + σ
Φ(x 0 γ)
| {zi }
=λ(−xi0 γ)=:λi
bi = λ(−x 0 γ
. Replace λi by λ i b) and regress
yi on xi0 β + σ λ
bi (yi > 0).
H1 : σi2 = exp(xi0 α)
⇒ H0 : α2 = . . . = αK = 0
. LM test requires only calculation of MLE under H0 :
!−1
∂ 2 `
∂` ∂` as. 2
− 0 ∼ χK −1 .
∂θ b θH ∂θ∂θ0 b θH ∂θ b θH H0
| {z }
=I(b θH )−1
Remarks
Empirical Example
To be added.
Contents I
5.1 Introduction
5.2 Binary response models
5.2.1 Introduction & model formulation
5.2.2 Probit and logit models
5.2.3 Maximum likelihood estimation
5.2.4 Model diagnostics
5.3 Limited dependent variables
5.3.1 Introduction
5.3.2 Truncation and censoring
5.3.3 Truncated regression model
5.3.4 Censored regression model
5.A Literature
5.A Literature
I Amemiya, T. (1985). Advanced Econometrics. Harvard University Press.
Cambridge, Ma.
I Cameron, A. C. and Trivedi, P. K. (2005). Microeconometrics - Methods
and Applications. Cambridge University Press.
I Heij, C.; de Boer, P.; Franses, P. H.; Kloek, T. and van Dijk, H. K.
(2004). Econometric Methods with Applications in Business and
Economics. Oxford University Press.
I McCullagh, P. and Nelder, J. A. (1983). Generalized Linear Models.
Chapman and Hall, London.
I Nelson, F. D. (1977). Censored Regression Models with Unobserved,
Stochastic Censoring Threshold. Journal of Econometrics 6, 309-327.
I Nelson, F. D. (1981). A Test for Misspecification in the Censored Normal
Model. Econometrica 49, 1317-1329.