You are on page 1of 79

Recent Developments in Panel Models for Count Data

Pravin K. Trivedi

Pravin K. Trivedi
Indiana University. - Bloomington
Prepared for 2010 Mexican Stata Users Group meeting,
based on
A. Colin Cameron and Pravin K. Trivedi (2005),
Microeconometrics: Methods and Applications (MMA), C.U.P.
MMA, chapters 21-23
and
A. Colin Cameron and Pravin K. Trivedi (2010),
Microeconometrics using Stata Revised edition (MUSR), Stata Press.
MUSR, chapters 8;18.

April 29, 2010

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 1A./ Colin
77

Introduction

0. Dedication

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 2A./ Colin
77

Introduction

1. Introduction

Objective 1: To survey recent developments in count data panel


models
Objective 2: Evaluate the advances made against background of main
features of count data
Objective 3: Highlight the areas where signicant gaps exist and
review the most promising approaches

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 3A./ Colin
77

Introduction

Background (1)
Panel data are repeated measures on individuals (i ) over time (t ):
data are (yit , xit ) for i = 1, ..., N and t = 1, ..., T , and yit are
nonnegative integer-valued outcomes.
Conditional on xit , the yit are likely to be serially correlated for a
given i, partly because of state dependence and partly because of
cserial correlation in shocks.
Hence each additional year of data is not independent of previous
years.
Cross-sectional dependence between observations is also to be
expected given emphasis on stratied clustered sampling designs.
(1) Pervasive unobserved heterogeneity, (2) a typically high
proportion of zeros, (3) inherent discreteness and heteroskedasticity
generate complications that are hard to handle simultaneously
Finally, the researchers interest often goes beyond the conditional
mean.
How well does available software (Stata) handle these issues?
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 4A./ Colin
77

Introduction

Basic linear panel models

2. Basic linear panel models review


Pooled model (or population-averaged)
yit = + xit0 + uit .

(1)

Two-way eects model allows intercept to vary over i and t


yit = i + t + xit0 + it .

(2)

Individual-specic eects model


yit = i + xit0 + it ,

(3)

where i may be xed eect or random eect.


Mixed model or random coe cients model allows slopes to vary over i
yit = i + xit0 i + it .
Pravin K. Trivedi

(4)

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 5A./ Colin
77

Introduction

Fixed versus random eects model

3. Fixed eects versus random eects model


Individual-specic eects model:
yit = xit0 + (i + it ).
Fixed eects (FE):
I
I
I
I

i is a random variable possibly correlated with xit


so regressor xit may be endogenous (wrt to i but not it )
e.g. education is correlated with time-invariant ability
pooled OLS, pooled GLS, RE are inconsistent for
within (FE) and rst dierence estimators are consistent.

Random eects (RE) or population-averaged (PA):


I
I
I

i is purely random (usually iid (0, 2 )) unrelated to xit


so regressor xit is exogenous
appropraite FE and RE estimators are consistent for

Fundamental divide: microeconometricians FE versus others RE.


Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 6A./ Colin
77

Nonlinear panel models

Overview

4. Some features of nonlinear panel models


In contrast to linear models, solutions for nonlinear models tend to
lack generality and are model-specic.
Standard count models include: Poisson and negative binomial
General approaches are similar to those for the linear case
I
I
I

Pooled estimation or population-averaged


Random eects
Fixed eects

Complications
I
I
I

Pravin K. Trivedi

Random eects often not tractable so need numerical integration


Fixed eects models in short panels are generally not estimable due to
the incidental parameters problem.
Count models involve discreteness, nonlinearity and intrinsic
heteroskedasticity.

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 7A./ Colin
77

Nonlinear panel models

Overview

Some Standard Cross-section Count Models

1
2

f (y )
Poisson
NB1

NB2

Hurdle

ZI

FMM

Pravin K. Trivedi

f (y ) = Pr[Y = y ]
e y /y !
As in NB2 below with 1 replaced by 1
1
y
( 1 + y )
1

1
1
1
( ) (y + 1 ) +
+
8
f1 (0 )
if y = 0,
<
1 f1 (0 )
f2 (y )
if y
1.
:
1 f2 (0 )
f1 (0) + (1 f1 (0))f2 (0)
if y = 0,
(1 f1 (0))f2 (y )
if y
1.
m
j =1

j fj (y jj )

Mean; Variance
(x); (x) = exp(x0 )
(x); (1 + ) (x)
(x) ; (1 + (x)) (x)

Pr [y > 0jx]Ey >0 [y jy > 0, x

(1

f1 (0))

((x)+f1 (0)2 (x))


2i =1 i i (x);
2i =1 i [i (x) + 2i (x)]

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 8A./ Colin
77

Nonlinear panel models

Overview

A pooled or population-averaged (PA) model may be used.


I

This is same model as in cross-section case, with adjustment for


correlation over time for a given individual.

A fully parametric model may be specied, with separable


heterogeneity and conditional density
f (yit ji , xit ) = f (yit , i + xit0 , ),

t = 1, ..., Ti , i = 1, ...., N, (5)

f (yit ji , xit ) = f (yit , i + xit0 i , ),

t = 1, ..., Ti , i = 1, ...., N, (6)

or nonseparable heterogeneity

where denotes additional model parameters such as variance


parameters and i is an individual eect.
A semiparametric conditional mean (usually exponential mean) model
may be specied, with additive eects
E[yit ji , xit ] = i + g (xit0 )

(7)

g (xit0 ).

(8)

or multiplicative eects

E[yit ji , xit ] = i
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group meeting,
April 29, 2010
based on 9A./ Colin
77

Nonlinear panel models

Overview

5. Evolution of Panel Models (1)

Focus on panel methods most commonly used by


microeconometricians. The underlying asymptotic theory assumes
short panels (T small, N large): data on many individual units and
few time periods.
The key paper in the modern treatment of panel analysis for counts is
Hausman et al. (1984).
The developments since 1984 can be summarized in generational
terms as follows.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on10A./ Colin
77

Nonlinear panel models

Overview

Evolution of Panel Models (2)


G-1 models

G-2 models

G-3 models

Period

1974-1990

1991-2000

Post-2000

Function

Mainly parametric

Flexible parametric

Parametric / SP

CS models

Poisson, Negbin

Hurdles, nite

Quantile reg;

mixtures, ZIP

Selection models

Panel models

Poisson, Negbin

Poisson, Negbin, EM

EM; QR;

Unobs. hetero.

Multiplicative

Separable or nonsep.

Flexible; non-sep

Modeling

Mainly RE or PA

RE, PA and

RE/PA/FE/

xed eects

Correlated RE; DV

Robust

Robust

Robust or Cl-Rob

wrt overdispersion

wrt OD

wrt OD/SC

Dynamics

Lagged xs

Exponential feedback

Linear or exponential

Endogeneity

Largely ignored

Allowed in RE models

Allowed in RE and FE

Estimators

Mainly MLE

MLE; GEE; NLIV;

MLE; GEE; NLGMM;

Variance est.

Pravin K. Trivedi

QR; QRIV

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on11A./ Colin
77

Nonlinear panel models

Overview

6. Remarks on the evolution of count panel models (2)

FE panel data counterparts of several popular cross-section models


like hurdles, FMM, and ZIP are undeveloped.
When several complications occur simultaneously (e.g. nonseprable
individual-specic eects and endogenous regressors) they are most
conveniently analyzed in a RE or PA or moment-based models.
Fully parametric methods for simultaneously handling endogeneity
plus something else (e.g. nonseparable UH) are largely absent, and
moment-based methods are a dominant alternative.
Overdispersion-robust and cluster-robust estimation of variances is
now feasible and very common.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on12A./ Colin
77

Nonlinear panel estimators

Pooled or population-averaged estimators

7. Nonlinear: Pooled or population-averaged estimators


Extend pooled OLS to the nonlinear case
I
I

Give the usual cross-section command for conditional mean models or


conditional density models but then get cluster-robust standard errors
Poisson example:
poisson y x, vce(cluster id)
or
xtgee y x, fam(poisson) link(log) corr(ind) vce(cluster
id)

Extend pooled feasible GLS to the nonlinear case


I
I

Pravin K. Trivedi

Estimate with an assumed correlation structure over time


Equicorrelated probit example:
xtpoisson y x, pa vce(boot)
or
xtgee y x, fam(poisson) link(log) corr(exch) vce(cluster
id)

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on13A./ Colin
77

Nonlinear panel estimators

Random eects estimators

Nonlinear random eects estimators


Assume individual-specic eect i has specied distribution g (i j).
Then the unconditional density for the i th observation is
f (yit , ..., yiT jxi 1 , ..., xiT , , , )
Z h
i
T
=
f
(
y
j
x
,

,
,

)
g (i j)d i .
t =1 it it i

(9)

Analytical solution:
I
I
I

For Poisson with gamma random eect


For negative binomial with gamma eect
Use xtpoisson, re and xtnbreg, re

No analytical solution:
I
I
I
I
I

Pravin K. Trivedi

For other models.


Instead use numerical integration (only univariate integration is
required).
Assume normally distributed random eects.
Use re option for xtlogit, xtprobit
Use normal option for xtpoisson and xtnbreg

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on14A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

9. Finite Mixture or Latent Class model


Suppose the sample is generated from the following dgp:
f (yit jxit , ) =

C 1

j =1

j fj (yit jxit , j ) + C fC (yi jxit , C ),

(10)

where Cj=1 j = 1, j > 0 (j = 1, ..., C ). For identiability, use labelling


restriction 1 2 .... C , always satised by rearrangement,
postestimation.
This specication accommodates discrete nonseparable heterogeneity
between latent classes.
Long history in statistics; see McLachlan and Basford (1988). Earlier
treatments emphasized univariate formulations; (Lindsey, 1995)
emphasized identication and complexity. Special cases: Heckman
and Singer (1984)
b C
b ) that maximizes L(,,C jy) is
Probability distribution f (yi j;
called the semiparametric maximum likelihood estimator

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on15A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

f (yi j j ) can itself be a exiblefunctional form that accommodates


within-class heterogeneity
C can be chosen using the hypothesis testing approach or model
comparison approach
Determining the number of components is a nonstandard inference
problem as testing at boundary of parameter space.
I
I

Simple approach is to use BIC or CAIC.


Or do appropriate bootstrap for the likelihood ratio test.

Can be implemented using Statas fmm command such as


fmm y $xlist1, vce(robust) components(3) mixtureof(poisson)

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on16A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

10. Quantile regression


b minimizes over
The q th quantile regression estimator
q
q
N

Q ( q ) =

i :yi xit
I

q jyit

xit0 q j +

(1

i :yi <xit

q )jyit

xit0 q j,

0<q<1

Example: median regression with q = 0.5.

Continuation transform: For count y adapt standard methods for


continuous y by:
I
I
I

Pravin K. Trivedi

Replace count y by continuous variable z = y + u where


u Uniform [0, 1].: "jittering step"
Then reconvert predicted z-quantile to y -quantile using ceiling
function.
Machado and Santos Silva (JASA, 2005).

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on17A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

Adapting to the exponential mean


Conventional count models based on exponential conditional mean,
exp(x0 ), rather than x0 .
Qq (y jx) and Qq (z jx) denote the q th quantiles of the conditional
distributions of y and z, respectively. To allow for exponentiation,
Qq (z jx) is specied to be
Qq (z jx) = q + exp(x0 q ).
The additional term q appears because Qq (z jx) bounded from below
by q, due to jittering.
Log transformation is applied so that ln(z
adjustment if z q < 0

q ) is modelled, with the

Transformation justied by the property that quantiles are equivariant


to monotonic transformation
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on18A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

Implementation
Post-estimation transformation of the z-quantiles back to y -quantiles
uses the ceiling function, with
Qq (y jx) = dQq (z jx)

1e

where the symbol dr e in the right-hand side denotes the smallest


integer greater than or equal to r .
To reduce the eect jittering the model is estimated multiple times
using independent draws from U (0, 1) distribution, and estimated
coe cients and condence interval endpoints are averaged. Hence
the estimates of the quantiles of y counts are based on
b ) 1e, where
b denotes
b q (y jx) = dQq (z jx) 1e = dq + exp(x0
Q
q
the average over the jittered replications.
Variance estimation usually based on computationally intensive
bootstrap

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on19A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

QCR method of Machado and Santos Silva can be implemented using


Stata add-on command qcount, due to Miranda (2006). The
command syntax is:

qcount depvar [indepvars ] [if ] [in], quantile(number ) [, repetition(#)


where quantile(number ) species the quantile to be estimated and
repetition(#)species the number of jittered samples to be used to
calculate the parameters of the model, the default value being 1000.
Panel models can be estimated treating the data as repeated cross
sections, as in PA approach.
Main attraction is the ability to study dierences in marginal eects
at dierent quantiles.
The post-estimation command qcount_mfx computes marginal
eects for the model, evaluated at the means of the regressors.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on20A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

QCR Example - Winkelmann, JHE 2006


Using an unbalanced sample (1995-1999) from GSOEP, Winkelmann
analyzes the dierential impact of healthcare reform on distribution of
doctor visits across quantiles.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on21A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

QR and panel data: pros and cons

Excess zeros can make identication of lower quantiles di cult.


Can QR accommodate xed and random eects?
Interpretation of xed eects in QR context is somewhat tenuous; see
Koenker (2004).
QR has been extended to accommodate censoring, endogenous
regressors; see Chernozhukov et al (2009)
QR has also been extended to handle lagged dependent variable.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on22A./ Colin
77

Nonlinear panel estimators

Random slopes estimators

11. Nonlinear random slopes estimators


Can extend to random slopes by adding an assumption about the
distribution of slopes.
I
I
I

Nonlinear generalization of xtmixed


Then higher-dimensional numerical integral.
Use adaptive Gaussian quadrature

Stata commands are:


I
I

xtmelogit for binary data


xtmepoisson for counts

Stata add-on that is very rich:


I
I

Pravin K. Trivedi

gllamm (generalized linear and latent mixed models); can be quite


slow!
Developed by Sophia Rabe-Hesketh and Anders Skrondal.

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on23A./ Colin
77

Nonlinear panel estimators

Fixed eects estimators

12. Nonlinear xed eects estimators


In general not possible in short panels.
Incidental parameters problem:
I
I
I
I

N xed eects i plus K regressors means (N + K ) parameters


But (N + K ) ! as N !
Need to eliminate i by some sort of dierencing, or concentrated
likelihood argument
possible for Poisson, negative binomial

Stata commands
I
I

xtpoisson, fe (better to use xtpqml as robust ses)


xtnbreg, fe

Fixed eects extensions to hurdle, nite mixture, zero-inated models


currently not available.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on24A./ Colin
77

Nonlinear panel estimators

Fixed eects estimators

Incidental parameters in Poisson regression


Derivation of xed eects estimator for the Poisson panel
Poisson MLE simultaneously estimates and 1 , ..., N . The
log-likelihood is
i
h
ln L( , ) = ln i t fexp( i it ) (i it )yit /yit !g
h
= i
i t it + ln i t yit + t yit ln it

where it = exp(xit0 ).
FOC with respect to i yields b
i = t yit / t it (a su cient
statistic for i )
Substituting this yields the concentrated likelihood function.
Dropping terms not involving ,
h
i
ln Lconc ( ) _ i t yit ln it yit ln s is .
Pravin K. Trivedi

t ln yit(1!

(12)

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on25A./ Colin
77

Nonlinear panel estimators

Fixed eects estimators

Interpretation
Here is no incidental parameters problem.
Consistent estimates of for xed T and N ! can be obtained by
maximization of ln Lconc ( )
FOC with respect to yields rst-order conditions
h
h
i h
ii
y
x
y

x
/

= 0,
it
it
it
is
is
is
s
s
i t
that can be re-expressed as

Pravin K. Trivedi

xit

i =1 t =1

yit

it
yi
i

= 0,

(13)

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on26A./ Colin
77

Nonlinear panel estimators

Fixed eects estimators

FE Poisson: pros and cons


Time-invariant regressors will be eliminated also by the
transformation. Some marginal eects not identied.
May substitute individual specic dummy variables, though this raises
some computational issues.
Poisson and linear panel model special in that simultaneous
estimation of and provides consistent estimates of in short
panels, so there is no incidental parameters problem.
The above assumes strict exogeneity of regressors.
We can handle endogenous regressors under weak exogeneity
assumption. A moment condition estimator can be dened using the
FOC (13).
This FE approach does not extend to several empiricaly important
models: hurdle, fmm, and zip.
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on27A./ Colin
77

Nonlinear panel estimators

Fixed eects estimators

Ad hoc methods for handling xed eects

Are we making too much of the xed eects and the associated
incidental paramnetr problem?
The dummy variables solution; Allison (2009); Greene (2004).

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on28A./ Colin
77

Computation with Stata Commands

13. Stata Commands


Nonlinear panel estimators
Estimator
Pooled
Quantile

Count data
poisson; nbreg
qcount
q(%), rep(#)
fmm components(#) mixtureof(poisson)
FMM
fmm components(#) mixtureof(nbreg)
GEE (PA)
xtgee,family(poisson)
xtgee,family(nbinomial)
RE
xtpoisson, re
xtnbreg, fe
Random slopes xtmepoisson
FE
xtpoisson, fe
xtnbreg, fe
FMM is not part of o cial Stata but is in the public domain and can
be added
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on29A./ Colin
77

Computation with Stata Commands

Data example

Panel counts: data example


Data from Rand health insurance experiment.
I

y is number of doctor visits.

. use mus18data.dta, clear


. describe mdu lcoins ndisease female age lfam child id year
variable name
mdu
lcoins
ndisease
female
age
lfam
child
id
year

Pravin K. Trivedi

storage
type
float
float
float
float
float
float
float
float
float

display
format
%9.0g
%9.0g
%9.0g
%9.0g
%9.0g
%9.0g
%9.0g
%9.0g
%9.0g

value
label

variable label
number face-to-fact md visits
log(coinsurance+1)
count of chronic diseases -- ba
female
age that year
log of family size
child
person id, leading digit is sit
study year

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on30A./ Colin
77

Computation with Stata Commands

Data example

b [y ] = 4.502 ' 7
Dependent variable mdu is very overdispersed: V

y .

. summarize mdu lcoins ndisease female age lfam child id year


Variable

Obs

Mean

mdu
lcoins
ndisease
female
age

20186
20186
20186
20186
20186

2.860696
2.383588
11.2445
.5169424
25.71844

lfam
child
id
year

20186
20186
20186
20186

1.248404
.4014168
357971.2
2.420044

Pravin K. Trivedi

Std. Dev.

Min

Max

4.504765
2.041713
6.741647
.4997252
16.76759

0
0
0
0
0

77
4.564348
58.6
1
64.27515

.5390681
.4901972
180885.6
1.217237

0
0
125024
1

2.639057
1
632167
5

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on31A./ Colin
77

Computation with Stata Commands

Data example

Panel is unbalanced. Most are in for 3 years or 5 years.


. xtdescribe
id:
year:

125024, 125025, ..., 632167


1, 2, ..., 5
Delta(year) = 1 unit
Span(year) = 5 periods
(id*year uniquely identifies each observation)

Distribution of T_i:

Pravin K. Trivedi

min
1

5%
2

25%
3

50%
3

n =
T =

75%
5

5908
5

95%
5

max
5

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on32A./ Colin
77

Computation with Stata Commands

Data example

For mdu both within and between variation are important.


. * Panel summary of dependent variable
. xtsum mdu
Variable
mdu

Mean
overall
between
within

2.860696

Std. Dev.
4.504765
3.785971
2.575881

Min

Max

Observations

0
0
-34.47264

77
63.33333
40.0607

N =
20186
n =
5908
T-bar = 3.41672

Only time-varying regressors are age, lfam and child


And these have mainly between variation.
This will make within or xed estimator very imprecise.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on33A./ Colin
77

Likelihood-based Panel Count Estimators

14. Panel Poisson

Consider four panel Poisson estimators


I
I
I
I

Pooled Poisson with cluster-robust errors


Population-averaged Poisson (GEE)
Poisson random eects (gamma and normal)
Poisson xed eects

Can additionally apply most of these to negative binomial.


And can extend FE to dynamic panel Poisson where yi ,t
regressor.

Pravin K. Trivedi

is a

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on34A./ Colin
77

Likelihood-based Panel Count Estimators

MLE Estimation

Model

Moment spec.

Estimating equations

Pooled

E[y it jx it ] = exp (x it0 )

T
it ) = 0
N
i =1 t =1 xit (yit
where it = exp(xit0 )
ts = Cor[(yit exp(xit0 ))(yis exp
yi + /T
T
it
N
i =1 t =1 xit yit
i + /T
i = T 1 t exp(xit0 ); = var(i )
yi
T
it
= 0,
N
i =1 t =1 xit yit
i

Poisson
PA
RE

E[y it ji , x it ]

Poisson

= i exp (x it0 )

FE Pois

E[y it ji , x it ] = i exp (x it )

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on35A./ Colin
77

Likelihood-based Panel Count Estimators

Pooled Poisson

15. Panel Poisson method 1: pooled Poisson

Specify
yit jxit ,

Poisson[exp(xit0 )]

Pooled Poisson of yit on intercept and xit gives consistent .


I
I

Pravin K. Trivedi

But get cluster-robust standard errors where cluster on the individual.


These control for both overdispersion and correlation over t for given i.

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on36A./ Colin
77

Likelihood-based Panel Count Estimators

Pooled Poisson

Pooled Poisson with cluster-robust standard errors


. * Pooled Poisson estimator with cluster-robust standard errors
. poisson mdu lcoins ndisease female age lfam child, vce(cluster id)
Iteration 0:
Iteration 1:
Iteration 2:

log pseudolikelihood = -62580.248


log pseudolikelihood = -62579.401
log pseudolikelihood = -62579.401

Poisson regression

Number of obs
Wald chi2(6)
Prob > chi2
Pseudo R2

Log pseudolikelihood = -62579.401

=
=
=
=

20186
476.93
0.0000
0.0609

(Std. Err. adjusted for 5908 clusters in id)


mdu

Coef.

lcoins
ndisease
female
age
lfam
child
_cons

-.0808023
.0339334
.1717862
.0040585
-.1481981
.1030453
.748789

Robust
Std. Err.
.0080013
.0026024
.0342551
.0016891
.0323434
.0506901
.0785738

z
-10.10
13.04
5.01
2.40
-4.58
2.03
9.53

P>|z|
0.000
0.000
0.000
0.016
0.000
0.042
0.000

[95% Conf. Interval]


-.0964846
.0288328
.1046473
.000748
-.21159
.0036944
.5947872

-.0651199
.039034
.2389251
.0073691
-.0848062
.2023961
.9027907

By comparison, the default (non cluster-robust) s.e.s are 1/4 as large.


) The default (non cluster-robust) t-statistics are 4 times as large!!

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on37A./ Colin
77

Likelihood-based Panel Count Estimators

Population-averaged Poisson (PA or GEE)

16. Panel Poisson method 2: population-averaged


Assume that for the i th observation moments are like for GLM Poisson
E[yit jxit ] = exp(xit0 )

V[yit jxit ] =

exp(xit0 ).

Stack the conditional means for the i th individual:


2
3
exp(xi0 1 )
6
7
..
E[yi jXi ] = mi ( ) = 4
5.
.
0
exp(xiT )

where yi = [yi 1 , ..., yiT ]0 and Xi = [xi 1 , ..., xiT ]0 .


Stack the conditional variances for the i th individual.
I

With no correlation
V[yi jXi ] = Hi ( ) =

Pravin K. Trivedi

Diag[exp(xit0 )].

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on38A./ Colin
77

Likelihood-based Panel Count Estimators

Population-averaged Poisson (PA or GEE)

Assume a pattern R() for autocorrelation over t for given i so


V[yi jXi ] = Hi ( )1/2 R()Hi ( )1/2
This is called a working matrix.
I
I
I

Pravin K. Trivedi

Example: R() = I if there is no correlation


Example: R() = R() has diagonal entries 1 and o diagonal entries
if there is equicorrelation.
Example: R() = R where diagonal entries 1 and o-diagonals
unrestricted (< 1).

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on39A./ Colin
77

Likelihood-based Panel Count Estimators

GLM Estimation

17. Statas GEE command


The GLM estimator solves: N
i =1

mi0 ( )
1
Hi ( ) (yi

mi ()) = 0.

Generalized estimating equations (GEE) estimator or


population-averaged estimator (PA) of Liang and Zeger (1986) solves

i =1
N

mi0 ( ) b 1
i (yi

mi ( )) = 0,

b i equals i in with R() replaced by R(b


where
) where plim b
= .
Cluster-robust estimate of the variance matrix of the GEE estimator is
b
b [
b0 b
V
GEE ] = D

1b

g =1 Dg0 b g 1 ubg ubg0 b g 1 Dg


G

b = [D
b 1 , ..., D
b G ]0 , u
b g = mg0 ( )/ b , D
where D
bg = yg

1/2
1/2
b ) R(
b) .
b g = Hg (
and now
b ) Hg (
I

Pravin K. Trivedi

The asymptotic theory requires that G ! .

b
D0

b ),
mg (

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on40A./ Colin
77

Likelihood-based Panel Count Estimators

GLM Estimation

Population-averaged Poisson with unstructured correlation


GEE population-averaged model
Group and time vars:
id year
Link:
log
Family:
Poisson
Correlation:
unstructured
Scale parameter:

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(6)
Prob > chi2

=
=
=
=
=
=
=

20186
5908
1
3.4
5
508.61
0.0000

(Std. Err. adjusted for clustering on id)


mdu

Coef.

lcoins
ndisease
female
age
lfam
child
_cons

-.0804454
.0346067
.1585075
.0030901
-.1406549
.1013677
.7764626

Semi-robust
Std. Err.
.0077782
.0024238
.0334407
.0015356
.0293672
.04301
.0717221

z
-10.34
14.28
4.74
2.01
-4.79
2.36
10.83

P>|z|
0.000
0.000
0.000
0.044
0.000
0.018
0.000

[95% Conf. Interval]


-.0956904
.0298561
.0929649
.0000803
-.1982135
.0170696
.6358897

-.0652004
.0393573
.2240502
.0060999
-.0830962
.1856658
.9170354

Generally s.e.s are within 10% of pooled Poisson cluster-robust s.e.s.


The default (non cluster-robust) t-statistics are 3.5 4 times larger,
No control for overdispersion.
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on41A./ Colin
77

Likelihood-based Panel Count Estimators

GLM Estimation

The correlations Cor[yit , yis jxi ] for PA (unstructured) are not equal.
But they are not declining as fast as AR(1).
. matrix list e(R)
symmetric e(R)[5,5]
c1
c2
r1
1
r2 .53143297
1
r3 .40817495 .58547795
r4 .32357326 .35321716
r5 .34152288 .29803555

Pravin K. Trivedi

c3

c4

c5

1
.54321752
.43767583

1
.61948751

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on42A./ Colin
77

Likelihood-based Panel Count Estimators

Poisson random eects

18. Panel Poisson method 3: random eects


Poisson random eects model is
yit jxit , , i

Poiss [i exp(xit0 )]

Poiss [exp(ln i + xit0 )]

where i is unobserved but is not correlated with xit .


RE estimator 1: Assume i is Gamma[1, ] distributed
I
I

closed-form solution exists (negative binomial)


E[yit jxit , ] = exp(xit0 )

RE estimator 2: Assume ln i is N [0, 2 ] distributed


I
I
I

Pravin K. Trivedi

closed-form solution does not exist (one-dimensional integral)


can extend to slope coe cients (higher-dimensional integral)
E[yit jxit , ] = exp(xit0 ) aside from translation of intercept.

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on43A./ Colin
77

Likelihood-based Panel Count Estimators

Poisson random eects

Poisson random eects (gamma) with panel bootstrap ses


Random-effects Poisson regression
Group variable: id

Number of obs
Number of groups

=
=

20186
5908

Random effects u_i ~ Gamma

Obs per group: min =


avg =
max =

1
3.4
5

Log likelihood

Wald chi2(6)
Prob > chi2

= -43240.556

=
=

529.10
0.0000

(Replications based on 5908 clusters in id)


mdu

Observed
Coef.

Bootstrap
Std. Err.

lcoins
ndisease
female
age
lfam
child
_cons

-.0878258
.0387629
.1667192
.0019159
-.1351786
.1082678
.7574177

.0086097
.0026904
.0379216
.0016242
.0308529
.0495487
.0754536

/lnalpha

.0251256

alpha

1.025444

z
-10.20
14.41
4.40
1.18
-4.38
2.19
10.04

P>|z|

-.1047004
.0334899
.0923942
-.0012675
-.1956492
.0111541
.6095314

-.0709511
.0440359
.2410442
.0050994
-.0747079
.2053816
.905304

.0270297

-.0278516

.0781029

.0277175

.9725326

1.081234

Likelihood-ratio test of alpha=0: chibar2(01) =

0.000
0.000
0.000
0.238
0.000
0.029
0.000

Normal-based
[95% Conf. Interval]

3.9e+04 Prob>=chibar2 = 0.000

- Bloomington (Prepared
Panel counts
for 2010 Mexican
Statatimes
Users Group
April
meeting,
29, 2010
based on44A./ Colin
77
The defaultIndiana
(nonUniversity.
cluster-robust)
t-statistics
are 2.5
larger

Pravin K. Trivedi

Likelihood-based Panel Count Estimators

Poisson random eects

19. Poisson xed eects with panel bootstrap ses

. xtpoisson mdu lcoins ndisease female age lfam child, fe vce(boot, reps(100) seed(10
(running xtpoisson on estimation sample)
Bootstrap replications (100)
1
2
3
4
5
..................................................
..................................................
Conditional fixed-effects Poisson regression
Group variable: id

Log likelihood

50
100

Number of obs
Number of groups

=
=

17791
4977

Obs per group: min =


avg =
max =

2
3.6
5

Wald chi2(3)
Prob > chi2

= -24173.211

=
=

4.64
0.2002

(Replications based on 4977 clusters in id)


mdu

Observed
Coef.

age
lfam
child

-.0112009
.0877134
.1059867

Bootstrap
Std. Err.
.0095077
.1125783
.0738452

z
-1.18
0.78
1.44

P>|z|
0.239
0.436
0.151

Normal-based
[95% Conf. Interval]
-.0298356
-.132936
-.0387472

.0074339
.3083627
.2507206

The default (non cluster-robust) t-statistics are 2 times larger.


Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on45A./ Colin
77

Likelihood-based Panel Count Estimators

Poisson random eects

Strength of xed eects versus random eects


I
I

Allows i to be correlated with xit .


So consistent estimates if regressors are correlated with the error
provided regressors are correlated only with the time-invariant
component of the error
An alternative to IV to get causal estimates.

Limitations:
I
I
I

Coe cients of time-invariant regressors are not identied


For identied regressors standard errors can be much larger
Marginal eect in a nonlinear model depend on i
MEj = E[yit ]/xit,j = i exp(xit0 ) j
and i is unknown.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on46A./ Colin
77

Likelihood-based Panel Count Estimators

Poisson random eects

Panel Poisson: estimator comparison


Compare following estimators
I
I
I
I
I

pooled Poisson with cluster-robust s.e.s


pooled population averaged Poisson with unstructured correlations and
cluster-robust s.e.s
random eects Poisson with gamma random eect and cluster-robust
s.e.s
random eects Poisson with normal random eect and default s.e.s
xed eects Poisson and cluster-robust s.e.s

Find that
I
I

Pravin K. Trivedi

similar results for all RE models


note that these data are not good to illustrate FE as regressors have
little within variation.

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on47A./ Colin
77

Likelihood-based Panel Count Estimators

Poisson random eects

20. Comparison of dierent Poisson estimators with


cluster-robust s.e.s
Variable

POOLED

POPAVE

-0.0808
0.0080
0.0339
0.0026
0.1718
0.0343
0.0041
0.0017
-0.1482
0.0323
0.1030
0.0507
0.7488
0.0786

-0.0804
0.0078
0.0346
0.0024
0.1585
0.0334
0.0031
0.0015
-0.1407
0.0294
0.1014
0.0430
0.7765
0.0717

RE_GAMMA

RE_NOR~L

-0.0878
0.0086
0.0388
0.0027
0.1667
0.0379
0.0019
0.0016
-0.1352
0.0309
0.1083
0.0495
0.7574
0.0755

-0.1145
0.0073
0.0409
0.0023
0.2084
0.0305
0.0027
0.0012
-0.1443
0.0265
0.0737
0.0345
0.2873
0.0642

FIXED

#1
lcoins
ndisease
female
age
lfam
child
_cons

-0.0112
0.0095
0.0877
0.1126
0.1060
0.0738

lnalpha
_cons

0.0251
0.0270

lnsig2u
_cons

0.0550
0.0255

Statistics
Pravin K. Trivedi

20186 (Prepared
20186
20186Stata Users
17791
N Indiana 20186
University. - Bloomington
Panel
counts
for 2010 Mexican
Group April
meeting,
29, 2010
based on48A./ Colin
77

Likelihood-based Panel Count Estimators

Moment based estimation of FE count panels

Predetrmined means regressor correlated with current and past


shoocks but not future shocks: E [uit xis ] = 0 for s t, but 6= 0 for
S < t.
Two specications are considered:
yit
yit

=
=

exp(xit0 )i wit
exp(xit0 )i + wit

A quasi-dierencing transformation is used to eliminate the xed


eect.
Then a moment condition is constructed for estimation.
Depending upon which specication is used dierent moment
conditions obtain.
Chamberlain and Wooldridge derive quasi-dierencing transformations
that are shown in the table below.
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on49A./ Colin
77

Likelihood-based Panel Count Estimators

Moment based estimation of FE count panels

21. Exponential Mean and Multiplicative Heterogeneity


Relies on a number of ways of eliminating the xed eects
Error may enter additively or multiplicatively
Estimating equations are orthogonality conditions after
quasi-dierencing which eliminates the xed eect
Model
Strict exog.
Predetermined
regressors

Moment spec.
E[xit uit +j ] = 0, j
E[xit uit s ] 6= 0, s

GMM

Chamberlain
Wooldridge

GMM/endog
Pravin K. Trivedi

Wooldridge

Estimating equations
0
1
it 1
yit 1 jxti 1 ) = 0
it
yit
yit 1 t 1
E
jx ) = 0
it
it 1 i
yit
yit 1 t 2
E
jx ) = 0
it
it 1 i

E yit

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on49A./ Colin
77

Likelihood-based Panel Count Estimators

Computational Strategies for GMM

Use an interactive version of an estimation command (e.g. gmm);


enter the function directly on the command line or dialog box by
using a substitutable expression.

Use a function evaluator program which gives more exibility in


dening your objective function; usually more complicated to use but
may be needed for more complicated problems.
Hint: In Stata a good place to start is the nl (nonlinear least squares)
command. Then go on to gmm.
Most of the examples here involve substitutable expressions.
Examples of function evaluator programs are in MUS and especially in
Stata manuals.
yi
T
Example: N
it
= 0,
i =1 t =1 xit yit
i

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on50A./ Colin
77

Likelihood-based Panel Count Estimators

Applications

22. Applications using balanced panel MEPS data

For illustrating panel methods the RAND data set has limitations

. sum

officevis T0officevis educ age income totchr

Variable

Obs

Mean

Min

Max

officevis
T0officevis
educ
age
income

78888
78888
78888
78888
78888

1.387372
1.488084
12.32776
4.562129
27.60833

3.328148
3.334559
3.264869
1.742034
28.94855

0
0
0
1.8
-63.631

94
58
17
8.5
264.674

totchr

78888

.7881047

1.081315

Pravin K. Trivedi

Std. Dev.

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on51A./ Colin
77

Likelihood-based Panel Count Estimators

Applications

MEPS Data
Quarterly data for 2005-06

. xtdes
dupersid: 30002019, 30004010, ..., 38505016
n =
timeindex: 1, 2, ..., 8
T =
Delta(timeindex) = 1 unit
Span(timeindex) = 8 periods
(dupersid*timeindex uniquely identifies each observation)
Distribution of T_i:
Freq.

min
8

Percent

Cum.

9861

100.00

100.00

9861

100.00

Pravin K. Trivedi

5%
8

25%
8

50%
8

75%
8

95%
8

9861
8

max
8

Pattern
11111111
XXXXXXXX

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on52A./ Colin
77

Likelihood-based Panel Count Estimators

Applications

Fixed Eects GMM in Stata 11


. program gmm_poi2
1.
version 11
2.
syntax varlist if, at(name) myrhs(varlist) ///
>
mylhs(varlist) myidvar(varlist)
3.
quietly {
4.
tempvar mu mubar ybar
5.
gen double `mu' = 0 `if'
6.
local j = 1
7.
foreach var of varlist `myrhs' {
8.
replace `mu' = `mu' + `var'*`at'[1,`j'] `if'
9.
local j = `j' + 1
10.
}
11.
replace `mu' = exp(`mu')
12.
egen double `mubar' = mean(`mu') `if', by(`myidvar')
13.
egen double `ybar' = mean(`mylhs') `if', by(`myidvar')
14.
replace `varlist' = `mylhs' - `mu'*`ybar'/`mubar' `if'
15.
}
16. end

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on53A./ Colin
77

Likelihood-based Panel Count Estimators

Applications

Implementing xed eects GMM in Stata 11


. gmm gmm_poi2, mylhs(officevis) myrhs(insprv age income totchr)
///
> myidvar(dupersid) nequations(1) parameters(insprv age income totchr)
> instruments(insprv age income totchr, noconstant) onestep
Step 1
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

GMM
GMM
GMM
GMM

criterion
criterion
criterion
criterion

Q(b)
Q(b)
Q(b)
Q(b)

=
=
=
=

///

.00140916
1.487e-07
1.583e-14
1.843e-28

GMM estimation
Number of parameters =
4
Number of moments
=
4
Initial weight matrix: Unadjusted

Coef.
/insprv
/age
/income
/totchr

-.0080549
-.5125841
.001128
.2211125

Robust
Std. Err.
.5460749
13.1682
.0013911
.3354182

Number of obs

z
-0.01
-0.04
0.81
0.66

P>|z|
0.988
0.969
0.417
0.510

78888

[95% Conf. Interval]


-1.078342
-26.32178
-.0015984
-.4362951

1.062232
25.29662
.0038545
.8785201

Instruments for equation 1: insprv age income totchr


. estimates store PFEGMM
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on54A./ Colin
77

Likelihood-based Panel Count Estimators

Applications

Standard xed eects panel Poisson


. * Usual panel Poisson FE
. xtpoisson officevis insprv age income totchr, fe
note: 1900 groups (15200 obs) dropped because of all zero outcomes
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

= -84468.435
= -84154.68
= -84154.647
= -84154.647

Conditional fixed-effects Poisson regression


Group variable: dupersid

Log likelihood

Coef.

insprv
age
income
totchr

-.0080549
-.5125841
.001128
.2211125

=
=

63688
7961

Obs per group: min =


avg =
max =

8
8.0
8

Wald chi2(4)
Prob > chi2

= -84154.647

officevis

Number of obs
Number of groups

Std. Err.
.027985
.0629145
.000258
.0091051

z
-0.29
-8.15
4.37
24.28

P>|z|
0.773
0.000
0.000
0.000

=
=

618.20
0.0000

[95% Conf. Interval]


-.0629046
-.6358943
.0006224
.2032669

.0467947
-.3892739
.0016336
.2389582

. estimates store PFE

Indiana
- Bloomington (Prepared
Panel counts
for 2010
Mexican
Stata
Users Group
April
meeting,
29,
based
on55A./ Colin
77
No dierence
in University.
point estimates
because
MLE
and
GMM
solve
the2010
same

Pravin K. Trivedi

Likelihood-based Panel Count Estimators

Applications

Standard FE Poisson with robust SE (with xtpqml add-on)


. * Add-on xtpqml gives panel robust se's
. xtpqml officevis insprv age income totchr, fe i(dupersid)
note: 1900 groups (15200 obs) dropped because of all zero outcomes
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

= -84468.435
= -84154.68
= -84154.647
= -84154.647

Conditional fixed-effects Poisson regression


Group variable: dupersid

Log likelihood

Coef.

insprv
age
income
totchr

-.0080549
-.5125841
.001128
.2211125

=
=

63688
7961

Obs per group: min =


avg =
max =

8
8.0
8

Wald chi2(4)
Prob > chi2

= -84154.647

officevis

Number of obs
Number of groups

Std. Err.
.027985
.0629145
.000258
.0091051

z
-0.29
-8.15
4.37
24.28

P>|z|
0.773
0.000
0.000
0.000

=
=

618.20
0.0000

[95% Conf. Interval]


-.0629046
-.6358943
.0006224
.2032669

.0467947
-.3892739
.0016336
.2389582

Calculating Robust Standard Errors...


officevis

Coef.

officevis
insprv
age
income
totchr

-.0080549
-.5125841
.001128
.2211125

Std. Err.
.0715881
.1804831
.0007661
.0250814

z
-0.11
-2.84
1.47
8.82

P>|z|
0.910
0.005
0.141
0.000

[95% Conf. Interval]


-.1483651
-.8663245
-.0003734
.1719539

.1322552
-.1588438
.0026295
.2702712

Wald
chi2(4) = Indiana
80.59
ProbMexican
> chi2 Stata
=
0.0000
Pravin
K. Trivedi
University. - Bloomington (Prepared
Panel counts
for 2010
Users Group April
meeting,
29, 2010
based on56A./ Colin
77

Likelihood-based Panel Count Estimators

Dynamic panel count models

23. Panel dynamic


Individual eects model allows for time series persistence via
unobserved heterogeneity (i )
I

e.g. High i means high doctor visits each period

Alternative time series persistence is via true state dependence (yt


I

1)

e.g. Many doctor visits last period lead to many this period.

Linear model:
yit = i + yi ,t

+ xit0 + uit .

Poisson model with exponnetial feedback: One possibility (designed


to confront the zero problem) is
it
yi ,t

Pravin K. Trivedi

= i it 1 = i exp(yi ,t
= min(c, yi ,t 1 ).

+ xit0 ),

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on57A./ Colin
77

Likelihood-based Panel Count Estimators

Dynamic panel count models

Panel dynamic: GMM estimation of FE model


In xed eects case Poisson FE estimator is now inconsistent.
Instead assume weak exogeneity
E [yit jyit

1 , ..., yi 1 , xit,..., xi 1 ]

= i it

1.

And use an alternative quasi-dierence


E [(yit

(it

1 /it )yit 1 ) jyit 1 , ..., yi 1 , xit,..., xi 1 ]

= 0.

So MM or GMM based on
E zit
where e.g. zit = (yit

yit

1 , xit )

it 1
yit
it

=0

in just-identied case.

Windmeijer (2008) has recent discussion.


Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on58A./ Colin
77

Likelihood-based Panel Count Estimators

Dynamic panel count models

Example of dynamic moment-based JI GMM


Ignore individual specic eects
. gmm (officevis - exp({xb:L.officevis insprv educ age income totchr}+{b0})),
///
> instruments(L.officevis insprv educ age income totchr) onestep vce(cluster dupersid)
Step 1
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:
5:
6:

GMM
GMM
GMM
GMM
GMM
GMM
GMM

criterion
criterion
criterion
criterion
criterion
criterion
criterion

Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)

=
=
=
=
=
=
=

4.9539327
4.7296297
1.4832673
.01045573
6.508e-06
3.032e-12
7.264e-25

GMM estimation
Number of parameters =
7
Number of moments
=
7
Initial weight matrix: Unadjusted

Number of obs

69027

(Std. Err. adjusted for 9861 clusters in dupersid)


Coef.

/xb_L_offi~s
.064072
.2152153
/xb_insprv
/xb_educ
.0404143
/xb_age
.1221278
-.0003585
/xb_income
/xb_totchr
.3027348
/b0
-1.447292
Pravin K. Trivedi
Indiana University. -

Robust
Std. Err.

P>|z|

[95% Conf. Interval]

.0041069
15.60
0.000
.0560228
.0721213
.0331676
6.49
0.000
.1502079
.2802227
.0065808
6.14
0.000
.0275162
.0533124
.0134542
9.08
0.000
.0957581
.1484976
.0004981
-0.72
0.472
-.0013347
.0006178
.0141805
21.35
0.000
.2749415
.330528
.0952543
-15.19
0.000
-1.633987
-1.260597
Bloomington (Prepared
Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on59A./ Colin
77

Likelihood-based Panel Count Estimators

Dynamic panel count models

Example of dynamic moment-based OI GMM

. gmm (officevis - exp({xb:L.officevis insprv educ age income totchr}+{b0})),


///
> instruments(L.officevis educ age income totchr female white hispanic married employed) /
> onestep vce(cluster dupersid)
Step 1
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:
5:
6:

GMM
GMM
GMM
GMM
GMM
GMM
GMM

criterion
criterion
criterion
criterion
criterion
criterion
criterion

Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)

=
=
=
=
=
=
=

4.9696148
3.7545442
.86353039
.25844389
.07248002
.07235453
.07235443

GMM estimation
Number of parameters =
7
Number of moments
= 11
Initial weight matrix: Unadjusted

Number of obs

69027

(Std. Err. adjusted for 9861 clusters in dupersid)


Coef.
/xb_L_offi~s
/xb_insprv
/xb_educ
/xb_age
/xb_income
/xb_totchr
/b0

.0631186
.0468067
.0422612
.1208516
.0004412
.2988192
-1.361726

Robust
Std. Err.
.0042901
.1154105
.0074362
.0136986
.0007107
.0144326
.0972536

z
14.71
0.41
5.68
8.82
0.62
20.70
-14.00

P>|z|
0.000
0.685
0.000
0.000
0.535
0.000
0.000

[95% Conf. Interval]


.0547101
-.1793937
.0276866
.0940028
-.0009518
.2705318
-1.55234

.071527
.273007
.0568359
.1477003
.0018341
.3271066
-1.171113

Instruments
for
equation
L.officevis
educcounts
age
income
totchr
female
marrie
Pravin
K. Trivedi
Indiana
University.1:- Bloomington
(Prepared
Panel
for
2010
Mexican
Stata Users
Groupwhite
April
meeting,
29,hispanic
2010
based on60
A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

24. Poisson Extensions


A dierent ML approach to dynamic specication

P (it ), i = 1, ..., N; t = 1, ..., T

yi ,t
f (yi ,t jit ) =
it

it yit
it

yit !
= it it = E[yit jyi ,t

1 , xit , i ]

= g (yi ,t

1 , xit , i )

Initial conditions problem in dynamic model. In a short panel bias


induced by neglect of dependence on initial condition.
The lagged dependent variable on the right hand side a source of bias
because the lagged dependent variable and individual-specic eect
are correlated.
Wooldridges method (2005) integrates out the individual-specic
random eect after conditioning on the initial value and covariates.
Random eect model used to accommodate the initial conditions.
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on61A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Alternative specications

E[yit jxit , yit

1 , i ]

= h(yit , xit , i )

where i is the individual-specic eect.


1st alternative: Autoregressive dependence through the exponential
mean.
E[yit jxit , yit 1 , i ] = exp(yit 1 + xit0 + i )
If the i are uncorrelated with the regressors, and further if
parametric assumptions are to be avoided, then this model can be
estimated using either the nonlinear least squares or pooled Poisson
MLE. In either case it is desirable to use the robust variance formula.
Limitation: Potentially explosive if large values of yit are realized.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on62A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Initial conditions
Dynamic panel model requires additional assumptions about the
relationship between the initial observations ("initial conditions") y0
and the i .
Eect of initial value on the future events is important in a short
panel. The initial-value eect might be a part of individual-specic
eect
Wooldridges method requires a specication of the conditional
distribution of i given y0 and zi , with the latter entering separably.
Under the assumption that the initial conditions are nonrandom, the
standard random eects conditional maximum likelihood approach
identies the parameters of interest.
For a class of nonlinear dynamic panel models, including the Poisson
model, Wooldridge (2005) analyzes this model which conditions the
joint distribution on the initial conditions.
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on63A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Conditionally correlated RE (1)


Where parametric FE models are not feasible, the conditionally
correlated random (CCR) eects model (Mundlak (1978) and
Chamberlain (1984)) provides a compromise between FE and RE
models.
Standard RE panel model assumes that i and xit are uncorrelated.
Making i a function of xi 1 , ..., xiT allows for possible correlation:
i = zi0 + i
Mundlaks (more parsimonious) method allows the individual-specic
eect to be determined by time averages of covariates, denoted zi ;
Chamberlains method suggests a richer model with a weighted sum
of the covariates for the random eect.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on64A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Conditionally correlated RE (2)


We can further allow for initial condition eect by including y0 thus:
i = y00 + zi0 + i
where y0 is a vector of initial conditions, zi =xi denotes the
time-average of the exogenous variables and i may be interpreted as
unobserved heterogeneity.
The formulation essentially introduces no additional problems though
the averages change when new data are added. Estimation and
inference in the pooled Poisson or NLS model can proceed as before.
Formulation can also be used when no dynamics are present in the
model. In this case i can be integrated out using a distributional
assumption about f ().

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on65A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Dynamic GMM without initial condition


Here individual specic eect is captured by the initial condition
. gmm (officevis - exp({xb:L.officevis insprv educ age income totchr}+{b0})),
///
> instruments(L.officevis insprv educ age income totchr) onestep vce(cluster dupersid)
Step 1
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:
5:
6:

GMM
GMM
GMM
GMM
GMM
GMM
GMM

criterion
criterion
criterion
criterion
criterion
criterion
criterion

Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)

=
=
=
=
=
=
=

4.9539327
4.7296297
1.4832673
.01045573
6.508e-06
3.032e-12
7.264e-25

GMM estimation
Number of parameters =
7
Number of moments
=
7
Initial weight matrix: Unadjusted

Number of obs

69027

(Std. Err. adjusted for 9861 clusters in dupersid)


Coef.
/xb_L_offi~s
.064072
.2152153
/xb_insprv
/xb_educ
.0404143
/xb_age
.1221278
-.0003585
/xb_income
/xb_totchr
.3027348
-1.447292
Pravin K. Trivedi /b0 Indiana
University. -

Robust
Std. Err.

P>|z|

[95% Conf. Interval]

.0041069
15.60
0.000
.0560228
.0721213
.0331676
6.49
0.000
.1502079
.2802227
.0065808
6.14
0.000
.0275162
.0533124
.0134542
9.08
0.000
.0957581
.1484976
.0004981
-0.72
0.472
-.0013347
.0006178
.0141805
21.35
0.000
.2749415
.330528
.0952543
-15.19
0.000
-1.633987
-1.260597
Bloomington (Prepared
Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on66A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Overidentied dynamic GMM with initial condition

. gmm (officevis - exp({xb:L.officevis insprv educ age income totchr}+{b0})),


///
> instruments(L.officevis educ age income totchr female white hispanic married empl
> onestep vce(cluster dupersid)
Step 1
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:
5:
6:

GMM
GMM
GMM
GMM
GMM
GMM
GMM

criterion
criterion
criterion
criterion
criterion
criterion
criterion

Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)
Q(b)

=
=
=
=
=
=
=

4.9696148
3.7545442
.86353039
.25844389
.07248002
.07235453
.07235443

GMM estimation
Number of parameters =
7
Number of moments
= 11
Initial weight matrix: Unadjusted

Number of obs

69027

(Std. Err. adjusted for 9861 clusters in dupersid)


Coef.
/xb_L_offi~s
/xb_insprv
/xb_educ
/xb_age
/xb_income
/xb_totchr

Pravin K. Trivedi

.0631186
.0468067
.0422612
.1208516
.0004412
.2988192

Robust
Std. Err.
.0042901
.1154105
.0074362
.0136986
.0007107
.0144326

z
14.71
0.41
5.68
8.82
0.62
20.70

P>|z|
0.000
0.685
0.000
0.000
0.535
0.000

[95% Conf. Interval]


.0547101
-.1793937
.0276866
.0940028
-.0009518
.2705318

.071527
.273007
.0568359
.1477003
.0018341
.3271066

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on67A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Dynamic Just Identied GMM with Initial Conditions

. gmm (officevis - exp({xb:L.officevis T0officevis insprv educ age income totchr}+{b


> instruments(L.officevis T0officevis insprv educ age income totchr) onestep vce(cl
Final GMM criterion Q(b) =

6.30e-26

GMM estimation
Number of parameters =
8
Number of moments
=
8
Initial weight matrix: Unadjusted

Number of obs

69027

(Std. Err. adjusted for 9861 clusters in dupersid)


Coef.
/xb_L_offi~s
/xb_T0offi~s
/xb_insprv
/xb_educ
/xb_age
/xb_income
/xb_totchr
/b0

.0495929
.0311947
.2153361
.0382539
.1303702
-.0003019
.2847798
-1.484486

Robust
Std. Err.
.0044248
.0043446
.0351702
.0056386
.0095834
.0004701
.010334
.0605323

z
11.21
7.18
6.12
6.78
13.60
-0.64
27.56
-24.52

P>|z|
0.000
0.000
0.000
0.000
0.000
0.521
0.000
0.000

[95% Conf. Interval]


.0409204
.0226794
.1464038
.0272024
.111587
-.0012232
.2645256
-1.603127

.0582654
.0397099
.2842684
.0493054
.1491534
.0006194
.3050341
-1.365845

Instruments for equation 1: L.officevis T0officevis insprv educ age income totchr _c

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on68A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Dynamic Over Identied GMM with Initial Condition

. gmm (officevis - exp({xb:L.officevis T0officevis insprv educ age income totchr}+{b


> instruments(L.officevis T0officevis educ age income totchr female white hispanic
> onestep vce(cluster dupersid) nolog
Final GMM criterion Q(b) =

.0685762

GMM estimation
Number of parameters =
8
Number of moments
= 12
Initial weight matrix: Unadjusted

Number of obs

69027

(Std. Err. adjusted for 9861 clusters in dupersid)


Coef.
/xb_L_offi~s
/xb_T0offi~s
/xb_insprv
/xb_educ
/xb_age
/xb_income
/xb_totchr
/b0

.0490201
.0305356
.0565968
.0402952
.1299791
.0004368
.2805608
-1.408679

Robust
Std. Err.
.0046062
.0044538
.1135886
.0059253
.0098075
.000703
.0101571
.0607941

z
10.64
6.86
0.50
6.80
13.25
0.62
27.62
-23.17

P>|z|
0.000
0.000
0.618
0.000
0.000
0.534
0.000
0.000

[95% Conf. Interval]


.039992
.0218063
-.1660328
.0286819
.1107567
-.0009411
.2606532
-1.527833

.0580481
.0392648
.2792264
.0519085
.1492014
.0018148
.3004684
-1.289525

Instruments for equation 1: L.officevis T0officevis educ age income totchr female wh
married employed _cons

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on69A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Alternative to EFM: LFM

An alternative to the (potentially explosive) EF is the linear


feedback model
E[yit jxit , yit

1 , i ]

= yit

+ exp(xit0 + i )

Limitation: Discontinuities avoided but model falls outside the


standard exponential class of models.
MLE not feasible, but QML/NLS/GMM is feasible.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on70A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

25. Linear feedback model


it

= yit
= yit

0
0
+ exp(x1it
1 + + x2it
2 + 1 yi 0 + zi0 2 + wi )
0
0
0
1 + exp(wi ) exp(x1it 1 + + x2it 2 + 1 yi 0 + zi 2 + wi )
1

MLE not feasible because the functional form is no longer belongs in


the exponential family.
GMM which uses dierencing transformations will eliminate initial
values and correlated heterogeneity.
NLS method for estimation can identify the conditional mean
function under certain conditions.
1
f,,g 2NT
min

(yit

it )2

To allow for a RE type extension should use a robust estimator of the


covariance matrix.
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on71A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Example: EFM vs LFM

. * Linear Feedback Model with Initial Condition Control


. gmm (officevis - {rho}*L.officevis - exp({xb: T0officevis insprv educ age income t
> instruments(L.officevis T0officevis insprv educ age income totchr) onestep vce(cl
Final GMM criterion Q(b) =

7.35e-23

GMM estimation
Number of parameters =
8
Number of moments
=
8
Initial weight matrix: Unadjusted

Number of obs

69027

(Std. Err. adjusted for 9861 clusters in dupersid)


Coef.
/rho
/xb_T0offi~s
/xb_insprv
/xb_educ
/xb_age
/xb_income
/xb_totchr
/b0

.5366234
.0672159
.1509578
.0375916
.1234875
-.0002804
.3270725
-2.187085

Robust
Std. Err.
.0248079
.0038061
.0408185
.0062318
.0119579
.0006164
.0154421
.096687

z
21.63
17.66
3.70
6.03
10.33
-0.45
21.18
-22.62

P>|z|
0.000
0.000
0.000
0.000
0.000
0.649
0.000
0.000

[95% Conf. Interval]


.4880008
.0597561
.0709551
.0253774
.1000504
-.0014885
.2968066
-2.376588

.585246
.0746758
.2309606
.0498058
.1469245
.0009277
.3573383
-1.997582

Instruments for equation 1: L.officevis T0officevis insprv educ age income totchr _c
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on72A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

More on LFM vs. EFM

Sensitivity to omitted y0 and z varies between LFM and EFM


Monte Carlo analysis suggests omission leads to biases especially in
the coe cient of lagged variable.
EFM is preferred on predictive performance when the proportion of
zeros is high.
LFM does better when the mean of y is high and proportion of zeros
small.
NLS turns out to be a robust estimator for the LFM. Should be
considered as a serious alternative for count panel models under
certain conditions.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on73A./ Colin
77

Likelihood-based Panel Count Estimators

Correlated RE and Initial Conditions

Concluding Remarks

Much progress in estimating panel count models, especially in dealing


with endogeneity and nonseprable heterogeneity.
Great progress in variance estimation.
RE models pose fewer problems.
For FE models moment-based/IV methods seem more tractable for
handling endogeneity and dynamics. Statas new suite of GMM
commands are very helpful in this regard.
Because FE models do not currently handle important cases, and
have other limitations, CCR panel model with initial conditions, is an
attractive alternative, at least for balanced panels.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on74A./ Colin
77

References

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on75A./ Colin
77

References

For Count Data magic, if you dont have the Thundercloud,

use

instead!

References
Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on75A./ Colin
77

References

Hausman, J.A., B.H. Hall and Z. Griliches (1984), Econometric Models


for Count Data With an Application to the Patents-R and D
Relationship, Econometrica, 52, 909-938.
Chamberlain, G. (1984). Panel Data. In Handbook of Econometrics,
Volume II, ed. by Z. Griliches and M. Intriligator, 1247-1318. Amsterdam:
North-Holland.
Wooldridge, J. (2005). Simple solutions to the initial conditions problem
in dynamic, nonlinear panel data models with unobserved heterogeneity.
Journal of Applied Econometrics, 20, 39-54.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on76A./ Colin
77

References

Chernozhukov, V; Fernandez-Val, I.; Kowalski, AE (2009) Censored


Quantile Instrumental Variable Estimation via
Control Functions. Discussion paper.
Koenker, R. (2004) Quantile regression for longitudinal data. Journal of
Multivariate Analysis 91, 74 89
Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section
Data. Econometrica, 46, 69-85.
Stata Release 11 Manuals
Windmeijer, F.A.G. (2008), GMM for Panel Count Data Models, ch.18
in L. Matyas and P. Sivestre eds., The Econometrics of Panel Data,
Springer.

Pravin K. Trivedi

Indiana University. - Bloomington (Prepared


Panel counts
for 2010 Mexican Stata Users Group April
meeting,
29, 2010
based on77A./ Colin
77