Professional Documents
Culture Documents
Introduction
• Panel data regression models – that is, models that study the same group of
entities (individuals, firms, states, countries, and the like) over time.
• The importance of panel data:
• 1. Since panel data deals with individuals, firms, states, countries and so on
over time, there is bound to be heterogeneity in these units, which may be
often unobservable.
• The panel data estimation techniques can take such heterogeneity explicitly
into account by allowing for subject-specific variables such as individuals, firms
or states.
• 2. By combining time series of cross-sectional observations, panel data gives
“more informative data, more variability, less collinearity among variables,
more degrees of freedom and more efficiency”.
• 3. By studying the repeated cross-sections of observations, panel data are
better suited to study the dynamics of change. Spells of unemployment, job
turnover, duration of unemployment, and labor mobility are better studied with
panel data.
• 4. Panel data can better detect and measure effects that cannot be observed in
pure cross-sectional or time series data.
• Thus, the effects of minimum wage laws on employment and earnings can be
better studied if we follow successive waves of increases in state minimum
wages.
• 5. Phenomena such as economies of scale and technological change can be
better studied by panel data than by pure cross-sectional or pure time series
data.
Example
• Consider the gives data on charitable giving by 47 individuals over the period
1979–1988.
• The variables are defined as follows:
• Charity: The sum of cash and other property contributions, excluding carry-
overs from previous years
• Income: Adjusted gross income
• Price: One minus the marginal income tax rate; marginal tax rate is defined on
income prior to contributions
• Age: A dummy variable equal to 1 if the taxpayer is over 64, and 0 otherwise
• MS: A dummy variable equal to 1 if the taxpayer is married, 0 otherwise
• DEPS: Number of dependents claimed on the tax return
• One of the goals of this study was to find out the effect, if any, of the marginal
tax rate on charitable giving.
• Before we proceed to the analysis
• the panel data in this example is called a balanced panel because the number of
time observations (10) is the same for each individual.
• If that were not the case, it would be an example of an unbalanced panel.
• The data here are also called a short panel. In a short panel the number of
cross-sectional or individual units N (here 47) is greater than the number of
time periods, T (here 10). In a long panel, on the other hand, T is greater than N.
• we want to estimate a model of charity giving in relation to the variables listed
above. Call it the charity function. How do we proceed? We have five options:
• Option 1. Individual time series of charity functions:
• We can estimate by OLS 47 time series charity functions, one for each individual
using the data for 10 years.
• Although in principle we can estimate these functions, we will have very few
degrees of freedom to do meaningful statistical analysis.
• This is because we have to estimate six coefficients in all, five for the five
explanatory variables and one for the intercept.
• Besides, these individual charity functions neglect the information about the
other individuals’ charity contributions because they all operate in the same
regulatory environment.
• Option 2: Cross-sectional charity functions:
• We can estimate by OLS 10 cross-sectional charity functions, one for each year.
There will be 47 observations per year to estimate such functions.
• But again, we neglect the dynamic aspect of charitable giving, for the charitable
contributions made by individuals over the years will depend on factors like
income and marital status.
• Option 3. Pooled OLS charity function:
• We can pool all 470 observations (47 × 10) and estimate a “grand” charity
function, neglecting the dual nature of time series and cross-sectional data.
• Not only would we be neglecting this if we were to run a pooled model, but such
a pooling assumes that the coefficients of the charity function remain constant
across time and cross-section.
• The pooled OLS estimation is also known as the constant coefficient model, for
we are assuming that coefficients across time and cross-section remain the
same.
• Option 4. Fixed effects least-squares dummy variable (LSDV) model:
• As in Option 3, we pool all 470 observations, but allow each individual to have
his or her individual intercept dummy. A variant of this is the within estimator
• Instead of allowing each individual to have their own (fixed) intercept value as
in LSDV, we assume that the intercept values of the 47 individuals are random
drawings from a much larger population of individuals.
Pooled OLS regression of charity function
• Consider the following charity function:
• -------(17.1)
• ----- (17.2)
• Notice that we have added the subscript i to the intercept to indicate that the
intercept of the 47 individuals may be different.
• The difference may be due special features of each individual, such as education
or religion.
• Equation (17.2) is known as the fixed effects regression model (FEM). The term
“fixed effects” is due to the fact that each taxpayer’s intercept, although
different from the intercepts of the other taxpayers, does not vary over time,
that is, it is time-invariant.
• If we were to write the intercept as B1it , the intercept of each taxpayer would
be time-variant. But note that in Eq. (17.2) we assume that the slope
coefficients are time-invariant.
• But how do we make Eq. (17.2) operational? This can be done easily by
introducing differential intercept dummies, which we first discussed in Chapter
3 on dummy variables.
• Specifically, we modify Eq. (17.1) as follows:
• In the random effects model it is assumed that B1i is a random variable with a mean value
of B1 (no i subscript here) and the intercept of any cross-section unit is expressed as:
• ------- (17.7)
• In terms of our illustrative example, this means that the 47 individuals included in our
sample are a drawing from a much larger universe of such individuals and that they have a
common mean value for the intercept (= B1). Differences in the individual
• values of the intercept for each individual donor to charity are reflected in the
error term . Therefore, we can write the charity function (17.1) as:
• ----(17.8)
• Where
• ----------------- (17.9)
• The composite error term wit has two components: which is the cross-
section or individual-specific error component and
• which is the combined time series and cross-section error component
• Now you can see why the REM model is also called an error components model
(ECM): the composite error term consists of two (or more) error components.
• The usual assumptions of ECM are that
• ------ (17.10)
• That is, individual error components are not correlated with each other and are
not autocorrelated across both cross-section and time series units.
• It is also critical to note that wit is not correlated with any of the explanatory
variables included in the model.
• Since .i is a part of wit , it is possible that the latter is correlated with one or more
regressors.
• If that turns out to be the case, the REM will result in inconsistent estimation of
the regression coefficients
• The Hausman test will show in a given application if wit is correlated with the
regressors – that is, whether REM is the appropriate model.
• As a result of the assumptions in Eq. (17.10), it follows that
• ------(17.11)
• ------(17.12)
• Now if = 0, there no difference between Eq. (17.1) and Eq. (17.8), in which case we
can simply pool all the observations and run the pooled regression, as in Table 17.2.
• This is so because in this situation there are either no subject-specific effects or they
have all been accounted for by the explanatory variables.
• Although Eq. (17.12) shows that the composite error term is homoscedastic, it
can be shown that are correlated – that is, the error terms
of a given cross-sectional unit at two different times are correlated.
• -------- (17.13)
• Two points about this correlation should be noted. First, for any cross-sectional
unit remains the same no matter how far apart the two time periods are; and
secondly, remains the same for all cross-sectional units.
• If we do not take into account , the OLS estimators of random effects model are
inefficient.
• Therefore we will have to use the method of generalized least squares (GLS) to
obtain efficient estimates
• In contrast to the fixed effects model (dummy variable, within or first-difference
version), in REM we can include time-invariant variables, such as gender,
geographic location or religion. They do not get washed out as in the FEM
model.
• As in the FEM, the estimated coefficients have the expected signs, although
DEPS and MS are individually statistically insignificant. From the effects
specification box, we see that