You are on page 1of 40

Panel data regression models

Introduction
• Panel data regression models – that is, models that study the same group of
entities (individuals, firms, states, countries, and the like) over time.
• The importance of panel data:
• 1. Since panel data deals with individuals, firms, states, countries and so on
over time, there is bound to be heterogeneity in these units, which may be
often unobservable.
• The panel data estimation techniques can take such heterogeneity explicitly
into account by allowing for subject-specific variables such as individuals, firms
or states.
• 2. By combining time series of cross-sectional observations, panel data gives
“more informative data, more variability, less collinearity among variables,
more degrees of freedom and more efficiency”.
• 3. By studying the repeated cross-sections of observations, panel data are
better suited to study the dynamics of change. Spells of unemployment, job
turnover, duration of unemployment, and labor mobility are better studied with
panel data.
• 4. Panel data can better detect and measure effects that cannot be observed in
pure cross-sectional or time series data.
• Thus, the effects of minimum wage laws on employment and earnings can be
better studied if we follow successive waves of increases in state minimum
wages.
• 5. Phenomena such as economies of scale and technological change can be
better studied by panel data than by pure cross-sectional or pure time series
data.
Example
• Consider the gives data on charitable giving by 47 individuals over the period
1979–1988.
• The variables are defined as follows:
• Charity: The sum of cash and other property contributions, excluding carry-
overs from previous years
• Income: Adjusted gross income
• Price: One minus the marginal income tax rate; marginal tax rate is defined on
income prior to contributions
• Age: A dummy variable equal to 1 if the taxpayer is over 64, and 0 otherwise
• MS: A dummy variable equal to 1 if the taxpayer is married, 0 otherwise
• DEPS: Number of dependents claimed on the tax return
• One of the goals of this study was to find out the effect, if any, of the marginal
tax rate on charitable giving.
• Before we proceed to the analysis
• the panel data in this example is called a balanced panel because the number of
time observations (10) is the same for each individual.
• If that were not the case, it would be an example of an unbalanced panel.
• The data here are also called a short panel. In a short panel the number of
cross-sectional or individual units N (here 47) is greater than the number of
time periods, T (here 10). In a long panel, on the other hand, T is greater than N.
• we want to estimate a model of charity giving in relation to the variables listed
above. Call it the charity function. How do we proceed? We have five options:
• Option 1. Individual time series of charity functions:
• We can estimate by OLS 47 time series charity functions, one for each individual
using the data for 10 years.
• Although in principle we can estimate these functions, we will have very few
degrees of freedom to do meaningful statistical analysis.
• This is because we have to estimate six coefficients in all, five for the five
explanatory variables and one for the intercept.
• Besides, these individual charity functions neglect the information about the
other individuals’ charity contributions because they all operate in the same
regulatory environment.
• Option 2: Cross-sectional charity functions:
• We can estimate by OLS 10 cross-sectional charity functions, one for each year.
There will be 47 observations per year to estimate such functions.
• But again, we neglect the dynamic aspect of charitable giving, for the charitable
contributions made by individuals over the years will depend on factors like
income and marital status.
• Option 3. Pooled OLS charity function:

• We can pool all 470 observations (47 × 10) and estimate a “grand” charity
function, neglecting the dual nature of time series and cross-sectional data.
• Not only would we be neglecting this if we were to run a pooled model, but such
a pooling assumes that the coefficients of the charity function remain constant
across time and cross-section.
• The pooled OLS estimation is also known as the constant coefficient model, for
we are assuming that coefficients across time and cross-section remain the
same.
• Option 4. Fixed effects least-squares dummy variable (LSDV) model:

• As in Option 3, we pool all 470 observations, but allow each individual to have
his or her individual intercept dummy. A variant of this is the within estimator

• Option 5. The random effects model:

• Instead of allowing each individual to have their own (fixed) intercept value as
in LSDV, we assume that the intercept values of the 47 individuals are random
drawings from a much larger population of individuals.
Pooled OLS regression of charity function
• Consider the following charity function:

• -------(17.1)

• where C is charitable contribution. Notice that we have put two subscripts on


the variables: i, representing the cross-section unit, and t, the time.
• It is assumed that the regressors are nonstochastic, or if stochastic, are
uncorrelated with the error term. It is also assumed that the error term
satisfies the usual classical assumptions.
Continue…
• A priori, we would expect age, income, price, and marital status to have a
positive impact on charitable giving and the number of dependents to have a
negative impact.
• The reason the price variable, as defined, is included in the model is that it
represents the opportunity cost of giving charitable contributions – the higher
the marginal tax, the lower the opportunity cost.
• Assuming that pooling of the data is valid (a big assumption), the results show
that Age, Income, and Price have significant positive impact on charitable
donation, and MS has negative but statistically insignificant effect on charitable
contributions.
Continue…
• Surprisingly, DEPS has a positive and significant impact on charitable giving. The
low Durbin–Watson in the present instance is probably more an indication of
specification error than spatial or serial correlation.
• The possibility that the model is misspecified stems from the fact that by lumping
together different individuals at different times, we camouflage the heterogeneity
(individuality or uniqueness) that may exist among the 47 individuals.
• Perhaps the uniqueness of each individual is subsumed in the composite error
term, uit. As a result, it is quite possible that the error term is correlated with some
of the regressors included in the model.
• If that is indeed the case, the estimated coefficients in Table 17.2 may be biased as
well as inconsistent.
The fixed effects least squares dummy
variable (LSDV) model
• One way we can take into account the heterogeneity that may exist among 47
individuals is to allow each individual to have his or her own intercept, as in the
following equation:

• ----- (17.2)
• Notice that we have added the subscript i to the intercept to indicate that the
intercept of the 47 individuals may be different.
• The difference may be due special features of each individual, such as education
or religion.
• Equation (17.2) is known as the fixed effects regression model (FEM). The term
“fixed effects” is due to the fact that each taxpayer’s intercept, although
different from the intercepts of the other taxpayers, does not vary over time,
that is, it is time-invariant.
• If we were to write the intercept as B1it , the intercept of each taxpayer would
be time-variant. But note that in Eq. (17.2) we assume that the slope
coefficients are time-invariant.
• But how do we make Eq. (17.2) operational? This can be done easily by
introducing differential intercept dummies, which we first discussed in Chapter
3 on dummy variables.
• Specifically, we modify Eq. (17.1) as follows:

• Where D2i=1 for individual 2, 0 otherwise; D3i=1for individual 3, 0 otherwise;


and so on.
• It is important to note that we have used only 46 dummies to represent 47
individuals to avoid the dummy variable trap (perfect collinearity).
• In this case the 46 dummies will represent the differential intercept dummy
coefficients – that is, they will show by how much the intercept coefficient of
the individual that is assigned a dummy variable will differ from the benchmark
category.
• We are treating the first individual as the benchmark or reference category,
although any individual can be chosen for that purpose.
• the table does not produce the values of the individual differential intercept
coefficients, although they are taken into account in estimating the model.
However, the differential intercept coefficients can be easily obtained
• Secondly, if you compare the OLS pooled regression results with the FEM results,
you will see substantial differences between the two, not only in the values of the
coefficients, but also in their signs.
• These results, therefore, cast doubt on the pooled OLS estimates. If you examine
the individual differential intercept dummies, you will find that several of them
are statistically highly significant (see Exercise 17.1), suggesting that the pooled
estimates hide the heterogeneity among the 47 charitable donors.
• We can provide a test to find out if the fixed effects model is better than the OLS
pooled model given in Table 17.2. Since the pooled model neglects the
heterogeneity effects that are explicitly taken into account in the fixed effects
model, the pooled model is a restricted version of the fixed effects model.
• F is highly significant, confirming that the fixed effects model is superior to the
pooled regression model.
• Before proceeding further, some features of the fixed effects model are worth
noting. First, the model (17.3) is known as a one-way fixed effects model, for we
have allowed the intercepts to differ among cross-sections (the 47 individuals),
but not over time.
• We can introduce nine time dummies to represent 10 years (again to avoid the
dummy variable trap) along with the 46 cross-section dummies.
• In that case the model that emerges is called a two-way fixed effects model.
• Of course, if we add these time dummies, in all we have to estimate 46 cross-
section dummies, nine time dummies, the common intercept and five slope
coefficients of the five regressors: in all, a total of 61 coefficients.
• Although we have 470 observations, we will lose 61 degrees of freedom.
• We have assumed that the slope coefficients of the charity function remain the
same. But it is quite possible that these slope coefficients may be different for
all 47 individuals.
• To allow for this possibility, we can introduce differential slope coefficients,
multiplying the five slope coefficients by 46 differential intercept dummies,
which will consume another 230 degrees of freedom.
• Nothing prevents us from interacting the 10 time dummies with the five
explanatory variables, which will consume another 50 degrees of freedom.
Ultimately, we will be left with very few degrees of freedom to do meaningful
statistical analysis.
• Limitations of the fixed effects LSDV model
• 1 Every additional dummy variable will cost an additional degree of freedom.
Therefore, if the sample is not very large, introducing too many dummies will
leave few observations to do meaningful statistical analysis.
• 2 Too many additive and multiplicative dummies may lead to the possibility of
multicollinearity, which make precise estimation of one or more parameters
difficult.
• 3. To obtain estimates with desirable statistical properties, we need to pay
careful attention to the error term uit. The statistical results are based on the
assumption that the error term follows the classical assumptions.
• Since the index i refers to cross-sectional observation and t to time series
observations, the classical assumption regarding uit may have to be modified.
There are several possibilities:
• (a) We can assume that the error variance is the same for all cross-sectional
units or we can assume that the error variance is heteroscedastic.
• (b) For each subject, we can assume that there is no autocorrelation over time
or we can assume autocorrelation of the AR (1) type.
• (c) At any given time, we can allow the error term of individual #1 to be
noncorrelated with the error term for say, individual #2, or we can assume that
there is such correlation
The random effects model (REM) or error
components model (ECM)
• In the fixed effects model it is assumed that the individual specific coefficient B1i is fixed
for each subject, that is, it is time-invariant.

• In the random effects model it is assumed that B1i is a random variable with a mean value
of B1 (no i subscript here) and the intercept of any cross-section unit is expressed as:

• ------- (17.7)

• where is a random error term with mean 0 and variance

• In terms of our illustrative example, this means that the 47 individuals included in our
sample are a drawing from a much larger universe of such individuals and that they have a
common mean value for the intercept (= B1). Differences in the individual
• values of the intercept for each individual donor to charity are reflected in the
error term . Therefore, we can write the charity function (17.1) as:

• ----(17.8)
• Where
• ----------------- (17.9)
• The composite error term wit has two components: which is the cross-
section or individual-specific error component and
• which is the combined time series and cross-section error component
• Now you can see why the REM model is also called an error components model
(ECM): the composite error term consists of two (or more) error components.
• The usual assumptions of ECM are that

• ------ (17.10)
• That is, individual error components are not correlated with each other and are
not autocorrelated across both cross-section and time series units.
• It is also critical to note that wit is not correlated with any of the explanatory
variables included in the model.
• Since .i is a part of wit , it is possible that the latter is correlated with one or more
regressors.
• If that turns out to be the case, the REM will result in inconsistent estimation of
the regression coefficients
• The Hausman test will show in a given application if wit is correlated with the
regressors – that is, whether REM is the appropriate model.
• As a result of the assumptions in Eq. (17.10), it follows that

• ------(17.11)
• ------(17.12)

• Now if = 0, there no difference between Eq. (17.1) and Eq. (17.8), in which case we
can simply pool all the observations and run the pooled regression, as in Table 17.2.
• This is so because in this situation there are either no subject-specific effects or they
have all been accounted for by the explanatory variables.
• Although Eq. (17.12) shows that the composite error term is homoscedastic, it
can be shown that are correlated – that is, the error terms
of a given cross-sectional unit at two different times are correlated.

• -------- (17.13)
• Two points about this correlation should be noted. First, for any cross-sectional
unit remains the same no matter how far apart the two time periods are; and
secondly, remains the same for all cross-sectional units.
• If we do not take into account , the OLS estimators of random effects model are
inefficient.
• Therefore we will have to use the method of generalized least squares (GLS) to
obtain efficient estimates
• In contrast to the fixed effects model (dummy variable, within or first-difference
version), in REM we can include time-invariant variables, such as gender,
geographic location or religion. They do not get washed out as in the FEM
model.
• As in the FEM, the estimated coefficients have the expected signs, although
DEPS and MS are individually statistically insignificant. From the effects
specification box, we see that

• Then from Eq. (17.13), we obtain , which gives the


extent of correlation of a cross-sectional unit at two different time periods, and
this correlation stays the same across all cross-sectional units.
• This rho value differs slightly from the one shown in Table 17.6 due to rounding
error.
Fixed effects model vs. random effects model
• you will see substantial differences between the fixed effects and random effect
models. So which model is better in the present example: fixed effects or random
effects?
• The null hypothesis underlying the Hausman test is that FEM and REM do not
differ substantially.
• His test statistic has an asymptotic (i.e. large sample) χ 2 distribution with df
equal to number of regressors in the model.
• As usual, if the computed chi-square value exceeds the critical chi-square value
for given df and the level of significance, we conclude that REM is not appropriate
because the random error terms are probably correlated with one or more
regressors. In this case, FEM is preferred to REM.
• For our example, the results of the Hausman test are given in Table 17.7. The Hausman
test strongly rejects the REM, for the p value of the estimated chi-square statistics is
very low.
• The last part of this table compares the fixed effects and random effects coefficient of
each variable.
• As the last probability column of the table shows, the differences in the Age and DEPS
coefficients are statistically highly significant.
• Basically, the Hausman test examines– that is, the squared difference
between regression coefficients estimated from REM and FEM.
• Since the REM model does not seem appropriate in the present example, we can revert
to the FEM model.
Some guidelines about REM and FEM
• Here are some general guidelines about which of the two models may be suitable
in practical applications:
• 1. If T (the number of time observations) is large and N (the number of cross
section units) is small, there is likely to be little difference in the values of the
parameters estimated by FEM and REM.
• The choice then depends on computational convenience, which may favor FEM.
• 2. In a short panel (N large and T small), the estimates obtained from the two
models can differ substantially.
• Remember that in REM where is the cross-sectional random
component, whereas in FEMB1i is treated as fixed.
• In the latter case, statistical inference is conditional on the observed cross-
sectional units in the sample.
• This is valid if we strongly believe that the cross-sectional units in the sample
are not random drawings from a larger population.
• In that case, FEM is appropriate. If that is not the case, then REM is appropriate
because in that case statistical inference is unconditional.
• 3. If N is large and T is small, and if the assumptions underlying REM hold, REM
estimators are more efficient than FEM.
• 4. Unlike FEM, REM can estimate coefficients of time-invariant variables, such
as gender and ethnicity.
• The FEM does control for such time-invariant variables, but it cannot estimate
them directly, as is clear from the LSDV estimator model.
• On the other hand, FEM controls for all time-invariant variables, whereas REM
can estimate only those time-invariant variables that are explicitly introduced in
the model.

You might also like