You are on page 1of 112

Panel Data Model

Andualem G. (Ph.D)

andulaser@gmail.com

What is panel data?

A data set containing observations on multiple phenomena observed at a single point in time is called cross-sectional data

A data set containing observations on a single phenomenon observed over multiple time periods is called time series data

Observations on multiple phenomena over multiple time periods are panel data

Cross sectional and time series data are one- dimensional, panel data are two-dimensional

Panel data can be used to answer both longitudinal and cross- sectional questions!

Example of a panel data

Organization

Time

Profit

Location

Tax

MUGER CEMENT

  • 2013 500,000

 
  • 1 2%

 

MUGER CEMENT

  • 2014 600,000

 
  • 1 3%

 

MUGER CEMENT

  • 2015 700,000

 
  • 1 4%

 

MESSEBO CEMENT FACTORY

  • 2013 400,000

 
  • 2 2%

 

MESSEBO CEMENT FACTORY

  • 2014 300,000

 
  • 2 3%

 

MESSEBO CEMENT FACTORY

  • 2015 200,000

 
  • 2 4%

 

DERBA MIDROC CEMENT

  • 2013 100,000

 
  • 3 2%

 

DERBA MIDROC CEMENT

  • 2014 300,000

 
  • 3 3%

 

DERBA MIDROC CEMENT

  • 2015 500,000

 
  • 3 4%

 
   

Xit

Xi

Xt

Long versus Short Panel Data

A short panel has many entities (large n) but few time periods (small T), while a long panel has many time periods (large T) but few entities

(Cameron and Trivedi, 2009: 230).

Accordingly, a short panel data set is wide in width (cross-sectional) and short in length (time- series), whereas a long panel is narrow in width. Both too small N (Type I error) and too large N (Type II error) problems matter. Researchers

should be very careful especially when examining

either short or long panel.

Balanced versus Unbalanced Panel

Data

In a balanced panel, all entities have measurements in all time periods. In a contingency table (or cross-table) of cross- sectional and time-series variables, each cell should have only one frequency. Therefore, the total number of observations is

nT.

When each entity in a data set has different numbers of observations, the panel data are not balanced. Some cells in the contingency table have zero frequency. Accordingly, the total number of observations is not nT in an unbalanced panel. Unbalanced panel data entail some computation and estimation issues although most software packages are able to handle both balanced and unbalanced data.

Fixed versus Rotating Panel Data

If the same individuals (or entities) are observed for each period, the panel data set is called a fixed panel (Greene 2008: 184). If a set of individuals changes from one period to the next, the data set is a rotating panel.

Recap : Example of a panel data

Organization

Time

Profit

Location

Tax

MUGER CEMENT

  • 2013 500,000

 
  • 1 2%

 

MUGER CEMENT

  • 2014 600,000

 
  • 1 3%

 

MUGER CEMENT

  • 2015 700,000

 
  • 1 4%

 

MESSEBO CEMENT FACTORY

  • 2013 400,000

 
  • 2 2%

 

MESSEBO CEMENT FACTORY

  • 2014 300,000

 
  • 2 3%

 

MESSEBO CEMENT FACTORY

  • 2015 200,000

 
  • 2 4%

 

DERBA MIDROC CEMENT

  • 2013 100,000

 
  • 3 2%

 

DERBA MIDROC CEMENT

  • 2014 300,000

 
  • 3 3%

 

DERBA MIDROC CEMENT

  • 2015 500,000

 
  • 3 4%

 
   

Xit

Xi

Xt

With panel data we can control for factors that:

Vary across entities (states) but do not vary over time

Could cause omitted variable bias if they are omitted

Are unobserved or unmeasured and therefore cannot be included in the regression using multiple regression

Here’s the key idea:

If an omitted variable does not change over time, then any changes in Y over time cannot be caused by the

omitted variable.

Panel Data (cont.)

Potential unobserved heterogeneity is a form of omitted variables bias.

“Unobserved heterogeneity” refers to omitted

variables that are fixed for an individual (at

least over a long period of time). A person’s upbringing, family characteristics, innate ability, and demographics (except age)

do not change.

Panel Data (cont.)

With cross-sectional data, there is no particular reason to differentiate between

omitted variables that are fixed over time and

omitted variables that are changing.

However, when an omitted variable is fixed over time, panel data offers another tool for

eliminating the bias.

Panel Data (cont.)

Panel Data is data in which we observe repeated cross-sections of the same individuals.

Examples:

Annual unemployment rates of each country over several years

Quarterly sales of individual stores over several quarters

Wages for the same worker, working at several different jobs

Panel Data (cont.)

By far the leading type of panel data is repeated cross-sections over time.

The key feature of panel data is that we observe the same individual in more than one

condition.

Omitted variables that are fixed will take on the same values each time we observe the same individual.

Panel Data (cont.)

Some of the most valuable data sets in economics and finance are panel data sets.

Longitudinal surveys return year after year to the same individuals, tracking them over time.

Panel Data (cont.)

For example, the National Longitudinal Survey

of Youth (NLSY) tracks labor market outcomes

for thousands of individuals, beginning in their teenage years.

The Panel Survey of Income Dynamics provides high-quality income data.

The panel Survey of Stock Price provides a high quality asset return data

Example:

Cross-Industry Wage Disparities

A great puzzle in labor market is the presence of cross-

industry wage disparities.

Workers of seemingly equivalent ability, in seemingly equivalent occupations, receive different wages in

different industries.

e.g. A person with MSc in accounting working at ECA paid 30,000 birr. Another person with MSc in accounting working at Addis Ababa University paid 5000 birr.

Do high-wage industries actually pay higher wages, or

do they attract workers of unobservably higher quality?

Gibbons and Katz findings

Gibbons and Katz (Review of Economic Studies 1992) exploited panel data to explore these differentials. They observed workers in 1984 and 1986.

They focused on workers who lost their 1984 jobs because of plant closings (on the grounds that plant closings are unlikely to be

correlated with an individual worker’s

abilities). They looked only at workers who were re-employed by 1986.

Gibbons and Katz findings

Gibbons and Katz estimated wages as

ln

w

it

o

 1
1

X

1

it

 

..

k

X

kit

 1
1

Industry _ 1 it

D

..

m

Industry

it

D

_

m

it

where X kit are demographic variables, and D it are a set of dummy variables for being employed in different industries

Gibbons and Katz findings

Gibbons and Katz findings Estimating with simple OLS, Gibbons and Katz estimate  ’s that are

Estimating with simple OLS, Gibbons and

Katz estimate ’s that are very similar to other estimates of cross-industry wage differentials

Gibbons and Katz findings

Gibbons and Katz speculated that any

unmeasured ability is fixed over time and equally

rewarded in all industries.

ln

w

it

o

 1
1

X

1

it

 

..

k

X

kit

1

Industry _1 it

D

..

m

Industry

it

D

_

m

v

i

it

Differencing the 1986 and 1984 observations eliminated the v i

Gibbons and Katz findings

Gibbons and Katz findings 20

Gibbons and Katz findings

The estimated industry coefficients from

the differenced equation are about 80% of

the estimated industry coefficients from the levels equation.

Unobserved worker ability appears to explain relatively little of the cross-industry

wage differentials.

A Panel Data DGP

A Panel Data DGP 22

Panel Data DGP’s (cont.)

Notice that when we have panel data, we index observations with both i and t.

Pay close attention to the subscripts on variables.

Some variables vary only across time or across individual.

Recap : Example of a panel data

Organization

Time

Profit

Location

Tax

MUGER CEMENT

  • 2013 500,000

 
  • 1 2%

 

MUGER CEMENT

  • 2014 600,000

 
  • 1 3%

 

MUGER CEMENT

  • 2015 700,000

 
  • 1 4%

 

MESSEBO CEMENT FACTORY

  • 2013 400,000

 
  • 2 2%

 

MESSEBO CEMENT FACTORY

  • 2014 300,000

 
  • 2 3%

 

MESSEBO CEMENT FACTORY

  • 2015 200,000

 
  • 2 4%

 

DERBA MIDROC CEMENT

  • 2013 100,000

 
  • 3 2%

 

DERBA MIDROC CEMENT

  • 2014 300,000

 
  • 3 3%

 

DERBA MIDROC CEMENT

  • 2015 500,000

 
  • 3 4%

 
   

Xit

Xi

Xt

Panel Data DGP’s (cont.)

Panel Data DGP’s (cont.) 25

Panel Data DGP’s (cont.)

Panel Data DGP’s (cont.) 26

Panel Data DGP’s (cont.)

In this DGP, the b 0i are fixed across samples. The unmeasured heterogeneity is the same in every sample. This DGP is called the “Distinct Intercepts” DGP. It is suitable for panels of states or countries, where the same individuals would be selected

in each sample.

Panel Data DGP’s (cont.)

With longitudinal data on individual workers

or consumers, we draw a different set of

individuals from the population each time we collect a sample.

Each individual has his/her own set of fixed omitted variables.

We cannot fix each individual intercept.

Another Panel Data DGP

Another Panel Data DGP 29

Panel Data DGP’s

In this DGP, we return to a model with a single intercept for all data points, b 0

However, we break the error term into two components:

v

it

i

it

When we draw an individual i, we draw one v i that is fixed for that individual in all time periods.

v i includes all fixed omitted variables.

Panel Data DGP’s

In the Distinct Intercepts DGP, the unobserved heterogeneity is absorbed into the individual- specific intercept b 0i

In the second DGP, the unobserved heterogeneity is absorbed into the individual fixed component of the error term, v i

This DGP is an “Error Components Model.”

Panel Data DGP’s

The Error Components DGP comes in two

flavors, depending on

E

(

X

jit

v

i

)

.

If

E

(

X

jit

v

i

)

0

, then the unobserved

heterogeneity is uncorrelated with the

explanators.

OLS is unbiased and consistent.

Panel Data DGP’s

If

E

(

X

jit

v

i

)

0

, then the unobserved

heterogeneity IS correlated with

the explanators.

OLS is BIASED and INCONSISTENT

Panel Data DGP’s

Panel data is most useful in the second Error Components case.

When

E

(

X

jit

v

i

)

0

, OLS is inconsistent.

Using panel data, we can create a consistent estimator: Fixed Effects.

Recall

OLS estimator beta and alpha are BLUE (best linear unbiased estimator)

Best: variance of OLS estimator is minimal, smaller than the variance of any other estimator

Linear: if the relationship is not linear- OLS is not applicable

Unbiased: the expected values of the estimated

beta and alpha equal the true values describing

the relationship between X and Y

Recall :How to choose between

estimator: Bias vs consistency

Efficiency is a relative measure between two estimator. An

estimator is efficient when it is smaller variance than

another estimator

Consistency: how far is the estimator likely to be from the true parameter as we let the sample size increase indefinitely. If N approaches to infinity the estimated beta

equals the true beta.

Unlike unbiasedness, consistency involves that the variance of the estimator collapses around the true value as N approaches infinity. Thus unbiased estimator are not necessarily consistent, but those whose variance shrink to

zero as the sample size grows are consistent

Fixed Effects

The Fixed Effects Estimator

Used with EITHER the distinct intercepts DGP OR the error components DGP

with

E

(

X

ijt

v

i

)

0

Basic Idea: estimate a separate intercept for each individual

Fixed Effects

The simplest way to estimate separate

intercepts for each individual is to use dummy

variables.

This method is called the least squares dummy variable estimator.

Fixed Effects

We have already seen that we can use dummy variables to estimate separate intercepts for

different groups.

With panel data, we have multiple observations for each individual. We can group these observations.

Fixed Effects

Least Squares Dummy Variable Estimator:

  • 1. Create a set of n dummy variables, D j , such that D j = 1 if i = j.

  • 2. Regress Y it against all the dummies, X t , and X it variables (you must omit X i variables and the constant).

Fixed Effects

The LSDV estimator is conceptually quite simple.

In practice, the tricky parts are:

  • 1. Creating the dummy variables

  • 2. Entering the regression into the computer

  • 3. Reporting results

Fixed Effects

Suppose we have a longitudinal dataset with 300 workers over 10 years.

n = 300

• • We must create 300 dummy variables and then specify a regression with 300+ explanators. How do we do this in our software package?

Fixed Effects

Our regression output includes 300 intercepts.

Usually, we are not interested in the intercepts themselves.

We include the dummy variables to control for heterogeneity.

Fixed Effects

In reporting your regression output,

it is preferable to note that you have included

“individual fixed effects.”

Then omit the dummy variable coefficients from your table of results.

Fixed Effects

At some point, n becomes too large for the computer to handle easily.

Modern computers can implement LSDV for ever larger data sets, but eventually LSDV

becomes computationally intractable.

Fixed Effects

A computationally convenient alternative is called the Fixed Effects Estimator.

Technically, only this strategy is “Fixed Effects;” using dummy variables is

LSDV.

In practice, econometricians tend to refer to either method as Fixed Effects.

Fixed Effects

The initial insight for the Fixed Effects estimator: if we DIFFERENCE observations for the same individual, the v i cancels out.

Fixed Effects

Fixed Effects 48

Fixed Effects

When we difference, the heterogeneity term v i drops out. (In the distinct intercepts model, the b 0i would drop out). By assumption, the it are uncorrelated with the X it OLS would be a consistent estimator of b 1

Fixed Effects

If T = 2, then we have only 2 observations for each individual

(as in the Gibbons and Katz example). Differencing the 2 observations is efficient.

If T > 2, then differencing any 2 observations ignores valuable information in the other observations for each individual.

Fixed Effects

We can use all the observations for each individual if we subtract the individual-specific

mean from each observation.

Fixed Effects

Fixed Effects 52

Fixed Effects

Fixed Effects 53

Fixed Effects

The Fixed Effects and DVLS estimators provide exactly identical estimates.

The n-T-k term of the Fixed Effects e.s.e.’s must be adjusted to account for the extra n degrees of freedom that have been used.

The computer can make this adjustment

Fixed Effects

Demeaning each observation by the individual-specific mean eliminates the need to create n dummy variables.

FE is computationally much simpler.

Fixed Effects

Fixed Effects (however estimated) discards all variation between individuals.

Fixed Effects uses only variation over time within an individual.

FE is sometimes called the “within” estimator.

Fixed Effects

Fixed Effects discards a great deal of variation in the explanators (all variation between

individuals).

Fixed Effects uses n degrees of freedom.

Fixed Effects is not efficient if

E

Could we use OLS?

(

X

it

v

i

)

0

Checking Understanding (cont.)

Checking Understanding (cont.) 58

Checking Understanding (cont.)

Because X is uncorrelated with either v or , OLS is consistent in the uncorrelated version of the error components DGP.

Var

(

)

it

(

Var v

i

2

2

it

v

)

The error terms are homoskedatic

Checking Understanding (cont.)

However, the covariance between disturbances for a given individual is

Checking Understanding (cont.) • However, the covariance between disturbances for a given individual is 60

Checking Understanding (cont.)

In the presence of serial correlation, OLS is inefficient.

Random Effects

When unobserved heterogeneity is uncorrelated with explanators, panel data

techniques are not needed to produce a consistent estimator.

However, we do need to correct for serial correlation between observations of the same

individual.

Random Effects

When

E

(

X

it

v

i

)

0

, panel data provides

a valuable tool for eliminating omitted

variables bias. We use Fixed Effects to gain the benefits of panel data.

When

E

(

X

it

v

i

)

0

, panel data does not offer

special benefits. We use Random Effects to

overcome the serial correlation of panel data.

Random Effects

The key idea of random effects:

Estimate v 2 and Use these estimates to construct efficient weights of panel data observations

2

Random Effects

Random Effects 65

Random Effects

Once we have estimates of v 2 and 2 , we can re-weight the observations optimally.

These calculations are complicated, but most computer packages can implement them.

Example: Production Functions

We have data from 625 French firms from 16 countries for 8 years.

We wish to estimate a CobbDouglas production function:

Q i
Q
i

b

0

b

L i
L
i

1

b

K i
K
i

2

i

Taking logs:

ln(

)

Q

i

ln(

b b

0

)

1

ln(

L b

i

)

2

ln(

)

K

i

ln(

 i
i

)

We estimate using random effects.

Random Effects Estimation of a CobbDouglas Production Function for a Sample of Manufacturing Firms

Random Effects Estimation of a Cobb – Douglas Production Function for a Sample of Manufacturing Firms

Example: Production Functions

The estimated coefficients of 0.30 for capital

and 0.69 for labor are similar to estimates

using US data.

We also get similar results using fixed effects estimation.

Fixed Effects Estimation of a CobbDouglas

Production Function for a Sample of

Manufacturing Firms

Fixed Effects Estimation of a Cobb – Douglas Production Function for a Sample of Manufacturing Firms

Example: Production Functions

We arrive at similar estimates using either random effects or fixed effects.

Because only fixed effects controls for unobserved heterogeneity that is

correlated with the explanators, the similarity

between the two estimates suggests that

unobserved heterogeneity is not creating a

large bias in

this sample.

Example: Production Functions

The fixed effects estimator discards all variation between firms, and must use

624 more degrees of freedom than random effects.

Moving from RE to FE increases the e.s.e. on capital from 0.0116 to 0.0145

The e.s.e. on labor moves from 0.0118 to 0.0132

Example: Production Functions

The RE estimator provides more precise estimates

We would prefer to use RE instead of FE.

However, RE might be inconsistent if

E

(

X

it

v

i

)

0

We need a test to help determine whether it is safe to use RE.

The Hausman Test

Hausman’s specification test for error components DGPs provides guidance on

whether

E

(

X

it

v

i

)

0

The key idea: if

E

(

X

it

v

i

)

0

, then the

inconsistent RE estimator and the consistent FE estimator converge to different estimates.

The Hausman Test

If

E

(

X

it

v

i

)

0

, then the unobserved

heterogeneity is uncorrelated with X and does not create a bias.

RE and FE are both consistent.

For two consistent estimators to provide significantly different estimates would be

surprising.

The Hausman Test

We know the FE estimator is consistent even when

E

(

X

it

v

i

)

0

The problem with FE is its inefficiency.

FE is not as precise as RE.

Although FE is imprecise, it may provide a good enough estimate to detect a large bias in

RE.

The Hausman Test

If FE is very imprecise, then the Hausman test has very weak power and cannot rule out

even large biases.

If FE is very precise, then the Hausman test has very good power, but we gain little benefit from switching to the more efficient RE.

The Hausman Test

If FE is somewhat precise, then the Hausman

test can warn us away from using RE in the

presence of a large bias, but there is still room for substantial efficiency gains in switching to

RE.

With the French manufacturing firms, FE is precise enough to reject the null even though

the two estimates are fairly close.

Hausman Specification Test

of French Firms’ Error Components’

Correlation with Explanators

Hausman Specification Test of French Firms’ Error Components’ Correlation with Explanators 79

The Hausman Test

Warning: fixed effects exacerbates measurement error bias.

There is likely to be less variation in X within the experience of a single individual than across several individuals.

Small measurement errors can become large relative to the within-variation in X.

The Hausman Test

The Hausman Test warns us that RE and FE provide significantly different estimates.

This difference could arise because of omitted variables bias in RE, caused by

E( X it v i ) 0

This difference could ALSO arise because of measurement error biases in FE.

Review

Potential unobserved heterogeneity is a form of omitted variables bias.

“Unobserved heterogeneity” refers to omitted variables that are fixed for an individual

(at least over a long period of time).

A person’s upbringing, family characteristics,

innate ability, and demographics (except age)

do not change.

Review

Panel Data is data in which we observe repeated cross-sections of the same

individuals.

By far the leading type of panel data is repeated cross-sections over time.

Review

The key feature of panel data is that we observe the same individual in more than one

condition.

Omitted variables that are fixed will take on the same values each time we observe the same individual.

Review

We learned 3 different DGP’s for panel data.

In the distinct intercept DGP, across samples we would observe the same individuals with

the same unobserved heterogeneity.

Each i has its own intercept, b 0i , that is fixed across samples.

A Panel Data DGP

A Panel Data DGP 86

Review

Error components DGP’s are suitable when we would draw different individuals across

samples.

When each i is drawn, its unobserved heterogeneity is captured in a v i term.

We learned two error components DGP, depending on whether the v i is correlated with the X kit ’s.

Another Panel Data DGP

Another Panel Data DGP 88

Review

If

E( X it v i ) 0

, OLS would be inconsistent

By estimating a separate intercept for each individual, we can control for the v i

We learned two equivalent strategies: DVLS and FE.

Review

The simplest way to estimate separate

intercepts for each individual is to use dummy

variables.

This method is called the least squares dummy variable estimator.

Review

Least Squares Dummy Variable Estimator:

  • 1. Create a set of n dummy variables, D j , such that D j = 1 if i = j

  • 2. Regress Y it against all the dummies, X t , and X it variables (you must omit X i variables and the constant)

Review

Review 92

Review

Demeaning each observation by the individual-specific mean eliminates the need to create n dummy variables.

FE is computationally much simpler.

Review

Fixed Effects (however estimated) discards all variation between individuals.

Fixed Effects uses only variation over time within an individual.

FE is sometimes called the “within” estimator.

Review

Because X is uncorrelated with either v or , OLS is consistent in the uncorrelated version of the error components DGP.

Var(it ) Var(v i it ) 2

v

2

OLS disturbances are homoskedatic.

Review

However, the covariance between disturbances for a given individual is

Cov

(

 

it

,

it

'

)

E

([

v

i

it

Review • However, the covariance between disturbances for a given individual is Cov (  

] [

(

E v

2

i

)

2

Review • However, the covariance between disturbances for a given individual is Cov (  

E (

 

it

it '

)

(

E v

2

i

)

2

v

Corr (

 

it

,

it '

)

Cov (

 

it

,

it

'

)

Var (

it

)

2

v

2

v

2

Review

When unobserved heterogeneity is uncorrelated with explanators, panel data

techniques are not needed to produce a consistent estimator.

However, we do need to correct for serial correlation between observations of the same

individual.

Review

When

E

(

X

,

it

v

i

)

0

panel data provides

a valuable tool for eliminating omitted

variables bias. We use Fixed Effects to gain the benefits of panel data.

Review

When

E( X it v i ) 0

, panel data is less

convenient than an equal-sized

cross-sectional data set. We use Random Effects to overcome the serial correlation of

panel data.

Review

The key idea of random effects:

Estimate v 2 and Use these estimates to construct efficient weights of panel data observations

2

Random vs. Fixed Effects

Random Effects

 

Small number of parameters

 

Efficient estimation

If you believe that differences across entities have some influence on your dependent variable then use Random effect model.

It also accommodate time invariant variables

 

Pr

ofit

it

b b Sale

1

2

in time t

it

b adv

3

cos

t

it

it

it

, where Profit is the profit of i firm

i

Fixed Effects

 
 

Robust generally consistent

 

Large number of parameters

More reasonable assumption

Precludes time invariant regressors

Use fixed effect whenever you are interested in analyzing the impact of variables that vary over time

assumption of the correlation between entity’s error term and predictor variables

FE remove the effect of those time-invariant characteristics

designed to study the causes of changes within a country

 

Pr

ofit

it

b b Sale

1

2

it

b adv

3

cos

t

it

i

it

, where Profit is the profit of i firm

i

in time t

101

Random vs. Fixed Effects

Random vs. Fixed Effects 102

Using panel data in Stata

Data on n cases, over t time periods, giving a total of n × t observations

One record per observation i.e. long format

Stata tools for analyzing panel data begin with the prefix

xt

First need to tell Stata that you have panel data using xtset

Commands

xtset company time

xtdes

Xtsum

Interpretation for sum:

  • **/ For time-invariant variable ed the within variation is zero.

  • For individual-invariant variable t the between variation is zero.

  • For lwage the within variation < between variation/*****

xtdata xtline revenue xtline revenue, overlay

Commands: Tabulation

**/Panel tabulation for a variable. If south is a dummy /**

xttab south

Interpretation for tab:

  • For overall percent: 29.03% on average were in the south.

For between percent: 20.59% were ever in the south.

For within percent: 94.9% of those ever in the south were always in the south.

Commands: Transition and correlation

**/Transition probabilities for a variable/// xttrans south, freq Interpretation: For the 28.99% of the sample

ever in the south, 99.23% remained in the south

the next period.

correlate lwage L.lwage L2.lwage L3.lwage L4.lwage L5.lwage L6.lwage

Regression

  • 1. reg revenue numberofbranchadvcost

  • 2. ****Pooled OLS with cluster-robust standard errors*/** regress lwage exp exp2 wks ed, noheader vce(cluster id)

  • 3. xi: regress revenue numberofbranchadvcosti.company

  • 4. Pooled GLS estimator: xtgee. Pooled FGLS estimator with

AR(2) error & cluster-robust se’s xtgee lwage exp exp2 wks ed, corr(ar 2) vce(robust)

  • 5. Between GLS estimator: xtreg, be

xtreg lwage exp exp2 wks ed, be

  • 6. First difference estimator with cluster-robust se’s regress D.(lwage $xlist), vce(cluster id)

Random and fixed effect

7. xtreg revenue numberofbranch , fe **** fixed effect

**Coefficients of the regressors indicate how much Y

changes when X increases by one unit********** 8. * Random effects estimator with cluster-robust se’s xtreg lwage exp exp2 wks ed, re vce(robust) theta

xtreg revenue numberofbranch, re **** random effect

**** represents the average effect of X over Y when X changes across time and between countries by one

unit**

Hausman test

Hausman test (If this is < 0.05 mean P value (i.e. significant) use fixed effects): to run it

xtreg revenue numberofbranch , fe estimates store fixed xtreg revenue numberofbranch , re estimates store randome Hausmanfixeddrandomm

Testing for random effects: Breusch-Pagan Lagrange multiplier (LM)

The LM test helps you decide between a random effects regression and a simple OLS regression.

xtreg revenue numberofbranch , re

xttest0

If Prob value is higher than >0.05 . Here we failed to reject the null and conclude

that randomeffects is not appropriate. This is, no evidence of significant differences across countries, therefore you can run a simple OLS regression

Compare different estimation

Compare different estimation 110

cross-sectional dependence

cross-sectional dependence is a problem in macro panels with long

time series (over 20 30 years). This is not much of a problem in

micro panels (few years and large number of cases). The null

hypothesis in the B-P/LM test of independence is that residuals across entities are not correlated.

xtreg revenue numberofbranch , fe

xttest2

if P value is higher than 0.05 then No cross-sectional dependence

alternatively we can use Pasaran CD (cross-sectional dependence):

used to test whether the residuals are correlated across entities

xtreg revenue numberofbranch , fe

xtcsd, pesaran abs

if P value is higher than 0.05 then No cross-sectional dependence

A test for heteroskedasticiy and Serial

correlation tests

1.

A test for heteroskedasticiy

The null is homoskedasticity (or constant variance)

This is a user-written program, to install it type:

ssc install xtest3

xttest3

2.

Serial correlation tests apply: is a problem of long time 20- 30

A Lagram-Multiplier test for serial correlation is available using the

command xtserial.

This is a user-written program, to install it typessc install xtserial

xtserial revenue numberofbranch