aqm_2008_lecture_random

© All Rights Reserved

4 views

aqm_2008_lecture_random

© All Rights Reserved

- g k 3112241234
- A Descriptive Survey
- Multi Page
- Mbr Final Report
- Initial Relationship Goal and Couple Therapy Outcomes at Post and Six-Month Follow-Up
- 0534-00121103537A
- Impact of SLR
- Pensri Et Al Biopsychosocial Factors and Perceived Disability in Saleswomen With Concurrent Low Back Pain
- PredictPeakOutflwBrchEmbkDms2010
- Chapter 15 CRAVEN SALES MODEL - Multiple Regression
- Research Experience
- Exchage Rate Volatility Forecast
- Handbk for Exp Sv
- chapter11_s07
- Letki Political Participation
- customer satisfaction
- hasil ECM
- Yoga Para Mulheres Com Cãncer de Mama e Metstase
- All You Need to Know about STATA.pdf
- Estimation of Bus Arrival Times Using APC Data

You are on page 1of 121

Jos Elkink

April, 2008

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Outline

1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Four topics

Missing data

Fixed & random effects

Time-series models

Causation and inference

March 27

April 3

April 10

April 17

Outline

1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Motivations

Clustered sampling

Sampling strategies

Probability sampling:

Sampling strategies

Probability sampling:

Simple random sampling

The sampling here is a purely random

selection from the sampling frame, selected

without replacement.

The sampling here is a purely random

selection from the sampling frame, selected

without replacement.

Each subject from a population has the exact

same chance of being selected in the sample.

The sampling here is a purely random

selection from the sampling frame, selected

without replacement.

Each subject from a population has the exact

same chance of being selected in the sample.

The sample probability for each subject is the

same.

Sampling strategies

Probability sampling:

Simple random sampling

Systematic random sampling

Sampling strategies

Probability sampling:

Simple random sampling

Systematic random sampling

Stratified sampling

Sampling strategies

Probability sampling:

Simple random sampling

Systematic random sampling

Stratified sampling

Cluster sampling

Cluster sampling

To reduce costs, clusters are (randomly)

sampled first, before lower levels are

clustered.

Cluster sampling

To reduce costs, clusters are (randomly)

sampled first, before lower levels are

clustered.

E.g. selecting schools before selecting

students, so that fewer schools need to be

visited.

Cluster sampling

To reduce costs, clusters are (randomly)

sampled first, before lower levels are

clustered.

E.g. selecting schools before selecting

students, so that fewer schools need to be

visited.

Individual observations from a clustered

sample are not independent.

Motivations

Clustered sampling

Inherent structure

Examples

schools

classes

firms

countries

doctors

subjects

interviewers

judges

teachers

pupils

employees

political parties

patients

measurements

respondents

suspects

Motivations

Clustered sampling

Inherent structure

Panel data

Motivations

Clustered sampling

Inherent structure

Panel data

Time-Series Cross-Section

Multilevel characteristics

Observations are not completely

independent

Multilevel characteristics

Observations are not completely

independent

Variance can be divided in

between-group and within-group

variances

Multilevel characteristics

Observations are not completely

independent

Variance can be divided in

between-group and within-group

variances

Variables can be measured at either

micro- or marco-level, or both

Example

4

x

Overall mean

4

x

Group means

4

x

Between variation

slope = 0.683

4

x

Group means

4

x

0.0

0.5

1.0

1.5

y.dev

0.5

1.0

Within variation

4

x

Outline

1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Pooled model

micro-level data, ignoring the multilevel

structure, we call this a pooled model.

Pooled model

If we have some observations at the

macro-level, we are artificially increasing the

number of observations.

Pooled model

If we have some observations at the

macro-level, we are artificially increasing the

number of observations. Thus we will be

overconfident in our results.

Pooled model

If we have some observations at the

macro-level, we are artificially increasing the

number of observations. Thus we will be

overconfident in our results.

E.g. characteristics of judges in explaining

the severity of court rulings.

Pooled model

slope = 0.124

4

x

With a fixed effects model we explain the

within-group variation, removing the

between-group variation by:

With a fixed effects model we explain the

within-group variation, removing the

between-group variation by:

Adding dummy variables for each group

With a fixed effects model we explain the

within-group variation, removing the

between-group variation by:

Adding dummy variables for each group

Subtracting the group means from all

variables

With a fixed effects model we explain the

within-group variation, removing the

between-group variation by:

Adding dummy variables for each group

Subtracting the group means from all

variables

The two are equivalent.

In essence, we thus have different intercepts

for each group.

In essence, we thus have different intercepts

for each group.

yi = 0 + Xi + j[i ] + i ,

whereby i denotes the individual unit, j the

group, and j[i] the group of i.

In essence, we thus have different intercepts

for each group.

yi = 0 + Xi + j[i ] + i ,

whereby i denotes the individual unit, j the

group, and j[i] the group of i.

If the fixed effects model is the true model,

pooled estimates are biased and inconsistent.

Pooled model

slope = 0.124

4

x

slope = 0.210

4

x

0.0

0.5

1.0

1.5

y.dev

0.5

1.0

slope = 0.210

0

x.dev

Another way of dealing with clustered data is

looking at the between model:

yj = 0 + Xj + j

Another way of dealing with clustered data is

looking at the between model:

yj = 0 + Xj + j

Typical mistake: conclusions about

individuals from aggregate data - ecological

fallacy.

slope = 0.683

4

x

Fixed effects in R

lm(grade ~ aptitude + age + factor(school))

Fixed effects in R

lm(grade ~ aptitude + age + factor(school))

lm(grade ~ aptitude + age + factor(school) - 1)

xtreg grade aptitude age, i(school) fe

xtreg grade aptitude age, i(school) fe

Or, manually:

xi: reg grade aptitude age i.school

6

5

4

3

2

grade

School example

0

aptitude

School example

Estimate Std. Error t value Pr(>|t|)

(Intercept)

1.955

1.886

1.04

0.304

aptitude

0.797

0.159

5.02 5.4e-06 ***

age

0.287

0.151

1.90

0.062 .

Residual standard error: 1.2 on 57 degrees of freedom

Multiple R-Squared: 0.361, Adjusted R-squared: 0.339

F-statistic: 16.1 on 2 and 57 DF, p-value: 2.86e-06

School example

aptitude

age

factor(school)1

factor(school)2

factor(school)3

0.9227

0.0723

12.76 < 2e-16 **

0.2013

0.0675

2.98

0.0042 **

2.6388

0.8565

3.08

0.0032 **

4.5112

0.8550

5.28 2.3e-06 **

1.9550

0.8370

2.34

0.0232 *

Multiple R-Squared: 0.992, Adjusted R-squared: 0.991

F-statistic: 1.3e+03 on 5 and 55 DF, p-value: <2e-16

Group-level variables

Note that fixed effects models cannot deal

with group-level variables.

Group-level variables

Note that fixed effects models cannot deal

with group-level variables.

The effect would be perfect multicollinearity.

Group-level variables

Note that fixed effects models cannot deal

with group-level variables.

The effect would be perfect multicollinearity.

High multicollinearity also arises from

variables with low variance - e.g. political

institutions.

Group-level variables

Solution:

1

2

3

yi = Xi + j[i ] + i

j = Zj + j

yi = Xi + Zj[i ] + j[i ] + i

The last step is necessary to get the correct

standard errors.

Group-level variables

however, a random effects or random

intercept model is more appropriate.

Outline

1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Random effects

For the random effects model we still have:

yi = 0 + Xi + j[i ] + i .

However, this time we assume j N(0, 2 ).

Random effects

For the random effects model we still have:

yi = 0 + Xi + j[i ] + i .

However, this time we assume j N(0, 2 ).

By assuming that j comes from a normal

distribution, we have fewer parameters to

estimate (only one 2 instead of J s).

Variance components

In the population, the variance of the

dependent variable can be split in

within-group and between-group variance:

2

2

Y2 = between

+ within

Intraclass correlation

Aside: the proportion of the variance that is

accounted for by the group level is the

intraclass correlation.

intra

2

between

= 2

2

between + within

Variance estimators

2

2

within

= swithin

Variance estimators

2

2

within

= swithin

between

2

sbetween

where

n = n

2

swithin

,

sn2j

N n

Fixed vs random

Fixed vs random

When to use random effects?

A group effect is random if we can think

of the levels we observe in that group to

be samples from a larger population.

Fixed vs random

When to use random effects?

A group effect is random if we can think

of the levels we observe in that group to

be samples from a larger population.

When making out-of-sample inferences.

Fixed vs random

When to use random effects?

A group effect is random if we can think

of the levels we observe in that group to

be samples from a larger population.

When making out-of-sample inferences.

When there are group-level variables.

Fixed vs random

When to use random effects?

A group effect is random if we can think

of the levels we observe in that group to

be samples from a larger population.

When making out-of-sample inferences.

When there are group-level variables.

When the sizes of groups are small.

Fixed vs random

When to use random effects?

Alternatively, one can primarily look at nj

and N:

N small

fixed effects

N not small, nj small random effects

nj larger

not as important

But this is only a preliminary quick judgment!

Fixed vs random

When to use random effects?

Gelman & Hill (2007): Our advice (...) is to

always use multilevel modeling (random

effects).

Fixed vs random

When to use random effects?

Johnston & DiNardo (1997): choose random

effects when you can assume that Xi and

j[i ] are uncorrelated; fixed effects otherwise.

Random effects in R

library(arm)

lmer(grade ~ aptitude + age + (1|school))

xtreg grade aptitude age, i(school) re mle

School example

Note that we are talking about 3 schools this is too few groups to seriously consider a

random effects model!

School example

Linear mixed-effects model fit by REML

Random effects:

Groups

Name

Variance Std.Dev.

school

(Intercept) 1.737

1.318

Residual

0.293

0.542

number of obs: 60, groups: school, 3

Fixed effects:

Estimate Std. Error t value

(Intercept)

3.0259

1.1360

2.66

aptitude

0.9216

0.0723

12.75

age

0.2020

0.0675

2.99

School example

Random-effects GLS regression

Group variable (i): school

Number of obs

Number of groups

=

=

60

3

R-sq:

avg =

max =

20

20.0

20

within = 0.7578

between = 0.0072

overall = 0.3610

corr(u_i, X)

= 0 (assumed)

Wald chi2(2)

Prob > chi2

=

=

32.21

0.0000

-----------------------------------------------------------------------------grade |

Coef.

Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------aptitude |

.7970039

.1587327

5.02

0.000

.4858935

1.108114

age |

.2871654

.1508365

1.90

0.057

-.0084687

.5827995

_cons |

1.955282

1.88636

1.04

0.300

-1.741915

5.652479

-------------+---------------------------------------------------------------sigma_u |

0

sigma_e |

.5416609

rho |

0

(fraction of variance due to u_i)

------------------------------------------------------------------------------

6

5

4

3

2

grade

0

aptitude

6

5

4

3

2

grade

0

aptitude

School example

Random-effects ML regression

Group variable (i): school

Number of obs

Number of groups

=

=

60

3

avg =

max =

20

20.0

20

Log likelihood

= -53.896431

LR chi2(2)

Prob > chi2

=

=

83.55

0.0000

-----------------------------------------------------------------------------grade |

Coef.

Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------aptitude |

.9210743

.0710091

12.97

0.000

.781899

1.06025

age |

.2022943

.0662681

3.05

0.002

.0724112

.3321775

_cons |

3.021863

1.034865

2.92

0.003

.993565

5.050161

-------------+---------------------------------------------------------------/sigma_u |

1.073727

.4438302

.4775795

2.414027

/sigma_e |

.5320764

.0498341

.4428439

.639289

rho |

.8028508

.1342531

.4616776

.9640621

-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)=

83.34 Prob>=chibar2 = 0.000

R-squared

In linear regression, a popular statistics is R 2,

which is the squared multiple correlation

coefficient

R-squared

In linear regression, a popular statistics is R 2,

which is the squared multiple correlation

coefficient, or in other words, which describes

the proportion of the variance in the

dependent variable that is explained by the

model.

R-squared

In linear regression, a popular statistics is R 2,

which is the squared multiple correlation

coefficient, or in other words, which describes

the proportion of the variance in the

dependent variable that is explained by the

model.

So what about R 2 for random effects

models?

R-squared

Remember, variance of a multilevel model

has different components:

2

2

Y2 = between

+ within

R-squared: individual

level

Estimated two models, one with and one

without explanatory variables (A and B,

respectively).

Then,

2

Rwithin

2

2

+ ,A

,A

= 2

2

,B + ,B

Estimated two models, one with and one

without explanatory variables (A and B,

respectively).

Then,

2

Rbetween

2

2

/n

+ ,A

,A

,

= 2

2 /n

,B + ,B

population.

With a fixed effects model, we have the

coefficients on the group dummies which we

can interpret as group-level predictors.

With a fixed effects model, we have the

coefficients on the group dummies which we

can interpret as group-level predictors.

In a random effects model, we do not have

these predictions, as we only estimated 2

and 0.

The predicted group levels can be estimated

using:

0,j = j yj + (1 j )0

2

,

j = 2

+

2/nj

whereby yj is the mean on y of group j.

In R, you get the estimated fixed effects with:

fixef(model.random)

and the predicted random effects with:

ranef(model.random)

Outline

1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Random coefficients

In the random effects model, we assume that

group intercepts vary according to a normal

distribution.

Random coefficients

In the random effects model, we assume that

group intercepts vary according to a normal

distribution.

But what about the coefficients?

Random coefficients

In the random effects model, we assume that

group intercepts vary according to a normal

distribution.

But what about the coefficients?

I.e. what about group slopes that vary

following a normal distribution?

Random coefficients

yi = 0 + Xi + Xi j[i ] + j[i ] + i

j N(0, 2 )

j N(0, 2 )

Random coefficients

yi = 0 + Xi + Xi j[i ] + j[i ] + i

j N(0, 2 )

j N(0, 2 )

Note that a model with random coefficients,

but a constant intercept across groups rarely

makes sense, especially because of the often

arbitrary location if x = 0.

Random effects in R

library(arm)

lmer(grade ~ aptitude + age + (aptitude|school))

School example

Linear mixed-effects model fit by REML

Random effects:

Groups

Name

Variance Std.Dev. Corr

school

(Intercept) 1.74e+00 1.32e+00

aptitude

1.47e-10 1.21e-05 0.000

Residual

2.93e-01 5.42e-01

number of obs: 60, groups: school, 3

Fixed effects:

Estimate Std. Error t value

(Intercept)

3.0259

1.1359

2.66

aptitude

0.9216

0.0723

12.75

age

0.2020

0.0675

2.99

6

5

4

3

2

grade

0

aptitude

10

Example

5

x

10

10

Pooled model

slope = 0.195

5

x

10

10

slope = 0.236

5

x

10

10

slope = 0.235

5

x

10

10

Random coefficients

model

mean slope = 0.363

Random coefficients

model

Linear mixed-effects model fit by REML

Random effects:

Groups

Name

Variance

g

(Intercept) 3.730

x

0.397

Residual

0.094

number of obs: 200, groups: g,

Std.Dev. Corr

1.931

0.630

0.168

0.307

10

Fixed effects:

Estimate Std. Error t value

(Intercept)

1.113

0.611

1.82

x

0.363

0.199

1.82

Outline

1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Time-dependence within groups (next

week)

Time-dependence within groups (next

week)

Predictors on the random coefficients

Time-dependence within groups (next

week)

Predictors on the random coefficients

Bayesian estimation

Time-dependence within groups (next

week)

Predictors on the random coefficients

Bayesian estimation

More complex models dealing with panel

data structures

Time-dependence within groups (next

week)

Predictors on the random coefficients

Bayesian estimation

More complex models dealing with panel

data structures

Extensions towards limited dependent

variables

Further information

A clear, relatively introductory textbook on

multilevel modeling is Snijders & Bosker

(1999), Multilevel analysis. An introduction

to basic and advanced multilevel modeling.

Further information

A clear, relatively introductory textbook on

multilevel modeling is Snijders & Bosker

(1999), Multilevel analysis. An introduction

to basic and advanced multilevel modeling.

An excellent, modern book on multilevel

modeling, using primarily R and Bugs, is

Gelman & Hill (2007), Data analysis using

regression and multilevel/hierarchical models.

Further information

Their websites are also interesting:

Snijders: http://stat.gamma.rug.nl/snijders/

Gelman: http://www.stat.columbia.edu/ gelman/

Further information

When using Stata, the

Longitudinal/panel-data reference manual is

of very high quality. The relevant chapters

for this lecture are in fact freely available as

sample chapters (xtreg and xtmixed) at

http://www.stata.com/bookstore/xt.html.

Further information

When using Stata, the

Longitudinal/panel-data reference manual is

of very high quality. The relevant chapters

for this lecture are in fact freely available as

sample chapters (xtreg and xtmixed) at

http://www.stata.com/bookstore/xt.html.

For the use of R, Google and Gelman & Hill

(2007) are more helpful resources.

Further information

Two standard textbooks on panel data are

Baltagi (2005), Econometric analysis of

panel data (primarily for small N, large T )

and Hsiao (2003), Analysis of panel data

(primarily for large N, small T ). Both are

very technical in nature. Perhaps an easier

introduction is Wooldridge (2002),

Econometric analysis of cross-section and

panel-data.

- g k 3112241234Uploaded byAnonymous 7VPPkWS8O
- A Descriptive SurveyUploaded byMehreen Mustafa
- Multi PageUploaded bykarlajane03
- Mbr Final ReportUploaded byAmra_Rahman
- Initial Relationship Goal and Couple Therapy Outcomes at Post and Six-Month Follow-UpUploaded byCamila
- 0534-00121103537AUploaded byIin Hurun
- Impact of SLRUploaded byAniq Syed
- Pensri Et Al Biopsychosocial Factors and Perceived Disability in Saleswomen With Concurrent Low Back PainUploaded bybubbly_bea
- PredictPeakOutflwBrchEmbkDms2010Uploaded byCarlosDavidMuñicoCapucho
- Chapter 15 CRAVEN SALES MODEL - Multiple RegressionUploaded byFahad Mushtaq
- Research ExperienceUploaded byonkarnath1978
- Exchage Rate Volatility ForecastUploaded byAnaR9
- Handbk for Exp SvUploaded byshingper
- chapter11_s07Uploaded byMaría Victoria Sánchez V
- Letki Political ParticipationUploaded byTani Andreeva
- customer satisfactionUploaded byHarpreet Sharma
- hasil ECMUploaded byBenjamin Rogers
- Yoga Para Mulheres Com Cãncer de Mama e MetstaseUploaded byCelso Santos
- All You Need to Know about STATA.pdfUploaded byBenjamin Koh
- Estimation of Bus Arrival Times Using APC DataUploaded bycristian_master
- Data Hasil EviUploaded byAldiyana
- molecules-23-01002Uploaded bygmail
- RESEARCH METHODOLOGY.docxUploaded byBalaji Gajendran
- Patterns and Drivers of Post-socialist farmlandUploaded bylacos
- 1-s2.0-S2212567114004547-mainUploaded byLynaa Ly
- SasUploaded byWelly Galagher
- 735.pdfUploaded byMalcolm Christopher
- fulltext (1).pdfUploaded byShachi
- log olahdata.docxUploaded byrandy admi
- An Examination of the Relationship between Socio-epidemiologic Factors and Positive Workplace Drug Tests in the United StatesUploaded bypeertechz

- QNT 561 Weekly Learning Assessments Answers | UOP StudentsUploaded byUOP Students
- 6_model_selection-handout.pdfUploaded byTaylor Tam
- STA6167_Project_1_Ramin_Shamshiri_SolutionUploaded byRaminShamshiri
- UsefulStataCommands(3)Uploaded byZaheer Khan
- Regress Ssss i OnUploaded byHassan Khan
- Predicting the Direction of Swap Spreads Paul Teetor.pdfUploaded byalexa_sherpy
- 2646 (1)Uploaded byfdf
- Parallax US Equity Valuation Neural Network ModelUploaded byAlex Bernal
- 4. Chapter 4 Demand Estimation (1)Uploaded bydelisyaaamily
- Do Market Differ MuchUploaded bygabrieladevens
- CCAUploaded byNihat Boz
- Studenmund_Ch02_v2Uploaded bytsy0703
- Chapter 6Uploaded byJoseph Kandalaft
- Univariate RegressionUploaded bySree Nivas
- Export Sector of Pakistan, Challenges and StrategiesUploaded bywasi ul islam
- A Statistical Analysis and Model of the Residual Value ###Uploaded byTATATAHER
- Managerial Economics - Overview.docxUploaded byrudraarjun
- Environment and BehaviorUploaded byPahlawan Gelap
- Statistics Block Presentation Slides Day 4Uploaded byStephan Zeelie
- Amazing FastclusUploaded byramanujsarkar
- Analysis of Factors Affecting the Level Unemployment in South SumatraUploaded byIk Ram
- Art811.pdfUploaded bygetsweet
- 23457 ID Efek Moderasi Kepuasan Konsumen Pada Pengaruh Harga Dan Kualitas Terhadap KeputuUploaded byBudi Pro
- MeasuresOfFitForLogisticRegression SlidesUploaded byVarun Nakra
- Multiple Linear RegressionUploaded bytsar mitchel
- wp1754 (1)Uploaded byharshnvicky123
- Practice QuestionsUploaded bymetricsverdhy
- The Dominant Role of Saudi Arabia in the Oil Market From 1997-2010Uploaded byElder Futhark
- 20180809104733D4998_Chapter_13 Correlation and Linear Regression.pptxUploaded bycatherine wijaya
- AP Statistics Problems #15Uploaded byldlewis

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.