You are on page 1of 23

Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling

Introduction

Some of the sections within this module have online quizzes for you
Module 5: Introduction to Multilevel to test your understanding. To find the quizzes:
Modelling EXAMPLE
From within the LEMMA learning environment
• Go down to the section for Module 5: Introduction to Multilevel Modelling
Stata Practical • Click " 5.1 Comparing Groups Using Multilevel Modelling"
to open Lesson 5.1
• Click Q1 to open the first question
George Leckie 1
Centre for Multilevel Modelling

Pre-requisites Introduction to the Scottish Youth Cohort Trends


• Modules 1-4
Dataset
You will be analysing data from the Scottish School Leavers Survey (SSLS), a
nationally representative survey of young people. We use data from seven cohorts
of young people collected in the first sweep of the study, carried out at the end of
Contents the final year of compulsory schooling (aged 16-17) when most sample members
had taken Standard grades.2
Introduction to the Scottish Youth Cohort Trends Dataset .................................... 2
In the practical for Module 3 on multiple regression, we considered the predictors
P5.1 Comparing Groups using Multilevel Modelling ........................................... 4 of attainment in Standard grades (subject-based examinations, typically taken in
P5.1.1 A multilevel model of attainment with school effects ............................. 5 up to eight subjects). In this practical, we extend the (previously single-level)
P5.1.2 Examining school effects (residuals) .................................................. 9 multiple regression analysis to allow for dependency of exam scores within schools
P5.2 Adding Student-level Explanatory Variables: Random Intercept Models ......... 12 and to examine the extent of between-school variation in attainment. We also
consider the effects on attainment of several school-level predictors.
P5.3 Allowing for Different Slopes across Schools: Random Slope Models ............. 17
P5.3.1 Testing for random slopes ............................................................. 19 The dependent variable is a total attainment score. Each subject is graded on a
P5.3.2 Interpretation of random cohort effects across schools .......................... 19 scale from 1 (highest) to 7 (lowest) and, after recoding so that a high numeric
P5.3.3 Examining intercept and slope residuals for schools .............................. 19 value denotes a high grade, the total is taken across subjects. The analysis
P5.3.4 Between-school variance as a function of cohort .................................. 22 dataset contains the student-level variables considered in Module 3 together with a
P5.3.5 Adding a random coefficient for gender (dichotomous x) ........................ 24
school identifier and three school-level variables:
P5.3.6 Adding a random coefficient for social class (categorical x) ..................... 26
P5.4 Adding Level 2 Explanatory Variables .................................................. 33 Variable name Description and codes
P5.4.1 Contextual effects ...................................................................... 36 caseid Anonymised student identifier
P5.4.2 Cross-level interactions ................................................................ 39 schoolid Anonymised school identifier
P5.5 Complex Level 1 Variation ................................................................ 42 score Point score calculated from awards in Standard grades taken at age 16.
P5.5.1 Within-school variance as a function of cohort (continuous x) .................. 42 Scores range from 0 to 75, with a higher score indicating a higher
P5.5.2 Within-school variance as a function of gender (dichotomous x) ................ 42 attainment
P5.5.3 Within-school variance as a function of cohort and gender ...................... 44
P5.6 References ................................................................................... 45 2
We are grateful to Linda Croxford (Centre for Educational Sociology, University of Edinburgh) for
providing us with these data. The dataset was constructed as part of an ESRC-funded project on
Education and Youth Transitions in England, Wales and Scotland 1984-2002.
1
This Stata practical is adapted from the corresponding MLwiN practical: Steele, F. (2008) Module Further analyses of the data can be found in Croxford, L. and Raffe, D. (2006) “Education Markets
5: Introduction to Multilevel Modelling. LEMMA VLE, Centre for Multilevel Modelling. Accessed at and Social Class Inequality: A Comparison of Trends in England, Scotland and Wales”. In R. Teese
http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13. (Ed.) Inequality Revisited. Berlin: Springer.

Centre for Multilevel Modelling, 2010 1 Centre for Multilevel Modelling, 2010 2
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
Introduction P5.1 Comparing Groups using Multilevel Modelling

cohort90 The sample includes the following cohorts: 1984, 1986, 1988, 1990, P5.1 Comparing Groups using Multilevel Modelling
1996 and 1998. The cohort90 variable is calculated by subtracting
1990 from each value. Thus values range from -6 (corresponding to
1984) to 8 (1998), with 1990 coded as zero Load “5.1.dta” into memory and open the do-file for this lesson:
female Sex of student (1 = female, 0 = male)
From within the LEMMA Learning Environment
sclass Social class, defined as the higher class of mother or father  Go to Module 5: Introduction to Multilevel Modelling, and scroll down to
(1 = managerial and professional, 2 = intermediate, 3 = working, 4 =
unclassified) Stata Datasets and Do-files
schtype School type, distinguishing independent schools from state-funded  Click “ 5.1.dta” to open the dataset
schools (1 = independent, 0 = state-funded)
schurban Urban-rural classification of school (1 = urban, 0 = town or rural)
schdenom School denomination (1 = Roman Catholic, 0 = non-denominational) and use the describe command to produce a summary of the dataset:

. describe
There are 33,988 students in 508 schools.
Contains data from 5.1.dta
obs: 33,988
vars: 9 3 Sep 2009 09:31
size: 713,748 (99.9% of memory free)
--------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------
caseid float %9.0g Case ID
schoolid int %9.0g School ID
score byte %9.0g Score
cohort90 byte %9.0g Cohort
female byte %9.0g Female
sclass byte %9.0g Social class
schtype byte %9.0g School type
schurban byte %9.0g School urban-rural classification
schdenom byte %9.0g School denomination
--------------------------------------------------------------------------------
Sorted by:

Centre for Multilevel Modelling, 2010 3 Centre for Multilevel Modelling, 2010 4
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.1 Comparing Groups using Multilevel Modelling P5.1 Comparing Groups using Multilevel Modelling

P5.1.1 A multilevel model of attainment with school effects Issuing the xtmixed command gives the following output:
. xtmixed score || schoolid:, mle variance nostderr
We will start with the simplest multilevel model which allows for school effects on
attainment, but without explanatory variables. This ‘null’ model may be written Performing EM optimization:

Performing gradient-based optimization:


scoreij = β 0 + u0 j + eij
Iteration 0: log likelihood = -143269.53
Iteration 1: log likelihood = -143269.53
where scoreij is the attainment of student i in school j , β 0 is the overall mean
Mixed-effects ML regression Number of obs = 33988
across schools, u0 j is the effect of school j on attainment, and eij is a student- Group variable: schoolid Number of groups = 508

level residual. The school effects u0 j , which we will also refer to as school (or Obs per group: min = 1
avg = 66.9
level 2) residuals, are assumed to follow a normal distribution with mean zero and max = 190
variance σ u20 .
Wald chi2(0) = .
Stata’s main command for fitting multilevel models for continuous response Log likelihood = -143269.53 Prob > chi2 = .
variables is the xtmixed command.3 To fit the above model using the xtmixed ------------------------------------------------------------------------------
command, we type: xtmixed score || schoolid:, mle variance score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nostderr. _cons | 30.6006 .3694317 82.83 0.000 29.87652 31.32467
------------------------------------------------------------------------------
The response variable (score) follows the command which is then followed by the
------------------------------------------------------------------------------
list of fixed part explanatory variables (excluding the constant as this is included Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
by default4). The above model contains only an intercept and so no fixed part -----------------------------+------------------------------------------------
schoolid: Identity |
explanatory variables are specified. The level 2 random part of the model is var(_cons) | 61.02457 . . .
specified after two vertical bars ||. The level 2 identifier (schoolid) is specified -----------------------------+------------------------------------------------
first followed by a colon and then the list of random part explanatory variables var(Residual) | 258.3572 . . .
------------------------------------------------------------------------------
(again excluding the constant as this is included by default). The mle option is LR test vs. linear regression: chibar2(01) = 3749.78 Prob >= chibar2 = 0.0000
used to request maximum likelihood estimation (as opposed to the default of
restricted maximum likelihood estimation). The variance option reports the Before interpreting the model, we will discuss the estimation procedure that
variances of the random intercept and any random coefficients included in the xtmixed uses.5 The default estimation option is to fit the model using the EM
model (as opposed to the default of standard deviations). The nostderr option is (expectation maximisation) algorithm until convergence (or 20 iterations have
specified to avoid calculating standard errors for the random part parameters. been reached). At that point, maximization switches to a gradient-based method,
This speeds up the time it takes to fit each xtmixed model and we can still use unless the emonly option is specified, in which case maximization stops.6 In the
likelihood ratio tests to compare nested models with different random part analysis which follows we will mainly use this default estimation option.
specifications.
While the default estimation options are normally the preferred approach,
complicated models can be very slow to iterate. The advantage of specifying
emonly is that EM iterations are typically much faster than those for gradient-
based methods. However, the disadvantage is that it can take a large number of
EM iterations to converge (if at all).

3
Note, two-level random intercept models can equally be fitted with the xtreg command (with
the mle option); see help xtreg. We do not discuss the xtreg command as it cannot be used to
fit more complicated multilevel models while xtmixed can. However, we do note that xtreg
(with the mle option) fits models considerably faster than xtmixed and is therefore recommended
5
for fitting two-level random intercept models. See Rabe-Hesketh and Skrondal (2008) for examples For further details see help xtmixed.
6
of two-level random intercept models fitted with both commands. By default, the gradient-based method is Newton–Raphson iterations, but other methods are
4
Note, the noconstant option can be used to omit the constant; see help xtmixed. available by specifying the appropriate maximize options; see help xtmixed.

Centre for Multilevel Modelling, 2010 5 Centre for Multilevel Modelling, 2010 6
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.1 Comparing Groups using Multilevel Modelling P5.1 Comparing Groups using Multilevel Modelling

The overall mean attainment (across schools) is estimated as 30.60. The mean for
school j is estimated as 30.60 + uˆ0 j , where uˆ0 j is the school residual which we Testing for school effects
will estimate in a moment. A school with uˆ0 j >0 has a mean that is higher than
To test the significance of school effects, we can carry out a likelihood ratio test
average, while uˆ0 j <0 for a below-average school. (We will obtain confidence comparing the null multilevel model with a null single-level model. To fit the null
intervals for residuals to determine whether differences from the overall mean can single-level model, we need to remove the random school effect:
be considered ‘real’ or due to chance.)
scoreij = β 0 + eij
Before we continue, we store the results using the estimates store command:
. xtmixed score, mle variance nostderr
. estimates store nullmodel
Mixed-effects ML regression Number of obs = 33988
We can then explore other model specifications with the option of restoring these Wald chi2(0) = .
estimates later (by using the estimates restore command) without having to Log likelihood = -145144.42 Prob > chi2 = .
refit this model. This will be particularly helpful when we fit more complex ------------------------------------------------------------------------------
models that are slower to converge. We can even store each model we fit under a score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
different name so that we can restore any previously fitted model at a later point. -------------+----------------------------------------------------------------
_cons | 31.09462 .0939156 331.09 0.000 30.91055 31.27869
------------------------------------------------------------------------------

Partitioning variance ------------------------------------------------------------------------------


Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
The between-school (level 2) variance var(_cons) in attainment is estimated as var(Residual) | 299.7787 . . .
------------------------------------------------------------------------------
σˆu20 = 61.02, and the within-school between-student (level 1) variance
var(Residual) is estimated as σˆe2 = 258.36. Thus the total variance is The likelihood ratio test statistic is calculated as two times the difference in the
61.02 + 258.36 = 319.38. log likelihood values for the two models:

The variance partition coefficient (VPC) is 61.02/319.38 = 0.19, which indicates LR = 2(-143269.53 - -145144.42) = 3750 on 1 d.f. (because there is only one
that 19% of the variance in attainment can be attributed to differences between parameter difference between the models, σ u20 ).
schools. Note, however, that we have not accounted for intake ability (measured
by exams taken on entry to secondary school) so the school effects are not value- Bearing in mind that the 5% point of a chi-squared distribution on 1 d.f. is 3.84,
added. Previous studies have found that between-school variance in progress, i.e. there is overwhelming evidence of school effects on attainment. We will therefore
after accounting for intake attainment, is close to 10%. revert to the multilevel model with school effects.7

Note, the xtmixed command automatically compares the specified model with
the equivalent single-level model. The likelihood ratio test statistic for this
comparison can be seen in the last line of the xtmixed output of the first model
we fitted: chibar2(01) = 3749.78. Note that there is not a corresponding
likelihood ratio test statistic for the second model we fitted as this model is a
single-level model.

7
Note that this test statistic has a non-standard sampling distribution as the null hypothesis of a
zero variance is on the boundary of the parameter space; we do not envisage a negative variance.
In this case the correct p-value is half the one obtained from the tables of chi-squared distribution
with 1 degree of freedom. In the output of the xtmixed command, Stata automatically reports
the correct p-value for this test. See help j_xtmixedlr for further details.

Centre for Multilevel Modelling, 2010 7 Centre for Multilevel Modelling, 2010 8
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.1 Comparing Groups using Multilevel Modelling P5.1 Comparing Groups using Multilevel Modelling

Now restore the estimates of the earlier model using the estimates restore To see the school residual, standard error and ranking for a particular school, we
command. We can now continue our analysis of that model: can list the data. Here we do this for the first 10 schools in the data by making
use of the <= (less than or equal) operator.9
. estimates restore nullmodel
(results nullmodel are active now) . sort schoolid

. list schoolid u0 u0se u0rank if pickone==1 & schoolid<=10


P5.1.2 Examining school effects (residuals) +------------------------------------------+
| schoolid u0 u0se u0rank |
|------------------------------------------|
To estimate the school-level residuals uˆ0 j and their associated standard errors, we 41. | 1 -11.84128 2.389899 37 |
58. | 2 3.206334 1.302732 337 |
use the predict command first with the reffects option and second with the 201. | 3 3.396004 1.497341 344 |
reses option (the reses option is available as of Stata 11):8 309. | 4 -7.415012 2.07105 73 |
413. | 5 3.426228 1.630054 345 |
. predict u0, reffects |------------------------------------------|
544. | 6 12.43373 1.403097 487 |
. predict u0se, reses 660. | 7 -1.651931 1.459818 199 |
727. | 8 20.97878 2.021325 508 |
753. | 9 -8.694923 6.437819 59 |
The school-level residuals and their standard errors have been calculated and 772. | 10 1.737383 1.904442 291 |
stored for every record in the dataset. However, summary statistics and graphs for +------------------------------------------+

school-level variables must be based on a dataset with one record per school. We
therefore create a dummy variable pickone to pick one observation per school From these values we can see, for example, that school 1 had an estimated
(see P3.1.2 in Module 3 where we explain this approach in detail): residual of -11.84 which was ranked 37, i.e. 37 places from the bottom. For this
school, we estimate a mean score of 30.60 – 11.84 = 18.76. In contrast, the mean
. egen pickone = tag(schoolid) for school 8 (ranked 508, the highest) is estimated as 30.60 + 20.98 = 51.58.

Next we sort the school effects in ascending order based on the values of u0: Finally, we use the serrbar command to produce a ‘caterpillar plot’ to show the
school effects in rank order together with 95% confidence intervals. The order of
. sort u0 the three variables that follow the command is important. The first variable must
contain the point estimates, the second the associated standard errors and the
Then we rank the school effects. To do this, we use the generate command with third the rank of the point estimates. We use the scale(1.96) option to obtain
the sum() function to create a new variable u0rank equal to the running (i.e. 95% confidence limits and the yline(0) option to plot a horizontal line at zero
cumulative) sum of pickone. Thus the nth observation on u0rank contains the sum which represents the average school in the data:
of the first n observations on pickone.
. serrbar u0 u0se u0rank if pickone==1, scale(1.96) yline(0)
. generate u0rank = sum(pickone)

9
If the schools in the dataset were not numbered consecutively then the above command would
only list those schools where schoolid took a value of 10 or less. It is therefore often useful to
recode identifier variables such as schoolid so that they do take consecutive values. The relevant
8 command is egen newschoolid = group(schoolid). The group function creates a new
The estimated residuals uˆ0 j are called shrunken residuals or sometimes empirical Bayes estimates variable, which we call here newschoolid, taking on values 1, 2, ... for the groups formed by
or posterior estimates. schoolid. The order of the groups is that of the sort order of schoolid.

Centre for Multilevel Modelling, 2010 9 Centre for Multilevel Modelling, 2010 10
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.1 Comparing Groups using Multilevel Modelling P5.2 Adding Student-level Explanatory Variables: Random Intercept Models

P5.2 Adding Student-level Explanatory Variables:


Random Intercept Models
Load “5.2.dta” into memory and open the do-file for this lesson:

From within the LEMMA Learning Environment


 Go to Module 5: Introduction to Multilevel Modelling, and scroll down to

Stata Datasets and Do-files


 Click “ 5.2.dta” to open the dataset

We begin by allowing for a linear cohort effect:

scoreij = β 0 + β1cohort90ij + u0 j + eij

. xtmixed score cohort90 || schoolid:, mle variance nostderr

Performing EM optimization:

Notice that the confidence intervals around the residual estimates vary greatly in Performing gradient-based optimization:
their width; smaller schools will have wider intervals than larger schools.
Iteration 0: log likelihood = -140456.79
Iteration 1: log likelihood = -140456.79
Note that because we have not accounted for intake ability, we cannot interpret
Mixed-effects ML regression Number of obs = 33988
these residuals as “school effects” in the value-added sense that it is used in Group variable: schoolid Number of groups = 508
school effectiveness research. Unfortunately, no measure of prior attainment is
available from the Scottish School Leavers Survey. Nevertheless, exam Obs per group: min = 1
avg = 66.9
performance at age 16 is an important educational outcome because it is a strong max = 190
predictor of post-16 educational attainment and entry to university depends on
attainment rather than progress. In these exercises, we will study trends in mean
Wald chi2(1) = 6120.93
attainment and variation in attainment between individuals and between schools. Log likelihood = -140456.79 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
Don’t forget to take the online quiz! score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cohort90 | 1.214954 .0155293 78.24 0.000 1.184518 1.245391
_cons | 30.55915 .3225441 94.74 0.000 29.92698 31.19133
From within the LEMMA learning environment ------------------------------------------------------------------------------
• Go down to the section for Module 5: Introduction to Multilevel Modelling ------------------------------------------------------------------------------
• Click " 5.1 Comparing Groups Using Multilevel Modelling" Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
to open Lesson 5.1 -----------------------------+------------------------------------------------
schoolid: Identity |
• Click Q1 to open the first question var(_cons) | 45.98856 . . .
-----------------------------+------------------------------------------------
var(Residual) | 219.2879 . . .
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 3158.04 Prob >= chibar2 = 0.0000

Centre for Multilevel Modelling, 2010 11 Centre for Multilevel Modelling, 2010 12
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.2 Adding Student-level Explanatory Variables: Random Intercept Models P5.2 Adding Student-level Explanatory Variables: Random Intercept Models

The equation of the average fitted regression line (across schools) is

ˆ ij = 30.559 + 1.215 cohort90ij


score

The fitted line for a given school will differ from this average line in its intercept,
by an amount uˆ0 j for school j . However, the slope of the school lines is assumed
to be fixed at 1.215, i.e. the effect of cohort is assumed the same for all schools.
A plot of the predicted school lines will show a set of parallel lines. To produce
ˆ
this plot, we first need to compute score for each student, based on their cohort
and school. We do this using the predict command with the fitted option to
create a new variable (predscore) which is equal to the average fitted regression
line plus the relevant school’s intercept:

. predict predscore, fitted

Next we create a variable to pick out the minimum amount of data required to
plot the predicted school lines (see P3.1.2 in Module 3 where we explain this
approach in detail):.

. egen pickone = tag(schoolid cohort90)


Careful examination of the top left hand corner of the graph shows that these
We will use the twoway command with the connected plottype instead of the commands have not been totally successful. A few scatter points belonging to
different schools are connected vertically. This has occurred as a small number of
line plottype in order to display markers (i.e. the data points) in addition to the
schools are observed for only one cohort and so cohort90 does not jump to a lower
school lines. First, however, we must arrange observations in ascending order
value as we move from one of these schools to the next in the dataset.
based on the values of schoolid, and within each value of schoolid arrange
observations in ascending order based on the values of cohort90. This is required
To circumvent this problem, we will reproduce the graph for the subset of schools
as we use the connect(ascending) option to connect points as long as
which are observed for two or more cohorts. To do this, we first generate a new
cohort90 is increasing. Whenever cohort90 jumps to a lower value, the two
variable multiplecohorts and initially set its values equal to those of the pickone
scatter points are not connected. Sorting the data in this way ensures that only
variable.
scatter points for the same school are connected.
. generate multiplecohorts = pickone
. sort schoolid cohort90

. twoway connected predscore cohort90 if pickone==1, connect(ascending) We will then replace multiplecohorts with the value 0 for those schools observed
for only one cohort. This can be achieved by sorting the observations within each
school by cohort90 and then looking to see whether the value of the last
observation of cohort90 in each school is the same as the first observation. If it
is, the school is observed for only one cohort and we set multiplecohorts equal to
0. The relevant command is:

. bysort schoolid (cohort90): replace multiplecohorts = 0 ///


> if cohort90[_N]==cohort90[1]
(32 real changes made)

where we have used the /// line join indicator to inform Stata that the two lines
of code form one command. If we did not do this, Stata would incorrectly
interpret the second line as a new command.

In this command, we have used the bysort prefix which repeats the command
after the colon for each group of observations for which the values of the variables

Centre for Multilevel Modelling, 2010 13 Centre for Multilevel Modelling, 2010 14
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.2 Adding Student-level Explanatory Variables: Random Intercept Models P5.2 Adding Student-level Explanatory Variables: Random Intercept Models

listed between the bysort prefix and the colon are the same. The use of Returning to the results and comparing with the results for the null model of P5.1,
parentheses in this variable list verifies that the data are sorted first by schoolid we can see that the addition of cohort has reduced the amount of variance at both
and then by cohort90 and then repeats the command after the colon once for the school and the student level. The between-school variance has reduced from
each value of schoolid only. Had we omitted the parentheses, Stata would have 61.02 to 45.99, and the within-school variance has reduced from 258.36 to 219.29.
instead repeated the command after the colon once for each observed The decrease in the within-school variance is expected because cohort is a
combination of schoolid and cohort90. student-level variable. The large reduction in the between-school variance
suggests that the distribution of students by cohort differs from school to school
The command after the colon replaces the dummy variable multiplecohorts with (see C5.2.3). In Module 3 (C3.1.1) we found that, pooling across all schools, the
the value 0 when the if logical expression is true, but otherwise leaves the proportions in each cohort were:
dummy variable unchanged. Within the expression, we use explicit subscripting
[_N] and [1] to refer to the last and first values of cohort90 within each school. Table 5.1. Proportion of students in each cohort

The single line of output after the command states that 32 changes have been Year 1984 1986 1988 1990 1996 1998
made to multiplecohorts. This informs us that 32 schools in the data are observed % students 19.1 18.6 15.4 12.9 12.5 21.6
for only one cohort.
One source of the variation in these proportions across schools can be seen from
Now we can simply repeat the previous twoway command but this time we the plot of the predicted lines above. If you look at the top line (corresponding to
condition upon multiplecohorts taking the value 1. the school with the highest intercept), you can see that there are only three
predicted values, for cohort90 = -4, -2 and 0 (1986, 1988 and 1990). This is
. twoway connected predscore cohort90 if multiplecohorts==1, connect(ascending)
because, in this school, no data were collected for 1984, 1996 and 1998. Clearly,
for this school, the proportions for the missing years will be zero. Similarly, for
the school with the second lowest intercept, there are no data points for the last
two cohorts (cohort90 = 6 and 8).

After accounting for cohort effects, the proportion of unexplained variance that is
due to differences between schools decreases slightly to 45.99/(45.99 + 219.29) =
17%.

Don’t forget to take the online quiz!

From within the LEMMA learning environment


• Go down to the section for Module 5: Introduction to Multilevel Modelling
• Click " 5.2 Multilevel Regression with a Level 1 Explanatory Variable:
Random Intercept Models"
to open Lesson 5.2
• Click Q1 to open the first question

The graph is now plotted correctly.

Centre for Multilevel Modelling, 2010 15 Centre for Multilevel Modelling, 2010 16
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

P5.3 Allowing for Different Slopes across Schools: Fit the model:

Random Slope Models . xtmixed score cohort90 ///


> || schoolid: cohort90, covariance(unstructured) ///
> mle variance nostderr
In the previous exercise, we allowed for school effects on the mean attainment by
allowing the intercept of the regression of attainment on cohort to vary randomly Performing EM optimization:

across schools. We assumed, however, that cohort changes in attainment are the Performing gradient-based optimization:
same for all schools, i.e. the slope of the regression line was assumed fixed across
Iteration 0: log likelihood = -140343.09
schools. We will now extend the random intercept model fitted at the end of P5.2 Iteration 1: log likelihood = -140343.09
to allow both the intercept and the slope to vary randomly across schools.
Mixed-effects ML regression Number of obs = 33988
Group variable: schoolid Number of groups = 508
Load “5.3.dta” into memory and open the do-file for this lesson:
Obs per group: min = 1
From within the LEMMA Learning Environment avg = 66.9
max = 190
 Go to Module 5: Introduction to Multilevel Modelling, and scroll down to

Stata Datasets and Do-files Wald chi2(1) = 2376.07


Log likelihood = -140343.09 Prob > chi2 = 0.0000
 Click “ 5.3.dta” to open the dataset
------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Fit the model: cohort90 | 1.233902 .0253135 48.74 0.000 1.184289 1.283516
_cons | 30.60963 .313448 97.65 0.000 29.99529 31.22398
------------------------------------------------------------------------------
scoreij = β 0 + β1cohort90ij + u0 j + u1j cohort90ij + eij
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
Note that a new term u1j has been added to the model, so that the coefficient of -----------------------------+------------------------------------------------
schoolid: Unstructured |
cohort90 has become β1j = β1 + u1j , and so the community-level variance has been var(cohort90) | .1605836 . . .
var(_cons) | 42.85854 . . .
replaced by a matrix with two new parameters, σ u21 and σ u 01 . cov(cohort90,_cons) | -1.024181 . . .
-----------------------------+------------------------------------------------
var(Residual) | 215.7394 . . .
------------------------------------------------------------------------------
 u0 j  0 σ2  LR test vs. linear regression: chi2(3) = 3385.44 Prob > chi2 = 0.0000
~ MVN ( 0, Ω u ) , 0 =   , Ωu =  u 0
  0 σ σ 2 
 u1j     u 01 u1  Note: LR test is conservative and provided only for reference.

In the output, the estimate of the intercept variance σ u20 is given to the right of
Note that the slope residual, and associated variance and covariance, have a
subscript of ‘1’ because cohort90 is the 1st explanatory variable in the model (not var(_cons) while the estimate of the slope variance σ u21 is given to the right of
including the constant). var(cohort90). Note, however, that the model output reports the random
slope variance before the random intercept variance. This is because although
Stata includes a constant term by default, it includes it as the last variable in the
list of explanatory variables rather than the first. We can see that this is the case
in both the fixed and random parts of the model. The estimate of the covariance
σ u 01 is given to the right of cov(cohort90,_cons) and is reported after the
variance parameters. Note, we had to specify the covariance(unstructured)
option to allow the random intercepts and slopes to covary (as opposed to the
default that they are independent).

Centre for Multilevel Modelling, 2010 17 Centre for Multilevel Modelling, 2010 18
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

The last line of output states that the likelihood ratio test, which compares the The intercept-slope correlation is estimated as:
current model to a single-level model, is conservative and provided only for
reference.10 This means that the reported p-value is not the correct p-value; σˆu 01 −1.024
rather it is an upper bound for the correct p-value. The test is described as ρˆu 01 = = = −0.390
2 2
σˆ σˆ 42.859 × 0.161
conservative as when the correct p-value is 0.05 (i.e. the multilevel model is just u0 u1

preferred to the single-level model), the reported p-value will be slightly higher
leading us to incorrectly favour the simpler model. However, as long as the We can obtain the estimated correlation directly by using the estat recov
reported p-value is less than 0.05 then the same will be the case for the correct p- command with the corr option to display the school level random effects
value and so it is safe to infer that the multilevel model is preferred to the single- correlation matrix:11
level model.
. estat recov, corr

Random-effects correlation matrix for level schoolid


P5.3.1 Testing for random slopes
| cohort90 _cons
-------------+----------------------
We can use a likelihood ratio test to test whether the cohort effect varies across cohort90 | 1
schools. The null hypothesis for this test is that the two additional parameters _cons | -.3903978 1

σ u 01 and σ u21 are simultaneously equal to zero. The log-likelihood value for the
To estimate the school intercepts and slopes we use the predict command with
random intercept model was found to be -140457 (P5.2), so the likelihood ratio the reffects options. We specify two new variables the first for the random
test statistic is slopes, the second for the random intercepts. This ordering reflects the order in
which the random effects were specified in the xtmixed command:
LR = 2 (-140343 - -140457) = 228 on 2 d.f.
. predict u1 u0, reffects
So there is very strong evidence that the cohort effect differs across schools.

P5.3.2 Interpretation of random cohort effects across schools

The cohort effect for school j is estimated as 1.234 + uˆ1j , and the between-school
variance in these slopes is estimated as 0.161. For the ‘average’ school we predict
an increase of 1.234 points in the attainment score for each successive cohort. A
95% coverage interval for the school slopes is estimated as 1.234 ± 1.96 0.161 =
0.448 to 2.020. Thus, assuming a normal distribution, we would expect the middle
95% of schools to have a slope between 0.448 and 2.020.

The intercept variance of 42.835 is interpreted as the between-school variance


when cohort90 = 0, i.e. for the 1990 cohort.

P5.3.3 Examining intercept and slope residuals for schools

The negative covariance estimate of -1.024 means that schools with a high
intercept (above-average attainment in 1990) tend to have a flatter-than-average
slope. Similarly, schools with a low slope (below-average attainment in 1990) tend
to have seen a more marked increase in attainment with cohort (above-average
slope).

11
Note omitting the corr option would result in the school-level random effects covariance matrix
10
See help j_xtmixedlr for further details. being displayed.

Centre for Multilevel Modelling, 2010 19 Centre for Multilevel Modelling, 2010 20
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

To obtain a plot of the school slopes versus the school intercepts, uˆ1j vs. uˆ0 j : ˆ
To produce a plot of the predicted school lines, we first need to compute score
for each student, based on their cohort and school:
. egen pickone = tag(schoolid)
. predict predscore, fitted
. scatter u1 u0 if pickone==1, yline(0) xline(0) ///
> ytitle("Slope of cohort90 (u1j)") xtitle("Intercept (u0j)")
As in P5.2, we plot the fitted regression lines for the subset of schools for which
we have multiple cohorts of data:

. egen multiplecohorts = tag(schoolid cohort90)

. bysort schoolid (cohort90): replace multiplecohorts = 0 ///


> if cohort90[_N]==cohort90[1]
(32 real changes made)

. twoway connected predscore cohort90 if multiplecohorts==1, connect(ascending)

where we have used the ytitle() and xtitle() options to add axes titles to
the graph. The use of double quotes is not necessary, but we find that it makes
the syntax easier to read.

From this plot, it is possible to identify, for example, those schools which had a
lower-than-average attainment in 1990 but a better-than-average year-on-year
improvement. Schools in the top-left quadrant are such schools while schools in
the bottom-left quadrant also had a below-average mean attainment in 1990, but
the below-average slopes for these schools means that they continued at this low
level. P5.3.4 Between-school variance as a function of cohort

The equation for the fitted regression line for school j is The random slope model we have fitted implies that the between-school variance
in attainment is a function of cohort; that is, the amount of between-school
ˆ ij = (30.610 + uˆ0 j ) + (1.234 + uˆ1j ) cohort90ij
score variance differs across cohorts.

In C5.3.5 (Equation 5.9), we saw that for a model with a random slope for an
where the values of uˆ0 j and uˆ1j are shown in the pairwise residual plot shown explanatory variable xij , the level 2 variance is:
above.
var(u0 j + u1j xij ) = var(u0 j ) + 2xij cov(u0 j , u1j ) + xij2 var(u1j )
= σ u20 + 2σ u 01xij + σ u21xij2

Centre for Multilevel Modelling, 2010 21 Centre for Multilevel Modelling, 2010 22
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

Substituting cohort90 for x , and the estimates for σ u20 , σ u 01 and σ u21 , we obtain: P5.3.5 Adding a random coefficient for gender (dichotomous x)

Between-school variance = 42.859 – 2.048 cohort90 +0.161 cohort902 In Module 3, we found that the mean attainment was higher for girls than for boys.
We will now consider whether this gender difference is the same across schools by
Applying this equation to selected cohorts we obtain the following estimates of introducing a random coefficient for gender.12
level 2 variance.
We will start be adding a fixed effect for gender. This will be our comparison
Table 5.2. Estimates of the between-school variance model for testing for a random coefficient.

cohort90 Year Between-school variance scoreij = β 0 + β1cohort90ij + β 2 femaleij + u0 j + u1j cohort90ij + eij
-6 1984 42.859 – (2.048 × -6) + [0.161 × (-6)2] = 60.943
. xtmixed score cohort90 female ///
0 1990 42.859 > || schoolid: cohort90, covariance(unstructured) ///
6 1996 42.859 – (2.048 × 6) + (0.161 × 62] = 36.367 > mle variance nostderr

Performing EM optimization:
We would therefore conclude that the mean attainment increased with cohort,
Performing gradient-based optimization:
and the variation in mean attainment among schools has decreased. 7
Iteration 0: log likelihood = -140272.07
Iteration 1: log likelihood = -140272.07
We can produce a plot of the between-school variance with the twoway command
and the function plottype. The equation for the between-school variance is Mixed-effects ML regression Number of obs = 33988
Group variable: schoolid Number of groups = 508
typed as part of the command where we can think of x as corresponding to the
cohort90 variable. However, the command does not make use of any data, it Obs per group: min = 1
simply plots the line associated with the typed equation. The range(-6 8) avg = 66.9
max = 190
option specifies that the function should only be plotted for when x ranges
between -6 and 8. This restricts the plot of the between school variance to the
Wald chi2(2) = 2517.61
cohorts in the data (1984 to 1998). Log likelihood = -140272.07 Prob > chi2 = 0.0000

. twoway function 42.859 + -2.048*x + 0.161*x^2, range(-6 8) ------------------------------------------------------------------------------


score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cohort90 | 1.227326 .0253264 48.46 0.000 1.177687 1.276965
female | 1.944526 .1629805 11.93 0.000 1.62509 2.263962
_cons | 29.58487 .3240554 91.30 0.000 28.94974 30.22001
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured |
var(cohort90) | .1612602 . . .
var(_cons) | 42.57498 . . .
cov(cohort90,_cons) | -1.030571 . . .
-----------------------------+------------------------------------------------
var(Residual) | 214.8374 . . .
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 3403.21 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

12
As noted in Module 3, we use the more general term ‘coefficient’ rather than ‘slope’ for
categorical explanatory variables. The term ‘slope’ is reserved for straight line relationships
between y and a continuous x .

Centre for Multilevel Modelling, 2010 23 Centre for Multilevel Modelling, 2010 24
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

To add a random coefficient for gender: this model with the previous model with a fixed gender effect, is testing the null
hypothesis that all three of these parameters are equal to zero.
scoreij = β 0 + β1cohort90ij + β2 femaleij + u0 j + u1j cohort90ij + u2 j femaleij + eij
The likelihood ratio test statistic is
. xtmixed score cohort90 female ///
> || schoolid: cohort90 female, covariance(unstructured) /// LR = 2 (-140269.45 - -140272.07) = 5.24 on 3 d.f.
> mle variance nostderr

Performing EM optimization: This is not significant at the 5% level (the 5% point of a chi-squared distribution on
3 d.f. is 7.82), so we cannot reject the null hypothesis and we conclude that the
Performing gradient-based optimization:
gender effect is the same for each school. We therefore revert to a model with a
Iteration 0: log likelihood = -140272 fixed coefficient for female.
Iteration 1: log likelihood = -140269.46
Iteration 2: log likelihood = -140269.45
Iteration 3: log likelihood = -140269.45
P5.3.6 Adding a random coefficient for social class (categorical x)
Mixed-effects ML regression Number of obs = 33988
Group variable: schoolid Number of groups = 508
In Module 3, we found strong social class effects on attainment. We will now
Obs per group: min = 1 explore whether these class effects can be assumed the same across schools.
avg = 66.9
max = 190
Before adding social class to the model, we create three dummy variables for
Wald chi2(2) = 2524.31
when sclass is 1, 2 and 4 respectively (taking social class 3 as the reference
Log likelihood = -140269.45 Prob > chi2 = 0.0000 category).
------------------------------------------------------------------------------ . generate sclass1 = sclass==1
score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------- . generate sclass2 = sclass==2
cohort90 | 1.22777 .0253452 48.44 0.000 1.178094 1.277446
female | 1.93145 .1738955 11.11 0.000 1.590621 2.272278 . generate sclass4 = sclass==4
_cons | 29.58908 .317659 93.15 0.000 28.96648 30.21169
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured |
var(cohort90) | .1617015 . . .
var(female) | 1.37019 . . .
var(_cons) | 40.55858 . . .
cov(cohort90,female) | -.0530665 . . .
cov(cohort90,_cons) | -1.008308 . . .
cov(female,_cons) | 1.535505 . . .
-----------------------------+------------------------------------------------
var(Residual) | 214.5159 . . .
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(6) = 3408.45 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

The effect of gender in school j is estimated as 1.931 + uˆ2 j . Allowing for a


random effect of gender at the school level has led to the addition of three new
random parameters to the model ( σ u 02 , σ u12 , σ u22 ). The estimate of the random
coefficient variance for female σ u22 is reported to the right of var(female). The
covariance between the intercept and female σ u 02 is given to the right of
cov(female,_cons) while the covariance between cohort90 and female σ u12 is
given to the right of cov(cohort90,female). A likelihood ratio test, comparing

Centre for Multilevel Modelling, 2010 25 Centre for Multilevel Modelling, 2010 26
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

We will start by fitting a fixed effect for social class: Next we add random coefficients for the social class dummy variables:

scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
+u0 j + u1j cohort90ij + eij +u0 j + u1j cohort90ij + u3 j sclass1ij + u4 j sclass2ij + u5 j sclass4ij + eij

. xtmixed score cohort90 female sclass1 sclass2 sclass4 /// where:


> || schoolid: cohort90, covariance(unstructured) ///
> mle variance nostderr

Performing EM optimization:  u0 j   σ u20 


   
Performing gradient-based optimization:  u1j  2
 σ u 01 σ u1 
 u  ~ N 0, Ω , Ωu =  σ u 03 σ u13 ,
Iteration 0: log likelihood = -138346.13  3j  ( u) 
σ u23

(
eij ~ N 0,σ e2 )
Iteration 1: log likelihood = -138346.13
 u4 j   σ u 04 σ u14 σ u 34 σ u24 
  σ 
Mixed-effects ML regression Number of obs = 33988 u   u 05 σ u15 σ u 35 σ u 45 σ u25 
Group variable: schoolid Number of groups = 508  5j 
Obs per group: min = 1
avg = 66.9
max = 190 Fit the model:

. xtmixed score cohort90 female sclass1 sclass2 sclass4 ///


Wald chi2(5) = 6918.15 > || schoolid: cohort90 sclass1 sclass2 sclass4, ///
Log likelihood = -138346.13 Prob > chi2 = 0.0000 > covariance(unstructured) ///
> mle variance nostderr emonly
------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval] Performing EM optimization:
-------------+----------------------------------------------------------------
cohort90 | 1.182831 .0243149 48.65 0.000 1.135175 1.230488 Iteration 0: log likelihood = -138913.58
female | 1.961342 .1542812 12.71 0.000 1.658956 2.263727 Iteration 1: log likelihood = -138603.9
sclass1 | 11.08567 .2063932 53.71 0.000 10.68115 11.4902 Iteration 2: log likelihood = -138482.58
sclass2 | 5.875198 .2040505 28.79 0.000 5.475266 6.275129 Iteration 3: log likelihood = -138424.11
sclass4 | -3.737739 .2845318 -13.14 0.000 -4.295412 -3.180067 Iteration 4: log likelihood = -138391.59
_cons | 24.60987 .2796221 88.01 0.000 24.06182 25.15792 Iteration 5: log likelihood = -138371.6
------------------------------------------------------------------------------ Iteration 6: log likelihood = -138358.4
Iteration 7: log likelihood = -138349.17
------------------------------------------------------------------------------ Iteration 8: log likelihood = -138342.45
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] Iteration 9: log likelihood = -138337.37
-----------------------------+------------------------------------------------ Iteration 10: log likelihood = -138333.42
schoolid: Unstructured | Iteration 11: log likelihood = -138330.28
var(cohort90) | .150845 . . . Iteration 12: log likelihood = -138327.74
var(_cons) | 22.51349 . . . Iteration 13: log likelihood = -138325.64
cov(cohort90,_cons) | -.5841601 . . . Iteration 14: log likelihood = -138323.88
-----------------------------+------------------------------------------------ Iteration 15: log likelihood = -138322.39
var(Residual) | 192.9457 . . . Iteration 16: log likelihood = -138321.11
------------------------------------------------------------------------------ Iteration 17: log likelihood = -138320.01
LR test vs. linear regression: chi2(3) = 1797.04 Prob > chi2 = 0.0000 Iteration 18: log likelihood = -138319.05
Iteration 19: log likelihood = -138318.2
Note: LR test is conservative and provided only for reference. Iteration 20: log likelihood = -138317.45

Mixed-effects ML regression Number of obs = 33988


Group variable: schoolid Number of groups = 508

Obs per group: min = 1


avg = 66.9
max = 190

Wald chi2(5) = 5267.22


Log likelihood = -138317.45 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval]

Centre for Multilevel Modelling, 2010 27 Centre for Multilevel Modelling, 2010 28
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

-------------+---------------------------------------------------------------- Table 5.3 presents estimated model parameters from a version of the model where
cohort90 | 1.183949 .0246006 48.13 0.000 1.135732 1.232165
female | 1.96809 .1540539 12.78 0.000 1.66615 2.270031 we did not specify the emonly option (i.e. where a gradient-based method was
sclass1 | 11.2049 .2591391 43.24 0.000 10.69699 11.7128 used and the model was allowed to iterate until convergence).13 There are very
sclass2 | 6.12456 .2414345 25.37 0.000 5.651357 6.597763
sclass4 | -3.1019 .3462576 -8.96 0.000 -3.780552 -2.423247
considerable differences between the two sets of parameter estimates. This
_cons | 24.3627 .2372518 102.69 0.000 23.8977 24.8277 highlights the importance of checking that convergence has been reached. In the
------------------------------------------------------------------------------ analysis below we will therefore interpret the estimates from the converged
------------------------------------------------------------------------------ model.
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured |
var(cohort90) | .1581957 . . .
var(sclass1) | 9.988941 . . .
var(sclass2) | 6.943944 . . .
var(sclass4) | 14.72518 . . .
var(_cons) | 12.22236 . . .
cov(cohort90,sclass1) | -.1699083 . . .
cov(cohort90,sclass2) | -.1886457 . . .
cov(cohort90,sclass4) | -.4713176 . . .
cov(cohort90,_cons) | -.5084394 . . .
cov(sclass1,sclass2) | 5.726013 . . .
cov(sclass1,sclass4) | 5.535831 . . .
cov(sclass1,_cons) | 3.49037 . . .
cov(sclass2,sclass4) | 5.171927 . . .
cov(sclass2,_cons) | 3.230988 . . .
cov(sclass4,_cons) | 6.062856 . . .
-----------------------------+------------------------------------------------
var(Residual) | 190.7541 . . .
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(15) = 1854.39 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.


Note: EM algorithm failed to converge

Before interpreting the output, it is important to notice that we have specified the
emonly option. We have done this as the model takes a very long time to
converge. Specifying the emonly option leads the model to stop prematurely,
after just 20 EM iterations (note that a maximum of 20 iterations is the default
setting). Stopping the model prematurely has some advantages: we can check the
output to see that we have specified the model correctly and we can get a rough
idea as to whether the additional random effects might be important. However, it
is important to realise that this model has not converged (the final line of output
confirms this) and so the estimates should never be used as final estimates.

13
Note that we have reordered both the fixed and random part parameter estimates to agree with
the way we have written down the model as opposed to the order in which the parameters appear
in the model output above. Note also that no standard errors are reported for the random part
parameters. This is because we have continued to specify the nostderr option as omitting this
option prevents the model from converging.

Centre for Multilevel Modelling, 2010 29 Centre for Multilevel Modelling, 2010 30
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.3 Allowing for Different Slopes across Schools: Random Slope Models

Table 5.3. Estimates from the converged model The new model contains a large number of additional random parameters. There
are 12 more parameters in this model than in the fixed social class effects model.
Parameter Estimates Standard errors The likelihood ratio test statistic for a comparison of these models is:
β0 (_cons) 24.401 0.231
LR = 2 (-138306 - -138346) = 80 on 12 d.f.
β1 (cohort90) 1.185 0.024
β2 (female) 1.966 0.154 The 5% point of a chi-squared distribution on 12 d.f. is 21.03, so we conclude that
there is evidence that the effect of social class on attainment differs across
β3 (sclass1) 11.182 0.244 schools.
β4 (sclass2) 6.072 0.220
The coefficients of sclass1, sclass2 and sclass4 have a fixed component,
β5 (sclass4) -3.190 0.314 representing contrasts with the reference category 3 (working class) on average,
and a school-specific component. For example, after accounting for cohort and
11.267 - gender effects, children with a parent in a professional or managerial occupation
σ u20 var(_cons)
(sclass = 1) attending school j are expected to have an attainment score that is
σ u 01 cov(cohort90,_cons) -0.554 - 11.2 + uˆ3 j points higher than working class children in the same school.
σ 2
var(cohort90) 0.156 -
u1

4.813 -
Due to the large number of parameters in the random part of this model, the
σ u 03 cov(sclass1,_cons)
simplest way to interpret the random coefficient for class is to compute the
σ u13 cov(cohort90,sclass1) -0.111 - between-school variance
σ 2
var(sclass1) 7.136 -
u3
(
var u0 j + u1j cohort90ij + u3 j sclass1ij + u4 j sclass2ij + u5 j sclass4ij )
σ u 04 cov(sclass2,_cons) 5.059 -
σ u14 cov(cohort90,sclass2) -0.118 - We will do this for each social class category, holding constant the value of
cohort90 (the other variable with a random coefficient). For convenience, we will
σ u 34 cov(sclass1,sclass2) 4.175 -
fix cohort90 at zero, so the between-school variances will refer to 1990. This
σ 2
var(sclass2) 3.321 - simplifies the expression for the between-school variance to:
u4

σ u 05 cov(sclass4,_cons) 8.077 -
(
var u0 j + u3 j sclass1ij + u4 j sclass2ij + u5 j sclass4ij )
σ u15 cov(cohort90,sclass4) -0.442 -
σ u 35 cov(sclass1,sclass4) 3.141 - We can use the display command as a substitute for calculating by hand. For
example, the variance for category 1 of sclass (sclass1 = 1, sclass2 = 0 and
σ u 45 cov(sclass2,sclass4) 2.957 -
sclass4 = 0) is:
σ u25 var(sclass4) 7.182 -

σ e2 191.768 - ( )
var u0 j + u3 j = σ u20 + 2σ u 03 + σ u23

. display 11.267 + 2*4.813 + 7.136


Log likelihood -138306.57 28.029

Similarly, the variance for category 2 of sclass (sclass1 = 0, sclass2 = 1 and sclass4
= 0) is:

( )
var u0 j + u4 j = σ u20 + 2σ u 04 + σ u24

. display 11.267 + 2*5.059 + 3.321


24.706

Centre for Multilevel Modelling, 2010 31 Centre for Multilevel Modelling, 2010 32
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.3 Allowing for Different Slopes across Schools: Random Slope Models P5.4 Adding Level 2 Explanatory Variables

The variance for category 3 of sclass (sclass1 = 0, sclass2 = 0 and sclass4 = 0) is P5.4 Adding Level 2 Explanatory Variables
simply:
In the last two exercises, you have seen how to add level 1 explanatory variables
( )
var u0 j = σ u20 to the model and interpret the results of random intercept and random slope
models. A key motivation for using multilevel modelling, however, is to assess the
. display 11.267 effects of level 2 explanatory variables on level 1 outcomes and the extent to
11.267
which they can explain the level 2 variance. In education, for example, we may
be interested in the contextual effect of prior attainment on students’ later
While the variance for category 4 of sclass (sclass1 = 0, sclass2 = 0 and sclass4 =
academic performance. A student’s progress may be affected by the performance
1) is:
of others in their peer group, and this effect may differ according to the student’s
own prior attainment (a cross-level interaction).
( )
var u0 j + u5 j = σ u20 + 2σ u 05 + σ u25
Our example dataset contains three school-level variables that are potential
. display 11.267 + 2*8.077 + 7.182 predictors of a student’s attainment at age 16: schtype (independent vs. state
34.603
schools), schurban (urban vs. rural location of school), and schdenom (Roman
Catholic vs. non-denominational school). In this exercise, we will add these
The between-school variance is similar for the first two categories variables to our model and consider whether the effect on attainment of one of
(professional/managerial and intermediate), highest for the unclassified group them depends on a selected student-level variable.
(category 4) and lowest for working class children (category 3). This implies that
the school attended matters most for the unclassified group (in terms of their age As in any analysis, we should look at the distribution of our variables before
16 attainment), and least for working class children. For example, the difference including them in a model.
between unclassified and working class in school j is estimated as -3.190 + uˆ5 j .
The estimated variance of u 5 j is 7.182, so a 95% coverage interval for the Load “5.4.dta” into memory and open the do-file for this lesson:
unclassified-working class difference is −3.190 ± 1.96 7.182 = -8.443 to 2.063. From within the LEMMA Learning Environment
Suppose we rank schools according to their unclassified-working class difference,  Go to Module 5: Introduction to Multilevel Modelling, and scroll down to
such that schools with the largest difference (in favour of working class children)
are ranked lowest. In the bottom 2.5% of schools, unclassified children are Stata Datasets and Do-files
expected to have a mean score that is more than 8.443 points lower than working
 Click “ 5.4.dta” to open the dataset
class children. In the top 2.5% of schools, however, the difference is estimated to
be more than 2.063, in favour of unclassified children.

Once again, it is important to note that these school differences should not be
interpreted as school effects in the usual sense because we have not accounted for
prior attainment. Of most interest is the extent of between-school variance in the
progress made by children from different social backgrounds.

Don’t forget to take the online quiz!

From within the LEMMA learning environment


• Go down to the section for Module 5: Introduction to Multilevel Modelling
• Click "5.3 Allowing for Different Slopes Across Groups: Random Slope Models"
to open Lesson 5.3
• Click Q1 to open the first question

Centre for Multilevel Modelling, 2010 33 Centre for Multilevel Modelling, 2010 34
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.4 Adding level 2 Explanatory Variables P5.4 Adding level 2 Explanatory Variables

Each school-level variable is binary, so we will simply look at the proportion in We will add these variables, one at a time, to a simplified version of the model
each category. This can be done by using the tab1 command to produce one-way fitted at the end of P5.3. Although we found evidence that the effect of social
tables of frequencies. We restrict the scope of the command to one record per class on attainment differs across schools, we will work with a simpler model by
school. removing the random coefficients on the class dummy variables.

. tab1 schtype schurban schdenom if pickone==1 scoreij = β 0 + β1cohort90ij + β 2 femaleij + β 3sclass1ij + β 4 sclass2ij + β5sclass4ij
-> tabulation of schtype if pickone==1 +u0 j + u1j cohort90ij + eij
School type | Freq. Percent Cum.
------------+-----------------------------------
0 | 456 89.76 89.76
. xtmixed score cohort90 female sclass1 sclass2 sclass4 ///
1 | 52 10.24 100.00
> || schoolid: cohort90, covariance(unstructured) ///
------------+-----------------------------------
Total | 508 100.00 > mle variance nostderr

-> tabulation of schurban if pickone==1 Performing EM optimization:

School | Performing gradient-based optimization:


urban-rural |
classificat | Iteration 0: log likelihood = -138346.13
ion | Freq. Percent Cum. Iteration 1: log likelihood = -138346.13
------------+-----------------------------------
0 | 163 32.09 32.09 Mixed-effects ML regression Number of obs = 33988
1 | 345 67.91 100.00 Group variable: schoolid Number of groups = 508
------------+-----------------------------------
Total | 508 100.00 Obs per group: min = 1
avg = 66.9
-> tabulation of schdenom if pickone==1 max = 190

School |
denominatio | Wald chi2(5) = 6918.15
n | Freq. Percent Cum. Log likelihood = -138346.13 Prob > chi2 = 0.0000
------------+-----------------------------------
0 | 425 83.66 83.66 ------------------------------------------------------------------------------
1 | 83 16.34 100.00 score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------+----------------------------------- -------------+----------------------------------------------------------------
Total | 508 100.00 cohort90 | 1.182831 .0243149 48.65 0.000 1.135175 1.230488
female | 1.961342 .1542812 12.71 0.000 1.658956 2.263727
sclass1 | 11.08567 .2063932 53.71 0.000 10.68115 11.4902
You should obtain the following proportion of schools in category 1 of each sclass2 | 5.875198 .2040505 28.79 0.000 5.475266 6.275129
sclass4 | -3.737739 .2845318 -13.14 0.000 -4.295412 -3.180067
variable: schtype (10% independent), schurban (68% urban), and schdenom (16% _cons | 24.60987 .2796221 88.01 0.000 24.06182 25.15792
Catholic). ------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured |
var(cohort90) | .150845 . . .
var(_cons) | 22.51349 . . .
cov(cohort90,_cons) | -.5841601 . . .
-----------------------------+------------------------------------------------
var(Residual) | 192.9457 . . .
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 1797.04 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

Centre for Multilevel Modelling, 2010 35 Centre for Multilevel Modelling, 2010 36
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.4 Adding Level 2 Explanatory Variables P5.4 Adding Level 2 Explanatory Variables

P5.4.1 Contextual effects error. There has also been a slight reduction in the school-level variance. After
accounting for school type, the between-school variance for the 1990 cohort (the
We will begin by adding school type (independent vs. state) to the model. intercept variance) reduces from 22.5 to 20.6. However, there remains a large
amount of unexplained between-school variance.
scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
We will now add in the urban-rural indicator of school location:
+ β6 schtype j
+u0 j + u1j cohort90ij + eij scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
+ β6 schtype j + β7 schurban j
. xtmixed score cohort90 female sclass1 sclass2 sclass4 schtype /// +u0 j + u1j cohort90ij + eij
> || schoolid: cohort90, covariance(unstructured) ///
> mle variance nostderr
. xtmixed score cohort90 female sclass1 sclass2 sclass4 ///
Performing EM optimization: > schtype schurban ///
> || schoolid: cohort90, covariance(unstructured) ///
> mle variance nostderr
Performing gradient-based optimization:
Performing EM optimization:
Iteration 0: log likelihood = -138333.44
Iteration 1: log likelihood = -138333.44
Performing gradient-based optimization:
Mixed-effects ML regression Number of obs = 33988
Group variable: schoolid Number of groups = 508 Iteration 0: log likelihood = -138328.96
Iteration 1: log likelihood = -138328.95
Obs per group: min = 1
avg = 66.9 Mixed-effects ML regression Number of obs = 33988
max = 190 Group variable: schoolid Number of groups = 508

Obs per group: min = 1


Wald chi2(6) = 6997.32 avg = 66.9
Log likelihood = -138333.44 Prob > chi2 = 0.0000 max = 190

------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval] Wald chi2(7) = 7018.43
Log likelihood = -138328.95 Prob > chi2 = 0.0000
-------------+----------------------------------------------------------------
cohort90 | 1.184027 .0242101 48.91 0.000 1.136576 1.231478
female | 1.963768 .154259 12.73 0.000 1.661426 2.26611 ------------------------------------------------------------------------------
sclass1 | 11.03064 .2069528 53.30 0.000 10.62502 11.43626 score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
sclass2 | 5.856441 .2041423 28.69 0.000 5.45633 6.256553 -------------+----------------------------------------------------------------
sclass4 | -3.750241 .2845443 -13.18 0.000 -4.307938 -3.192544 cohort90 | 1.181905 .0242256 48.79 0.000 1.134424 1.229386
schtype | 4.247496 .8168863 5.20 0.000 2.646428 5.848564 female | 1.966677 .1542528 12.75 0.000 1.664347 2.269006
sclass1 | 11.03297 .2069363 53.32 0.000 10.62738 11.43856
_cons | 24.27927 .2787147 87.11 0.000 23.733 24.82554
sclass2 | 5.84713 .2041836 28.64 0.000 5.446938 6.247323
------------------------------------------------------------------------------
sclass4 | -3.739987 .2845715 -13.14 0.000 -4.297737 -3.182237
schtype | 4.391871 .8092549 5.43 0.000 2.80576 5.977981
------------------------------------------------------------------------------
schurban | -1.437171 .4763462 -3.02 0.003 -2.370793 -.50355
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------ _cons | 25.25994 .4272198 59.13 0.000 24.4226 26.09727
schoolid: Unstructured | ------------------------------------------------------------------------------
var(cohort90) | .1481224 . . .
var(_cons) | 20.56986 . . . ------------------------------------------------------------------------------
cov(cohort90,_cons) | -.4585435 . . . Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------ -----------------------------+------------------------------------------------
var(Residual) | 192.9941 . . . schoolid: Unstructured |
------------------------------------------------------------------------------ var(cohort90) | .148265 . . .
LR test vs. linear regression: chi2(3) = 1681.11 Prob > chi2 = 0.0000 var(_cons) | 19.95181 . . .
cov(cohort90,_cons) | -.4522767 . . .
Note: LR test is conservative and provided only for reference. -----------------------------+------------------------------------------------
var(Residual) | 193.0111 . . .
------------------------------------------------------------------------------
A child in an independent school would be expected to have a score that is 4.25 LR test vs. linear regression: chi2(3) = 1650.58 Prob > chi2 = 0.0000
points higher than a child in a state school (from the same cohort, and of the same Note: LR test is conservative and provided only for reference.
sex and social background). We can see that this effect is strongly statistically
significant because the estimated coefficient is more than 5 times its standard

Centre for Multilevel Modelling, 2010 37 Centre for Multilevel Modelling, 2010 38
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.4 Adding Level 2 Explanatory Variables P5.4 Adding Level 2 Explanatory Variables

On average, a student in an urban school has a score that is 1.44 points lower than The ratio of the estimated coefficient of schdenom to its standard error is less
a student attending a school in a town or rural area. This difference is adjusted than 0.3, so there is little evidence of a difference between Catholic and non-
for the effects of school type, and student cohort, gender and social class. The denominational schools. We will therefore remove this variable from our model.14
between-school variance in 1990 has decreased further but by a very small amount
(from 20.6 to 20.0).
P5.4.2 Cross-level interactions
Finally, we will test for differences in attainment by school denomination.
Our analysis thus far has revealed that student attainment at age 16 is significantly
scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij related to the year in which the exams were taken (cohort), and student gender
+ β6 schtype j + β7 schurban j + β8 schdenom j and parental social class. At the school level, there are differences in student
attainment between independent and state schools, and between urban and rural
+u0 j + u1j cohort90ij + eij schools. However, we have considered only main effects of these variables. In
practice, the relationship between y and an explanatory variable x1 may depend
. xtmixed score cohort90 female sclass1 sclass2 sclass4 ///
> schtype schurban schdenom /// on the value of another variable x 2 , i.e. an interaction effect between x1 and x 2 .
> || schoolid: cohort90, covariance(unstructured) ///
> mle variance nostderr
In a multilevel model, x1 and x 2 may be defined at the same or different levels. If
they are at different levels, the interaction is referred to as a cross-level
Performing EM optimization:
interaction.
Performing gradient-based optimization:
To illustrate cross-level interactions and their interpretation, we will test for an
Iteration 0: log likelihood = -138328.92
Iteration 1: log likelihood = -138328.92
interaction between cohort (level 1) and school type (level 2). We will also
explore whether a cohort-school type interaction can explain between-school
Mixed-effects ML regression Number of obs = 33988 differences in attainment trends (i.e. whether such an interaction reduces some of
Group variable: schoolid Number of groups = 508
the variance of the random part of the slope for cohort). First we generate this
Obs per group: min = 1 new interaction variable:
avg = 66.9
max = 190
. generate cohort90Xschtype = cohort90*schtype

Wald chi2(8) = 7019.84


Log likelihood = -138328.92 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cohort90 | 1.182018 .0242188 48.81 0.000 1.13455 1.229486
female | 1.966649 .1542535 12.75 0.000 1.664318 2.26898
sclass1 | 11.0335 .2069607 53.31 0.000 10.62786 11.43913
sclass2 | 5.847271 .2041904 28.64 0.000 5.447065 6.247477
sclass4 | -3.740548 .284578 -13.14 0.000 -4.298311 -3.182786
schtype | 4.398108 .8108271 5.42 0.000 2.808916 5.9873
schurban | -1.462191 .4849347 -3.02 0.003 -2.412646 -.511737
schdenom | .1698495 .6015188 0.28 0.778 -1.009106 1.348805
_cons | 25.2484 .4293585 58.80 0.000 24.40687 26.08993
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured | 14
It is a possible that a variable with a non-significant main effect could be involved in a significant
var(cohort90) | .1481931 . . . interaction effect. To illustrate how this might arise, suppose we have a binary student-level
var(_cons) | 19.96586 . . .
cov(cohort90,_cons) | -.4598982 . . .
variable z (coded 0 and 1). Suppose also that attending a Catholic school is associated with higher
-----------------------------+------------------------------------------------ attainment among students with z = 0, but lower attainment among students with z = 1. This would
var(Residual) | 193.0126 . . . be an example of an interaction between school denomination and z (actually a cross-level
------------------------------------------------------------------------------ interaction because the two variables are defined at different levels). If the categories of z are of
LR test vs. linear regression: chi2(3) = 1647.69 Prob > chi2 = 0.0000 a similar size, ignoring the interaction with z and allowing only for an overall main effect of school
denomination is likely to lead to an apparently non-significant effect. We will not pursue this
Note: LR test is conservative and provided only for reference. possibility here.

Centre for Multilevel Modelling, 2010 39 Centre for Multilevel Modelling, 2010 40
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.4 Adding Level 2 Explanatory Variables P5.4 Adding Level 2 Explanatory Variables

We then add this interaction to the model: school differences in attainment trends: the school-level variance in the cohort90
coefficient has reduced only slightly from 0.148 to 0.138.
scoreij = β0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
To see the nature of the interaction effect, consider the fixed part of the model
+ β6 schtype j + β7 schurban j + β8 cohort90Xschtypeij
that contains cohort90 and schtype:
+u0 j + u1j cohort90ij + eij
1.214 cohort90 + 5.291schtype − 0.599 cohort90Xschtype
. xtmixed score cohort90 female sclass1 sclass2 sclass4 ///
> schtype schurban cohort90Xschtype ///
> || schoolid: cohort90, covariance(unstructured) /// For schtype = 0 (state schools), this equation reduces to:
> mle variance nostderr

Performing EM optimization:
1.214 cohort90

Performing gradient-based optimization:


So in the average state school (i.e. with u1j = 0 ),15 we would expect a year-on-
Iteration 0: log likelihood = -138312.52 year increase in attainment of 1.214 points.
Iteration 1: log likelihood = -138312.52

Mixed-effects ML regression Number of obs = 33988 For schtype = 1 (independent schools), this equation reduces to:
Group variable: schoolid Number of groups = 508

Obs per group: min = 1 1.214 cohort90 + 5.291- 0.599 cohort90 = (1.214 - 0.599) cohort90 + 5.291
avg = 66.9
max = 190 = 0.615 cohort90 + 5.291

Wald chi2(8) = 7154.79 So in the average independent school, we would expect a year-on-year increase in
Log likelihood = -138312.52 Prob > chi2 = 0.0000 attainment of 0.615 points.
------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval] The coefficient of schtype (estimated as 5.291) is the expected difference in
-------------+---------------------------------------------------------------- attainment between independent and state schools in 1990 (i.e. when
cohort90 | 1.21353 .0244236 49.69 0.000 1.16566 1.261399 cohort90 = 0).
female | 1.970255 .1541846 12.78 0.000 1.668059 2.272451
sclass1 | 11.01941 .2068684 53.27 0.000 10.61395 11.42486
sclass2 | 5.830755 .2041076 28.57 0.000 5.430712 6.230798 Our overall conclusion is that the mean attainment is higher in independent
sclass4 | -3.742837 .2844436 -13.16 0.000 -4.300336 -3.185338
schtype | 5.290782 .8307023 6.37 0.000 3.662635 6.918929
schools than in state schools, but independent schools experienced a smaller
schurban | -1.403769 .4828696 -2.91 0.004 -2.350176 -.4573615 increase in attainment with cohort. As in our earlier analyses, it would be
cohort90Xs~e | -.599423 .1037954 -5.78 0.000 -.8028582 -.3959878 interesting to investigate whether the trends in progress are different for
_cons | 25.18708 .4320316 58.30 0.000 24.34031 26.03384
------------------------------------------------------------------------------ independent and state schools.

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
Don’t forget to take the online quiz!
-----------------------------+------------------------------------------------
schoolid: Unstructured |
var(cohort90) | .1380192 . . .
var(_cons) | 20.41395 . . . From within the LEMMA learning environment
cov(cohort90,_cons) | -.3906912 . . . • Go down to the section for Module 5: Introduction to Multilevel Modelling
-----------------------------+------------------------------------------------
var(Residual) | 192.8513 . . . • Click "5.4 Adding Level 2 Explanatory Variables"
------------------------------------------------------------------------------ to open Lesson 5.4
LR test vs. linear regression: chi2(3) = 1651.36 Prob > chi2 = 0.0000 • Click to open the first question
Q
Note: LR test is conservative and provided only for reference. 1(i)

The estimated coefficient of the interaction variable cohort90Xschtype is almost


6 times its standard error, so this is strong evidence that the effect of cohort
differs for independent and state schools. (Equivalently, we can say that the
difference between independent and state schools differs across cohorts.) 15
The effect of cohort varies randomly across schools, so we fix the school cohort residual at its
However, the addition of this interaction effect does little to explain between- mean of zero to examine the cohort-school type interaction effect.

Centre for Multilevel Modelling, 2010 41 Centre for Multilevel Modelling, 2010 42
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.5 Complex Level 1 Variation P5.5 Complex Level 1 Variation

P5.5 Complex Level 1 Variation To allow boys and girls to have separate variances, we specify the
residuals(independent, by(female)) option (the residuals() option is
In a random slope (coefficient) model, the level 2 variance is a function of the available as of Stata 11).
explanatory variable(s) with a random coefficient. For example, in P5.3, we
allowed the effects of cohort and social class to vary randomly across schools, scoreij = β 0 + β1cohort90ij + β 2 femaleij + β3sclass1ij + β 4 sclass2ij + β5sclass4ij
which implies that the between-school variance depends on cohort and class. Up + β6 schtype j + β7 schurban j + β8 cohort90Xschtypeij
to this point, however, we have assumed that the level 1 (within-school) variance
+u0 j + u1j cohort90ij + eij
is constant. In this exercise, we will allow the within-school variance to depend on
explanatory variables in a complex level 1 variance model.
where:
Load “5.5.dta” into memory and open the do-file for this lesson:
( )
var eij = σ e20 for males
From within the LEMMA Learning Environment
 Go to Module 5: Introduction to Multilevel Modelling, and scroll down to = σ e21 for females

Stata Datasets and Do-files . xtmixed score cohort90 female sclass1 sclass2 sclass4 ///
> schtype schurban cohort90Xschtype ///
 Click “ 5.5.dta” to open the dataset > || schoolid: cohort90, covariance(unstructured) ///
> residuals(independent, by(female)) ///
> mle variance

Obtaining starting values by EM:


P5.5.1 Within-school variance as a function of cohort (continuous x) Performing gradient-based optimization:

Unfortunately, it is not possible to use the xtmixed command to fit models where Iteration 0: log likelihood = -138312.52
Iteration 1: log likelihood = -138303.77
the level 1 variance is a function of a continuous variable. We recommend that Iteration 2: log likelihood = -138303.77
the interested reader considers the gllamm command which can fit such models
Computing standard errors:
(Rabe-Hesketh and Skrondal, 2008). What is possible is to fit models where the
level 1 variance is a function of a categorical variable. An example is given in Mixed-effects ML regression Number of obs = 33988
P5.5.2. Group variable: schoolid Number of groups = 508

Obs per group: min = 1


avg = 66.9
P5.5.2 Within-school variance as a function of gender (dichotomous max = 190

x)
Wald chi2(8) = 7189.37
Log likelihood = -138303.77 Prob > chi2 = 0.0000
We will extend the model in P5.4.2 to assess whether boys and girls differ in terms
of the variability in their scores. We have already found that the mean score is ------------------------------------------------------------------------------
score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
higher among girls than boys from fitting a model with a dummy for gender in the -------------+----------------------------------------------------------------
fixed part of the model. We will now include gender in the random level 1 part of cohort90 | 1.215688 .0243563 49.91 0.000 1.16795 1.263425
the model. female | 1.969503 .1544564 12.75 0.000 1.666774 2.272232
sclass1 | 11.01903 .2067112 53.31 0.000 10.61388 11.42418
sclass2 | 5.835307 .203975 28.61 0.000 5.435523 6.23509
In C5.5.2 we considered two alternative ways of specifying a model where the sclass4 | -3.765776 .2843587 -13.24 0.000 -4.323109 -3.208443
level 1 variance depends on a dichotomous variable. The preferred approach is to schtype | 5.311512 .8300741 6.40 0.000 3.684597 6.938428
schurban | -1.413508 .4821454 -2.93 0.003 -2.358496 -.4685208
specify separate level 1 residuals for boys and girls, and then to estimate a cohort90Xs~e | -.5929665 .1036443 -5.72 0.000 -.7961056 -.3898274
separate variance for each. We do this by including in the random level 1 part of _cons | 25.19513 .4319988 58.32 0.000 24.34843 26.04184
the model a dummy for boys and a separate dummy for girls. We do not need to ------------------------------------------------------------------------------

create these variables as the xtimixed command will do this automatically for us ------------------------------------------------------------------------------
when we specify a model with complex level 1 variation. Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured |
var(cohort90) | .1369753 .0181445 .1056544 .1775813
var(_cons) | 20.38501 1.786094 17.16842 24.20425

Centre for Multilevel Modelling, 2010 43 Centre for Multilevel Modelling, 2010 44
Module 5 (Stata Practical): Introduction to Multilevel Modelling Module 5 (Stata Practical): Introduction to Multilevel Modelling
P5.5 Complex Level 1 Variation P5.5 Complex Level 1 Variation

cov(cohort90,_cons) | -.4000899 .1334368 -.6616212 -.1385586


-----------------------------+------------------------------------------------
Don’t forget to take the online quiz!
Residual: Independent, |
by female |
0: var(e) | 199.5662 2.273028 195.1605 204.0714
1: var(e) | 186.8744 2.009239 182.9775 190.8542 From within the LEMMA learning environment
------------------------------------------------------------------------------ • Go down to the section for Module 5: Introduction to Multilevel Modelling
LR test vs. linear regression: chi2(4) = 1668.86 Prob > chi2 = 0.0000
• Click " 5.5 Complex Level 1 Variance"
Note: LR test is conservative and provided only for reference. to open Lesson 5.5
• Click Q1 to open the first question
The likelihood ratio test statistic comparing this model with the constant level 1
variance model is:

LR = 2(-138304 - -138313) = 18 on 1 d.f. Don’t forget to take the quiz that tests you on the whole of Module
5!
So there is strong evidence (critical value at 5% is 3.84) that the amount of within-
school variance differs for boys and girls. The estimated within-school variance is
From with in the LEMMA learning environment
186.874 for girls and 199.566 for boys. So girls have a higher mean attainment
• Go down to the section for Module 5: Introduction to Multilevel Modelling
than boys and there is less variation in their scores.
• Click "Module 5 Understanding Quiz" to open the Quiz

P5.5.3 Within-school variance as a function of cohort and gender

Unfortunately, it is not possible to use the xtmixed command to fit models where P5.6 References
the level 1 variance is a function of a continuous variable. We recommend that
the interested reader considers the gllamm command which can fit such models Rabe-Hesketh, S. and Skrondal, A. (2008) Multilevel and longitudinal modeling
(Rabe-Hesketh and Skrondal, 2008). using Stata (Second Edition). College Station, TX: Stata Press.

Centre for Multilevel Modelling, 2010 45 Centre for Multilevel Modelling, 2010 46

You might also like