You are on page 1of 103

Bayesian factor and structural equation

models in spatial applications.


Specification, identification and model
assessment, with case study
illustrations
Peter Congdon, Queen Mary
University of London
Dept of Geography & Centre for
Statistics

Outline
Background: Bayesian approaches to LV

models, advantages & disadvantages


Computational options including
WINBUGS
Wider application contexts of Bayesian LV
& SEM models
Spatial Priors; Common Spatial Factors

Outline (continued)
Different sorts of spatial factor model

(depending on form of manifest variables)


and possible identification issues
Assessing models, model fit & model
choice. Possible variable/model choice
approaches
Case studies

Case Studies
Social capital & mental health, multilevel

model using Health Survey for England


(HSE)
Multilevel model for joint prevalence of
obesity & diabetes, BRFSS respondents
nested within US counties & states (CDC
Behavioral Risk Factor Surveillance
System)
Suicide & self-harm, ecological study for
small areas (wards) in Eastern England

Background
SEM and factor models originate in (& still
most widely used) in psychological,
educational & behavioural applications.
Recent Bayesian applications to
psychological & education testing data
include SEM (e.g. Lee & Song, 2003), LCA,
item analysis, and factor analysis per se (e.g.
Aitkin & Aitkin, 2005; Press & Shigemasu,
1998).
Also some work on automated Bayesian
model choice in normal linear factor model

Advantages of Bayesian Approach


Straightforward to depart from

standard
assumptions often built into classical
estimation methods (e.g. factor scores
multivariate normal & independent over
subjects)
Advantage in generalizations such as
nonlinear factor effects, multiplicative factor
schemes

Advantages of Bayesian Approach


(continued)
Random

effect models (of which


factor/SEM models are subclass) can be
fitted without relying on numerical
methods to integrate out random effects
Potential for Bayesian model choice
procedures (e.g. stochastic search
variable selection) in factor/SEM models

Disadvantages of Bayesian Approach


Identification issues (re naming of

factors): can have label switching for


latent constructs during MCMC updating if
there arent constraints to ensure
consistent labelling.
Slow convergence of model parameters or
global model fit measures (e.g. DIC and
effective parameter estimate) in large
latent variable applications (e.g. 1000 or
10000 subjects)

Disadvantages of Bayesian Approach


Formal Bayes model assessment

(marginal likelihoods/Bayes factors)


difficult for large realistic applications
Sensitivity to priors on hyperparameters
(e.g. priors for factor covariance matrix)
Bayesian approach may need sensible
priors when applied to factor models
(diffuseness not necessarily suitable)

Bayesian Computing
Many Bayesian applications to SEM and

factor analysis facilitated by WINBUGS


package.
See Congdon (Applied Bayesian
Modelling, 2003); Lee (Structural Equation
Modeling: a Bayesian Approach, 2007)
Alternative is Rmore programming
involved
BayesX cant model common factors

WINBUGS
Despite acronym, WINBUGS employs

Metropolis-Hastings updating where


necessary as well as Gibbs sampling
Program code is essentially a description
of the priors & likelihood, but can monitor
model-related quantities of interest

Computing Illustration: a Normal


SEM

Wheaton Study: 3 latent variables, each


measured by two indicators. Alienation67
measured by anomia67 (1967 anomia scale)
and powles67 (1967 powerlessness scale).
Alienation71 is measured in same way, but using
1971 scales.
Third latent variable, SES (socio-economic
status) measured by years of schooling and
Duncan's Socioeconomic Index, both in 1967.

Structural model relates alienation in 1971


(F2) to alienation in 1967 (F1) and SES (G)
F2i = F1i + 2Gi+u2i
F1i = Gi + u1i
Measurement model for alienation
yji=j +jF1i
j=1,2
yji=j +jF2i
j=3,4
Measurement model for SES
xji=j +jGi
j=1,2

WINBUGS for Wheaton study


model { for (i in 1:n) { # structural model
F2[i] ~ dnorm(mu.F2[i],1);
mu.F2[i] <- beta* F1[i]+gam[2]*G[i]
F1[i] ~ dnorm(mu.F1[i],1);
mu.F1[i] <- gam[1]*G[i]}
# priors (normal uses inverse variance)
for (j in 1:2) {gam[j] ~ dnorm(0,0.001)}
beta ~ dnorm(0,0.001)

# measurement equations for alienation


for (i in 1:n) { for (j in 1:4) { y[i,j] ~ dnorm(mu[i,j],tau[j])}
mu[i,1] <- alph[1]+lam[1]* F1[i]; mu[i,2] <- alph[2]+lam[2]* F1[i]
mu[i,3] <- alph[3]+lam[3]* F2[i]; mu[i,4] <- alph[4]+lam[4]* F2[i]}

# PRIORS
for (j in 1:4){ alph[j] ~ dnorm(0,0.001);
# gamma prior on precisions
tau[j] ~ dgamma(1,0.001)
# alternative prior starts with s.d. of residuals
# sd.y[j] ~ dunif(0,100); tau[j] <- 1/(sd.y[j]*sd.y[j])
# identifiability constraint on loadings to ensure
# positive alienation measure
lam[j] ~ dnorm(1,1) I(0,)}

# measurement of SES (G[i])


for (i in 1:n) { G[i] ~ dnorm(0,1)
for (j in 1:2) { x[i,j] ~ dnorm(mu.x[i,j],tau.x[j])}
mu.x[i,1] <- del[1]+kappa[1]* G[i];
mu.x[i,2] <- del[2]+kappa[2]* G[i]}
for (j in 1:2) {del[j] ~ dnorm(0,0.001);
# gamma prior on precisions
tau.x[j] ~ dgamma(1,0.001)
# identifying constraint ensures +ve SES scale
kappa[j] ~ dnorm(1,1) I(0,)}}

Monitoring model related quantities


Suppose one were interested in posterior

probs that F2i > F1i (alienation increasing


for ith subject)
Add code
for (i in 1:n) {delF[i] <- step(F2[i]-F1[i])}
Then posterior means of delF provide
required probabilities

Widening Applications of Latent


Variable Methods
In particular: application contexts of Bayes

SEM/factor models now include ecological


(area level) studies of health variations.
Usually no longer valid to assume units
(i.e. areas) are independent.
Instead spatial correlation in latent
variable(s) (common spatial factors) over
the areas should be considered

Multi-Level Latent Variable Models


Latent variable methods also more widely

applied in multilevel health studies


Such models consider joint impact of
individual level and area level risk factors
on health status
With several outcomes (data both
multivariate & multilevel) can model area
effects using common factor(s)

SOME SPATIAL PRIORS:


THE BASIS FOR COMMON
SPATIAL FACTORS

Priors incorporating spatial structure:


basis for common spatial factors
May be specified over continuous space

(geostatistical models often used for


kriging)
OR for discrete sets of areas with irregular
boundaries (lattices or polygons)
Major classes:
Simultaneous Autoregressive (SAR) or
Conditional Autoregressive (CAR) priors

Spatial Priors
My focus: CAR priors for lattices (e.g.

administrative areas)
These are priors for structured effects
(where labels of area units are important)
as opposed to unstructured effects
(unaffected or exchangeable over different
labelling scheme for areas)

Substantive Basis
Generally taken to represent

unmeasured area level risk factors for


health that vary relatively smoothly
over space (regardless of arbitrary
administrative boundaries that may
define units of analysis)
Substantive grounding: increased
recognition of genuine spatial effects
on health (contextual effects)

DIFFERENT TYPES OF
COMMON SPATIAL FACTOR

(A) Manifest health variables


Manifest variables are health outcomes

yij (areas i, variable j)


Common residual factor si, expresses
spatial clustering recurring over several
outcomes j
Interpretable as index of common
health risks over outcomes
Example: Wang & Wall 2003

(B) Census Indicator Confirmatory


Model.
Common Spatial Socioeconomic Factor or

Factors (deprivation, rurality, etc) based


on relevant indicators Zik (k=1,..,K) such as
unemployment, low income etc.
Often census indicators form bulk of
manifest variables
Example: Hogan & Tchernis JASA 2004

(C) Two Classes of Manifest


Variable

Common factor(s) used to explain

variations in observed Y variables (health


outcomes).
But factors mainly measured by
socioeconomic indicators Z (e.g. census
data)
Example: my Eastern region suicide study
Partly confirmatory, partly exploratory

MANIFEST VARIABLES:
AREA HEALTH VARIABLES

(A) Shared Spatial Residual Effects


Unobserved area effects common to

several health outcomes modelled by


shared spatial effect
Typical scenario: area counts yij for
areas i and outcomes j. Poisson or
binomial likelihood

Types of Event
May be deaths, hospitalizations, incidence

counts for different cancer types,


prevalence counts, etc
Expected events (offset) Eij based on
standard age rates applied to area
populations: yij ~ Poisson(Eijij)
Can also have populations at risk: y ij ~
Poisson(Niij) or yij ~ Bin(Ni,ij)

Multivariate Spatial Effects


One option for such data: no reduction

Multivariate residual effects


log(ij)=j+sij
(or log(ij)=j+jxi+sij)

For sij

could use multivariate version of


conditional autoregressive prior

Multivariate Spatial Effects


Multivariate normal CAR Prior is example

of Markov Random Field (Rue & Held,


2005).
Easily applied in WINBUGS using mv.car
prior.
May fit well but proliferation of parameters
(more parameters than data points)

Alternative : common spatial factor


log(ij)=j+jsi
Parsimonious and provides interpretable

summary measure of health risk


si is univariate CAR (or some other prior
with spatial dependence)
Correlation between outcomes within areas
modelled via loadings j.

Identification: Location & Scale


Need

isi=0 for location identification.

Centre effects at each MCMC iteration.


Scale identifiability:
EITHER set var(s)=1 and all j are free
loadings (fixed scale),
OR leave var(s) unknown and constrain a
loading, e.g. 1=1.0 (anchoring constraint)

Labelling Problems in Repeated


Sampling
Even

in simple model, labelling may be an


issue.
Consider fixed variance identification
option, var(s)=1, loadings all unknown.
Suppose diffuse priors are taken on
loadings in
log(ij)=j+jsi
without directional constraint.

Labelling Problems (continued)


Then

can have:
a) j all positive combined with si acting as
positive measure of health risk (higher s i in
areas with higher cancer rates)
OR
b) j all negative combined with si acting
as negative measure of health risk (s i
higher in areas with lower cancer rates)

Identifying constraints for


consistent labelling
For

unambiguous labelling advisable to


constrain one or more j to be positive (e.g.

truncated normal or gamma prior)


Note that anchoring constraint with var(s)
unknown, and preset loading (e.g. 1=1.0),
may be intrinsically better identified
steers remaining unknown coefficients to
consistent labelling

Loadings and Labellings


May not be sufficient just to rely on

constraining one loading (e.g. assume +ve)


to ensure consistent labelling
Sometimes said that constraining direction
on one loading ensures consistent
identification
What if indicator chosen for constrained
loading (e.g. ii> 0) is poor measure for
construct

Loadings and Labellings


If twenty indicators are measuring a

construct, the 19 unconstrained loadings


may fit a different label (e.g. deprivation)
to that implied by the remaining
constrained loading (e.g. affluence)
Personal View: Much depends on suitable
selection of manifest indicators and which
(and how many, maybe >1 ) are chosen to
have constrained loadings

WINBUGS Code for manifest


variable scenario A

Extensions of Spatial Common Factors


Product schemes. Consider health

outcomes arranged by area i and age x.


Populations at risk Nix
yix ~ Poisson(Nixix)
log(ix)=x+xsi
x show which age groups are most
sensitive to spatial variations in risk
represented by si
Variation on Lee-Carter (JASA 1992)
mortality forecasting model

Random Effect Loadings


x

potentially random, rather than


fixed effects.
Identified using sum to 1 or averaging
to 1 constraint, e.g. x multinomial, or
x~Gamma(h,h)

Nonlinear effects of common factor


One possibility: just take powers of si,

e.g.

log(ij)=j+jsi+js2i
Or: spline for nonlinear effects in

common factor score si.


e.g. under fixed variance var(s)=1
option, locate knots k at selected
quantiles on cumulative standard
normal.

Linear Spline
Then linear spline

log(ij)=j+jsi+kbjk(si- k)+
bjk might be random effects, but raises
identification issues?

INDICATOR BASED
SPATIAL CONSTRUCTS

(B) Indicator Based Spatial Constructs


Many studies use latent constructs to

analyze population health variations.


Such constructs (e.g. deprivation) not
directly observed
Instead derived from a collection of relevant
indicator variables that are observed, using
multivariate techniques or other composite
variable methods
Many health outcomes show deprivation
gradient

Latent Constructs in Population


Health
Example: Townsend deprivation score

based on summing standardized census


area values for 4 input variables (sum of z
scores)
% unemployed, % with no car, %
households overcrowded, % households
not owner occupiers

Other area constructs


Other examples of latent constructs

relevant to area health variations:


rurality/urbanicity, social fragmentation
Social fragmentation scores used to
analyze variations in area suicide rates
and psychiatric hospitalization rates

Confirmatory Indicator Based Model


Confirmatory model: indicators k=1,..,K are

established proxies for latent construct


e.g. area unemployment rates, welfare
recipients, social housing rates as
indicators of area deprivation
Census rates rik=zik/Dik where zik are counts
(e.g. unemployed), Dik are relevant
denominators (e.g. econ active
populations).

One option for confirmatory model


Use Gaussian approximation to binomial

(Hogan & Tchernis JASA 2004) with


variance stabilizing transformation:
Rik=rik, var(Rik)=k/Dik.
normal measurement equations
Rik ~N( kkFi, k/Dik)
where Fi scores follow spatial CAR prior

Or use relevant Exponential Family links in


deriving common spatial factor
P(zik|ik) = exp([zikik-b(ik)]/+c(zik, ))
e.g. zik

binomial with populations Ni, zik ~


Bin(Ni,ik)
Logit link, plus overdispersion effects wik
logit(ik)= kkFi+wik
wik : normal and uncorrelated over
indicators k.

For other indicators transform to


normality
For intrinsic proportions (e.g. proportion of

area that is green space as indicator of


rurality) take logit transform to
approximate normality
for population density take log transform
etc

TWO CLASSES OF MANIFEST


VARIABLE

(C) Spatial Factors in Model with 2


classes of manifest variable
Health Outcomes Yij

(j=1,,J); e.g.
mortality or incidence counts
Social Indicators Zik (k=1,..k); e.g. census
rates of unemployment
Typical Scenario: multiple common spatial
factors (F1i,..,FQi) primarily measured by Z
variables (indicators established as
relevant).

2 class model
But Factors F also act to potentially

explain area variations in health outcomes


Y.
Z to F links confirmatory, Y to F links
exploratory

Example

Four Poisson health outcomes Y1-Y4, Eight


indicators: Z1-Z4 measure F1; Z5-Z8 measure F2
; both F1 and F2 may explain Y
Yij ~ Po(Eijij)
log(ij)=j+j1F1i+j2F2i
Zik ~ EF(ik)
g(i1)= 1F1i+wi1

g(i5)= 5F2i+wi5

MODEL CHOICE

Formal Choice or Not


Formal Bayes model criteria (e.g. marginal

likelihood/Bayes factor) difficult to derive;


also change with priors
Popular alternative (AIC analogue):
Deviance Information Criterion (DIC).
Average deviance Dev.bar + effective
parameter count de
DIC=Dev.bar+ de

Model Fit in Realistic Applications


Multilevel applications to health survey

data may involve thousands of subjects


(e.g. HSE study).
Ecological applications may involve
hundreds of small areas (Eastern region
suicide study)

Model Fit in Realistic Applications


Convergence of DIC and d e

typically slow
in models with many random effects (such
as factor scores)
Slow convergence also applies to other
measures of fit, e.g. Monte Carlo
estimates of conditional predictive
ordinates
Model selection alternatives

Model Choice using Variable Selection


Model selection potentially for both

loadings and factor variance/covariance


structure.
Dont necessarily apply selection for all
elements in any particular application (e.g.
depending whether exploratory or
confirmatory)
Apply to selected aspects of spatial SEM
models, e.g. loadings only or correlations
between factors only

Selection in 2 manifest variable


SEM
Spatial factor models with 2 types of

manifest variable (health outcomes Yj +


socioeconomic indices Zk)
Apply selection to loadings jq linking Yj to
Fq (exploratory part of model)
But dont apply selection to Z on F
loadings (confirmatory sub-model based
on extensive prior knowledge)

Mixture Priors for Selecting Loadings

Random Effects Selection


Selection procedures for random effects

and/or their variance/covariance structure


e.g. Cai and Dunson (2008), Tchler &
Frhwirth-Schnatter (2008)
These extend to factor and SEM models
as factors are shared random effects

RE Selection: Multivariate Spatial


Prior
Q>1 for shared common spatial

factors
Within area covariance matrix in
MCAR prior denoted F

Cholesky Decomposition of Covariance Matrix F

Selection on variances and/or covariances


Suppose investigator sure about number

of factors (confirmatory model based on


substantial evidence)
BUT not sure whether correlations
between factors are needed
Selection can be applied to relevant
parameters in decomposition of F
mixture prior selection on qr parameters to
decide whether correlations needed

CASE STUDIES

Social capital and mental health,

multilevel model using Health Survey


for England (HSE)
Multilevel model, joint prevalence of
obesity & diabetes, BRFSS subjects
nested within US counties & states
Suicide & self-harm, ecological (area)
study for wards in Eastern England

Case Study 1, Mental Health & Social


Capital, Health Survey for England

Y is observed mental health status (binary). Y=1


if subjects GHQ12 score is 4 or more, Y = 0
otherwise.
Pr(Y=1) related to known socioeconomic risk
factors X at individual subject level
Pr(Y=1) also related to known indicators of
geographic context, G (e.g. micro-area
deprivation quintile, region of residence, urbanrural residence). Micro-areas (32K in England)
called Super Output Areas

Latent Risks
Finally Pr(Y=1) also related to latent

subject level risks, {F1i,F2i,...,FQi}


Examples: social capital, perceived stress.
Structural model: Y~f(Y|X,G,F,)

Health Outcome Sub-Model


Regression

involves 9065 adult subjects.


Yi~ Bin(1,i) .

Use

log-link (relative risk interpretation).


Q=1 for single latent risk factor (social
capital)
log(i)=Xi+Gi+Fi
=+1,gend[i]
+2,age[i]+3,eth[i]+4,oph[i]+5,own[i]
+6,noqual[i]+1,reg[i]+2,dep[i]+3,urb[i]+Fi

Multiple Indicators for Social


Capital
Social capital measured by a battery of K

survey `items' (e.g. questions about


neighbourhood perceptions, organisational
memberships etc), {Z,...,ZK}
Z~g(Z|F,)
e.g. with binary questions, link probability
of positive response k=Pr(Zk=1) to latent
construct via
logit(k)=k+kF

Indicators of Social Capital


Social Support Score (Z1)
5

binary items (Z2-Z6) relate to


neighbourhood perceptions (e.g. can
people be trusted?; do people try to be
helpful?; this area is a place I enjoy living
in; etc)
Final item (Z7) relates to membership of
organisations or groups.

Multiple Causes of Social Capital


Social capital varies by demographic

groups and geographic context (urban


status, region, small area deprivation
category, etc).
So have multiple causes of F as well as
multiple indicators
F ~ h(F|X*,G*, )
X* and G* are individual and contextual
variables relevant to causing social
capital variations

Multiple Cause Sub-Model


Fi~N(i,1)

i=1,gend[i]+2,eth[i]+3,noqual[i]

+4,urb[i]+5,reg[i]
+6,dep[i].
: fixed effects parameters with reference
category (zero coeff) for identification
Only small number of regions in HSE
If had finer spatial detail could take area
effects spatially random (but weak
identification?)

Effect of F on Y

Social capital has significant effect in reducing


the chances of psychiatric caseness.
The effect of social capital apparent in relative
risk 0.35 of psychiatric morbidity for high capital
individuals (with score F=+1) as compared to
low capital individuals (with F=-1).
Obtained as exp(-0.525)/exp(0.525)
= -0.525 is coefficient for social capital effect.

Geographic Context: Micro-area Deprivation Gradient


from Multiple Cause Model

Case Study 2: Diabetes & Obesity


in US
Data from 2007 Behavioral Risk Factor

Surveillance System (BRFSS)


Multinomial outcome (J=6 categories)
defined by diabetic status and weight
category (obese, overweight, normal).

Multinomial Categories

Reference category are subjects with


neither condition. All other categories are
ill relative to reference category

Multilevel multicategory regression


Regression includes:

subject level risk factors (age, ethnicity,


gender, education),
o known geographic effects (e.g. county
poverty),
o county and state random effects to model
unknown geographic influences (e.g.
unknown environmental exposures).
o

Regression & Likelihood

Model Form
Model includes

known subject risk factors


and contextual variables (e.g. county
poverty)
Unknown contextual risks: assume county
and state latent effects, shared over
categories j=1,..,J-1.
Illustrates nested latent spatial effects

County & State Effects


Take county effects vc

(c=1,..,3142) to be
spatially correlated CAR
But us (state effects, s=1,..,51) taken to be
unstructured.
Avoids confounding of two spatially
structured effects

Regression Terms for j=1,..J-1

Case Study 3, Suicide & Self Harm:


Eastern Region Wards in England
Two classes of manifest variables
Y1-Y4: suicide totals in small areas
Z1-Z14: Fourteen small area social

indicators
Q=3 latent constructs (F1 fragmentation, F2
deprivation, F3 urbanicity). Converse of F3
is rurality. Common spatial factors.

Local Authority Map: Eastern England

Geographic Framework
N=1118 small areas (called wards,

subdivisions of local authorities).


Small area focus beneficial: people with
similar socio-demographic characteristics
tend to cluster in relatively small areas, so
greater homogeneity in risk factors related
to social status
On other hand, health events may be
rare

Confirmatory Sub-Model
Confirmatory Z-on-F model
Each indicator Zk

loads only on one

construct Fq.
Most indicators binomial. A few taken as
normal after transformation. Mostly 2001
Census, a few non-census (service
access score, proportion greenspace)

Exponential Family Model for


modelling Z-on-F effects
For indicator k1,..,14, Gk

1,2,3 denotes
which construct it loads on.
Regression with link g allows for
overdispersion via unique w effects
g(ik)= kk,GkF[Gk,i]+wik

Expected Direction of
Confirmatory Model Loadings

Health Outcome Sub-Model (Y-onF effects)


Model for Y-on-F effects

Yij ~ Po(Eijij) j=1,..,4


log(ij)=j+j1F1i+j2F2i+j3F3i+uij
jq using relatively
informative priors under retain option
when Jjq=1. Using diffuse priors means null
model tends to be selected

Coefficient selection on

Redundant Coefficients
Some coefficients (e.g. urbanicity on male

and female suicide, deprivation on female


suicide) not retained under model
selection
Four coefficients in the Y-on-F model were
set to zero in at least some MCMC
iterations averaging over 24 Y-on-F
models

Future Directions in Spatial Factor


Modelling

Extend model selection to interactions between


factors, nonlinear effects etc
In England, model area socioeconomic structure
(and maybe some health outcomes) at
neighbourhood level (32000 Super Output
Areas with mean population 1500).
In US, similar scope for modelling SES structure
in relation to health events for Zip Code
Tabulation Areas or ZCTAs (around 31K across
US, on average about 10K population)

More generally
Bayesian software options for latent

variable and SEM applications more


widely available
Potentialities of WINBUGS in this context
not always appreciated
Scope for dedicated Bayesian factor
analysis package

You might also like