IIntroduction
d i to
WinBUGS
Olivier Gimenez
University of St Andrews
Andrews, Scotland
A bri
brieff hi
history
t r
1989: project began with a Unix version called BUGS
1998: first Windows version,
version WinBUGS was born
Initially developed by the MRC Biostatistics Unit in
Cambridge and now joint work with Imperial College
School of Medicine at St Mary's, London.
Windows Bayesian inference Using Gibbs Sampling
Software for the Bayesian analysis of complex statistical
models using Markov chain Monte Carlo (MCMC) methods
Who?
Nicky Best
Imperial College Faculty of
Medicine, London (UK)
Thomas Andrew
Wally Gilks
University of Helsinki,
Helsinki (Finland)
MRC Biostatistics Unit
Institute of Public Health
Cambridge (UK)
David Spiegelhalter
MRC Biostatistics Unit
Institute of Public Health
Cambridge (UK)
Freely downloadable from: http://www.mrcbsu.cam.ac.uk/bugs/winbugs/contents.shtml
Key principle
l
Y
You
specify
if the
h prior
i and
d build
b ild up the
h likelihood
lik lih d
WinBUGS computes the posterior by running a Gibbs
sampling
li
algorithm,
l
ith
b
based
d on:
(D)
(
) / L(D)
(
) ()
( )
WinBUGS computes some convergence diagnostics that
you have to check
y
A biological
g
example
p throughout
g
White stork (Ciconia ciconia) in Alsace 19481970
Demographic components
(fecundity, breeding success,
survival, etc)
Climate
(Temperature, rainfall, etc)
WinBUGS & Linear Regression
Y
Number
of chicks
per pairs
2.55
1.85
2.05
2.88
3.13
2.21
2 43
2.43
2.69
2.55
2.84
2.47
2.69
2.52
2 31
2.31
2.07
2.35
2.98
1.98
2.53
2.21
2 62
2.62
1.78
2.30
15.1
13.3
15.3
13.3
14.6
15.6
13 1
13.1
13.1
15.0
11.7
15.3
14.4
14.4
12 7
12.7
11.7
11.9
15.9
13.4
14.0
13.9
12 9
12.9
15.1
13.0
67
52
88
61
32
36
72
43
92
32
86
28
57
55
66
26
28
96
48
90
86
78
87
T Temp. May (C)
R Rainf. May (mm)
WinBUGS & Linear Regression
1. Do temperature and rainfall affect the number of chicks?
2. Regression model:
Yi = + r Ri + t Ti + i , i=1,...,23
i i.i.d. ~ N(0,2)
Yi ii.i.d.
i d ~ N(
N( i,2),
) i=1,...,23
i 1
23
i = + r Ri + t Ti
3. Estimation of parameters: , r, t,
4. Frequentist inference uses ttests
Linear Regression using Frequentist approach
Y
Number
of chicks
per pairs
2.55
1.85
2.05
2.88
3.13
2.21
...
2.30
Y=
temperature
rainfall
15.1
13.3
15.3
13.3
14.6
15.6
...
13.0
67
52
88
61
32
36
...
87
T Temp
Temp. May (C)
R Rainf. May (mm)
2.451 + 0.031 T  0.007 R
Estimate
0.031069
0.007316
Std. Error
0.054690
0.002897
t value Pr(>t)
0.568 0.57629
2.525 0.02011 *
Linear Regression using Frequentist approach
Y
Number
of chicks
per pairs
2.55
1.85
2.05
2.88
3.13
2.21
...
2.30
Y=
temperature
rainfall
15.1
13.3
15.3
13.3
14.6
15.6
...
13.0
67
52
88
61
32
36
...
87
T Temp
Temp. May (C)
R Rainf. May (mm)
2.451 + 0.031 T  0.007 R
Estimate
0.031069
0.007316
Std. Error
0.054690
0.002897
t value Pr(>t)
0.568 0.57629
2.525 0.02011 *
Influence of Rainfall only
Running WinBUGS
Wh do
What
d you need?
d?
1  a model giving the
likelihood and the priors
2  some data of course
3  initial values to start
the MCMC algorithm
Running WinBUGS
T model
The
d
use the WinBUGS
command 'model'
Specify the priors
We use noninformative
or vague or flat priors
h
here
don't forget to embrace
the model with {...}
Define the likelihood...
Monitor
parameter
Yi ~ N(any
+ other
R
+
t T2i ,
2)
r
i
you'd like to...
e.g. = 1/
Note: 2 = 1/
Running WinBUGS
Data andd initial values
...and
'vector'
structures
(R/Splus
syntax)
Use 'list'
structures
(R/Splus
syntax)...
Running WinBUGS
Overall
1  a model giving the
likelihood and the priors
2  data
3  initial values
Running WinBUGS
At last!!
1 check model
1
2 load data
3 compile model
4 load initial values
4
5 generate burnin values
6 parameters to be monitored
7 p
perform the sampling
p g to g
generate p
posteriors
8 check convergence and display results
Running WinBUGS
1. Check model
d
Running WinBUGS
1. Check model:
d highlight ''model'
d '
Running WinBUGS
1. Check model:
d open the Model
d Specification T
Tool
Running WinBUGS
1. Check model:
d Now click ''check model'
d '
Running WinBUGS
1 Check
1.
Ch k model:
d l Watch
W h out for
f the
h confirmation
fi
i at the
h foot
f off the
h screen
Running WinBUGS
2. Loadd ddata: Now highlight the 'list'
' ' in the data
d window
d
Running WinBUGS
2. Loadd ddata: then click ''loadd ddata''
Running WinBUGS
2 Load
2.
L d ddata: watchh out ffor the
h confirmation
fi
i at the
h foot
f off the
h screen
Running WinBUGS
3. Compile model:
d Next, click ''compile''
Running WinBUGS
3 Compile
3.
C il model:
d l watchh out for
f the
h confirmation
fi
i at the
h foot
f off the
h screen
Running WinBUGS
4. Loadd initial values: highlight the 'list'
' ' in the data
d window
d
Running WinBUGS
4. Loadd initial values: click 'load
' d inits''
Running WinBUGS
4 Load
4.
L d iinitial
i i l values:
l watchh out for
f the
h confirmation
fi
i at the
h foot
f off the
h screen
Running WinBUGS
5. Generate Burnin values: Open the Model
d Update
d T
Tool
Running WinBUGS
5 Generate
5.
G
B
Burnin
i values:
l Give
Gi the
h number
b off burnin
b i iterations
i
i (1000)
Running WinBUGS
5 Generate
5.
G
B
Burnin
i values:
l click
li k 'update'
' d ' to do
d the
h sampling
li
Running WinBUGS
6 Monitor
6.
M i parameters: open the
h Inference
I f
Samples
S l Tool
T l
Running WinBUGS
6 Monitor
6.
M i parameters: Enter
E
'intercept'
'i
' in
i the
h node
d box
b andd click
li k 'set'
' '
Running WinBUGS
6 Monitor
6.
M i parameters: Enter
E
'slope_temperature'
'l
' in
i the
h node
d box
b andd click
li k 'set'
' '
Running WinBUGS
6 Monitor
6.
M i parameters: Enter
E
'slope_rainfall'
'l
i f ll' in
i the
h node
d box
b andd click
li k 'set'
' '
Running WinBUGS
7 Generate
7.
G
posterior
i values:
l enter the
h number
b off samples
l you want to take
k (10000)
Running WinBUGS
7 Generate
7.
G
posterior
i values:
l click
li k 'update'
' d ' to do
d the
h sampling
li
Running WinBUGS
8. Summarize posteriors: Enter '*' in the node box and click 'stats'
Running WinBUGS
8. Summarize posteriors: mean, median and credible intervals
Running WinBUGS
8. Summarize posteriors: 95% Credible intervals
tell us the same story
temperature
rainfall
Estimate
0.031069
0.007316
Std. Error
0.054690
0.002897
t value Pr(>t)
0.568 0.57629
2.525 0.02011 *
Running WinBUGS
8. Summarize posteriors: 95% Credible intervals
tell us the same story
temperature
rainfall
Estimate
0.031069
0.007316
Std. Error
0.054690
0.002897
t value Pr(>t)
0.568 0.57629
2.525 0.02011 *
Running WinBUGS
8. Summarize posteriors: click 'history'
Running WinBUGS
8. Summarize posteriors: click 'auto cor'
Problem of autocorrelation
Coping with autocorrelation
use standardized covariates
Coping with autocorrelation
use standardized covariates
Rerunning WinBUGS
1,2,...7, and 8. Summarize posteriors: click 'auto cor'
slope temperature
slope.temperature
1.0
0.5
0.0
0.5
1.0
0
20
40
lag
slope.rainfall
1.0
0.5
0.0
0.5
1.0
0
20
40
lag
g
autocorrelation OK
Rerunning WinBUGS
1,2,...7, and 8. Summarize posteriors: click 'density'
slope.rainfall sample: 1000
8.0
6.0
40
4.0
2.0
0.0
0.6
06
0.4
04
0.2
02
slope.temperature
p
p
sample:
p 1000
8.0
6.0
4.0
2.0
0.0
0.4
0.2
0.0
0.2
Rerunning WinBUGS
1,2,...7, and 8. Summarize posteriors: click 'quantiles'
slope.rainfall
p
2.77556E17
0.1
0.2
0.3
0.4
1041
1250
1500
1750
iteration
slope.temperature
0.3
0.2
0.1
0.0
0.1
0.2
02
1041
1250
1500
iteration
1750
Running WinBUGS
8. Checking for convergence using the BrooksGelmanRubin criterion
A way to identify nonconvergence is to simulate
multiple sequences for overdispersed starting points
Intuitively, the behaviour of all of the chains should be
basically the same.
In other words, the variance within the chains should
be the same as the variance across the chains.
chains
In WinBUGS, stipulate the number of chains after
'load
load data'
data and before 'compile'
compile (obviously
(obviously, as many
sets of initial values as chains have to be loaded, or
generated)
Running WinBUGS
8. Checking for convergence using the BrooksGelmanRubin criterion
slope.temperature chains 1:2
slope.rainfall chains 1:2
1.5
1.0
10
1.0
0.5
0.5
0.0
0.0
1
5000
iteration
10000
5000
10000
iteration
The normalized width of the central 80% interval of the pooled
runs is green
The normalized average width of the 80% intervals within the
individual runs is blue
Rerunning WinBUGS
1,2,...7, and 8. Summarize posteriors: others...
Click 'coda' to produce lists of data suitable for
external treatment via the Coda R package
Click 'trace' to produce dynamic history changing
in real time
Another example: logistic regression
Y
Proportion
of nests with
success
(>0 youngs)
151 / 173
105 / 164
73 / 103
107 / 113
113 / 122
87 / 112
77 / 98
108 / 121
118 / 132
122 / 136
112 / 133
120 / 137
122 / 145
89 / 117
69 / 90
71 / 80
53 / 67
41 / 54
53 / 58
31 / 39
35 / 42
14 / 23
18 / 23
15.1
13.3
15.3
13.3
14.6
15.6
13 1
13.1
13.1
15.0
11.7
15.3
14.4
14.4
12 7
12.7
11.7
11.9
15.9
13.4
14.0
13.9
12 9
12.9
15.1
13.0
67
52
88
61
32
36
72
43
92
32
86
28
57
55
66
26
28
96
48
90
86
78
87
T Temp. May (C)
R Rainf. May (mm)
Performing a logistic regression with WinBUGS
Performing a logistic regression with WinBUGS
model
# succ. in
i year i ~ Bin(p
Bi ( i, totall # couples
l in
i year i)
where pi the probability of success in year i
logit(pi)= + r Ri + t Ti,
i=1,...,23
Performing a logistic regression with WinBUGS
noninformative priors
Performing a logistic regression with WinBUGS
data & initial values
Performing a logistic regression with WinBUGS
the
h results
l
lower
upper
influence of rainfall, but not temperature (see credible intervals)
Performing a logistic regression with WinBUGS
the
h results
l
additional parameters as a byproduct of the MCMC samples: just add
them in the model as parameters to be monitored
 geometric mean:
geom < pow(prod(p[]),1/N)
 oddsratio:
odds.rainfall < exp(slope.rainfall)
odds.temperature < exp(slope.temperature)
Performing a logistic regression with WinBUGS
the
h results
l
additional parameters as a byproduct of the MCMC samples
 geom.
geom mean probability of success around 82% [81%;84%]
 oddsratio: 16% for an increase of rainfall of 1 unit
Running WinBUGS from R: R2WinBUGS package
Th logistic
The
l i ti regression
i example
pl revisited
i it d
It may be uneasy to read complex sets of data and initial values
It is also quite boring to specify the parameters to be monitored in
each
h run
It might be interesting to save the output and read it into R for
further analyses
solution 1: WinBUGS can be used in batch mode using scripts
solution 2: R2WinBUGS allows WinBUGS to be run from R
We can work with the results after importing them back into R again
create graphical displays of data and posterior simulations or use
WinBUGS in Monte Carlo simulation studies
Running WinBUGS from R: R2WinBUGS package
Th logistic
The
l i ti regression
i example
pl revisited
i it d
To call WinBUGS from R:
1. Write a WinBUGS model in a ASCII file.
2. Go into R.
3 Prepare the inputs to the 'bugs' function and run it
3.
4. A WinBUGS window will pop up amd R will freeze up. The
model will now run in WinBUGS
WinBUGS. You will see things happening
in the Log window within WinBUGS. When WinBugs is done, its
window will close and R will work again.
5. If an error message appears, rerun with 'debug = TRUE'.
Running WinBUGS from R: R2WinBUGS package
1  Write
W it th
the WinBUGS
Wi BUGS code
d in
i a ASCII file
fil
# This covers logistic regression model using two explicative variables.
# The White storks in BadenWurttemberg (Germany) data set is provided
model
{
for( i in 1 : N)
{
# A binomial distribution as a likelihood nbsuccess[i] ~ dbin(p[i]
dbin(p[i],nbpairs[i])
nbpairs[i])
# The probability of success is a function of both rainfall and temperature
logit(p[i]) < intercept + slope.temperature * (temperature[i] mean(temperature[]))/(sd(temperature[])) + slope.rainfall
slope rainfall * (rainfall[i] mean(rainfall[]))/(sd(rainfall[]))
}
# priors for regression parameters
intercept ~ dnorm(0,0.001)
slope.temperature ~ dnorm(0,0.001)
slope.rainfall ~ dnorm(0,0.001)
}
Running WinBUGS from R: R2WinBUGS package
3  Prepare
P p th
the inputs
i p t tto th
the 'bugs'
'b ' ffunction
ti
Nbsuccess
151
105
73
103
107
113
87
112
53
58
31
39
35
42
14
23
18
23
nbpairs
173
164
15 3
15.3
113
122
15.6
15
6
14.0
13 9
13.9
12.9
15.1
13 0
13.0
temperature
15.1
13.3
88
13.3
14.6
36
48
90
86
78
87
rainfall
67
52
61
32
Running WinBUGS from R: R2WinBUGS package
Th logistic
The
l i ti regression
i example
pl revisited
i it d
# Load R2WinBUGS package
library(R2WinBUGS)
# Data
D t (R 'list'
'li t' format)
f
t)
N = 23
data = read.table("datalogistic.dat",header=T)
attach(data)
datax = list("N","nbsuccess","nbpairs","temperature","rainfall")
# MCMC details
nb.iterations = 10000
nb.burnin = 1000
Running WinBUGS from R: R2WinBUGS package
Th logistic
The
l i ti regression
i example
pl revisited
i it d
# Initial values
init1 = list(intercept=1,slope.temperature=1,slope.rainfall=1)
init2 = list(intercept=0,slope.temperature=0,slope.rainfall=0)
i it3 = li
init3
list(intercept=1,slope.temperature=1,slope.rainfall=1)
t(i t
t 1 l
t
t
1 l
i f ll 1)
inits = list(init1,init2,init3)
nb.chains = length(inits)
# Parameters to be monitored
parameters < c("intercept","slope.temperature","slope.rainfall")
# MCMC simulations
res.sim < bugs(datax, inits, parameters,
"logisticregressionstandardized
logisticregressionstandardized.bug
bug", n.chains
n chains = nb
nb.chains,
chains n.iter
n iter
= nb.iterations, n.burnin = nb.burnin)
# Save results
save(res.sim, file = "logistic.Rdata")
Running WinBUGS from R: R2WinBUGS package
Th logistic
The
l i ti regression
i example
pl revisited
i it d
# Summarize results
res.sim$summary # use print(res.sim) alternatively
mean
sd
d
50%
97
97.5%
5%
Rh
Rhatt
Intercept
1.55122555 0.05396574
1.55100 1.6589750 1.0020641
slope.temperature 0.03030854 0.06128879
0.03148 0.1510975 0.9997848
slope rainfall
slope.rainfall
0.15302837
0 15302837 0
0.06206554
06206554 0.15295
0 15295 0.0243025
0 0243025 1
1.0014894
0014894
deviance
204.60259481 2.48898337 203.90000 211.2000000 1.0002280
Running WinBUGS from R: R2WinBUGS package
Th logistic
The
l i ti regression
i example
pl revisited
i it d
# Numerical summaries for slope.rainfall?
quantile(slope.rainfall,c(0.025,0.975))
2.5%
2
5%
97
97.5%
5%
0.2693375 0.0243025
# Calculate the odds
oddsratio
ratio
odds.rainfall < exp(slope.rainfall)
quantile(odds.rainfall,c(0.025,0.975))
2.5% 97.5%
0.7638855 0.9759904
Running WinBUGS from R: R2WinBUGS package
Th logistic
The
l i ti regression
i example
pl revisited
i it d
# Graphical summaries for slope.rainfall?
plot(density(slope.rainfall),xlab="",ylab="", main="slope.rainfall a posteriori density")
Recent developments in WinBUGS
Model selection using RJMCMC
We consider data relating to population of White
storks breeding in Baden Wrttemberg (Germany).
Interest lies in the impact of climate variation
(rainfall) in their wintering area on their population
d
dynamics
i ((adult
d lt survival
i l rates).
t )
Markrecapture data from 195671 are available.
The
Th covariates
i
relate
l
to the
h amount off rainfall
i f ll
between JuneSeptember each year from 10
weather stations located around the Sahel region.
region
Interest lies in identifying the given rainfall
locations that explain the adult survival rates.
Bayesian Model
d l SSelection
l
Discriminating between different models
can often be of particular interest, since
they represent competing biological
hypotheses.
How do we decide which covariates to use?
often there may
y be a large
g number of
possible covariates.
Example
l (cont))
We express the survival rate as a logistic function
of the covariates:
l it t = + Txt + t
logit
where xt denotes the set of covariate values at
time t and t are random effects,
effects
t ~ N(0,2).
However which rainfalls explain the survival
However,
rates for the adults?
Alternatively what values of are nonzero?
Alternatively,
Model
d l SSelection
l
In the classical framework, likelihood ratio tests
or information criterion (e.g. AIC) are often used.
Th
There
is
i a similar
i il Bayesian
B
i
statistic
i i the
h DIC.
DIC
This is programmed within WinBUGS however
its implementation is not suitable
s itable for
fo hierarchical
hie a chical
models (e.g. random effect models).
In addition
addition, the DIC is known to give fallacious
results in even simple problems.
Within the general Bayesian framework
framework, there is
a more natural way of dealing with the issue of
model discrimination.
Bayesian Approach
A
h
We treat the model itself as an unknown
parameter to be estimated.
Th
Then,
applying
l i
B
Bayes
Theorem
Th
we obtain
bt i the
th
posterior distribution over both parameter and
model space:
(m, m  data) L(data  m, m) p(m) p(m).
Here m denotes the parameters in model m.
m
Likelihood
Prior on parameters
in model m
Prior on model m
Posterior Model
d l Probabilities
b bl
The Bayesian approach then allows us to
quantitatively discriminate between competing
models via posterior model probabilities:
(m  data) = s (m, m  data) dm
p(m) s L(data  m, m) p(m) dm
Note that we need to specify priors on both the
parameters and now also on the models
p
themselves.
Thus we need to specify a prior probability for
each model to be considered.
MCMCbased
C C b d estimates
We have a posterior distribution (over parameter
and model space) defined up to proportionality:
(
(m, m  data)
d t ) L(data
L(d t  m, m)) p(
(m  m)) p(m)
( )
If we can sample from this posterior distribution
then we are able to obtain posterior estimates of
summary statistics of interest.
In particular the posterior model probabilities can
be estimated as the proportion of time that the
chain is in each model.
So, all we need to do is define how we construct
such a Markov chain!!
Reversible
bl Jump MCMC
C C
The reversible jump MCMC algorithm allows us to
construct a Markov chain with stationary
distribution equal to the posterior distribution.
distribution
It is simply an extension of the MetropolisHastings algorithm that allows moves between
different dimensions.
This algorithm is needed because the number of
parameters, m, in model m, may differ between
models.
Note that this algorithm needs only one Markov
chain irrespective of the number of models!!
Markov
k chain
h
Each
E
h iteration
it
ti
off the
th Markov
M k
chain
h i essentially
ti ll
involves two steps:
1
1.
2
2.
Within model moves updating each parameter
using standard MCMC moves (Gibbs sampler,
MetropolisHastings)
Between model moves updating the model using
a reversible jump type move.
Then, standard MCMCtype processes apply,
such as using an appropriate burnin, obtaining
summary statistics of interest etc.
To illustrate the RJ algorithm
algorithm, we consider a
particular example relating to variable selection
(e.g. covariate analysis).
WinBUGS
GS
General RJ updates cannot currently be
programmed into WinBUGS.
B
Bespoke
k code
d needs
d to
t be
b written
itt
instead.
i t d
However, the recent addon called jump allows
two particular RJ updates to be performed in
WinBUGS:
Variable selection
Splines
See http://www.winbugsdevelopment.org.uk/rjmcmc.html
So, in particular, WinBUGS can be used for model
selection in the White storks example.
Example:
l White
h SStorks
k
Recall our white storks example there are 10
possible covariates, hence a total of 210 possible
models (1024!).
(1024!)
We specify a prior probability of 0.5 that each
covariate is in the model.
model
Then, conditional on the covariate being present,
we specify,
~ N(0,10).
Finally, for the random effect error variance,
2
1
~ (0.001,0.001).
Example:
l White
h SStorks
k
WinBUGS demonstration.
RJMCMC is performed on the betas only,
neither on the nor on the intercept
Results
l
Models with largest posterior support
0001000000
0000000000
0001010000
0001100000
0001000001
1001000000
0001000100
0001001000
0001000010
0000000100
0101000000
0011000000
1000000000
0000010000
0.5178058297
0
5178058297
0.07629791141
0.0474087675
0 03814092265
0.03814092265
0.03085379849
0.02549001607
0 02374569658
0.02374569658
0.02336699564
0.0229240303
0 02123479458
0.02123479458
0.01809960982
0.01540739041
0.01186825798
0.01103282075
0.5178058297
0
5178058297
0.5941037411
0.6415125086
0 6796534313
0.6796534313
0.7105072297
0.7359972458
0 7597429424
0.7597429424
0.783109938
0.8060339683
0 8272687629
0.8272687629
0.8453683727
0.8607757631
0.8726440211
0.8836768419
Results
l
Additionally the (marginal) posterior probability that each covariate influence
the survival rates:
node
effect[1]
effect[2]
effect[3]
effect[4]
effect[5]
effect[6]
effect[7]
effect[8]
[ ]
effect[9]
effect[10]
p
sdeps
mean
0.01
0 00
0.00
0.00
0.30
0.01
0.01
0.01
0.00
0.01
0.01
0.01
0.91
0.20
sd
0.04
0 03
0.03
0.02
0.17
0.04
0.05
0.03
0.06
0.04
0.04
0.01
0.14
posterior marg prob
0.06
0 04
0.04
0.03
0.83
0.07
0.09
0.04
0.07
0.05
0.05
Results:
l survivall rates
0.8
1.0
S
Survival
i l rate
t for
f white
hit storks
t k
0.6
*

*

*

*

*
*

*

0..4
0.2
*
*
*
*
*
*
0.0
Survival rate
S
1960
1965
Year
1970