Professional Documents
Culture Documents
The American Statistician, Vol. 46, No. 3. (Aug., 1992), pp. 167-174.
Stable URL:
http://links.jstor.org/sici?sici=0003-1305%28199208%2946%3A3%3C167%3AETGS%3E2.0.CO%3B2-R
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/astata.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Tue Nov 6 02:28:34 2007
Explaining the Gibbs Sampler
GEORGE CASELLA and EDWARD I. GEORGE*
Computer-intensive algorithms, such as the Gibbs sam- applications of the Gibbs sampler have been in Bayesian
pler, have become increasingly popular statistical tools, models, it is also extremely useful in classical (likeli-
both in applied and theoretical work. The properties of hood) calculations [see Tanner (1991) for many ex-
such algorithms, however, may sometimes not be ob- amples]. Furthermore, these calculational methodolo-
vious. Here we give a simple explanation of how and gies have also had an impact on theory. By freeing
why the Gibbs sampler works. We analytically establish statisticians from dealing with complicated calculations,
its properties in a simple case and provide insight for the statistical aspects of a problem can become the main
more complicated cases. There are also a number of focus. This point is wonderfully illustrated by Smith and
examples. Gelfand (1992).
In the next section we describe and illustrate the ap-
KEY WORDS: Data augmentation; Markov chains;
plication of the Gibbs sampler in bivariate situations.
Monte Carlo methods; Resampling techniques.
Section 3 is a detailed development of the underlying
theory, given in the simple case of a 2 x 2 table with
multinomial sampling. From this detailed development,
1. INTRODUCTION
the theory underlying general situations is more easily
The continuing availability of inexpensive, high-speed understood, and is also outlined. Section 4 elaborates
computing has already reshaped many approaches to the role of the Gibbs sampler in relating conditional
statistics. Much work has been done on algorithmic and marginal distributions and illustrates some higher
approaches (such as the EM algorithm; Dempster, Laird, dimensional generalizations. Section 5 describes many
and Rubin 1977), or resampling techniques (such as the of the implementation issues surrounding the Gibbs
bootstrap; Efron 1982). Here we focus on a different sampler, and Section 6 contains a discussion and de-
type of computer-intensive statistical method, the Gibbs scribes many applications.
sampler.
The Gibbs sampler enjoyed an initial surge of pop-
ularity starting with the paper of Geman and Geman 2. ILLUSTRATING THE GIBBS SAMPLER
(1984), who studied image-processing models. The roots
of the method, however, can be traced back to at least Suppose we are given a joint density f(x, y , , . . . ,
Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller y,), and are interested in obtaining characteristics of
(1953), with further development by Hastings (1970). the marginal density
More recently, Gelfand and Smith (1990) generated
new interest in the Gibbs sampler by revealing its po-
tential in a wide variety of conventional statistical
problems. such as the mean or variance. Perhaps the most natural
The Gibbs sampler is a technique for generating ran- and straightforward approach would be to calculate f (x)
dom variables from a (marginal) distribution indirectly, and use it to obtain the desired characteristic. However,
without having to calculate the density. Although there are many cases where the integrations in (2.1) are
straightforward to describe, the mechanism that drives extremely difficult to perform, either analytically or nu-
this scheme may seem mysterious. The purpose of this merically. In such cases the Gibbs sampler provides an
article is to demystify the workings of these algorithms alternative method for obtaining f(x).
by exploring simple cases. In such cases, it is easy to Rather than compute or approximate f(x) directly,
see that Gibbs sampling is based only on elementary the Gibbs sampler allows us effectively to generate a
properties of Markov chains. sample XI, . . . , X,, - f(x) without requiring f(x). By
Through the use of techniques like the Gibbs sam- simulating a large enough sample, the mean, variance,
pler, we are able to avoid difficult calculations, replac- or any other characteristic off (x) can be calculated to
ing them instead with a sequence of easier calculations. the desired degree of accuracy.
These methodologies have had a wide impact on prac- It is important to realize that, in effect, the end result
tical problems, as discussed in Section 6. Although most of any calculations, although based on simulations, are
the population quantities. For example, to calculate the
mean of f(x), we could use (l/m)C~=lXi, and the fact
*George Casella is Professor, Biometrics Unit, Cornell University,
Ithaca, NY 14853. The research of this author was supported by that
National Science Foundation Grant DMS 89-0039. Edward I. George
is Professor, Department of MSIS, The University of Texas at Austin,
TX 78712. The authors thank the editors and referees, whose com-
ments led to an improved version of this article.
0 1992 American Statistical Association The Arnericrtrz Stutisfician, Augilst 1992, Vol. 46, No. 3 167
Thus, by taking rn large enough, any population char- can be obtained analytically from (2.5) as
acteristic, even the density itself, can be obtained to
any degree of accuracy.
To understand the workings of the Gibbs sampler,
we first explore it in the two-variable case. Starting with
a pair of random variables (X, Y), the Gibbs sampler
generates a sample from f (x) by sampling instead from the beta-binomial distribution. Here, characteristics of
the conditional distributions f(x I y) and fO, I x), dis- f(x) can be directly obtained from (2.7), either analyt-
tributions that are often known in statistical models. ically or by generating a sample from the marginal and
This is done by generating a "Gibbs sequence" of ran- not fussing with the conditional distributions. However,
dom variables this simple situation is useful for illustrative purposes.
Figure 1 displays histograms of two samples x,, . . . ,
Y;), x;), Y;, x ; , y;, x;, . . xm of size m = 500 from the beta-binomial distribution
The initial value Y;) = y;) is specified, and the rest of of (2.7) with n = 16, a = 2, and /3 = 4.
(2.3) is obtained iteratively by alternately generating The two histograms are very similar, giving credence
values from to the claim that the Gibbs scheme for random variable
generation is indeed generating variables from the mar-
X,' - f ( x I Yi' = y;) ginal distribution.
One feature brought out by Example 1 is that the
Gibbs sampler is really not needed in any bivariate
We refer to this generation of (2.3) as Gibbs sampling. situation where the joint distribution f(x, y) can be
It turns out that under reasonably general conditions, calculated, since f(x) = f(x, y)lf(y I x). However, as
the distribution of Xk converges to f (x) (the true mar- the next example shows, Gibbs sampling may be indis-
ginal of X) as k + a.Thus, for k large enough, the pensable in situations where f(x, y), f (x), or f (y) cannot
final observation in (2.3), namely Xk = x;, is effec- be calculated.
tively a sample point from f(x).
The convergence (in distribution) of the Gibbs se- Example 2. Suppose X and Y have conditional dis-
quence (2.3) can be exploited in a variety of ways to tributions that are exponential distributions restricted
obtain an approximate sample from f(x). For example, to the interval (0, B), that is,
Gelfand and Smith (1990) suggest generating m inde-
pendent Gibbs sequences of length k, and then using
the final value of Xk from each sequence. If k is chosen f(y I x) rn xe-", 0 < y < B < m, (2.8b)
large enough, this yields an approximate iid sample where B is a known positive constant. The restriction
from f(x). Methods for choosing such k, as well as to the interval (0, B) ensures that the marginal f(x)
alternative approaches to extracting information from exists. Although the form of this marginal is not easily
the Gibbs sequence, are discussed in Section 5. For the calculable, by applying the Gibbs sampler to the con-
sake of clarity and consistency, we have used only the ditionals in (2.8) any characteristic o f f ( x ) can be ob-
preceding approach in all of the illustrative examples tained.
that follow.
Example 1. For the following joint distribution of
X and Y,
f(x I Y; = y;) will be a close approximation to f(x), Figure 3. Comparison of Two Probability Histograms of the Beta-
and we can estimate f (x) with Binomial Distribution With n = 16, a = 2, and p = 4. The black
had length k = 10. The white histogram represents the exact beta-
and if we similarly substitute for fY(y), we have Substituting fx(t) = llt into (4.4) yields
[-I
1X = j (x +t t)2 1t dt,
solving (4.4). Although this is a solution, llx is not a
density function. When the Gibbs sampler is applied to
= jh(x, t)fx(t) dl, (4.1) the conditional densities in (4.3), convergence breaks
down. It does not give an approximation to llx, in fact,
where h(x, t) = [S fXIY(~I ylfYIX(~I t) dyl. Equation we do not get a sample of random variables from a
(4.1) defines a fixed point integral equation for which marginal distribution. A histogram of such random vari-
fx(x) is a solution. The fact that it is a unique solution ables is given in Figure 4, which vaguely mimics a graph
is explained by Gelfand and Smith (1990). of f(x) = llx.
Equation (4.1) is the limiting form of the Gibbs it- It was pointed out by Trevor Sweeting (personal com-
eration scheme, illustrating how sampling from condi- munication) that Equation (4.1) can be solvedvlsing the
tionals produces a marginal distribution. As k + w in truncated exponential densities in (2.8). Evaluating the
(3.51, constant in the conditional densities gives f(x1y) =
ye-Yxl(l - ecBy), 0 < x < B, with a similar expression
fX;llX"(~I xo) + fx(x) for f (ylx). Substituting these functions into (4.1) yields
and the solution f(x) (1 - ecBX)Ix.This density (properly
normalized) is the dashed line in Figure 2.
fxiIxi-,(x I t) + h(x, t), (4.2) The Gibbs sampler fails when B = w above because
and thus (4.1) is the limiting form of (3.5). $fx(x)dx = w , and there is no convergence as described
Although the joint distribution of Xand Y determines in (4.2). In a sense, we can say that a sufficient condition
all of the conditionals and marginals, it is not always for the convergence in (4.2) to occur is that fx(x) is a
the case that a set of proper conditional distributions proper density, that is J fx(x)dx < m. One way to guar-
will determine a proper marginal distribution (and hence antee this is to restrict the conditional densities to lie
a proper joint distribution). The next example shows in a compact interval, as was done in (2.8). General
this. convergence conditions needed for the Gibbs sampler
(and other algorithms) are explored in detail by Scher-
Example 2 (continued): Suppose that B = w in (2.8), vish and Carlin (1990), and rates of convergence are
so that X and Y have the conditional densities also discussed by Roberts and Polson (1990).
Diebolt, J . , and Robert, C. (1990), "Estimation of Finite Mixture Hoel, P. G., Port, S. C., and Stone, C. J. (1972), Introduction to
Distributions Through Bayesian Sampling," technical report, Univ- Stochastic Processes, New York: Houghton-Mifflin.
ersitt Paris VI, L.S.T.A. Lnage, N., Carlin, B. P . , and Gelfand, A. E . (1990), "Hierarchical
Devroye, L. (1986), Non-Uniform Random Variate Generation, New Bayes Models for the Progression of HIV Infection Using Longi-
York: Springer-Verlag. tudinal CD4+ Counts," technical report, Carnegie Mellon Uni-
Eddy, W. F . , and Schervish, M. J. (1990), "The Asymptotic Dis- versity, Dept. of Statistics.
tribution of the Running Time of Quicksort," technical report, Liu, J . , Wong, W. H . , and Kong, A . (1991), "Correlation Structure
Carnegie Mellon University, Dept. of Statistics. and Convergence Rate of the Gibbs Sampler (I): Applications to
Efron, B. (1982), The Jackknife, the Bootstrap, and Other Resampling the Comparison of Estimators and Augmentation Schemes," Tech-
Plans, National Science Foundation-Conference Board of the nical Report 299, University of Chicago, Dept. of Statistics.
Mathematical Sciences Monograph 38, Philadelphia: SIAM. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A.
Gelfand, A. E., Hills, S. E., Racine-Poon, A , , and Smith, A . F. M. H . , and Teller, E . (1953), "Equations of State Calculations by Fast
(1990), "Illustration of Bayesian Inference in Normal Data Models Computing Machines," Journal of Chemical Physics, 21, 1087-
Using Gibbs Sampling," Journal of the American Statistical Asso- 1091.
ciation, 85, 972-985. Muller, P. (1991), "A Generic Approach to Posterior Integration and
Gelfand, A. E . , and Smith, A. F. M. (1990), "Sampling-Based Ap- Gibbs Sampling," Technical Report 91-09, Purdue University, Dept.
proaches to Calculating Marginal Densities," Journal ofthe Amer- of Statistics.
ican Statistical Association, 85, 398-409. Raftery, A. E . , and Banfield, J. D . (1990), "Stopping the Gibbs
Gelfand, A. E., Smith, A . F. M., and Lee, T-M. (1992), "Bayesian Sampler, the Use of Morphology, and Other Issues in Spatial Sta-
Analysis of Constrained Parameter and Truncated Data Problems tistics," technical report, University of Washington, Dept. of
Using Gibbs Sampling," Journal of the American Statistical Asso- Statistics.
ciation, 87, 523-532. Ripley, B. D . (1987), Stochastic Simulation, New York: John Wiley.
Gelman, A , , and Rubin, D. (1991), "An Overview and Approach Ritter, C., and Tanner, M. A . (1990), "The Griddy Gibbs Sampler,"
to Inference From Iterative Simulation," Technical Report, Uni- technical report, University of Rochester.
versity of California-Berkeley, Dept. of Statistics. Robert, C. P. (1990), "Hidden Mixtures and Bayesian Sampling,"
Geman, S., and Geman, D. (1984), "Stochastic Relaxation, Gibbs technical report, Universitt Paris VI, L.S.T.A.
Distributions, and the Bayesian Restoration of Images," IEEE Roberts, G. O . , and Polson, N. G . (1990), "A Note on the Geometric
Transactions on Pattern Analysis and Machine Intelligence, 6, 721- Convergence of the Gibbs Sampler," Technical Report, University
741. of Nottingham, Dept. of Mathematics.
George, E . I . , and McCulloch, R . E . (1991), "Variable Selection via Schervish, M. J., and Carlin, B. P. (1990), "On the Convergence of
Gibbs Sampling," Technical Report, University of Chicago, Grad- Successive Substitution Sampling," Technical Report 492, Carnegie
uate School of Business. Mellon University, Dept. of Statistics.
George, E . I . , and Robert, C. P. (1991), "Calculating Bayes Esti- Smith, A . F. M., and Gelfand, A . E . (1992), "Bayesian Statistics
mates for Capture-Recapture Models," Technical Report 94, Uni- Without Tears: A Sampling-Resampling Perspective, The Amer-
versity of Chicago, Statistics Research Center, and Cornell Uni- ican Statistician, 46, 84-88.
versity, Biometrics Unit. Tanner, M. A . (1991), Tools for Statistical Inference, New York:
Geweke, J. (in press), "Evaluating the Accuracy of Sampling-Based Springer-Verlag.
Approaches to the Calculation of Posterior Moments," Proceedings Tanner, M. A , , and Wong, W. (1987), "The Calculation of Posterior
of the Fourth Valencia International Conference on Bayesian Sta- Distributions by Data Augmentation" (with discussion), Journal
tistics. of the American Statistical Association, 82, 528-550.
Geyer, C. J . (in press), "Markov Chain Monte Carlo Maximum Like-
Tierney, L. (1991), "Markov Chains for Exploring Posterior Distri-
lihood," Computer Science and Statistics: Proceedings of the 23rd
butions," Technical Report 560, University of Minnesota, School
Symposium on the Interface.
of Statistics.
Geyer, C. J . , and Thompson, E . A. (in press), "Constrained Max-
imum Likelihood and Autologistic Models With and Application Verdinelli, I . , and Wasserman, L. (1990), "Bayesian Analysis of
to DNA Fingerprinting Data, Journal of the Royal Statistical So- Outlier Problems Using the Gibbs Sampler," Technical Report 469,
ciety, Ser. B. Carnegie Mellon University, Dept. of Statistics.
Gilks, W. R., and Wild, P. (1992), "Adaptive Rejection Sampling Zeger, S . , and Rizaul Karim, M. (1991), "Generalized Linear Models
for Gibbs Sampling," Journal of the Royal Statistical Society, Ser. With Random Effects: A Gibbs Sampling Approach," Journal of
C, 41, 337-348. the American Statistical Association, 86, 79-86.
LINKED CITATIONS
- Page 1 of 2 -
This article references the following linked citations. If you are trying to access articles from an
off-campus location, you may be required to first logon via your library web site to access JSTOR. Please
visit your library's website or contact a librarian to learn about options for remote access to JSTOR.
References
Bayesian Analysis of Constrained Parameter and Truncated Data Problems Using Gibbs
Sampling
Alan E. Gelfand; Adrian F. M. Smith; Tai-Ming Lee
Journal of the American Statistical Association, Vol. 87, No. 418. (Jun., 1992), pp. 523-532.
Stable URL:
http://links.jstor.org/sici?sici=0162-1459%28199206%2987%3A418%3C523%3ABAOCPA%3E2.0.CO%3B2-2
http://www.jstor.org
LINKED CITATIONS
- Page 2 of 2 -
Monte Carlo Sampling Methods Using Markov Chains and Their Applications
W. K. Hastings
Biometrika, Vol. 57, No. 1. (Apr., 1970), pp. 97-109.
Stable URL:
http://links.jstor.org/sici?sici=0006-3444%28197004%2957%3A1%3C97%3AMCSMUM%3E2.0.CO%3B2-C