ok

© All Rights Reserved

5 views

ok

© All Rights Reserved

- piuk-chapter02.pdf
- (Book) Handbook of Statistics Vol 19 - Elsevier Science 2001 - Stochastic Processes. Theory and Methods North Holland - Shanbhag - Rao
- Predictive Cyber Security
- Algorithmics and Optimization
- Martingale (Probability Theory) - Wikipedia, The Free Encyclopedia
- 2_BasicInformationTheory
- ACTL2102 Final Notes
- n Jobs and M- Machine Problem
- Exploring the Geostatistical Method for Estimating the Signal-To-Noise Ratio of Images
- Stochastic Processes
- 2
- 01621112(Packet Delay in Ocs)
- Implementation of Finite Field Arithmetic Operations for Polynomial
- DeMarzo, Vayanos, Zwiebel - Persuasion Bias, Social Influence and Unidimensional Opinions
- Application of Markov Chain to the Assessment of Students Admission and Academic Performance in Ekiti State University
- The Canonical a Probabilistic Model for Temporal or Sequential Data is What
- week2
- Mi Ope Tri Net
- Plugin Science 24
- Antecedent Es

You are on page 1of 6

Lecture 7: September 12, 2006

Coupling from the Past

Eric Vigoda

7.1 Coupling from the Past

7.1.1 Introduction

We saw in the last lecture how Markov chains can be useful algorithmically. If we have a

probability distribution wed like to generate random samples from, we design an ergodic

Markov chain whose unique stationary distribution is the desired distribution. We then run

the chain (i.e., start at an arbitrary state and evolve according to the transition matrix),

until the process is at (or close to) the stationary distribution.

In the last lecture we saw that the chain eventually reaches the stationary distribution. In

order for this Markov chain approach to be ecient we need to know how fast we converge

to stationarity. Most of the class will focus on methods for bounding the convergence rate.

However there are many chains which we believe converge quickly, but no one has proved

it. Today well look at a method which notices when we reach the stationary distribution.

The resulting samples are guaranteed to be from the stationary distribution, and in many

cases the algorithm is extremely ecient. The method is known as Coupling from the Past,

and was introduced by Propp and Wilson in 1996 [1].

Let /be an ergodic Markov chain dened on a space and transition matrix P. Thus, for

x, y and integer t > 0, P(x, y) is the probability of going from x to y in t steps. Let

be the stationary distribution of /.

We will use the three state Markov chain in Figure 3.1 as a running example. Consider the

following experiment. Create three copies of the Markov chain, each with a distinct starting

state A, B, and C. Dene a global coupling for the three chains. More precisely, a global

coupling is a joint evolution for all three chains, such that each chain viewed in isolation

is a faithful copy of the chain of interest. Now, as soon as all three chains reach the same

state (i.e., theyve all coupled), output the resulting state. Is the output from the stationary

distribution?

7-1

Lecture 7: September 12, 2006 7-2

C

B

A

1

.5

.5

.5

.5

Figure 7.1: A simple three state Markov chain

Our intuition from last class is that once weve coupled from all pairs of initial states, weve

reached stationarity. Thus, we might expect that the output is from stationarity. However,

this is false. In our example the output will never be state B. This because B is never the rst

state where all the chains couple, since B has a unique predecessor they would have coupled

a step earlier at state A. But B has some positive probability in the stationary distribution,

so the output is clearly not from the stationary distribution. However, it turns out that if

we run this experiment backwards, it works. Its a very clever idea. Lets formalize the

notion of a global coupling before detailing the general algorithm and analyzing it.

7.1.2 Formal Description

Let f : [0, 1] be a global coupling, i.e., for all x and r chosen uniformly at

random from [0, 1]

Pr(f(x, r) = y) = P(x, y). (7.1)

Thus, the random seed r denes the evolution from all states. In words, from X

t

, the

chain evolves by choosing r uniformly at random from the interval [0, 1] and then moving to

state f(X

t

, r).

Let r

t

, t Z be independent, each chosen uniformly at random from [0, 1]. Let f

t

:

be the function f(, r

t

) (i.e. f

t

(x) = f(x, r

t

)). For a chain (X

t

), the function f

t

will dene

the evolution at time t, i.e., X

t+1

= f(X

t

). Let t

1

< t

2

be integers. Let

F

t

2

t

1

(x) = (f

t

2

1

f

t

2

2

f

t

1

)(x) = f

t

2

1

(f

t

2

2

(. . . f

t

1

(x) . . . )).

Thus, F

t

2

t

1

denes the evolution from times t

1

t

2

. From (7.1) it follows that for every

Lecture 7: September 12, 2006 7-3

x, y

Pr(F

t

2

t

1

(x) = y) = P

t

2

t

1

(x, y).

The earlier (wrong!) algorithm (which simulates the chains forward in time) can now be

formalized as follows. We will run [[ chains X

will

have starting state , i.e., X

0

= . We will run all these chains simultaneously, using the

coupling f, and outputting the nal state when they coalesced. More formally, let T be the

rst time when [F

T

0

()[ = 1. (Observe that F

T

0

is a constant function, i.e., [F

T

0

()[ = 1, if all

the chains have reached the same state.) Then the output value is the unique element of the

set F

T

0

(). We asked the following question - will the output value be from the stationary

distribution?

Our Markov chain in Figure 3.1 shows that the output value does not have to be from the

stationary distribution. Now we can formalize the strange question - what would happen if

we run the chain from the past? More formally, let M be the rst time when [F

0

M

()[ = 1.

Output the unique element of F

0

M

(). What is the distribution of the output?

Theorem 7.1 ([1]) F

0

M

() has the same distribution as .

Proof: For any xed t > 0, note that, for all x, y ,

Pr

F

t

0

(x) = y

= Pr

F

0

t

(x) = y

.

The probability space for the LHS is over the choices r

1

, . . . , r

t

, whereas for the RHS its

over r

t+1

, . . . , r

1

, r

0

. Thus, regardless of which order we construct these t random seeds,

the distributions of F

t

0

and F

0

t

are the same.

Therefore, if we run the simulation innitely far in the past we reach stationarity. More

formally, for all x, y ,

lim

t

Pr

F

0

t

(x) = y

= lim

t

Pr

F

t

0

(x) = y

= (y).

For t = t

1

+t

2

, observe

F

0

t

= F

t

1

1

t

2

F

0

t

1

Therefore, if F

0

M

is a constant function, then for all t > M, all x ,

F

0

t

(x) = (F

M1

t

F

0

M

)(x) = F

0

M

(x).

Outputting F

0

M

(x) is the same as outputting F

0

Note, the above proof gets at the heart of the dierence for going backwards versus going

forward. Recall T is the rst time t such that F

t

0

is a constant function. Note for t > T we

know F

t

0

is a constant function, but it is not necessarily true that F

t

0

(x) = F

T

0

(x). Hence,

after time T all of the couplings converge, but where they converge might change as the time

increases. Whereas, we saw in the last proof, for t > M we have F

0

t

(x) = F

0

M

(x), thus

they continue to converge upon the same state.

Lecture 7: September 12, 2006 7-4

7.1.3 Implementation

Theorem 7.1 translates to an algorithm for perfect sampling from the stationary distribution

of the Markov chain /. However there are several issues we must deal with before we get

an ecient sampling algorithm

Is it possible to implement the algorithm eciently? More precisely, can we decide

[F

0

t

()[ = 1 in polynomial time even though the sample space of the Markov chain is

exponentially large?

How do we pick the coupling f? We would like to make E[M] small. Note that for a

bad choice of f it can even happen that P(M = ) > 0.

It turns out that for monotone systems we can tackle both of the above problems. We

will consider the following example which shows that in the case of the Ising model we can

use the natural partial ordering of the model to decide coalescence eciently by computing

F

0

t

for just two elements , . Moreover the expected running time exp M will be

O(T

mix

(1/4) ln[V [). Recall, T

mix

(1/4) is the time (from the worst initial state) until the

chain is within variation distance at most 1/4 from the stationary distribution.

7.1.4 The Ising Model

Consider the Ising model on an undirected graph G = (V, E). Let = +1, 1

V

. For

, recall the Hamiltonian is dened as

H() =

(u,v)E

1((u) ,= (v)) =

1

2

(u,v)E

(1 (u)(v)).

At inverse temperature > 0, the weight of a conguration is then

w() = exp(H()).

The Gibbs distribution is

() =

w()

Z

G

()

,

where Z

G

() is the partition function (or normalizing constant) dened as

Z

G

() =

w().

To sample from the Gibbs distribution we will use a simple single-site Markov chain known

as the Glauber dynamics (with Metropolis lter). The transitions X

t

X

t+1

are dened as

follows. From X

t

,

Lecture 7: September 12, 2006 7-5

Choose v

R

V , s

R

+1, 1 and r

R

[0, 1].

Let X

(v) = s and X

(w) = X

t

(w), w ,= v.

Set

X

t+1

=

if r min1, w(X

)/w(X

t

) = min1, e

H(X

)

/e

H(Xt)

X

t

otherwise

There is a natural partial order on where

1

_

2

i

1

(v)

2

(v) for all v V . Let ,

be the minimum and maximum elements of _.

Let f be the coupling in which v, s, r are the same for all chains. Then X

t

_ Y

t

implies

X

t+1

_ Y

t+1

, i.e., the coupling preserves the ordering. To see this note that

e

H(X

)

/e

H(Xt)

= exp

(s X

t

(v))

{u,v}E

X

t

(u)

.

Thus for s = 1 if chain Y

t

makes the move then X

t

also does, hence X

t+1

_ Y

t+1

. Similarly

for s = +1.

Since f preserves monotonicity,

[F

t

2

t

1

()[ = 1 F

t

2

t

1

() = F

t

2

t

1

().

The coupling time of f is the smallest T such that F

T

0

() = F

T

0

(). The coupling from the

past time of f is the smallest M such that F

0

M

() = F

0

M

(). We have that

Pr ( T > t ) = Pr

F

t

0

() ,= F

t

0

()

= Pr

F

0

t

() ,= F

0

t

()

= Pr ( M > t ) .

Hence M has the same distribution as T.

Let

d(t) = d

TV

(P

t

(, ), P

t

(, )).

Note,

d(t) 2 max

z

d

TV

(P

t

(z, ), ).

Thus, if we show T

mix

() t, then d(t) 2.

Lemma 7.2 Pr ( M > t ) = Pr ( T > t ) d(t)[V [.

Proof: For z let h(z) be the length of the longest decreasing chain in _ starting with

z. If X

t

> Y

t

, then h(X

t

) h(Y

t

) + 1.

Lecture 7: September 12, 2006 7-6

Now let X

0

= , Y

0

= . Then

Pr ( T > t ) = Pr ( X

t

,= Y

t

) E( h(Y

t

) h(X

t

) )

= E( h(Y

t

)) E(h(X

t

) )

[V [d

TV

(P

t

(, ), P

t

(, ))

= [V [d(t).

as desired.

Lemma 7.3 E( M ) = E( T ) = 2T

mix

(1/4) ln(4[V [).

Proof: Let t = T

mix

(1/4) log(4[V [). Note, T

mix

(1/4[V [) t. Hence, d(t) 1/2[V [.

Therefore, we have Pr ( T > t ) 1/2 and hence Pr ( T > kt ) 1/2

k

. Thus

E( T ) =

0

Pr ( T )

k0

tPr ( T kt ) t

k0

1

2

k

2t.

References

[1] James Gary Propp, David Bruce Wilson Exact Sampling with Coupled Markov Chains

and Applications to Statistical Mechanics, Random Structures Algorithms 9 (1996), no.

1-2, 223252.

- piuk-chapter02.pdfUploaded bykelvin
- (Book) Handbook of Statistics Vol 19 - Elsevier Science 2001 - Stochastic Processes. Theory and Methods North Holland - Shanbhag - RaoUploaded byalvaro562003
- Predictive Cyber SecurityUploaded byCS & IT
- Algorithmics and OptimizationUploaded byDina Zafarraya
- Martingale (Probability Theory) - Wikipedia, The Free EncyclopediaUploaded byHassanMubasher
- 2_BasicInformationTheoryUploaded byKhandai Seenanan
- ACTL2102 Final NotesUploaded byVinit Desai
- n Jobs and M- Machine ProblemUploaded byMohammed Afshan
- Exploring the Geostatistical Method for Estimating the Signal-To-Noise Ratio of ImagesUploaded bysolmazbabakan
- Stochastic ProcessesUploaded bypratgame
- 2Uploaded byCristina Alexandra
- 01621112(Packet Delay in Ocs)Uploaded byssuetengineer
- Implementation of Finite Field Arithmetic Operations for PolynomialUploaded byYehuda Giay
- DeMarzo, Vayanos, Zwiebel - Persuasion Bias, Social Influence and Unidimensional OpinionsUploaded bySchokn-Itrch
- Application of Markov Chain to the Assessment of Students Admission and Academic Performance in Ekiti State UniversityUploaded byIJSTR Research Publication
- The Canonical a Probabilistic Model for Temporal or Sequential Data is WhatUploaded byAndrea Fields
- week2Uploaded byRajesh Bathija
- Mi Ope Tri NetUploaded byjoisigar
- Plugin Science 24Uploaded byAsk Bulls Bear
- Antecedent EsUploaded byAndrson Kstro Cña
- QC2Uploaded byrajab111
- Persistent Placement Papers(Technical C and C++)Uploaded byiamsweetsam
- Note4 - Probability - Events(1)Uploaded byMohamed El Kiki
- 1234.pdfUploaded bysrinjoy chakravorty
- Algorithm AnalysisUploaded bysameermssb
- US Federal Reserve: 200104papUploaded byThe Fed
- Lecturenotes3 4 ProbabilityUploaded byarjunvenugopalachary
- A Full-factor Multivariate GARCH ModelUploaded byVu Nguyen
- Smashing Magazine podręcznik freelancera. tajniki sukcesu niezależnego projektanta stron www. smashing magazine fullUploaded byTavros Teams
- Astro1Uploaded bylolopopo28

- Sokolnikoff Theory of ElasticityUploaded byAdam Taylor
- CrossFlowSolid-LiquidExtractionCalculationswithKremsersequationsandgraphUploaded byImelia Yohed
- Chem_Lab.pdfUploaded byYoung-Hoon Sung
- The Densitiy of Cement Phases WorksheetsUploaded byHazem Diab
- Wave Power - Wikipedia,Uploaded byzvjesos
- 81898709-Millipore-UF-Catalog.pdfUploaded byShawkat
- 2.2 Quantum Mechanics 4Uploaded bylucaspristes
- Individual Assignment - Semester Mac 2018Uploaded byFarrina Aziz
- ExerciciosPROIIUploaded byCaique Ferreira
- 151 0548 FS2017 K3 Introduction to Polymer MaterialsUploaded byLu Hkarr
- A Finite Element Calculation of Stress Intensity Factors of Cruciform and Butt Welded Joints for Some Geometrical ParametersUploaded byAhmed M. Al-Mukhtar
- code SYSTEM.pdfUploaded byMuhammad Zuhayr
- Basics of EHL for Practical ApplicationUploaded byAnibal Rios
- The Development of High Performance Liquid Chromatography Systems for the Analysis of ....pdfUploaded bymlaorrc
- JTTEUploaded byAnonymous eCD5ZR
- AOFST_7-140Uploaded byRamyaa Lakshmi
- 2. MatricesUploaded byrajbharb
- 12-Aliphatic Nucleophilic Substitution.pdfUploaded byVenkatraj Gowdas
- Uploaded byfrayne
- POWER SYSTEM SIMULATION LAB - QUESTIONSUploaded byMATHANKUMAR.S
- PcaUploaded byRashedul Islam
- Test Paper for Paper 1 STAGE 7Uploaded byDương Ngọc Cường
- ButUploaded byJoel Wasserman
- HFE0710 Salleh Part2Uploaded bysuper_lativo
- GT 1 Paper 2 SolutionsUploaded byAnand Muruganantham
- Fly Ash Usage at Marine Structures to Resist Chloride and Sulfate AttacksUploaded bySuresh Rao
- Kinematics BansalUploaded byBHAAJI0001
- bioUploaded byAnkur
- TBR OChem1 OptUploaded byRamski
- Ideal and Real Gas LawsUploaded byAlex Lee