Professional Documents
Culture Documents
Motivation
Metropolis among the top 10 algorithms in science and engineering. Use in Statistics, Econometrics, Physics, Computing science. Example: High dimensional problems such as computing the volume of a convex body in d dimensions.
Motivation
Normalizing factor in Bayes Theorem:
Rejection Sampling
Sample another easy to use distribution q(x) that satisfies p(x) <= Mq(x), M < Inf.
Importance Sampling
Why MCMC?
Wasting resources we need to spend more time on the tail that overlaps with E.
MCMC Principles
Even with adaptation, often impossible to obtain proposal distributions that are easy to sample from and good approximations at the same time. Markov Chain is used to explore the state space X. Transition matrix (kernels) are constructed so that the chain spends more time in the important regions.
MCMC Principles
For any starting point, the chain will converge to the invariant distribution p(x)
As long as T is a stochastic transition matrix
Irreducible graph should be connected. Aperiodicity chain should not get trapped in cycles.
One way to design a MCMC sampler is to satisfy this condition. However, convergence speed plays a more crucial role in terms of practicalities.
Mathematical Representation
Based on different kernels, different kinds of Markov Chain algorithms are possible. The most celebrated is the MetropolisHastings algorithm.
Metropolis-Hastings Algorithm
Metropolis-Hastings Algorithm
Rejection Term:
Detailed Balance:
Metropolis Algorithm
Assumes a symmetric random walk proposal.
Metropolis Algorithm
Normalizing constant of the target distribution is not required. (Cancels each other out) Parallelization Several independent chains can be simulated in parallel. Success or failure depends on the parameters selected for the proposal distribution.
Metropolis Algorithm
Simulated Annealing
Global Optimization. Could be estimated by
Argmax p(x_i), x_i, i = 1..N
Inefficient because random samples rarely come from the vicinity of the mode (blind sampling unless the distribution has large probability mass around the mode). Simulated Annealing is a variant of MCMC/Metropolis-Hastings that solves this problem.
Simulated Annealing
Simulated Annealing
Other Methods
Mixture of Kernels! Could be very useful when target distribution has many peaks
Can incorporate global proposals to explore vast regions of the state space. (global proposal locks into peaks) Local proposals to discover finer details. (explore space around peaks)