January 18, 2025

As we can see in a article(https://eye.kohei-kevin.com/2024/04/20/review-map-mle-bayes-estimation/) (which I posted before), MCMC algorithms are used for do bayes estimation. Generally there are 2 ways of the Bayes estimation using MCMC, Metropolis-Hastings and Gibbs Sampling. I will summarize these two methods and its difference.

The difference

  • Proposal Distribution: In Metropolis-Hastings, any proposal distribution can be used, whereas in Gibbs sampling, the proposal distribution is automatically the conditional distribution of the component being sampled.
  • Acceptance Probability: Metropolis-Hastings requires an acceptance step, but in Gibbs sampling, every proposal is accepted with probability 1 because it follows the conditional distribution directly.

Metropolis-Hastings

The Metropolis-Hastings algorithm uses a proposal distribution “q” to suggest the next state and accepts or rejects this proposal with a certain acceptance probability. This can be expressed with the following steps:

\begin{align*}
\textbf{1. Proposal Step:} & \\
\text{From the current state } x^{(t)}, & \\
\text{generate a new state } x' \text{ using the proposal distribution } q(x' \mid x^{(t)}). & \\
\textbf{2. Acceptance Step:} & \\
\text{Compute the acceptance probability } \alpha(x^{(t)}, x'): & \\
\alpha(x^{(t)}, x') = \min\left(1, \frac{p(x') \, q(x^{(t)} \mid x')}{p(x^{(t)}) \, q(x' \mid x^{(t)})}\right) & \\
\text{where } p(x) \text{ is the target distribution.} & \\
\textbf{3. Update Step:} & \\
\text{Generate a uniform random number } u \sim \text{Uniform}(0, 1), & \\
\text{and if } u \leq \alpha(x^{(t)}, x'), \text{ set } x^{(t+1)} = x'; & \\
\text{otherwise, } \text{ set } x^{(t+1)} = x^{(t)}. &
\end{align*}

Gibbs Sampling

Gibbs sampling is a method for sequentially sampling each component of a multi-dimensional random variable from its conditional distributions.

Suppose you have a multi-dimensional random variable

\mathbf{x} = (x_1, x_2, \ldots, x_n)

In Gibbs sampling, each component is updated sequentially. For the i-th component, fix all other components xi​ and sample xi​ from the conditional distribution:

x_i \sim p(x_i \mid \mathbf{x}_{-i})

This process is repeated for each component, cycling through the vector x repeatedly to generate a sequence of samples approximating the joint distribution.