Review: EM algorithm

EM Steps in general

Given the joint distribution p(X, Z) of observed variables and latent variables, parameterized by θ, maximization of the log-likelihood function with respect to n can be achieved through the following steps:

1. Initialize Parameters

Initialize the parameter .

2. E STEP

Computer the posterior distribution of Z, p(Z | X, θ)

3. M STEP

Calculate the expected value of the log-likelihood function with respect to the posterior distribution, and update.

4. Check the log-likelihood and if it is converged

The log-likelihood is converged enough, stop the process.

In the case of Gaussian mixtures

Formulation

A Gaussian mixture model is defined as follows:

Latent Variables Z

The latent variable can be represented as a one-hot encoded vector of length , where each element of is a binary indicator (0 or 1). The probability that (i.e., the data point comes from the -th component) is given by the mixing coefficient .

The distribution of given is given by:

This expression represents the posterior probability that the data point belongs to the -th component, conditioned on and the parameters of the model.

The parameters of distribution

Of course this model is using Gaussian, the parameters can be represented as means and covariance matrices obviously. These parameters are updated in M step so as to be fitted to data points.