January 18, 2025

The Bayesian Information Criterion (BIC) is a measure used for statistical model selection. It helps to determine which model is most appropriate by considering both the complexity of the model and its fit to the data.

\begin{align}
\text{BIC} =  - 2 \log(\hat{L}) + k \log(n)\\
\end{align}

where L is the likelihood function, n is number of data samples, and k is the number of independent parameters in the model. The lower BIC score the model returns, the better model you get. It turns out that what you do is to estimate models and select one that returns the lowest BIC score.

Comparison with Variational Bayes Method

You might know Variational Bayes Method that is also going to estimate GMM, which estimate number of mixture ratio for each gaussian distribution in GMM, which one to be chosen?

I think that it depends on cases.

As you know, VBM is going to fit a model to data from initial distribution. But, the fitting is not going to fit well if number of data is not so many. This means that it remains relatively unchanged from the initial distribution.

On the other hand, regarding the models selected by BIC, there is a direct penalty related to the number of data points. Therefore, if the amount of data is small compared to the number of parameters, it might be appropriate to choose BIC. Although you would love the theoretical background of VB (Variational Bayes) since it is impressive, you should choose EM algorithm with the BIC.

Implementation of GMM with BIC using Scikit-Learn

from sklearn import mixture
import numpy as np

def estimate_gmm_bic(data):
    gmm_list = []
    bic_list = []

    components_range = range(1, 20)

    for loop in components_range:
        gmm = mixture.GaussianMixture(
            n_components=loop
        )
        gmm.fit(data)
        bic_list.append(gmm.bic(data))

        gmm_list.append(gmm)

    best_arg = np.argmin(bic_list)
    gmm_best = gmm_list[best_arg]
    return gmm_best