October 19, 2024

About

Basically, contentious variables are inputted to Gaussian process model because this is Gaussian model. But Is it possible to handle categorical variables in Gaussian Process?

BoTorch

Actually, the BoTorch has the model and its kernel for handling categorical variables. I extracted the part where explains about it (https://botorch.org/api/models.html).

Model: MixedSingleTaskGP

It supports mixed search spaces, which combine discrete and continuous features, as well as solely discrete spaces. It uses a kernel that combines a CategoricalKernel (based on Hamming distances) and a regular kernel into a kernel of the form.

Kernel: CategoricalKernel

Computes exp(-dist(x1, x2) / lengthscale), where dist(x1, x2) is zero if x1 == x2 and one if x1 != x2. If the last dimension is not a batch dimension, then the mean is considered.

What important here is that this kernel is NOT differentiable with respect to the inputs.

How to compose these Kernels

K((x_1, c_1), (x_2, c_2)) = K_{\text{cont\_1}}(x_1, x_2) + K_{\text{cat\_1}}(c_1, c_2) + K_{\text{cont\_2}}(x_1, x_2) \cdot K_{\text{cat\_2}}(c_1, c_2)

Better than One-Hot-Encoding?

 In discussion in BoTorch GitHub Repository(https://github.com/pytorch/botorch/discussions/1450), even though the other says that the Bayesian Optimization Library “Ax” has function to handle categorical variables by One-Hot-Encoding, an inquirer implies that this way works well.