Multi-Task Gaussian Process

About

Multi-Output(MOGP) and Multi-Task(MTGP)

In BoTorch document, it mentions that the multi-task and multi-output gaussian processes are difference.

The MOGP is useful for capturing interdependencies among outputs and is suitable for predicting physical quantities, such as weather data. On the other hand, MTGP aims to improve the learning of each task through the sharing of information between tasks. Consequently, by considering correlations between tasks, it enables more advanced predictions.

Intrinsic Co-Regionalization Model (Multi-Task GPs)

The Intrinsic Co-Regionalization Model, so called (ICM) is popular formulation of Multi-Task GP. The article is going to explain the some versions of ICM which is implemented in BoTorch.

Kronecker Structure Kernel Model

The ICM kernel using Kronecker Structure is written as follows

\begin{align} K_{\text{multi}}((t, x),(t_0, x_0)) = K_t(t, t_0) \otimes K_x(x, x_0) \end{align}

The Kt shows task kernel matrix, means that it is covariance matrix between task to task. By taking Kronecker Product with Data kernel which is used in general Gaussian Process, kernel matrix to be optimized is gotten. This formulation is beautiful, and easy to represent the correlation between the task, and the number of parameters is few. There are many advantage.

This structure is basically for Isotopic Datasets.

Because MultitaskKernel MultitaskMean support us, the implementation in Gpytorch is easy

class MultitaskGPModel_kronecker(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel_kronecker, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.MultitaskMean(
            gpytorch.means.ConstantMean(), num_tasks=2
        )
        self.covar_module = gpytorch.kernels.MultitaskKernel(
            gpytorch.kernels.RBFKernel(), num_tasks=2, rank=1
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)


likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=2)
model = MultitaskGPModel_kronecker(train_x, train_y, likelihood)

Reference

E. Bonilla, K. Chai and C. Williams. Multi-task Gaussian Process Prediction. Advances in Neural Information Processing Systems 20, NeurIPS 2007.

K. Swersky, J. Snoek and R. Adams. Multi-Task Bayesian Optimization. Advances in Neural Information Processing Systems 26, NeurIPS 2013.

https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html

Hadamard Structure Kernel Model

Another idea is to use Hadamard Structure to represent the kernel matrix. The task kernel is composed by using the indices to each task.

\begin{align} K_{\text{multi}}((i, x),(i_0, x_0)) = K_{II} \circ K_{XX} \end{align}

It means that the data structure is heterotopic. Whereas Kronecker structure assumes isotopic. The number of parameter itself is as same as that of Kronecker one, but it is more flexible. Meanwhile, it is not computationally efficient since we need to pass task index and data in which amount is number of input data multiply number of task when it is isotopic.

Even though it uses IndexKernel(which is actually used in MultitaskKernel), it gets more complicated than Kronecker Multitask midel above.

class MultitaskGPModel_hadmard(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel_hadmard, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.RBFKernel()

        # We learn an IndexKernel for 2 tasks
        # (so we'll actually learn 2x2=4 tasks with correlations)
        self.task_covar_module = gpytorch.kernels.IndexKernel(num_tasks=2, rank=1)

    def forward(self, x, i):
        mean_x = self.mean_module(x)

        # Get input-input covariance
        covar_x = self.covar_module(x)
        # Get task-task covariance
        covar_i = self.task_covar_module(i)
        # Multiply the two together to get the covariance we want
        covar = covar_x.mul(covar_i)

        return gpytorch.distributions.MultivariateNormal(mean_x, covar)


likelihood = gpytorch.likelihoods.GaussianLikelihood()

train_i_task1 = torch.full((train_x1.shape[0], 1), dtype=torch.long, fill_value=0)
train_i_task2 = torch.full((train_x2.shape[0], 1), dtype=torch.long, fill_value=1)

full_train_x = torch.cat([train_x1, train_x2])
full_train_i = torch.cat([train_i_task1, train_i_task2])
full_train_y = torch.cat([train_y1, train_y2])

# Here we have two items that we're passing in as train_inputs
model = MultitaskGPModel_hadmard((full_train_x, full_train_i), full_train_y, likelihood)

Reference

W. Maddox, M. Balandat, A. Wilson, and E. Bakshy. Bayesian Optimization with High-Dimensional Outputs. https://arxiv.org/abs/2106.12997, Jun 2021.

https://docs.gpytorch.ai/en/latest/examples/03_Multitask_Exact_GPs/Hadamard_Multitask_GP_Regression.html

Fully Bayesian ICM

The document says that “the kernel uses the SAAS prior to model high-dimensional parameter spaces”.

Reference

D. Eriksson, M. Jankowiak. High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces. Proceedings of the Thirty- Seventh Conference on Uncertainty in Artificial Intelligence, 2021.