October 19, 2024

About

TuRBO is an algorithm that adaptively determines a specific local region, termed the ‘trusted region,’ based on the outcomes of Bayesian optimization using Gaussian processes. This region is adjusted to focus the search on areas where improvements are most likely, using model uncertainty estimates to guide exploration and exploitation strategies. How are these uncertainty estimates derived and utilized in adjusting the trusted region?

TuRBO code in BoTorch

If you see implementation(https://botorch.org/tutorials/turbo_1) in BoTorch, you would find this code.

x_center = X[Y.argmax(), :].clone()
weights = model.covar_module.base_kernel.lengthscale.squeeze().detach()
weights = weights / weights.mean()
weights = weights / torch.prod(weights.pow(1.0 / len(weights)))
tr_lb = torch.clamp(x_center – weights * state.length / 2.0, 0.0, 1.0)
tr_ub = torch.clamp(x_center + weights * state.length / 2.0, 0.0, 1.0)

This is how they decide the search region.

It means that the code uses kernel parameter “lengthscale” for defining the region. So What is the kernel length scale here? In RBF Kernel for example,

k(x, x') = \exp\left(-\frac{1}{2} \sum_{d=1}^D \frac{(x_d - x'_d)^2}{l_d^2}\right)

l_d^2 is the length scale that represents density of point cloud. The distance between point to point gets further If this value is larger, and closer if the value is smaller. The value controls the sensitivity of algorithm. The GPs gets smoother when it is larger, and gets more sensitive when the value is smaller. This means that the algorithm is searching for regions with significant local variations.

A key feature of this algorithm is

  • it uses local Gaussian processes, which allows for avoiding the introduction of complex models
  • The lengthscale values are determined through learning
  • Training framework is totally different from standard algorithm. So, as of now, Ax does NOT support officially.
  • The method outperforms the other algorithms in many single objective problems.
  • Using it for multi-objective blackbox optimization is not common

Reference

[1] David ErikssonMichael PearceJacob R GardnerRyan TurnerMatthias Poloczek Scalable Global Optimization via Local Bayesian Optimization in https://arxiv.org/abs/1910.01739