February 22, 2025

About

Various acquisition functions used in Bayesian optimization have been studied. However, in multi-objective optimization, I believe that the best algorithm in practice is q-Noisy Expected Hypervolume Improvement (qNEHVI). While it takes somewhat longer to compute, it has the advantages of being noise-resistant and capable of batch optimization. I feel that there is virtually no algorithm that consistently outperforms it on average. Of course, this depends on the specific task.

In any case, in this sense, it is worth knowing about qNEHVI. Algthough it seems that it is difficult to understand the entire algorithm, but it is good to know the concept. Anyway, I will try to summarize a part of it for reviewing it back in the future.

Reference

Since It would take time to read and understand it, I put references here to go contents quickly.

I will update this article gradually by reading these references.

Paper

https://botorch.org/tutorials/multi_objective_bo

https://openreview.net/pdf?id=A7pvvrlv68

https://arxiv.org/abs/2105.08195

Appendix

https://proceedings.neurips.cc/paper/2021/file/11704817e347269b7254e744b5e22dac-Supplemental.pdf

q-Noisy Expected Hypervolume Improvement (qNEHVI)

EHVI to qNEHVI

q Expected Hypervolume Improvement (NOT qNEHVI) is described as follows:

\alpha_{qEHVI}(\mathbf{X}_{\text{cand}} | \mathcal{P}) \approx \hat{\alpha}_{qEHVI}(\mathbf{X}_{\text{cand}} | \mathcal{P}) = \frac{1}{N} \sum_{t=1}^{N} HVI(\tilde{f}^{(t)}(\mathbf{X}_{\text{cand}}) | \mathcal{P})\\
\tilde{f}^{(t)} \sim p(f | \mathcal{D}) \quad \text{for } t = 1, \dots, N\\
\mathbf{X}_{\text{cand}} = \{ x_i \}_{i=1}^{q}

N is number of mc samples, and q is the size of candidates.

The definition of NHEVI is given as

\begin{align}
\alpha_{NEHVI}(\mathbf{x}) = \int \alpha_{EHVI}(\mathbf{x} | \mathcal{P}_n) p(f | \mathcal{D}_n) \, df
\end{align}

The paper says that “the integral in (1) is analytically intractable, but can easily be approximated using MC integration”

\begin{align}
\hat{\alpha}_{NEHVI}(\mathbf{x}) = \frac{1}{N} \sum_{t=1}^{N} HVI(\tilde{f}^{(t)}(\mathbf{x}) | \mathcal{P}_t)
\end{align}

So, the acquisition function of Parallel Noisy Expected Hypervolume Improvement (qNEHVI) is formulated like

\alpha_{qNEHVI}(\mathbf{X}_{\text{cand}}) = \int \alpha_{qEHVI}(\mathbf{X}_{\text{cand}} | \mathcal{P}_n) p(f | \mathcal{D}_n) \, df\\
\alpha_{qNEHVI}(\mathbf{X}_{\text{cand}}) \approx \hat{\alpha}_{qNEHVI}(\mathbf{X}_{\text{cand}}) = \frac{1}{N} \sum_{t=1}^{N} HVI(\tilde{f}^{(t)}(\mathbf{X}_{\text{cand}}) | \mathcal{P}_t)

The paper says that

Since optimizing q candidates jointly is a difficult numerical optimization problem over a q-dimensional domain, we use a sequential greedy approximation in the parallel setting and solve a sequence of q simpler optimization problems with d dimensions, which been shown empirically to improve optimization performance.

Cached Box Decompositions

Because the computation cost of (1) is quite high, they introduce “Cached Box Decomposition”.

…It is definitely complicated, I leave it for now.