
About
Various acquisition functions used in Bayesian optimization have been studied. However, in multi-objective optimization, I believe that the best algorithm in practice is q-Noisy Expected Hypervolume Improvement (qNEHVI). While it takes somewhat longer to compute, it has the advantages of being noise-resistant and capable of batch optimization. I feel that there is virtually no algorithm that consistently outperforms it on average. Of course, this depends on the specific task.
In any case, in this sense, it is worth knowing about qNEHVI. Algthough it seems that it is difficult to understand the entire algorithm, but it is good to know the concept. Anyway, I will try to summarize a part of it for reviewing it back in the future.
Reference
Since It would take time to read and understand it, I put references here to go contents quickly.
I will update this article gradually by reading these references.
Paper
https://botorch.org/tutorials/multi_objective_bo
https://openreview.net/pdf?id=A7pvvrlv68
https://arxiv.org/abs/2105.08195
Appendix
https://proceedings.neurips.cc/paper/2021/file/11704817e347269b7254e744b5e22dac-Supplemental.pdf
q-Noisy Expected Hypervolume Improvement (qNEHVI)
EHVI to qNEHVI
q Expected Hypervolume Improvement (NOT qNEHVI) is described as follows:
\alpha_{qEHVI}(\mathbf{X}_{\text{cand}} | \mathcal{P}) \approx \hat{\alpha}_{qEHVI}(\mathbf{X}_{\text{cand}} | \mathcal{P}) = \frac{1}{N} \sum_{t=1}^{N} HVI(\tilde{f}^{(t)}(\mathbf{X}_{\text{cand}}) | \mathcal{P})\\ \tilde{f}^{(t)} \sim p(f | \mathcal{D}) \quad \text{for } t = 1, \dots, N\\ \mathbf{X}_{\text{cand}} = \{ x_i \}_{i=1}^{q}
N is number of mc samples, and q is the size of candidates.
The definition of NHEVI is given as
\begin{align} \alpha_{NEHVI}(\mathbf{x}) = \int \alpha_{EHVI}(\mathbf{x} | \mathcal{P}_n) p(f | \mathcal{D}_n) \, df \end{align}
The paper says that “the integral in (1) is analytically intractable, but can easily be approximated using MC integration”
\begin{align} \hat{\alpha}_{NEHVI}(\mathbf{x}) = \frac{1}{N} \sum_{t=1}^{N} HVI(\tilde{f}^{(t)}(\mathbf{x}) | \mathcal{P}_t) \end{align}
So, the acquisition function of Parallel Noisy Expected Hypervolume Improvement (qNEHVI) is formulated like
\alpha_{qNEHVI}(\mathbf{X}_{\text{cand}}) = \int \alpha_{qEHVI}(\mathbf{X}_{\text{cand}} | \mathcal{P}_n) p(f | \mathcal{D}_n) \, df\\ \alpha_{qNEHVI}(\mathbf{X}_{\text{cand}}) \approx \hat{\alpha}_{qNEHVI}(\mathbf{X}_{\text{cand}}) = \frac{1}{N} \sum_{t=1}^{N} HVI(\tilde{f}^{(t)}(\mathbf{X}_{\text{cand}}) | \mathcal{P}_t)
The paper says that
Since optimizing q candidates jointly is a difficult numerical optimization problem over a q-dimensional domain, we use a sequential greedy approximation in the parallel setting and solve a sequence of q simpler optimization problems with d dimensions, which been shown empirically to improve optimization performance.
Cached Box Decompositions
Because the computation cost of (1) is quite high, they introduce “Cached Box Decomposition”.
…It is definitely complicated, I leave it for now.