About
Understanding EHVI is quite complicated. So by following the BoTorch code and some papers, I will quickly see the usage and basic concept,
https://arxiv.org/abs/2105.08195
Bayesian Estimation based on Expected Hyper Volume Improvement(EHVI)
in BoTorch Tutorial (https://botorch.org/tutorials/multi_objective_bo), what they do is to call 3 functions, fit_gpytorch_mll, qNoisyExpectedHypervolumeImprovement and optimize_acqf.
The fit_gpytorch_mll simply optimizes model parameters based on marginal likelihood loss (training), and the qNoisyExpectedHypervolumeImprovement defines what kind of acquisition function to use as metrics. And then, optimize_acqf gives us candidates in explanatory variables space where we could increase hyper volume improvement.
But what is Hypervolume itself or its process to be computed?
Code in Tutorial
def optimize_qnehvi_and_get_observation(
model, train_x, train_obj, sampler
):
"""Optimizes the qEHVI acquisition function, and returns a new candidate and observation."""
# partition non-dominated space into disjoint rectangles
acq_func = qNoisyExpectedHypervolumeImprovement(
model=model,
ref_point=problem.ref_point.tolist(), # use known reference point
X_baseline=normalize(train_x, problem.bounds),
prune_baseline=True, # prune baseline points that have estimated zero probability of being Pareto optimal
sampler=sampler,
)
# optimize
candidates, _ = optimize_acqf(
acq_function=acq_func,
bounds=standard_bounds,
q=BATCH_SIZE,
num_restarts=NUM_RESTARTS,
raw_samples=RAW_SAMPLES, # used for intialization heuristic
options={"batch_limit": 5, "maxiter": 200},
sequential=True,
)
# observe new values
new_x = unnormalize(candidates.detach(), bounds=problem.bounds)
return new_x
qnehvi_sampler = SobolQMCNormalSampler(sample_shape=torch.Size([MC_SAMPLES]))
fit_gpytorch_mll(mll_qnehvi)
new_x = optimize_qnehvi_and_get_observation(
model_qnehvi, train_x_qnehvi, train_obj_qnehvi, qnehvi_sampler
)
new_obj_true = problem(new_x)
new_obj = new_obj_true + torch.randn_like(new_obj_true) * NOISE_SE
Hypervolume Improvement
The definition of Hypervolume improvement in formula is like this.
\text{HVI}(P_0|P, r) = HV(P ∪ P0|r) − HV(P|r)
Suppose we have reference point on the left lower in the graph, and somehow we would like to get pareto front by maximizing 2 objectives. The blue area where is surrounded by pareto front and reference point is so called Hypervolume.
The points xd which are inside the hypervolume is said that these are weakly “dominated” against pareto front.
x_p \succeq x_d
The problem setting we have is to maximize that area by using Gaussian Process for black-box approximation. And the Gaussian Process is regarded as stochastic model, turn out that the hypervolume is also stochastic. So somehow, we would like to get expectation of hypervolume.
Expected Hypervolume Improvement
As we looked at previous section, the idea is to take expectation over Hypervolume based on Gaussian Process. But the problem is, we cannot compute the expectation analytically since it is Hypervolume. So, what the implementation does is to do Monte Carlo (MC) sampling and to take integration over the sampled data. What the SobolQMCNormalSampler in code plays is the role.
\alpha_{\text{EHVI}}(x) = \int \alpha_{\text{EHVI}}(x \mid \mathcal{P}_n) p(f \mid \mathcal{D}_n) \, df\\ \hat{\alpha}_{ \text{NEHVI}}(x) = \frac{1}{N} \sum_{t=1}^{N} \text{HVI}\left(\tilde{f}_t(x) \mid \mathcal{P}_t\right)
Optimize Acquisition Function
Once you’ve decided which acquisition function to use, the process will optimize the acquisition function you’ve set. This involves sampling based on the raw samples, calculating metrics like EHV, and optimizing the samples using backpropagation. By taking expectation over samples, you would know EHVI and the next candidate points where it is good to sample.