Multi-Fidelity Estimation: API Cookbook

PyApprox Tutorial Library

A single reference for every multi-fidelity estimator in PyApprox: the universal workflow, per-estimator quick reference, ACVSearch for automatic selection, multi-output estimation, and troubleshooting.

Download Notebook

Download as Jupyter Notebook

How to Use This Tutorial

This is the single code reference for all multi-fidelity estimators in PyApprox. Every estimator follows the same workflow; the per-estimator sections show only what differs. Read the Universal Workflow once, then jump to the estimator you need.

I want to…	Go to
See the full workflow end-to-end	Universal Workflow
Use a specific estimator	Estimator Quick Reference
Let PyApprox find the best estimator	ACVSearch In Depth
Estimate multiple QoIs jointly	Multi-Output Estimation
Choose which models to include	Ensemble Selection
Inspect allocations and DAGs	Visualization
Debug common problems	Troubleshooting

Shared Setup

All examples use the five-model polynomial benchmark. Replace benchmark with your own models and prior to apply any recipe to your problem.

import numpy as np
import matplotlib.pyplot as plt
from pyapprox.util.backends.numpy import NumpyBkd
from pyapprox_benchmarks.statest import (
    PolynomialEnsembleBenchmark,
)
from pyapprox.statest.statistics import MultiOutputMean
from pyapprox.statest.mc_estimator import MCEstimator
from pyapprox.statest import (
    MLMCEstimator, MFMCEstimator, GMFEstimator, GRDEstimator, GISEstimator,
)
from pyapprox.statest.acv import ACVAllocator, default_allocator_factory
from pyapprox.statest.acv.base import FittedACVEstimator
from pyapprox.statest.acv.search import ACVSearch
from pyapprox.statest.acv.strategies import (
    FixedRecursionStrategy, TreeDepthRecursionStrategy,
)
from pyapprox.statest.allocation import MCAllocator, CVAllocator
from pyapprox.statest.plotting import (
    plot_allocation, plot_estimator_variance_reductions,
)
from pyapprox.optimization.minimize.scipy.slsqp import ScipySLSQPOptimizer

bkd = NumpyBkd()
np.random.seed(42)

# SLSQP is more robust than the default trust-constr optimizer for ACV allocation
optimizer = ScipySLSQPOptimizer(maxiter=200)
allocator_factory = lambda est: default_allocator_factory(est, optimizer=optimizer)

benchmark = PolynomialEnsembleBenchmark(bkd, nmodels=5)
models    = benchmark.problem().models()        # f_0 (HF) ... f_4 (cheapest)
variable  = benchmark.problem().prior()
costs     = benchmark.problem().costs()
nqoi      = models[0].nqoi()
nmodels   = len(models)

costs_np = bkd.to_numpy(costs)
print(f"{nmodels} models, {nqoi} QoI(s)")
for a, c in enumerate(costs_np):
    print(f"  model {a}: cost = {c:.5f}")

5 models, 1 QoI(s)
  model 0: cost = 1.00000
  model 1: cost = 0.10000
  model 2: cost = 0.01000
  model 3: cost = 0.00100
  model 4: cost = 0.00010

Universal Workflow

Every multi-fidelity estimator in PyApprox follows five stages. This section walks through each stage once using MFMCEstimator. All other estimators substitute a different class at Stage 2 — everything else is identical.

Stage 1: Pilot study

Evaluate all models at a shared set of pilot samples to estimate the cross-covariance.

N_pilot = 50
samples_pilot = variable.rvs(N_pilot)
vals_pilot = [m(samples_pilot) for m in models]

stat = MultiOutputMean(nqoi, bkd)
cov_pilot, = stat.compute_pilot_quantities(vals_pilot)
stat.set_pilot_quantities(cov_pilot)

# Inspect pilot correlations with HF model
cov_np = bkd.to_numpy(cov_pilot)
for a in range(1, nmodels):
    rho = cov_np[0, a] / np.sqrt(cov_np[0, 0] * cov_np[a, a])
    print(f"  ρ(f0, f{a}) = {rho:.4f}")

  ρ(f0, f1) = 0.9954
  ρ(f0, f2) = 0.9756
  ρ(f0, f3) = 0.9246
  ρ(f0, f4) = 0.8112

Always subtract the pilot cost

The pilot uses real budget. Pass total_budget - pilot_cost to the allocator.

total_budget = 500.0
pilot_cost   = float(costs_np.sum()) * N_pilot
remaining    = total_budget - pilot_cost
print(f"Pilot cost: {pilot_cost:.1f}  |  Remaining: {remaining:.1f}")

Pilot cost: 55.6  |  Remaining: 444.4

Stage 2: Build estimator and allocate

est = MFMCEstimator(stat, costs)
allocator = default_allocator_factory(est)
result = allocator.allocate(remaining)
fitted = FittedACVEstimator(est, result)

print(f"Samples per model: {fitted.nsamples_per_model()}")
print(f"Predicted std:     {float(fitted.covariance()[0,0])**0.5:.6f}")

Samples per model: [   200   1313   6548  29506 170626]
Predicted std:     0.002432

Stage 3: Generate samples

samples_per_model = fitted.generate_samples_per_model(variable.rvs)
print(f"Sample shapes: {[s.shape for s in samples_per_model]}")

Sample shapes: [(1, 200), (1, 1313), (1, 6548), (1, 29506), (1, 170626)]

Stage 4: Evaluate models

values_per_model = [models[a](samples_per_model[a]) for a in range(nmodels)]

Stage 5: Compute estimate

estimate = fitted(values_per_model)

true_mean = float(bkd.to_numpy(benchmark.ensemble_means()[0, 0]))
print(f"Estimate:  {float(estimate):.6f}")
print(f"True mean: {true_mean:.6f}")
print(f"Error:     {abs(float(estimate) - true_mean):.6f}")

Estimate:  0.167531
True mean: 0.166667
Error:     0.000864

Compare against MC

stat_mc = MultiOutputMean(nqoi, bkd)
stat_mc.set_pilot_quantities(cov_pilot[:1, :1])
mc_est = MCEstimator(stat_mc, costs[:1])
mc_fitted = MCAllocator(mc_est).allocate(remaining)

mc_var  = float(mc_fitted.covariance()[0, 0])
mf_var  = float(fitted.covariance()[0, 0])
print(f"MC std:  {mc_var**0.5:.6f}")
print(f"MF std:  {mf_var**0.5:.6f}")
print(f"Variance reduction: {mc_var / mf_var:.1f}×")

MC std:  0.011500
MF std:  0.002432
Variance reduction: 22.4×

Estimator Quick Reference

Each tab shows only what differs from the universal workflow. Replace the Stage 2 block with the code below; Stages 1, 3–5 are unchanged.

Classical control variate. Requires the known LF population mean — rare in practice. Use ACV instead when the LF mean is unknown.

from pyapprox.statest import CVEstimator

# Only works with known LF mean (set via benchmark here)
lf_means = bkd.to_numpy(benchmark.ensemble_means())[1:]
stat_cv = MultiOutputMean(nqoi, bkd)
stat_cv.set_pilot_quantities(cov_pilot[:2, :2])

est_cv = CVEstimator(stat_cv, costs[:2], lf_means[:1])
fitted_cv = CVAllocator(est_cv).allocate(remaining)
print(f"CV predicted std: {float(fitted_cv.covariance()[0,0])**0.5:.6f}")

CV predicted std: 0.001152

Multi-Level Monte Carlo. Decomposes the HF mean into telescoping level corrections. Best when models form a natural refinement hierarchy (mesh levels).

est_mlmc = MLMCEstimator(stat, costs)
mlmc_allocator = default_allocator_factory(est_mlmc)
mlmc_result = mlmc_allocator.allocate(remaining)
fitted_mlmc = FittedACVEstimator(est_mlmc, mlmc_result)

N_alloc = fitted_mlmc.nsamples_per_model()
print(f"MLMC samples per level: {[int(n) for n in N_alloc]}")
print(f"MLMC predicted std: {float(fitted_mlmc.covariance()[0,0])**0.5:.6f}")

MLMC samples per level: [225, 1179, 5161, 23213, 261464]
MLMC predicted std: 0.002513

Multi-Fidelity Monte Carlo. Nested sample structure with closed-form allocation. Fast and robust for well-correlated hierarchies.

est_mfmc = MFMCEstimator(stat, costs)
mfmc_allocator = default_allocator_factory(est_mfmc)
mfmc_result = mfmc_allocator.allocate(remaining)
fitted_mfmc = FittedACVEstimator(est_mfmc, mfmc_result)

print(f"MFMC samples per model: {[int(n) for n in fitted_mfmc.nsamples_per_model()]}")
print(f"MFMC predicted std: {float(fitted_mfmc.covariance()[0,0])**0.5:.6f}")

MFMC samples per model: [200, 1313, 6548, 29506, 170626]
MFMC predicted std: 0.002432

Validity condition

MFMC requires correlations and costs to satisfy a monotonicity condition. If violated, the allocator raises a RuntimeError. Remove the offending model and retry, or use ACVSearch to find a better structure.

Best Linear Unbiased Estimator via GLS. Solves an SDP for the optimal allocation. Especially powerful when models share subsets of inputs.

from pyapprox.statest import MLBLUEEstimator
from pyapprox.statest.groupacv import FittedGroupACVEstimator, MLBLUESPDAllocationOptimizer

# Define which model subsets share samples
# Default: every model evaluated at its own set + HF shared
model_subsets = [[0, 1], [0, 2], [0, 3], [0, 4]]

est_mlblue = MLBLUEEstimator(stat, costs, model_subsets=model_subsets)
mlblue_allocator = MLBLUESPDAllocationOptimizer(est_mlblue)
mlblue_result = mlblue_allocator.optimize(remaining)
fitted_mlblue = FittedGroupACVEstimator(est_mlblue, mlblue_result)

print(f"MLBLUE predicted std: {float(fitted_mlblue.covariance()[0,0])**0.5:.6f}")

MLBLUE predicted std: 0.011513

ACVSearch In Depth

ACVSearch automatically selects the best estimator structure for a given problem and budget. This is the recommended approach unless you have a specific reason to use a named estimator.

Reasons to use a named estimator instead: (a) you know from theory which estimator is best, (b) you have many models and cannot afford enumeration time, (c) you specifically need MLMC (which is not an ACV estimator).

Search all families (default)

search_all = ACVSearch(stat, costs, allocator_factory=allocator_factory)
result_all = search_all.search(remaining)
best = result_all.best

var_best = float(bkd.to_numpy(best.covariance())[0, 0])
print(f"Best estimator: {best.__class__.__name__}")
print(f"Predicted std:  {var_best**0.5:.6f}")

Best estimator: FittedACVEstimator
Predicted std:  0.002477

Restrict to specific families

search_gmf = ACVSearch(stat, costs, estimator_classes=[GMFEstimator, GRDEstimator],
                       allocator_factory=allocator_factory)
result_gmf = search_gmf.search(remaining)
best_gmf = result_gmf.best

print(f"Best (GMF+GRD): {best_gmf.__class__.__name__}")
print(f"Predicted std:  {float(bkd.to_numpy(best_gmf.covariance())[0,0])**0.5:.6f}")

Best (GMF+GRD): FittedACVEstimator
Predicted std:  0.002117

Restrict search depth (fast for many models)

search_shallow = ACVSearch(
    stat, costs,
    recursion_strategy=TreeDepthRecursionStrategy(max_depth=2),
    allocator_factory=allocator_factory,
)
result_shallow = search_shallow.search(remaining)
best_shallow = result_shallow.best

print(f"Best (depth≤2): {best_shallow.__class__.__name__}")
print(f"Predicted std:  {float(bkd.to_numpy(best_shallow.covariance())[0,0])**0.5:.6f}")

Best (depth≤2): FittedACVEstimator
Predicted std:  0.001352

Fix a specific recursion index

# Star topology: every LF model corrects HF directly (= ACVMF)
ri_star = tuple([0] * (nmodels - 1))

search_fixed = ACVSearch(
    stat, costs,
    recursion_strategy=FixedRecursionStrategy(ri_star),
    allocator_factory=allocator_factory,
)
result_fixed = search_fixed.search(remaining)

print(f"Fixed γ={list(ri_star)}: {result_fixed.best.__class__.__name__}")
print(f"Predicted std: "
      f"{float(bkd.to_numpy(result_fixed.best.covariance())[0,0])**0.5:.6f}")

Fixed γ=[0, 0, 0, 0]: FittedACVEstimator
Predicted std: 0.001352

Inspecting search results

After any search, use the result to compare candidates.

fig, ax = plt.subplots(figsize=(6, 3.5))
plot_estimator_variance_reductions([best], ["ACVSearch best"], ax)
ax.set_title("Variance reduction vs MC", fontsize=11)
plt.tight_layout()
plt.show()

Figure 1: Variance reduction of the best estimator found by ACVSearch vs MC at the same budget.

Multi-Output Estimation

When the HF model produces multiple QoIs, estimating them jointly can reduce variance for each individual QoI compared to running separate scalar ACVs.

from pyapprox_benchmarks.statest import (
    MultiOutputEnsembleBenchmark,
)

bm_mo    = MultiOutputEnsembleBenchmark(bkd)
costs_mo = bm_mo.problem().costs()
nqoi_mo  = 2   # use first 2 of 3 QoIs
nmod_mo  = bm_mo.problem().nmodels()
full_nqoi_mo = bm_mo.problem().models()[0].nqoi()

# Extract 2-QoI covariance subset
cov_full = bm_mo.ensemble_covariance()
idx = []
for a in range(nmod_mo):
    for q in range(nqoi_mo):
        idx.append(a * full_nqoi_mo + q)
cov_mo = cov_full[np.ix_(idx, idx)]

# MOACV: joint estimation
stat_mo = MultiOutputMean(nqoi_mo, bkd)
stat_mo.set_pilot_quantities(cov_mo)
search_mo = ACVSearch(stat_mo, costs_mo, allocator_factory=allocator_factory)
result_mo = search_mo.search(100.0)
cov_moacv = bkd.to_numpy(result_mo.best.covariance())

# SOACV: independent scalar ACV per QoI
for qi in range(nqoi_mo):
    (cov_qi,) = stat_mo.get_pilot_quantities_subset(
        nmod_mo, nqoi_mo, list(range(nmod_mo)), [qi]
    )
    stat_qi = MultiOutputMean(1, bkd)
    stat_qi.set_pilot_quantities(cov_qi)
    search_qi = ACVSearch(stat_qi, costs_mo, allocator_factory=allocator_factory)
    result_qi = search_qi.search(100.0)
    var_so = float(bkd.to_numpy(result_qi.best.covariance())[0, 0])
    var_mo = cov_moacv[qi, qi]
    gain = (var_so - var_mo) / var_so * 100
    print(f"QoI {qi}: SOACV std={var_so**0.5:.5f}  MOACV std={var_mo**0.5:.5f}  gain={gain:.1f}%")

QoI 0: SOACV std=0.02358  MOACV std=0.01175  gain=75.2%
QoI 1: SOACV std=0.00948  MOACV std=0.00219  gain=94.7%

Ensemble Selection

When your model library contains models that may hurt rather than help, use AllSubsetsStrategy to search over model subsets.

from pyapprox.statest import AllSubsetsStrategy

search_ens = ACVSearch(
    stat, costs,
    estimator_classes=[GMFEstimator],
    model_strategy=AllSubsetsStrategy(max_models=4),
    allocator_factory=allocator_factory,
)
result_ens = search_ens.search(remaining, allow_failures=True)
best_ens = result_ens.best

print(f"Best subset estimator: {best_ens.__class__.__name__}")
print(f"Predicted std: {float(bkd.to_numpy(best_ens.covariance())[0,0])**0.5:.6f}")
print(f"Candidates evaluated: {result_ens.candidates_evaluated()}")
print(f"Candidates successful: {result_ens.candidates_successful()}")

Best subset estimator: FittedACVEstimator
Predicted std: 0.002512
Candidates evaluated: 14
Candidates successful: 14

Sizing max_models

Start with max_models = min(nmodels, 4). Increase if the variance-by-size curve is still decreasing at the cap.

Visualization

Allocation matrices

fig, ax = plt.subplots(figsize=(10, 5))
plot_allocation(best, ax, show_partition_sizes=True)
ax.set_title("Best estimator — allocation matrix", fontsize=11)
plt.tight_layout()
plt.show()

Figure 2: Allocation matrix for the best ACVSearch estimator. Each row is a partition; each column pair is a sample set. Numbers show partition sizes.

Troubleshooting

MFMC validity condition failure

ValueError: Validity condition violated for model 3

The MFMC allocation requires correlations and costs to satisfy a monotonicity condition. Fix: remove the offending model, or use ACVSearch which searches over all valid structures automatically.

Near-singular pilot covariance

If the pilot covariance is nearly singular (small eigenvalues), the optimal weights become unstable. Fix: increase N_pilot, or remove highly collinear models.

# Check pilot covariance condition number
eigvals = np.linalg.eigvalsh(cov_np)
print(f"Condition number: {eigvals.max() / eigvals.min():.1e}")
print(f"Smallest eigenvalue: {eigvals.min():.2e}")

Condition number: 4.3e+06
Smallest eigenvalue: 8.13e-08

Optimizer convergence (trust-constr vs SLSQP)

The default trust-constr optimizer can stall when the optimal allocation sits on a constraint boundary. Fix: pass an SLSQP optimizer to default_allocator_factory.

# Pass optimizer to default_allocator_factory
slsqp = ScipySLSQPOptimizer(maxiter=200)
slsqp_factory = lambda est: default_allocator_factory(est, optimizer=slsqp)

# Use with ACVSearch
search_slsqp = ACVSearch(stat, costs, allocator_factory=slsqp_factory)
result_slsqp = search_slsqp.search(remaining)
print(f"SLSQP result std: "
      f"{float(bkd.to_numpy(result_slsqp.best.covariance())[0,0])**0.5:.6f}")

SLSQP result std: 0.002477

Budget too small

If the allocator returns all samples on the HF model (no LF models used), the budget is too small for multi-fidelity estimation to help. The pilot cost alone may consume most of the budget. Fix: increase the total budget or reduce N_pilot.

--- title: "Multi-Fidelity Estimation: API Cookbook" subtitle: "PyApprox Tutorial Library" description: "A single reference for every multi-fidelity estimator in PyApprox: the universal workflow, per-estimator quick reference, ACVSearch for automatic selection, multi-output estimation, and troubleshooting." tutorial_type: usage topic: multi_fidelity difficulty: intermediate estimated_time: 15 render_time: 45 prerequisites: - acv_concept tags: - multi-fidelity - api - cookbook - acv - mlmc - mfmc - pacv - mlblue format: html: code-fold: false code-tools: true toc: true execute: echo: true warning: false jupyter: python3 --- ::: {.callout-tip collapse="true"} ## Download Notebook [Download as Jupyter Notebook](notebooks/multifidelity_estimation_cookbook.ipynb) ::: ## How to Use This Tutorial This is the **single code reference** for all multi-fidelity estimators in PyApprox. Every estimator follows the same workflow; the per-estimator sections show only what differs. Read the [Universal Workflow](#universal-workflow) once, then jump to the estimator you need. | I want to... | Go to | |---|---| | See the full workflow end-to-end | [Universal Workflow](#universal-workflow) | | Use a specific estimator | [Estimator Quick Reference](#estimator-quick-reference) | | Let PyApprox find the best estimator | [ACVSearch In Depth](#acvsearch-in-depth) | | Estimate multiple QoIs jointly | [Multi-Output Estimation](#multi-output-estimation) | | Choose which models to include | [Ensemble Selection](#ensemble-selection) | | Inspect allocations and DAGs | [Visualization](#visualization) | | Debug common problems | [Troubleshooting](#troubleshooting) | ## Shared Setup {#shared-setup} All examples use the five-model polynomial benchmark. Replace `benchmark` with your own models and prior to apply any recipe to your problem. ```{python} import numpy as np import matplotlib.pyplot as plt from pyapprox.util.backends.numpy import NumpyBkd from pyapprox_benchmarks.statest import ( PolynomialEnsembleBenchmark, ) from pyapprox.statest.statistics import MultiOutputMean from pyapprox.statest.mc_estimator import MCEstimator from pyapprox.statest import ( MLMCEstimator, MFMCEstimator, GMFEstimator, GRDEstimator, GISEstimator, ) from pyapprox.statest.acv import ACVAllocator, default_allocator_factory from pyapprox.statest.acv.base import FittedACVEstimator from pyapprox.statest.acv.search import ACVSearch from pyapprox.statest.acv.strategies import ( FixedRecursionStrategy, TreeDepthRecursionStrategy, ) from pyapprox.statest.allocation import MCAllocator, CVAllocator from pyapprox.statest.plotting import ( plot_allocation, plot_estimator_variance_reductions, ) from pyapprox.optimization.minimize.scipy.slsqp import ScipySLSQPOptimizer bkd = NumpyBkd() np.random.seed(42) # SLSQP is more robust than the default trust-constr optimizer for ACV allocation optimizer = ScipySLSQPOptimizer(maxiter=200) allocator_factory = lambda est: default_allocator_factory(est, optimizer=optimizer) benchmark = PolynomialEnsembleBenchmark(bkd, nmodels=5) models = benchmark.problem().models() # f_0 (HF) ... f_4 (cheapest) variable = benchmark.problem().prior() costs = benchmark.problem().costs() nqoi = models[0].nqoi() nmodels = len(models) costs_np = bkd.to_numpy(costs) print(f"{nmodels} models, {nqoi} QoI(s)") for a, c in enumerate(costs_np): print(f" model {a}: cost = {c:.5f}") ``` ## Universal Workflow {#universal-workflow} Every multi-fidelity estimator in PyApprox follows five stages. This section walks through each stage once using `MFMCEstimator`. All other estimators substitute a different class at Stage 2 — everything else is identical. ### Stage 1: Pilot study Evaluate all models at a shared set of pilot samples to estimate the cross-covariance. ```{python} N_pilot = 50 samples_pilot = variable.rvs(N_pilot) vals_pilot = [m(samples_pilot) for m in models] stat = MultiOutputMean(nqoi, bkd) cov_pilot, = stat.compute_pilot_quantities(vals_pilot) stat.set_pilot_quantities(cov_pilot) # Inspect pilot correlations with HF model cov_np = bkd.to_numpy(cov_pilot) for a in range(1, nmodels): rho = cov_np[0, a] / np.sqrt(cov_np[0, 0] * cov_np[a, a]) print(f" ρ(f0, f{a}) = {rho:.4f}") ``` ::: {.callout-warning} ## Always subtract the pilot cost The pilot uses real budget. Pass `total_budget - pilot_cost` to the allocator. ::: ```{python} total_budget = 500.0 pilot_cost = float(costs_np.sum()) * N_pilot remaining = total_budget - pilot_cost print(f"Pilot cost: {pilot_cost:.1f} | Remaining: {remaining:.1f}") ``` ### Stage 2: Build estimator and allocate ```{python} est = MFMCEstimator(stat, costs) allocator = default_allocator_factory(est) result = allocator.allocate(remaining) fitted = FittedACVEstimator(est, result) print(f"Samples per model: {fitted.nsamples_per_model()}") print(f"Predicted std: {float(fitted.covariance()[0,0])**0.5:.6f}") ``` ### Stage 3: Generate samples ```{python} samples_per_model = fitted.generate_samples_per_model(variable.rvs) print(f"Sample shapes: {[s.shape for s in samples_per_model]}") ``` ### Stage 4: Evaluate models ```{python} values_per_model = [models[a](samples_per_model[a]) for a in range(nmodels)] ``` ### Stage 5: Compute estimate ```{python} estimate = fitted(values_per_model) true_mean = float(bkd.to_numpy(benchmark.ensemble_means()[0, 0])) print(f"Estimate: {float(estimate):.6f}") print(f"True mean: {true_mean:.6f}") print(f"Error: {abs(float(estimate) - true_mean):.6f}") ``` ### Compare against MC ```{python} stat_mc = MultiOutputMean(nqoi, bkd) stat_mc.set_pilot_quantities(cov_pilot[:1, :1]) mc_est = MCEstimator(stat_mc, costs[:1]) mc_fitted = MCAllocator(mc_est).allocate(remaining) mc_var = float(mc_fitted.covariance()[0, 0]) mf_var = float(fitted.covariance()[0, 0]) print(f"MC std: {mc_var**0.5:.6f}") print(f"MF std: {mf_var**0.5:.6f}") print(f"Variance reduction: {mc_var / mf_var:.1f}×") ``` ## Estimator Quick Reference {#estimator-quick-reference} Each tab shows only what differs from the universal workflow. Replace the Stage 2 block with the code below; Stages 1, 3–5 are unchanged. ::: {.panel-tabset} ### CVEstimator Classical control variate. Requires the **known** LF population mean — rare in practice. Use ACV instead when the LF mean is unknown. ```{python} from pyapprox.statest import CVEstimator # Only works with known LF mean (set via benchmark here) lf_means = bkd.to_numpy(benchmark.ensemble_means())[1:] stat_cv = MultiOutputMean(nqoi, bkd) stat_cv.set_pilot_quantities(cov_pilot[:2, :2]) est_cv = CVEstimator(stat_cv, costs[:2], lf_means[:1]) fitted_cv = CVAllocator(est_cv).allocate(remaining) print(f"CV predicted std: {float(fitted_cv.covariance()[0,0])**0.5:.6f}") ``` ### MLMCEstimator Multi-Level Monte Carlo. Decomposes the HF mean into telescoping level corrections. Best when models form a natural refinement hierarchy (mesh levels). ```{python} est_mlmc = MLMCEstimator(stat, costs) mlmc_allocator = default_allocator_factory(est_mlmc) mlmc_result = mlmc_allocator.allocate(remaining) fitted_mlmc = FittedACVEstimator(est_mlmc, mlmc_result) N_alloc = fitted_mlmc.nsamples_per_model() print(f"MLMC samples per level: {[int(n) for n in N_alloc]}") print(f"MLMC predicted std: {float(fitted_mlmc.covariance()[0,0])**0.5:.6f}") ``` ### MFMCEstimator Multi-Fidelity Monte Carlo. Nested sample structure with closed-form allocation. Fast and robust for well-correlated hierarchies. ```{python} est_mfmc = MFMCEstimator(stat, costs) mfmc_allocator = default_allocator_factory(est_mfmc) mfmc_result = mfmc_allocator.allocate(remaining) fitted_mfmc = FittedACVEstimator(est_mfmc, mfmc_result) print(f"MFMC samples per model: {[int(n) for n in fitted_mfmc.nsamples_per_model()]}") print(f"MFMC predicted std: {float(fitted_mfmc.covariance()[0,0])**0.5:.6f}") ``` ::: {.callout-note} ## Validity condition MFMC requires correlations and costs to satisfy a monotonicity condition. If violated, the allocator raises a `RuntimeError`. Remove the offending model and retry, or use ACVSearch to find a better structure. ::: ### MLBLUEEstimator Best Linear Unbiased Estimator via GLS. Solves an SDP for the optimal allocation. Especially powerful when models share subsets of inputs. ```{python} from pyapprox.statest import MLBLUEEstimator from pyapprox.statest.groupacv import FittedGroupACVEstimator, MLBLUESPDAllocationOptimizer # Define which model subsets share samples # Default: every model evaluated at its own set + HF shared model_subsets = [[0, 1], [0, 2], [0, 3], [0, 4]] est_mlblue = MLBLUEEstimator(stat, costs, model_subsets=model_subsets) mlblue_allocator = MLBLUESPDAllocationOptimizer(est_mlblue) mlblue_result = mlblue_allocator.optimize(remaining) fitted_mlblue = FittedGroupACVEstimator(est_mlblue, mlblue_result) print(f"MLBLUE predicted std: {float(fitted_mlblue.covariance()[0,0])**0.5:.6f}") ``` ::: ## ACVSearch In Depth {#acvsearch-in-depth} `ACVSearch` automatically selects the best estimator structure for a given problem and budget. **This is the recommended approach unless you have a specific reason to use a named estimator.** Reasons to use a named estimator instead: (a) you know from theory which estimator is best, (b) you have many models and cannot afford enumeration time, (c) you specifically need MLMC (which is not an ACV estimator). ### Search all families (default) ```{python} search_all = ACVSearch(stat, costs, allocator_factory=allocator_factory) result_all = search_all.search(remaining) best = result_all.best var_best = float(bkd.to_numpy(best.covariance())[0, 0]) print(f"Best estimator: {best.__class__.__name__}") print(f"Predicted std: {var_best**0.5:.6f}") ``` ### Restrict to specific families ```{python} search_gmf = ACVSearch(stat, costs, estimator_classes=[GMFEstimator, GRDEstimator], allocator_factory=allocator_factory) result_gmf = search_gmf.search(remaining) best_gmf = result_gmf.best print(f"Best (GMF+GRD): {best_gmf.__class__.__name__}") print(f"Predicted std: {float(bkd.to_numpy(best_gmf.covariance())[0,0])**0.5:.6f}") ``` ### Restrict search depth (fast for many models) ```{python} search_shallow = ACVSearch( stat, costs, recursion_strategy=TreeDepthRecursionStrategy(max_depth=2), allocator_factory=allocator_factory, ) result_shallow = search_shallow.search(remaining) best_shallow = result_shallow.best print(f"Best (depth≤2): {best_shallow.__class__.__name__}") print(f"Predicted std: {float(bkd.to_numpy(best_shallow.covariance())[0,0])**0.5:.6f}") ``` ### Fix a specific recursion index ```{python} # Star topology: every LF model corrects HF directly (= ACVMF) ri_star = tuple([0] * (nmodels - 1)) search_fixed = ACVSearch( stat, costs, recursion_strategy=FixedRecursionStrategy(ri_star), allocator_factory=allocator_factory, ) result_fixed = search_fixed.search(remaining) print(f"Fixed γ={list(ri_star)}: {result_fixed.best.__class__.__name__}") print(f"Predicted std: " f"{float(bkd.to_numpy(result_fixed.best.covariance())[0,0])**0.5:.6f}") ``` ### Inspecting search results After any search, use the result to compare candidates. ```{python} #| fig-cap: "Variance reduction of the best estimator found by ACVSearch vs MC at the same budget." #| label: fig-search-comparison fig, ax = plt.subplots(figsize=(6, 3.5)) plot_estimator_variance_reductions([best], ["ACVSearch best"], ax) ax.set_title("Variance reduction vs MC", fontsize=11) plt.tight_layout() plt.show() ``` ## Multi-Output Estimation {#multi-output-estimation} When the HF model produces multiple QoIs, estimating them jointly can reduce variance for each individual QoI compared to running separate scalar ACVs. ```{python} from pyapprox_benchmarks.statest import ( MultiOutputEnsembleBenchmark, ) bm_mo = MultiOutputEnsembleBenchmark(bkd) costs_mo = bm_mo.problem().costs() nqoi_mo = 2 # use first 2 of 3 QoIs nmod_mo = bm_mo.problem().nmodels() full_nqoi_mo = bm_mo.problem().models()[0].nqoi() # Extract 2-QoI covariance subset cov_full = bm_mo.ensemble_covariance() idx = [] for a in range(nmod_mo): for q in range(nqoi_mo): idx.append(a * full_nqoi_mo + q) cov_mo = cov_full[np.ix_(idx, idx)] # MOACV: joint estimation stat_mo = MultiOutputMean(nqoi_mo, bkd) stat_mo.set_pilot_quantities(cov_mo) search_mo = ACVSearch(stat_mo, costs_mo, allocator_factory=allocator_factory) result_mo = search_mo.search(100.0) cov_moacv = bkd.to_numpy(result_mo.best.covariance()) # SOACV: independent scalar ACV per QoI for qi in range(nqoi_mo): (cov_qi,) = stat_mo.get_pilot_quantities_subset( nmod_mo, nqoi_mo, list(range(nmod_mo)), [qi] ) stat_qi = MultiOutputMean(1, bkd) stat_qi.set_pilot_quantities(cov_qi) search_qi = ACVSearch(stat_qi, costs_mo, allocator_factory=allocator_factory) result_qi = search_qi.search(100.0) var_so = float(bkd.to_numpy(result_qi.best.covariance())[0, 0]) var_mo = cov_moacv[qi, qi] gain = (var_so - var_mo) / var_so * 100 print(f"QoI {qi}: SOACV std={var_so**0.5:.5f} MOACV std={var_mo**0.5:.5f} gain={gain:.1f}%") ``` ## Ensemble Selection {#ensemble-selection} When your model library contains models that may hurt rather than help, use `AllSubsetsStrategy` to search over model subsets. ```{python} from pyapprox.statest import AllSubsetsStrategy search_ens = ACVSearch( stat, costs, estimator_classes=[GMFEstimator], model_strategy=AllSubsetsStrategy(max_models=4), allocator_factory=allocator_factory, ) result_ens = search_ens.search(remaining, allow_failures=True) best_ens = result_ens.best print(f"Best subset estimator: {best_ens.__class__.__name__}") print(f"Predicted std: {float(bkd.to_numpy(best_ens.covariance())[0,0])**0.5:.6f}") print(f"Candidates evaluated: {result_ens.candidates_evaluated()}") print(f"Candidates successful: {result_ens.candidates_successful()}") ``` ::: {.callout-tip} ## Sizing `max_models` Start with `max_models = min(nmodels, 4)`. Increase if the variance-by-size curve is still decreasing at the cap. ::: ## Visualization {#visualization} ### Allocation matrices ```{python} #| fig-cap: "Allocation matrix for the best ACVSearch estimator. Each row is a partition; each column pair is a sample set. Numbers show partition sizes." #| label: fig-cookbook-alloc fig, ax = plt.subplots(figsize=(10, 5)) plot_allocation(best, ax, show_partition_sizes=True) ax.set_title("Best estimator — allocation matrix", fontsize=11) plt.tight_layout() plt.show() ``` ## Troubleshooting {#troubleshooting} ### MFMC validity condition failure ``` ValueError: Validity condition violated for model 3 ``` The MFMC allocation requires correlations and costs to satisfy a monotonicity condition. Fix: remove the offending model, or use `ACVSearch` which searches over all valid structures automatically. ### Near-singular pilot covariance If the pilot covariance is nearly singular (small eigenvalues), the optimal weights become unstable. Fix: increase `N_pilot`, or remove highly collinear models. ```{python} # Check pilot covariance condition number eigvals = np.linalg.eigvalsh(cov_np) print(f"Condition number: {eigvals.max() / eigvals.min():.1e}") print(f"Smallest eigenvalue: {eigvals.min():.2e}") ``` ### Optimizer convergence (trust-constr vs SLSQP) The default `trust-constr` optimizer can stall when the optimal allocation sits on a constraint boundary. Fix: pass an SLSQP optimizer to `default_allocator_factory`. ```{python} # Pass optimizer to default_allocator_factory slsqp = ScipySLSQPOptimizer(maxiter=200) slsqp_factory = lambda est: default_allocator_factory(est, optimizer=slsqp) # Use with ACVSearch search_slsqp = ACVSearch(stat, costs, allocator_factory=slsqp_factory) result_slsqp = search_slsqp.search(remaining) print(f"SLSQP result std: " f"{float(bkd.to_numpy(result_slsqp.best.covariance())[0,0])**0.5:.6f}") ``` ### Budget too small If the allocator returns all samples on the HF model (no LF models used), the budget is too small for multi-fidelity estimation to help. The pilot cost alone may consume most of the budget. Fix: increase the total budget or reduce `N_pilot`.