Multi-Fidelity Estimation: API Cookbook

PyApprox Tutorial Library

A single reference for every multi-fidelity estimator in PyApprox: the universal workflow, per-estimator quick reference, ACVSearch for automatic selection, multi-output estimation, and troubleshooting.

How to Use This Tutorial

This is the single code reference for all multi-fidelity estimators in PyApprox. Every estimator follows the same workflow; the per-estimator sections show only what differs. Read the Universal Workflow once, then jump to the estimator you need.

I want to… Go to
See the full workflow end-to-end Universal Workflow
Use a specific estimator Estimator Quick Reference
Let PyApprox find the best estimator ACVSearch In Depth
Estimate multiple QoIs jointly Multi-Output Estimation
Choose which models to include Ensemble Selection
Inspect allocations and DAGs Visualization
Debug common problems Troubleshooting

Shared Setup

All examples use the five-model polynomial benchmark. Replace benchmark with your own models and prior to apply any recipe to your problem.

import numpy as np
import matplotlib.pyplot as plt
from pyapprox.util.backends.numpy import NumpyBkd
from pyapprox.benchmarks.instances.multifidelity.polynomial_ensemble import (
    polynomial_ensemble_5model,
)
from pyapprox.statest.statistics import MultiOutputMean
from pyapprox.statest.mc_estimator import MCEstimator
from pyapprox.statest import (
    MLMCEstimator, MFMCEstimator, GMFEstimator, GRDEstimator, GISEstimator,
)
from pyapprox.statest.acv import ACVAllocator, default_allocator_factory
from pyapprox.statest.acv.search import ACVSearch
from pyapprox.statest.acv.strategies import (
    FixedRecursionStrategy, TreeDepthRecursionStrategy,
)
from pyapprox.statest.plotting import (
    plot_allocation, plot_estimator_variance_reductions,
)
from pyapprox.optimization.minimize.scipy.slsqp import ScipySLSQPOptimizer

bkd = NumpyBkd()
np.random.seed(42)

# SLSQP is more robust than the default trust-constr optimizer for ACV allocation
optimizer = ScipySLSQPOptimizer(maxiter=200)
allocator_factory = lambda est: default_allocator_factory(est, optimizer=optimizer)

benchmark = polynomial_ensemble_5model(bkd)
models    = benchmark.models()        # f_0 (HF) ... f_4 (cheapest)
variable  = benchmark.prior()
costs     = benchmark.costs()
nqoi      = models[0].nqoi()
nmodels   = len(models)

costs_np = bkd.to_numpy(costs)
print(f"{nmodels} models, {nqoi} QoI(s)")
for a, c in enumerate(costs_np):
    print(f"  model {a}: cost = {c:.5f}")
5 models, 1 QoI(s)
  model 0: cost = 1.00000
  model 1: cost = 0.10000
  model 2: cost = 0.01000
  model 3: cost = 0.00100
  model 4: cost = 0.00010

Universal Workflow

Every multi-fidelity estimator in PyApprox follows five stages. This section walks through each stage once using MFMCEstimator. All other estimators substitute a different class at Stage 2 — everything else is identical.

Stage 1: Pilot study

Evaluate all models at a shared set of pilot samples to estimate the cross-covariance.

N_pilot = 50
samples_pilot = variable.rvs(N_pilot)
vals_pilot = [m(samples_pilot) for m in models]

stat = MultiOutputMean(nqoi, bkd)
cov_pilot, = stat.compute_pilot_quantities(vals_pilot)
stat.set_pilot_quantities(cov_pilot)

# Inspect pilot correlations with HF model
cov_np = bkd.to_numpy(cov_pilot)
for a in range(1, nmodels):
    rho = cov_np[0, a] / np.sqrt(cov_np[0, 0] * cov_np[a, a])
    print(f"  ρ(f0, f{a}) = {rho:.4f}")
  ρ(f0, f1) = 0.9954
  ρ(f0, f2) = 0.9756
  ρ(f0, f3) = 0.9246
  ρ(f0, f4) = 0.8112
WarningAlways subtract the pilot cost

The pilot uses real budget. Pass total_budget - pilot_cost to the allocator.

total_budget = 500.0
pilot_cost   = float(costs_np.sum()) * N_pilot
remaining    = total_budget - pilot_cost
print(f"Pilot cost: {pilot_cost:.1f}  |  Remaining: {remaining:.1f}")
Pilot cost: 55.6  |  Remaining: 444.4

Stage 2: Build estimator and allocate

est = MFMCEstimator(stat, costs)
allocator = default_allocator_factory(est)
result = allocator.allocate(remaining)
est.set_allocation(result)

print(f"Samples per model: {est.nsamples_per_model()}")
print(f"Predicted std:     {float(est.optimized_covariance()[0,0])**0.5:.6f}")
Samples per model: [   200   1313   6548  29506 170626]
Predicted std:     0.002432

Stage 3: Generate samples

samples_per_model = est.generate_samples_per_model(variable.rvs)
print(f"Sample shapes: {[s.shape for s in samples_per_model]}")
Sample shapes: [(1, 200), (1, 1313), (1, 6548), (1, 29506), (1, 170626)]

Stage 4: Evaluate models

values_per_model = [models[a](samples_per_model[a]) for a in range(nmodels)]

Stage 5: Compute estimate

result = est(values_per_model)

true_mean = float(bkd.to_numpy(benchmark.ensemble_means()[0, 0]))
print(f"Estimate:  {float(result):.6f}")
print(f"True mean: {true_mean:.6f}")
print(f"Error:     {abs(float(result) - true_mean):.6f}")
Estimate:  0.167531
True mean: 0.166667
Error:     0.000864

Compare against MC

stat_mc = MultiOutputMean(nqoi, bkd)
stat_mc.set_pilot_quantities(cov_pilot[:1, :1])
mc_est = MCEstimator(stat_mc, costs[:1])
mc_est.allocate_samples(remaining)

mc_var  = float(mc_est.optimized_covariance()[0, 0])
mf_var  = float(est.optimized_covariance()[0, 0])
print(f"MC std:  {mc_var**0.5:.6f}")
print(f"MF std:  {mf_var**0.5:.6f}")
print(f"Variance reduction: {mc_var / mf_var:.1f}×")
MC std:  0.011500
MF std:  0.002432
Variance reduction: 22.4×

Estimator Quick Reference

Each tab shows only what differs from the universal workflow. Replace the Stage 2 block with the code below; Stages 1, 3–5 are unchanged.

Classical control variate. Requires the known LF population mean — rare in practice. Use ACV instead when the LF mean is unknown.

from pyapprox.statest import CVEstimator

# Only works with known LF mean (set via benchmark here)
lf_means = bkd.to_numpy(benchmark.ensemble_means())[1:]
stat_cv = MultiOutputMean(nqoi, bkd)
stat_cv.set_pilot_quantities(cov_pilot[:2, :2])

est_cv = CVEstimator(stat_cv, costs[:2], lf_means[:1])
est_cv.allocate_samples(remaining)
print(f"CV predicted std: {float(est_cv.optimized_covariance()[0,0])**0.5:.6f}")
CV predicted std: 0.001152

Multi-Level Monte Carlo. Decomposes the HF mean into telescoping level corrections. Best when models form a natural refinement hierarchy (mesh levels).

est_mlmc = MLMCEstimator(stat, costs)
mlmc_allocator = default_allocator_factory(est_mlmc)
mlmc_result = mlmc_allocator.allocate(remaining)
est_mlmc.set_allocation(mlmc_result)

N_alloc = est_mlmc.nsamples_per_model()
print(f"MLMC samples per level: {[int(n) for n in N_alloc]}")
print(f"MLMC predicted std: {float(est_mlmc.optimized_covariance()[0,0])**0.5:.6f}")
MLMC samples per level: [225, 1179, 5161, 23213, 261464]
MLMC predicted std: 0.002513

Multi-Fidelity Monte Carlo. Nested sample structure with closed-form allocation. Fast and robust for well-correlated hierarchies.

est_mfmc = MFMCEstimator(stat, costs)
mfmc_allocator = default_allocator_factory(est_mfmc)
mfmc_result = mfmc_allocator.allocate(remaining)
est_mfmc.set_allocation(mfmc_result)

print(f"MFMC samples per model: {[int(n) for n in est_mfmc.nsamples_per_model()]}")
print(f"MFMC predicted std: {float(est_mfmc.optimized_covariance()[0,0])**0.5:.6f}")
MFMC samples per model: [200, 1313, 6548, 29506, 170626]
MFMC predicted std: 0.002432
NoteValidity condition

MFMC requires correlations and costs to satisfy a monotonicity condition. If violated, the allocator raises a RuntimeError. Remove the offending model and retry, or use ACVSearch to find a better structure.

Best Linear Unbiased Estimator via GLS. Solves an SDP for the optimal allocation. Especially powerful when models share subsets of inputs.

from pyapprox.statest import MLBLUEEstimator
from pyapprox.statest.groupacv import MLBLUESPDAllocationOptimizer

# Define which model subsets share samples
# Default: every model evaluated at its own set + HF shared
model_subsets = [[0, 1], [0, 2], [0, 3], [0, 4]]

est_mlblue = MLBLUEEstimator(stat, costs, model_subsets=model_subsets)
mlblue_allocator = MLBLUESPDAllocationOptimizer(est_mlblue)
mlblue_result = mlblue_allocator.optimize(remaining)
est_mlblue.set_allocation(mlblue_result)

print(f"MLBLUE predicted std: {float(est_mlblue.optimized_covariance()[0,0])**0.5:.6f}")
MLBLUE predicted std: 0.011513

ACVSearch In Depth

ACVSearch automatically selects the best estimator structure for a given problem and budget. This is the recommended approach unless you have a specific reason to use a named estimator.

Reasons to use a named estimator instead: (a) you know from theory which estimator is best, (b) you have many models and cannot afford enumeration time, (c) you specifically need MLMC (which is not an ACV estimator).

Search all families (default)

search_all = ACVSearch(stat, costs, allocator_factory=allocator_factory)
result_all = search_all.search(remaining)
best = result_all.estimator

var_best = float(bkd.to_numpy(best.optimized_covariance())[0, 0])
print(f"Best estimator: {best.__class__.__name__}")
print(f"Predicted std:  {var_best**0.5:.6f}")
Best estimator: GMFEstimator
Predicted std:  0.002432

Restrict to specific families

search_gmf = ACVSearch(stat, costs, estimator_classes=[GMFEstimator, GRDEstimator],
                       allocator_factory=allocator_factory)
result_gmf = search_gmf.search(remaining)
best_gmf = result_gmf.estimator

print(f"Best (GMF+GRD): {best_gmf.__class__.__name__}")
print(f"Predicted std:  {float(bkd.to_numpy(best_gmf.optimized_covariance())[0,0])**0.5:.6f}")
Best (GMF+GRD): GMFEstimator
Predicted std:  0.002432

Restrict search depth (fast for many models)

search_shallow = ACVSearch(
    stat, costs,
    recursion_strategy=TreeDepthRecursionStrategy(max_depth=2),
    allocator_factory=allocator_factory,
)
result_shallow = search_shallow.search(remaining)
best_shallow = result_shallow.estimator

print(f"Best (depth≤2): {best_shallow.__class__.__name__}")
print(f"Predicted std:  {float(bkd.to_numpy(best_shallow.optimized_covariance())[0,0])**0.5:.6f}")
Best (depth≤2): GMFEstimator
Predicted std:  0.001352

Fix a specific recursion index

# Star topology: every LF model corrects HF directly (= ACVMF)
ri_star = tuple([0] * (nmodels - 1))

search_fixed = ACVSearch(
    stat, costs,
    recursion_strategy=FixedRecursionStrategy(ri_star),
    allocator_factory=allocator_factory,
)
result_fixed = search_fixed.search(remaining)

print(f"Fixed γ={list(ri_star)}: {result_fixed.estimator.__class__.__name__}")
print(f"Predicted std: "
      f"{float(bkd.to_numpy(result_fixed.estimator.optimized_covariance())[0,0])**0.5:.6f}")
Fixed γ=[0, 0, 0, 0]: GMFEstimator
Predicted std: 0.001352

Inspecting search results

After any search, use the result to compare candidates.

fig, ax = plt.subplots(figsize=(6, 3.5))
plot_estimator_variance_reductions([best], ["ACVSearch best"], ax)
ax.set_title("Variance reduction vs MC", fontsize=11)
plt.tight_layout()
plt.show()
Figure 1: Variance reduction of the best estimator found by ACVSearch vs MC at the same budget.

Multi-Output Estimation

When the HF model produces multiple QoIs, estimating them jointly can reduce variance for each individual QoI compared to running separate scalar ACVs.

from pyapprox.benchmarks.instances.multifidelity.multioutput_ensemble import (
    multioutput_ensemble_3x3,
)

bm_mo    = multioutput_ensemble_3x3(bkd)
costs_mo = bm_mo.costs()
nqoi_mo  = 2   # use first 2 of 3 QoIs
nmod_mo  = bm_mo.nmodels()
full_nqoi_mo = bm_mo.models()[0].nqoi()

# Extract 2-QoI covariance subset
cov_full = bm_mo.ensemble_covariance()
idx = []
for a in range(nmod_mo):
    for q in range(nqoi_mo):
        idx.append(a * full_nqoi_mo + q)
cov_mo = cov_full[np.ix_(idx, idx)]

# MOACV: joint estimation
stat_mo = MultiOutputMean(nqoi_mo, bkd)
stat_mo.set_pilot_quantities(cov_mo)
search_mo = ACVSearch(stat_mo, costs_mo, allocator_factory=allocator_factory)
result_mo = search_mo.search(100.0)
cov_moacv = bkd.to_numpy(result_mo.estimator.optimized_covariance())

# SOACV: independent scalar ACV per QoI
for qi in range(nqoi_mo):
    (cov_qi,) = stat_mo.get_pilot_quantities_subset(
        nmod_mo, nqoi_mo, list(range(nmod_mo)), [qi]
    )
    stat_qi = MultiOutputMean(1, bkd)
    stat_qi.set_pilot_quantities(cov_qi)
    search_qi = ACVSearch(stat_qi, costs_mo, allocator_factory=allocator_factory)
    result_qi = search_qi.search(100.0)
    var_so = float(bkd.to_numpy(result_qi.estimator.optimized_covariance())[0, 0])
    var_mo = cov_moacv[qi, qi]
    gain = (var_so - var_mo) / var_so * 100
    print(f"QoI {qi}: SOACV std={var_so**0.5:.5f}  MOACV std={var_mo**0.5:.5f}  gain={gain:.1f}%")
QoI 0: SOACV std=0.02358  MOACV std=0.01228  gain=72.9%
QoI 1: SOACV std=0.00948  MOACV std=0.00302  gain=89.8%

Ensemble Selection

When your model library contains models that may hurt rather than help, use AllSubsetsStrategy to search over model subsets.

from pyapprox.statest import AllSubsetsStrategy

search_ens = ACVSearch(
    stat, costs,
    estimator_classes=[GMFEstimator],
    model_strategy=AllSubsetsStrategy(max_models=4),
    allocator_factory=allocator_factory,
)
result_ens = search_ens.search(remaining, allow_failures=True)
best_ens = result_ens.estimator

print(f"Best subset estimator: {best_ens.__class__.__name__}")
print(f"Predicted std: {float(bkd.to_numpy(best_ens.optimized_covariance())[0,0])**0.5:.6f}")
print(f"Candidates evaluated: {result_ens.candidates_evaluated()}")
print(f"Candidates successful: {result_ens.candidates_successful()}")
Best subset estimator: GMFEstimator
Predicted std: 0.002512
Candidates evaluated: 14
Candidates successful: 14
TipSizing max_models

Start with max_models = min(nmodels, 4). Increase if the variance-by-size curve is still decreasing at the cap.

Visualization

Allocation matrices

fig, ax = plt.subplots(figsize=(10, 5))
plot_allocation(best, ax, show_partition_sizes=True)
ax.set_title("Best estimator — allocation matrix", fontsize=11)
plt.tight_layout()
plt.show()
Figure 2: Allocation matrix for the best ACVSearch estimator. Each row is a partition; each column pair is a sample set. Numbers show partition sizes.

Troubleshooting

MFMC validity condition failure

ValueError: Validity condition violated for model 3

The MFMC allocation requires correlations and costs to satisfy a monotonicity condition. Fix: remove the offending model, or use ACVSearch which searches over all valid structures automatically.

Near-singular pilot covariance

If the pilot covariance is nearly singular (small eigenvalues), the optimal weights become unstable. Fix: increase N_pilot, or remove highly collinear models.

# Check pilot covariance condition number
eigvals = np.linalg.eigvalsh(cov_np)
print(f"Condition number: {eigvals.max() / eigvals.min():.1e}")
print(f"Smallest eigenvalue: {eigvals.min():.2e}")
Condition number: 4.3e+06
Smallest eigenvalue: 8.13e-08

Optimizer convergence (trust-constr vs SLSQP)

The default trust-constr optimizer can stall when the optimal allocation sits on a constraint boundary. Fix: pass an SLSQP optimizer to default_allocator_factory.

# Pass optimizer to default_allocator_factory
slsqp = ScipySLSQPOptimizer(maxiter=200)
slsqp_factory = lambda est: default_allocator_factory(est, optimizer=slsqp)

# Use with ACVSearch
search_slsqp = ACVSearch(stat, costs, allocator_factory=slsqp_factory)
result_slsqp = search_slsqp.search(remaining)
print(f"SLSQP result std: "
      f"{float(bkd.to_numpy(result_slsqp.estimator.optimized_covariance())[0,0])**0.5:.6f}")
SLSQP result std: 0.002432

Budget too small

If the allocator returns all samples on the HF model (no LF models used), the budget is too small for multi-fidelity estimation to help. The pilot cost alone may consume most of the budget. Fix: increase the total budget or reduce N_pilot.