Monte Carlo Budget Estimation

PyApprox Tutorial Library

Using pilot samples and the PyApprox estimator API to predict the number of samples needed for a target accuracy.

Download Notebook

Download as Jupyter Notebook

Learning Objectives

After completing this tutorial, you will be able to:

Explain why the MSE formulas from the previous tutorial cannot be evaluated directly in practice
Use pilot samples to estimate the moments ($\boldsymbol{\Sigma}$, $\mathbf{W}$, $\mathbf{B}$) that appear in the MSE formulas
Use the PyApprox estimator API (MultiOutputMeanAndVariance, MCEstimator, MCAllocator) to compute the sample budget for a target accuracy
Recognize that the budget estimate is itself uncertain and understand how pilot sample size affects its reliability

Prerequisites

Complete Estimator Accuracy and Mean Squared Error before this tutorial.

The Practical Catch

The previous tutorial derived that the estimator covariance for simultaneously estimating the mean and covariance of $K$ QoIs is:

\[ \text{Cov}[\hat{\mathbf{Q}}_M] = \begin{bmatrix} \frac{1}{M}\boldsymbol{\Sigma} & \frac{1}{M}\mathbf{B} \\[4pt] \frac{1}{M}\mathbf{B}^\top & \frac{1}{M(M-1)}\mathbf{V} + \frac{1}{M}\mathbf{W} \end{bmatrix} \]

To use this formula — for example, to determine how many samples $M$ we need so that the estimator variance drops below a target — we need to know $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$. But these are properties of the QoI distribution, which is exactly what we are trying to characterize. We don’t know them yet.

This is the bootstrapping problem: to plan a Monte Carlo experiment, we need information that only the experiment itself can provide.

The standard solution is a two-stage approach:

Pilot stage: run a small, cheap Monte Carlo experiment to estimate $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$.
Budget stage: plug these estimates into the MSE formulas to determine the total number of samples $M$ needed for a target accuracy.

Setup

Simplified model for comparison

As in the previous tutorial, we use a lightweight analytical model with $K = 3$ QoIs so that exact moments are available for verification. The same workflow applies to any model, including expensive PDE solvers.

\[ \mathbf{q}(x) = \begin{bmatrix} \sqrt{11}\, x^5 \\[4pt] x^4 + \cos(2.2\pi x) \\[4pt] \sin(2\pi x) \end{bmatrix}, \quad x \sim U[0,1] \]

import numpy as np
import matplotlib.pyplot as plt
from pyapprox.util.backends.numpy import NumpyBkd
from pyapprox_benchmarks.statest import (
    MultiOutputEnsembleBenchmark,
)
from pyapprox.statest.statistics import (
    MultiOutputMean,
    MultiOutputVariance,
    MultiOutputMeanAndVariance,
)
from pyapprox.statest.mc_estimator import MCEstimator
from pyapprox.statest.allocation import MCAllocator

bkd = NumpyBkd()
bm = MultiOutputEnsembleBenchmark(bkd, psd=True)
model = bm.problem().models()[0]       # highest-fidelity model
variable = bm.problem().prior()

nqoi = model.nqoi()          # 3 QoIs
cost_per_eval = 2.1

print(f"QoIs: {nqoi}")
print(f"Cost per evaluation: {cost_per_eval}")

QoIs: 3
Cost per evaluation: 2.1

Stage 1: Pilot Samples

We draw a modest number of pilot samples $M_{\text{pilot}}$, evaluate the model, and estimate the three moment quantities from the resulting QoI values.

M_pilot = 50
np.random.seed(42)
samples_pilot = variable.rvs(M_pilot)
values_pilot = model(samples_pilot)  # shape (nqoi, M_pilot)

print(f"Pilot samples: {M_pilot}")
print(f"Values shape: {values_pilot.shape}")

Pilot samples: 50
Values shape: (3, 50)

Estimating $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$

From the pilot QoI values, we compute:

$\hat{\boldsymbol{\Sigma}}$: the sample covariance of the QoI values ($K \times K$)
$\hat{\mathbf{W}}$: the sample covariance of the centered outer products $(\mathbf{q}^{(i)} - \hat{\boldsymbol{\mu}})^{\otimes 2}$ ($K^2 \times K^2$)
$\hat{\mathbf{B}}$: the sample covariance between the QoI values and their centered outer products ($K \times K^2$)

The MultiOutputMeanAndVariance statistic class provides compute_pilot_quantities to compute all three from raw model evaluations:

stat_mv = MultiOutputMeanAndVariance(nqoi, bkd)
cov_pilot, W_pilot, B_pilot = stat_mv.compute_pilot_quantities([values_pilot])

print("Pilot estimate of Sigma:")
print(cov_pilot)
print(f"\nW shape: {W_pilot.shape}")
print(f"B shape: {B_pilot.shape}")

Pilot estimate of Sigma:
[[ 0.64592504  0.48595219 -0.2841006 ]
 [ 0.48595219  0.74333168 -0.164864  ]
 [-0.2841006  -0.164864    0.48075183]]

W shape: (9, 9)
B shape: (3, 9)

What are we estimating?

$\hat{\boldsymbol{\Sigma}}$ captures how the three QoIs co-vary — needed for the mean–mean block of the estimator covariance.
$\hat{\mathbf{W}}$ captures the fourth-order moment structure — needed for the variance–variance block.
$\hat{\mathbf{B}}$ captures the third-order (skewness-like) structure — needed for the mean–variance cross block.

With only $M_{\text{pilot}} = 50$ samples, these estimates are rough. The fourth-order moments in $\hat{\mathbf{W}}$ are particularly noisy because they depend on tail behavior that a small sample may not capture.

Stage 2: Budget Computation with the PyApprox API

PyApprox provides an estimator API that encapsulates the MSE formulas and budget computation. The workflow is:

Create a stat object that defines which statistics we want to estimate.
Set the pilot moment estimates on the stat object.
Create an estimator (here, MCEstimator).
Create an MCAllocator and call allocate with a target computational cost to determine the optimal number of samples.

# Step 1-2: Create stat and set pilot quantities
stat = MultiOutputMeanAndVariance(nqoi, bkd)
stat.set_pilot_quantities(cov_pilot, W_pilot, B_pilot)

# Step 3: Create the MC estimator
est = MCEstimator(stat, cost_per_eval)

# Step 4: Allocate samples for a target cost
target_cost = 500.0
fitted = MCAllocator(est).allocate(target_cost)
print(f"Target cost: {target_cost}")
print(f"Allocated samples: {fitted.nsamples_per_model()}")
print(f"Actual cost: {float(fitted.nsamples_per_model()[0]) * cost_per_eval:.1f}")

Target cost: 500.0
Allocated samples: [238]
Actual cost: 499.8

Understanding the Output

The MCAllocator(est).allocate(target_cost) call determines how many model evaluations to run and returns a fitted allocation object. For single-model Monte Carlo, this is simply $M = \lfloor \text{target\_cost} / \text{cost\_per\_eval} \rfloor$. Because the number of samples must be an integer, the actual cost is generally less than the target — no partial evaluations are allowed. The power of this API becomes apparent with multi-fidelity methods (covered in later tutorials), where it optimally distributes samples across models of different cost and accuracy.

We can also inspect the predicted estimator covariance at this allocation:

predicted_est_cov = fitted.covariance()
nstats = stat.nstats()
print(f"Number of statistics: {nstats}")
print(f"Predicted estimator covariance shape: {predicted_est_cov.shape}")
print(f"\nDiagonal (individual estimator variances):")
print(bkd.diag(predicted_est_cov))

Number of statistics: 9
Predicted estimator covariance shape: (9, 9)

Diagonal (individual estimator variances):
[0.00271397 0.00312324 0.00201997 0.00790421 0.00421419 0.00212952
 0.00045293 0.00087081 0.00055071]

Running the Allocated Experiment

With the budget determined, we generate samples, evaluate the model, and compute the statistics:

# Generate samples according to the allocation
np.random.seed(123)
samples_allocated = fitted.generate_samples_per_model(variable.rvs)[0]
values_allocated = model(samples_allocated)

# Compute the statistics
result = fitted(values_allocated)
mean_est = result[:nqoi]
cov_entries = result[nqoi:]  # lower-triangle entries of covariance

print(f"Estimated mean vector: {mean_est}")
print(f"Estimated covariance entries (lower triangle): {cov_entries}")

Estimated mean vector: [ 0.50055343  0.18947408 -0.01458151]
Estimated covariance entries (lower triangle): [ 0.61420488  0.51494343  0.79141596 -0.27236203 -0.23945126  0.49904325]

Verifying the Predicted Accuracy

The predicted estimator covariance tells us how precise we expect the estimates to be. We can verify this empirically by repeating the experiment many times and checking that the actual spread matches the prediction.

Figure 1 runs 200 independent experiments at the allocated budget and compares the empirical estimator covariance with the prediction.

Figure 1: Verification of the budget prediction. Histograms of mean estimates from 200 independent experiments at the allocated budget, with the predicted standard deviation (red dashed) and empirical standard deviation (green) overlaid.

The Budget Depends on the Target Statistic

The estimator covariance depends on which statistics we estimate. Estimating the mean alone is cheaper than estimating the mean and covariance simultaneously, because the latter requires accurate fourth-order moments.

Figure 2 shows the predicted trace of the estimator covariance as a function of the computational budget for three choices: mean only, variance only, and mean + variance.

Figure 2: Predicted estimation error (trace of estimator covariance) vs. computational budget for three statistic choices. Estimating the variance requires a larger budget than the mean for the same accuracy, and estimating both simultaneously is the most expensive.

Pilot Size Matters

The budget estimate is only as good as the pilot estimates of $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$. A small pilot produces noisy moment estimates, which in turn produce unreliable budget predictions.

Figure 3 demonstrates this. We repeat the entire two-stage workflow 100 times, each with a fresh pilot of size $M_{\text{pilot}}$, and record the predicted estimator standard deviation for $\hat{\mu}_1$. With $M_{\text{pilot}} = 20$, the predictions scatter widely; with $M_{\text{pilot}} = 200$, they concentrate near the true value.

Figure 3: Sensitivity of the budget prediction to pilot size. Each point is the predicted standard deviation of $\hat{\mu}_1$ from an independent pilot experiment. Small pilots ($M_{\text{pilot}} = 20$) produce highly variable predictions; larger pilots ($M_{\text{pilot}} = 200$) are much more stable. The true value (from the benchmark’s exact moments) is shown as a dashed red line.

Practical guidance on pilot size

There is no universal rule for $M_{\text{pilot}}$. Choose 20–50 samples if you can afford them and use the pilot samples as the first batch of the final experiment. The total budget is $M_{\text{pilot}} + M_{\text{additional}}$, so pilot samples are never wasted.

There are advanced methods for optimally balancing pilot size with accuracy of the final estimator (Dixon et al. 2025).

The Complete Two-Stage Workflow

Putting it all together, the practical MC workflow for a target accuracy is:

# --- Full workflow ---

# 1. Choose target accuracy and statistic
target_cost = 1000.0
stat_full = MultiOutputMeanAndVariance(nqoi, bkd)

# 2. Run pilot
M_pilot_full = 100
np.random.seed(999)
samples_pilot_full = variable.rvs(M_pilot_full)
values_pilot_full = model(samples_pilot_full)

# 3. Estimate moments from pilot
cov_hat, W_hat, B_hat = stat_full.compute_pilot_quantities([values_pilot_full])
stat_full.set_pilot_quantities(cov_hat, W_hat, B_hat)

# 4. Create estimator and allocate budget
est_full = MCEstimator(stat_full, cost_per_eval)
fitted_full = MCAllocator(est_full).allocate(target_cost)
print(f"Allocated samples: {fitted_full.nsamples_per_model()}")
print(f"Actual cost: {float(fitted_full.nsamples_per_model()[0]) * cost_per_eval:.1f} "
      f"(target: {target_cost})")

# 5. Generate samples and evaluate model
np.random.seed(1000)
samples_final = fitted_full.generate_samples_per_model(variable.rvs)[0]
values_final = model(samples_final)

# 6. Compute statistics
result_full = fitted_full(values_final)
mean_final = result_full[:nqoi]
cov_entries_final = result_full[nqoi:]

print(f"\nEstimated mean:  {mean_final}")
print(f"Estimated covariance (lower triangle): {cov_entries_final}")

# 7. Report predicted accuracy
pred_cov_full = fitted_full.covariance()
for k in range(nqoi):
    print(f"Predicted std of mean(q_{k+1}): {float(bkd.sqrt(pred_cov_full[k,k])):.6f}")

Allocated samples: [476]
Actual cost: 999.6 (target: 1000.0)

Estimated mean:  [0.56878555 0.33459039 0.04283921]
Estimated covariance (lower triangle): [ 0.75843799  0.58913625  0.84125025 -0.32098129 -0.26696309  0.4798086 ]
Predicted std of mean(q_1): 0.041797
Predicted std of mean(q_2): 0.042768
Predicted std of mean(q_3): 0.032772

Visualizing the Estimator Covariance

The estimator covariance matrix at the allocated budget shows the block structure derived in the previous tutorial.

from pyapprox_tutorials.figures._multifidelity_advanced import plot_est_cov_final

# Build labels for the 9 statistics: 3 means + 6 lower-triangle covariance
stat_labels = [rf"$\hat{{\mu}}_{{{k+1}}}$" for k in range(nqoi)]
for j in range(nqoi):
    for k in range(j + 1):
        stat_labels.append(rf"$\hat{{\Sigma}}_{{{j+1}{k+1}}}$")

# Convert to numpy for plotting
pred_cov_plot = bkd.to_numpy(pred_cov_full)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
plot_est_cov_final(pred_cov_plot, stat_labels, nstats, nqoi, target_cost,
                   ax1, ax2, fig)
plt.tight_layout()
plt.show()

Figure 4: Estimator covariance matrix for simultaneous mean and covariance estimation at the allocated budget. The block structure from the previous tutorial is visible: mean-mean (top-left), mean-variance (off-diagonal), variance-variance (bottom-right).

Key Takeaways

The MSE formulas require knowledge of the QoI moments ($\boldsymbol{\Sigma}$, $\mathbf{W}$, $\mathbf{B}$), which must be estimated from pilot samples
The PyApprox estimator API encapsulates this workflow: define the statistic → set pilot quantities → allocate budget → generate samples → compute estimates
The pilot size matters: small pilots produce noisy moment estimates, leading to unreliable budget predictions — especially for variance estimation, which depends on fourth-order moments
Pilot samples can be reused as part of the final experiment, so they are never wasted
The predicted estimator covariance matrix provides a pre-computation accuracy forecast that can be verified empirically

Exercises

Run the complete two-stage workflow with target_cost = 100 and target_cost = 10000. Compare the predicted estimator standard deviations. Does the improvement match the expected $1/\sqrt{M}$ scaling?
Modify the workflow to estimate the mean only (not mean + variance). How does the required budget compare for the same target accuracy? Why is it cheaper?
In Figure 3, the histograms for small pilots are wide and sometimes produce budget predictions that are far too optimistic. Explain why an underestimate of $\mathbf{W}$ leads to an overly optimistic (too small) budget for variance estimation.
(Challenge) Implement a simple adaptive scheme: start with $M_{\text{pilot}} = 30$, estimate the budget, run the experiment, then use all samples as a new pilot and re-estimate the budget. Does the budget estimate stabilize after one or two iterations?

Next Steps

Continue with:

Multi-Fidelity Monte Carlo — Using cheap surrogate models to reduce the constant in the $1/M$ rate

References

Dixon, Thomas, Alex Gorodetsky, John Jakeman, Akil Narayan, and Yiming Xu. 2025. Optimally Balancing Exploration and Exploitation to Automate Multi-Fidelity Statistical Estimation. https://arxiv.org/abs/2505.09828.

--- title: "Monte Carlo Budget Estimation" subtitle: "PyApprox Tutorial Library" description: "Using pilot samples and the PyApprox estimator API to predict the number of samples needed for a target accuracy." tutorial_type: hands-on topic: uncertainty_quantification difficulty: intermediate estimated_time: 15 render_time: 10 prerequisites: - estimator_accuracy_mse tags: - uq - monte-carlo - budget - estimator-api format: html: code-fold: false code-tools: true toc: true execute: echo: true warning: false jupyter: python3 --- ::: {.callout-tip collapse="true"} ## Download Notebook [Download as Jupyter Notebook](notebooks/mc_budget_estimation.ipynb) ::: ## Learning Objectives After completing this tutorial, you will be able to: - Explain why the MSE formulas from the previous tutorial cannot be evaluated directly in practice - Use **pilot samples** to estimate the moments ($\boldsymbol{\Sigma}$, $\mathbf{W}$, $\mathbf{B}$) that appear in the MSE formulas - Use the PyApprox estimator API (`MultiOutputMeanAndVariance`, `MCEstimator`, `MCAllocator`) to compute the sample budget for a target accuracy - Recognize that the budget estimate is itself uncertain and understand how pilot sample size affects its reliability ## Prerequisites Complete [Estimator Accuracy and Mean Squared Error](estimator_accuracy_mse.qmd) before this tutorial. ## The Practical Catch The [previous tutorial](estimator_accuracy_mse.qmd) derived that the estimator covariance for simultaneously estimating the mean and covariance of $K$ QoIs is: $$ \text{Cov}[\hat{\mathbf{Q}}_M] = \begin{bmatrix} \frac{1}{M}\boldsymbol{\Sigma} & \frac{1}{M}\mathbf{B} \\[4pt] \frac{1}{M}\mathbf{B}^\top & \frac{1}{M(M-1)}\mathbf{V} + \frac{1}{M}\mathbf{W} \end{bmatrix} $$ To use this formula --- for example, to determine how many samples $M$ we need so that the estimator variance drops below a target --- we need to know $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$. But these are properties of the QoI distribution, which is exactly what we are trying to characterize. We don't know them yet. This is the bootstrapping problem: **to plan a Monte Carlo experiment, we need information that only the experiment itself can provide.** The standard solution is a two-stage approach: 1. **Pilot stage**: run a small, cheap Monte Carlo experiment to estimate $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$. 2. **Budget stage**: plug these estimates into the MSE formulas to determine the total number of samples $M$ needed for a target accuracy. ## Setup ::: {.callout-tip} ## Simplified model for comparison As in the [previous tutorial](estimator_accuracy_mse.qmd), we use a lightweight analytical model with $K = 3$ QoIs so that exact moments are available for verification. The same workflow applies to any model, including expensive PDE solvers. $$ \mathbf{q}(x) = \begin{bmatrix} \sqrt{11}\, x^5 \\[4pt] x^4 + \cos(2.2\pi x) \\[4pt] \sin(2\pi x) \end{bmatrix}, \quad x \sim U[0,1] $$ ::: ```{python} import numpy as np import matplotlib.pyplot as plt from pyapprox.util.backends.numpy import NumpyBkd from pyapprox_benchmarks.statest import ( MultiOutputEnsembleBenchmark, ) from pyapprox.statest.statistics import ( MultiOutputMean, MultiOutputVariance, MultiOutputMeanAndVariance, ) from pyapprox.statest.mc_estimator import MCEstimator from pyapprox.statest.allocation import MCAllocator bkd = NumpyBkd() bm = MultiOutputEnsembleBenchmark(bkd, psd=True) model = bm.problem().models()[0] # highest-fidelity model variable = bm.problem().prior() nqoi = model.nqoi() # 3 QoIs cost_per_eval = 2.1 print(f"QoIs: {nqoi}") print(f"Cost per evaluation: {cost_per_eval}") ``` ## Stage 1: Pilot Samples We draw a modest number of pilot samples $M_{\text{pilot}}$, evaluate the model, and estimate the three moment quantities from the resulting QoI values. ```{python} M_pilot = 50 np.random.seed(42) samples_pilot = variable.rvs(M_pilot) values_pilot = model(samples_pilot) # shape (nqoi, M_pilot) print(f"Pilot samples: {M_pilot}") print(f"Values shape: {values_pilot.shape}") ``` ### Estimating $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$ From the pilot QoI values, we compute: - $\hat{\boldsymbol{\Sigma}}$: the sample covariance of the QoI values ($K \times K$) - $\hat{\mathbf{W}}$: the sample covariance of the centered outer products $(\mathbf{q}^{(i)} - \hat{\boldsymbol{\mu}})^{\otimes 2}$ ($K^2 \times K^2$) - $\hat{\mathbf{B}}$: the sample covariance between the QoI values and their centered outer products ($K \times K^2$) The `MultiOutputMeanAndVariance` statistic class provides `compute_pilot_quantities` to compute all three from raw model evaluations: ```{python} stat_mv = MultiOutputMeanAndVariance(nqoi, bkd) cov_pilot, W_pilot, B_pilot = stat_mv.compute_pilot_quantities([values_pilot]) print("Pilot estimate of Sigma:") print(cov_pilot) print(f"\nW shape: {W_pilot.shape}") print(f"B shape: {B_pilot.shape}") ``` ::: {.callout-note} ## What are we estimating? - $\hat{\boldsymbol{\Sigma}}$ captures how the three QoIs co-vary --- needed for the mean--mean block of the estimator covariance. - $\hat{\mathbf{W}}$ captures the fourth-order moment structure --- needed for the variance--variance block. - $\hat{\mathbf{B}}$ captures the third-order (skewness-like) structure --- needed for the mean--variance cross block. With only $M_{\text{pilot}} = 50$ samples, these estimates are rough. The fourth-order moments in $\hat{\mathbf{W}}$ are particularly noisy because they depend on tail behavior that a small sample may not capture. ::: ## Stage 2: Budget Computation with the PyApprox API PyApprox provides an estimator API that encapsulates the MSE formulas and budget computation. The workflow is: 1. Create a **stat** object that defines which statistics we want to estimate. 2. Set the pilot moment estimates on the stat object. 3. Create an **estimator** (here, `MCEstimator`). 4. Create an `MCAllocator` and call `allocate` with a target computational cost to determine the optimal number of samples. ```{python} # Step 1-2: Create stat and set pilot quantities stat = MultiOutputMeanAndVariance(nqoi, bkd) stat.set_pilot_quantities(cov_pilot, W_pilot, B_pilot) # Step 3: Create the MC estimator est = MCEstimator(stat, cost_per_eval) # Step 4: Allocate samples for a target cost target_cost = 500.0 fitted = MCAllocator(est).allocate(target_cost) print(f"Target cost: {target_cost}") print(f"Allocated samples: {fitted.nsamples_per_model()}") print(f"Actual cost: {float(fitted.nsamples_per_model()[0]) * cost_per_eval:.1f}") ``` ### Understanding the Output The `MCAllocator(est).allocate(target_cost)` call determines how many model evaluations to run and returns a fitted allocation object. For single-model Monte Carlo, this is simply $M = \lfloor \text{target\_cost} / \text{cost\_per\_eval} \rfloor$. Because the number of samples must be an integer, the actual cost is generally less than the target --- no partial evaluations are allowed. The power of this API becomes apparent with multi-fidelity methods (covered in later tutorials), where it optimally distributes samples across models of different cost and accuracy. We can also inspect the **predicted estimator covariance** at this allocation: ```{python} predicted_est_cov = fitted.covariance() nstats = stat.nstats() print(f"Number of statistics: {nstats}") print(f"Predicted estimator covariance shape: {predicted_est_cov.shape}") print(f"\nDiagonal (individual estimator variances):") print(bkd.diag(predicted_est_cov)) ``` ## Running the Allocated Experiment With the budget determined, we generate samples, evaluate the model, and compute the statistics: ```{python} # Generate samples according to the allocation np.random.seed(123) samples_allocated = fitted.generate_samples_per_model(variable.rvs)[0] values_allocated = model(samples_allocated) # Compute the statistics result = fitted(values_allocated) mean_est = result[:nqoi] cov_entries = result[nqoi:] # lower-triangle entries of covariance print(f"Estimated mean vector: {mean_est}") print(f"Estimated covariance entries (lower triangle): {cov_entries}") ``` ## Verifying the Predicted Accuracy The predicted estimator covariance tells us how precise we *expect* the estimates to be. We can verify this empirically by repeating the experiment many times and checking that the actual spread matches the prediction. @fig-budget-verification runs 200 independent experiments at the allocated budget and compares the empirical estimator covariance with the prediction. ```{python} #| echo: false #| fig-cap: "Verification of the budget prediction. Histograms of mean estimates from 200 independent experiments at the allocated budget, with the predicted standard deviation (red dashed) and empirical standard deviation (green) overlaid." #| label: fig-budget-verification from pyapprox_tutorials.figures._multifidelity_advanced import plot_budget_verification n_reps = 200 M_alloc = int(fitted.nsamples_per_model()[0]) all_means = np.empty((n_reps, nqoi)) for i in range(n_reps): np.random.seed(i + 300000) s = variable.rvs(M_alloc) q = model(s) all_means[i] = bkd.to_numpy(bkd.mean(q, axis=1)) predicted_est_cov_np = bkd.to_numpy(predicted_est_cov) fig, axes = plt.subplots(1, nqoi, figsize=(4 * nqoi, 4)) plot_budget_verification(all_means, nqoi, predicted_est_cov_np, axes) plt.tight_layout() plt.show() ``` ## The Budget Depends on the Target Statistic The estimator covariance depends on which statistics we estimate. Estimating the mean alone is cheaper than estimating the mean and covariance simultaneously, because the latter requires accurate fourth-order moments. @fig-budget-vs-statistic shows the predicted trace of the estimator covariance as a function of the computational budget for three choices: mean only, variance only, and mean + variance. ```{python} #| echo: false #| fig-cap: "Predicted estimation error (trace of estimator covariance) vs. computational budget for three statistic choices. Estimating the variance requires a larger budget than the mean for the same accuracy, and estimating both simultaneously is the most expensive." #| label: fig-budget-vs-statistic from pyapprox_tutorials.figures._multifidelity_advanced import plot_budget_vs_statistic # Create stat objects and set pilot quantities stat_mean = MultiOutputMean(nqoi, bkd) (cov_m,) = stat_mean.compute_pilot_quantities([values_pilot]) stat_mean.set_pilot_quantities(cov_m) stat_var = MultiOutputVariance(nqoi, bkd) cov_v, W_v = stat_var.compute_pilot_quantities([values_pilot]) stat_var.set_pilot_quantities(cov_v, W_v) stat_mv2 = MultiOutputMeanAndVariance(nqoi, bkd) stat_mv2.set_pilot_quantities(cov_pilot, W_pilot, B_pilot) stat_configs = { "Mean only": stat_mean, "Variance only": stat_var, "Mean + Variance": stat_mv2, } fig, ax = plt.subplots(figsize=(8, 5)) plot_budget_vs_statistic(stat_configs, cost_per_eval, bkd, ax) plt.tight_layout() plt.show() ``` ## Pilot Size Matters The budget estimate is only as good as the pilot estimates of $\boldsymbol{\Sigma}$, $\mathbf{W}$, and $\mathbf{B}$. A small pilot produces noisy moment estimates, which in turn produce unreliable budget predictions. @fig-pilot-sensitivity demonstrates this. We repeat the entire two-stage workflow 100 times, each with a fresh pilot of size $M_{\text{pilot}}$, and record the predicted estimator standard deviation for $\hat{\mu}_1$. With $M_{\text{pilot}} = 20$, the predictions scatter widely; with $M_{\text{pilot}} = 200$, they concentrate near the true value. ```{python} #| echo: false #| fig-cap: "Sensitivity of the budget prediction to pilot size. Each point is the predicted standard deviation of $\\hat{\\mu}_1$ from an independent pilot experiment. Small pilots ($M_{\\text{pilot}} = 20$) produce highly variable predictions; larger pilots ($M_{\\text{pilot}} = 200$) are much more stable. The true value (from the benchmark's exact moments) is shown as a dashed red line." #| label: fig-pilot-sensitivity from pyapprox_tutorials.figures._multifidelity_advanced import plot_pilot_sensitivity pilot_sizes = [20, 50, 100, 200] n_pilot_reps = 100 target_for_test = 500.0 # Reference: exact moments from the benchmark # Statistics methods are directly on the benchmark (inherits mixin) cov_true = bm.covariance_matrix()[:nqoi, :nqoi] W_true = bm.covariance_of_centered_values_kronecker_product_subproblem( [0], list(range(nqoi)) ) B_true = bm.covariance_of_mean_and_variance_estimators_subproblem( [0], list(range(nqoi)) ) stat_ref = MultiOutputMeanAndVariance(nqoi, bkd) stat_ref.set_pilot_quantities(cov_true, W_true, B_true) est_ref = MCEstimator(stat_ref, cost_per_eval) fitted_ref = MCAllocator(est_ref).allocate(target_for_test) true_pred_std = float(bkd.sqrt(fitted_ref.covariance()[0, 0])) # Collect all predicted stds first to determine shared x range all_pred_stds = {} for M_p in pilot_sizes: pred_stds = np.empty(n_pilot_reps) for i in range(n_pilot_reps): np.random.seed(i + 400000 + M_p * 1000) s_p = variable.rvs(M_p) v_p = model(s_p) stat_p = MultiOutputMeanAndVariance(nqoi, bkd) cov_p, W_p, B_p = stat_p.compute_pilot_quantities([v_p]) stat_p.set_pilot_quantities(cov_p, W_p, B_p) est_p = MCEstimator(stat_p, cost_per_eval) fitted_p = MCAllocator(est_p).allocate(target_for_test) pred_stds[i] = float(bkd.sqrt(fitted_p.covariance()[0, 0])) all_pred_stds[M_p] = pred_stds fig, axes = plt.subplots(1, len(pilot_sizes), figsize=(14, 3.5), sharey=True) plot_pilot_sensitivity(all_pred_stds, pilot_sizes, true_pred_std, n_pilot_reps, target_for_test, axes, fig) plt.tight_layout() plt.show() ``` ::: {.callout-important} ## Practical guidance on pilot size There is no universal rule for $M_{\text{pilot}}$. Choose 20--50 samples if you can afford them and use the pilot samples as the first batch of the final experiment. The total budget is $M_{\text{pilot}} + M_{\text{additional}}$, so pilot samples are never wasted. There are advanced methods for optimally balancing pilot size with accuracy of the final estimator [@dixon2025optimallybalancing]. ::: ## The Complete Two-Stage Workflow Putting it all together, the practical MC workflow for a target accuracy is: ```{python} # --- Full workflow --- # 1. Choose target accuracy and statistic target_cost = 1000.0 stat_full = MultiOutputMeanAndVariance(nqoi, bkd) # 2. Run pilot M_pilot_full = 100 np.random.seed(999) samples_pilot_full = variable.rvs(M_pilot_full) values_pilot_full = model(samples_pilot_full) # 3. Estimate moments from pilot cov_hat, W_hat, B_hat = stat_full.compute_pilot_quantities([values_pilot_full]) stat_full.set_pilot_quantities(cov_hat, W_hat, B_hat) # 4. Create estimator and allocate budget est_full = MCEstimator(stat_full, cost_per_eval) fitted_full = MCAllocator(est_full).allocate(target_cost) print(f"Allocated samples: {fitted_full.nsamples_per_model()}") print(f"Actual cost: {float(fitted_full.nsamples_per_model()[0]) * cost_per_eval:.1f} " f"(target: {target_cost})") # 5. Generate samples and evaluate model np.random.seed(1000) samples_final = fitted_full.generate_samples_per_model(variable.rvs)[0] values_final = model(samples_final) # 6. Compute statistics result_full = fitted_full(values_final) mean_final = result_full[:nqoi] cov_entries_final = result_full[nqoi:] print(f"\nEstimated mean: {mean_final}") print(f"Estimated covariance (lower triangle): {cov_entries_final}") # 7. Report predicted accuracy pred_cov_full = fitted_full.covariance() for k in range(nqoi): print(f"Predicted std of mean(q_{k+1}): {float(bkd.sqrt(pred_cov_full[k,k])):.6f}") ``` ### Visualizing the Estimator Covariance The estimator covariance matrix at the allocated budget shows the block structure derived in the previous tutorial. ```{python} #| fig-cap: "Estimator covariance matrix for simultaneous mean and covariance estimation at the allocated budget. The block structure from the previous tutorial is visible: mean-mean (top-left), mean-variance (off-diagonal), variance-variance (bottom-right)." #| label: fig-est-cov-final from pyapprox_tutorials.figures._multifidelity_advanced import plot_est_cov_final # Build labels for the 9 statistics: 3 means + 6 lower-triangle covariance stat_labels = [rf"$\hat{{\mu}}_{{{k+1}}}$" for k in range(nqoi)] for j in range(nqoi): for k in range(j + 1): stat_labels.append(rf"$\hat{{\Sigma}}_{{{j+1}{k+1}}}$") # Convert to numpy for plotting pred_cov_plot = bkd.to_numpy(pred_cov_full) fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) plot_est_cov_final(pred_cov_plot, stat_labels, nstats, nqoi, target_cost, ax1, ax2, fig) plt.tight_layout() plt.show() ``` ## Key Takeaways - The MSE formulas require knowledge of the QoI moments ($\boldsymbol{\Sigma}$, $\mathbf{W}$, $\mathbf{B}$), which must be **estimated from pilot samples** - The PyApprox estimator API encapsulates this workflow: define the statistic → set pilot quantities → allocate budget → generate samples → compute estimates - The **pilot size matters**: small pilots produce noisy moment estimates, leading to unreliable budget predictions --- especially for variance estimation, which depends on fourth-order moments - Pilot samples can be reused as part of the final experiment, so they are never wasted - The predicted estimator covariance matrix provides a pre-computation accuracy forecast that can be verified empirically ## Exercises 1. Run the complete two-stage workflow with `target_cost = 100` and `target_cost = 10000`. Compare the predicted estimator standard deviations. Does the improvement match the expected $1/\sqrt{M}$ scaling? 2. Modify the workflow to estimate the **mean only** (not mean + variance). How does the required budget compare for the same target accuracy? Why is it cheaper? 3. In @fig-pilot-sensitivity, the histograms for small pilots are wide and sometimes produce budget predictions that are far too optimistic. Explain why an underestimate of $\mathbf{W}$ leads to an overly optimistic (too small) budget for variance estimation. 4. **(Challenge)** Implement a simple adaptive scheme: start with $M_{\text{pilot}} = 30$, estimate the budget, run the experiment, then use all samples as a new pilot and re-estimate the budget. Does the budget estimate stabilize after one or two iterations? ## Next Steps Continue with: - [Multi-Fidelity Monte Carlo](control_variate_concept.qmd) --- Using cheap surrogate models to reduce the constant in the $1/M$ rate

Learning Objectives

Prerequisites

The Practical Catch

Setup

Stage 1: Pilot Samples

Estimating \(\boldsymbol{\Sigma}\), \(\mathbf{W}\), and \(\mathbf{B}\)

Stage 2: Budget Computation with the PyApprox API

Understanding the Output

Running the Allocated Experiment

Verifying the Predicted Accuracy

The Budget Depends on the Target Statistic

Pilot Size Matters

The Complete Two-Stage Workflow

Visualizing the Estimator Covariance

Key Takeaways

Exercises

Next Steps

References