Monte Carlo Budget Estimation

PyApprox Tutorial Library

Using pilot samples and the PyApprox estimator API to predict the number of samples needed for a target accuracy.

Learning Objectives

After completing this tutorial, you will be able to:

  • Explain why the MSE formulas from the previous tutorial cannot be evaluated directly in practice
  • Use pilot samples to estimate the moments (\(\boldsymbol{\Sigma}\), \(\mathbf{W}\), \(\mathbf{B}\)) that appear in the MSE formulas
  • Use the PyApprox estimator API (MultiOutputMeanAndVariance, MCEstimator, allocate_samples) to compute the sample budget for a target accuracy
  • Recognize that the budget estimate is itself uncertain and understand how pilot sample size affects its reliability

Prerequisites

Complete Estimator Accuracy and Mean Squared Error before this tutorial.

The Practical Catch

The previous tutorial derived that the estimator covariance for simultaneously estimating the mean and covariance of \(K\) QoIs is:

\[ \text{Cov}[\hat{\mathbf{Q}}_M] = \begin{bmatrix} \frac{1}{M}\boldsymbol{\Sigma} & \frac{1}{M}\mathbf{B} \\[4pt] \frac{1}{M}\mathbf{B}^\top & \frac{1}{M(M-1)}\mathbf{V} + \frac{1}{M}\mathbf{W} \end{bmatrix} \]

To use this formula — for example, to determine how many samples \(M\) we need so that the estimator variance drops below a target — we need to know \(\boldsymbol{\Sigma}\), \(\mathbf{W}\), and \(\mathbf{B}\). But these are properties of the QoI distribution, which is exactly what we are trying to characterize. We don’t know them yet.

This is the bootstrapping problem: to plan a Monte Carlo experiment, we need information that only the experiment itself can provide.

The standard solution is a two-stage approach:

  1. Pilot stage: run a small, cheap Monte Carlo experiment to estimate \(\boldsymbol{\Sigma}\), \(\mathbf{W}\), and \(\mathbf{B}\).
  2. Budget stage: plug these estimates into the MSE formulas to determine the total number of samples \(M\) needed for a target accuracy.

Setup

TipSimplified model for comparison

As in the previous tutorial, we use a lightweight analytical model with \(K = 3\) QoIs so that exact moments are available for verification. The same workflow applies to any model, including expensive PDE solvers.

\[ \mathbf{q}(x) = \begin{bmatrix} \sqrt{11}\, x^5 \\[4pt] x^4 + \cos(2.2\pi x) \\[4pt] \sin(2\pi x) \end{bmatrix}, \quad x \sim U[0,1] \]

import numpy as np
import matplotlib.pyplot as plt
from pyapprox.util.backends.numpy import NumpyBkd
from pyapprox.benchmarks.instances.multifidelity.multioutput_ensemble import (
    psd_multioutput_ensemble_3x3,
)
from pyapprox.statest.statistics import (
    MultiOutputMean,
    MultiOutputVariance,
    MultiOutputMeanAndVariance,
)
from pyapprox.statest.mc_estimator import MCEstimator

bkd = NumpyBkd()
bm = psd_multioutput_ensemble_3x3(bkd)
model = bm.models()[0]       # highest-fidelity model
variable = bm.prior()

nqoi = model.nqoi()          # 3 QoIs
cost_per_eval = 2.1

print(f"QoIs: {nqoi}")
print(f"Cost per evaluation: {cost_per_eval}")
QoIs: 3
Cost per evaluation: 2.1

Stage 1: Pilot Samples

We draw a modest number of pilot samples \(M_{\text{pilot}}\), evaluate the model, and estimate the three moment quantities from the resulting QoI values.

M_pilot = 50
np.random.seed(42)
samples_pilot = variable.rvs(M_pilot)
values_pilot = model(samples_pilot)  # shape (nqoi, M_pilot)

print(f"Pilot samples: {M_pilot}")
print(f"Values shape: {values_pilot.shape}")
Pilot samples: 50
Values shape: (3, 50)

Estimating \(\boldsymbol{\Sigma}\), \(\mathbf{W}\), and \(\mathbf{B}\)

From the pilot QoI values, we compute:

  • \(\hat{\boldsymbol{\Sigma}}\): the sample covariance of the QoI values (\(K \times K\))
  • \(\hat{\mathbf{W}}\): the sample covariance of the centered outer products \((\mathbf{q}^{(i)} - \hat{\boldsymbol{\mu}})^{\otimes 2}\) (\(K^2 \times K^2\))
  • \(\hat{\mathbf{B}}\): the sample covariance between the QoI values and their centered outer products (\(K \times K^2\))

The MultiOutputMeanAndVariance statistic class provides compute_pilot_quantities to compute all three from raw model evaluations:

stat_mv = MultiOutputMeanAndVariance(nqoi, bkd)
cov_pilot, W_pilot, B_pilot = stat_mv.compute_pilot_quantities([values_pilot])

print("Pilot estimate of Sigma:")
print(cov_pilot)
print(f"\nW shape: {W_pilot.shape}")
print(f"B shape: {B_pilot.shape}")
Pilot estimate of Sigma:
[[ 0.64592504  0.48595219 -0.2841006 ]
 [ 0.48595219  0.74333168 -0.164864  ]
 [-0.2841006  -0.164864    0.48075183]]

W shape: (9, 9)
B shape: (3, 9)
NoteWhat are we estimating?
  • \(\hat{\boldsymbol{\Sigma}}\) captures how the three QoIs co-vary — needed for the mean–mean block of the estimator covariance.
  • \(\hat{\mathbf{W}}\) captures the fourth-order moment structure — needed for the variance–variance block.
  • \(\hat{\mathbf{B}}\) captures the third-order (skewness-like) structure — needed for the mean–variance cross block.

With only \(M_{\text{pilot}} = 50\) samples, these estimates are rough. The fourth-order moments in \(\hat{\mathbf{W}}\) are particularly noisy because they depend on tail behavior that a small sample may not capture.

Stage 2: Budget Computation with the PyApprox API

PyApprox provides an estimator API that encapsulates the MSE formulas and budget computation. The workflow is:

  1. Create a stat object that defines which statistics we want to estimate.
  2. Set the pilot moment estimates on the stat object.
  3. Create an estimator (here, MCEstimator).
  4. Call allocate_samples with a target computational cost to determine the optimal number of samples.
# Step 1-2: Create stat and set pilot quantities
stat = MultiOutputMeanAndVariance(nqoi, bkd)
stat.set_pilot_quantities(cov_pilot, W_pilot, B_pilot)

# Step 3: Create the MC estimator
est = MCEstimator(stat, cost_per_eval)

# Step 4: Allocate samples for a target cost
target_cost = 500.0
est.allocate_samples(target_cost)
print(f"Target cost: {target_cost}")
print(f"Allocated samples: {est._rounded_nsamples_per_model}")
print(f"Actual cost: {float(est._rounded_nsamples_per_model[0]) * cost_per_eval:.1f}")
Target cost: 500.0
Allocated samples: [238]
Actual cost: 499.8

Understanding the Output

The allocate_samples call determines how many model evaluations to run. For single-model Monte Carlo, this is simply \(M = \lfloor \text{target\_cost} / \text{cost\_per\_eval} \rfloor\). Because the number of samples must be an integer, the actual cost is generally less than the target — no partial evaluations are allowed. The power of this API becomes apparent with multi-fidelity methods (covered in later tutorials), where it optimally distributes samples across models of different cost and accuracy.

We can also inspect the predicted estimator covariance at this allocation:

predicted_est_cov = est.optimized_covariance()
nstats = stat.nstats()
print(f"Number of statistics: {nstats}")
print(f"Predicted estimator covariance shape: {predicted_est_cov.shape}")
print(f"\nDiagonal (individual estimator variances):")
print(bkd.diag(predicted_est_cov))
Number of statistics: 9
Predicted estimator covariance shape: (9, 9)

Diagonal (individual estimator variances):
[0.00271397 0.00312324 0.00201997 0.00790421 0.00421419 0.00212952
 0.00045293 0.00087081 0.00055071]

Running the Allocated Experiment

With the budget determined, we generate samples, evaluate the model, and compute the statistics:

# Generate samples according to the allocation
np.random.seed(123)
samples_allocated = est.generate_samples_per_model(variable.rvs)[0]
values_allocated = model(samples_allocated)

# Compute the statistics
result = est(values_allocated)
mean_est = result[:nqoi]
cov_entries = result[nqoi:]  # lower-triangle entries of covariance

print(f"Estimated mean vector: {mean_est}")
print(f"Estimated covariance entries (lower triangle): {cov_entries}")
Estimated mean vector: [ 0.50055343  0.18947408 -0.01458151]
Estimated covariance entries (lower triangle): [ 0.61420488  0.51494343  0.79141596 -0.27236203 -0.23945126  0.49904325]

Verifying the Predicted Accuracy

The predicted estimator covariance tells us how precise we expect the estimates to be. We can verify this empirically by repeating the experiment many times and checking that the actual spread matches the prediction.

Figure 1 runs 200 independent experiments at the allocated budget and compares the empirical estimator covariance with the prediction.

Figure 1: Verification of the budget prediction. Histograms of mean estimates from 200 independent experiments at the allocated budget, with the predicted standard deviation (red dashed) and empirical standard deviation (green) overlaid.

The Budget Depends on the Target Statistic

The estimator covariance depends on which statistics we estimate. Estimating the mean alone is cheaper than estimating the mean and covariance simultaneously, because the latter requires accurate fourth-order moments.

Figure 2 shows the predicted trace of the estimator covariance as a function of the computational budget for three choices: mean only, variance only, and mean + variance.

Figure 2: Predicted estimation error (trace of estimator covariance) vs. computational budget for three statistic choices. Estimating the variance requires a larger budget than the mean for the same accuracy, and estimating both simultaneously is the most expensive.

Pilot Size Matters

The budget estimate is only as good as the pilot estimates of \(\boldsymbol{\Sigma}\), \(\mathbf{W}\), and \(\mathbf{B}\). A small pilot produces noisy moment estimates, which in turn produce unreliable budget predictions.

Figure 3 demonstrates this. We repeat the entire two-stage workflow 100 times, each with a fresh pilot of size \(M_{\text{pilot}}\), and record the predicted estimator standard deviation for \(\hat{\mu}_1\). With \(M_{\text{pilot}} = 20\), the predictions scatter widely; with \(M_{\text{pilot}} = 200\), they concentrate near the true value.

Figure 3: Sensitivity of the budget prediction to pilot size. Each point is the predicted standard deviation of \(\hat{\mu}_1\) from an independent pilot experiment. Small pilots (\(M_{\text{pilot}} = 20\)) produce highly variable predictions; larger pilots (\(M_{\text{pilot}} = 200\)) are much more stable. The true value (from the benchmark’s exact moments) is shown as a dashed red line.
ImportantPractical guidance on pilot size

There is no universal rule for \(M_{\text{pilot}}\). Choose 20–50 samples if you can afford them and use the pilot samples as the first batch of the final experiment. The total budget is \(M_{\text{pilot}} + M_{\text{additional}}\), so pilot samples are never wasted.

There are advanced methods for optimally balancing pilot size with accuracy of the final estimator (Dixon et al. 2025).

The Complete Two-Stage Workflow

Putting it all together, the practical MC workflow for a target accuracy is:

# --- Full workflow ---

# 1. Choose target accuracy and statistic
target_cost = 1000.0
stat_full = MultiOutputMeanAndVariance(nqoi, bkd)

# 2. Run pilot
M_pilot_full = 100
np.random.seed(999)
samples_pilot_full = variable.rvs(M_pilot_full)
values_pilot_full = model(samples_pilot_full)

# 3. Estimate moments from pilot
cov_hat, W_hat, B_hat = stat_full.compute_pilot_quantities([values_pilot_full])
stat_full.set_pilot_quantities(cov_hat, W_hat, B_hat)

# 4. Create estimator and allocate budget
est_full = MCEstimator(stat_full, cost_per_eval)
est_full.allocate_samples(target_cost)
print(f"Allocated samples: {est_full._rounded_nsamples_per_model}")
print(f"Actual cost: {float(est_full._rounded_nsamples_per_model[0]) * cost_per_eval:.1f} "
      f"(target: {target_cost})")

# 5. Generate samples and evaluate model
np.random.seed(1000)
samples_final = est_full.generate_samples_per_model(variable.rvs)[0]
values_final = model(samples_final)

# 6. Compute statistics
result_full = est_full(values_final)
mean_final = result_full[:nqoi]
cov_entries_final = result_full[nqoi:]

print(f"\nEstimated mean:  {mean_final}")
print(f"Estimated covariance (lower triangle): {cov_entries_final}")

# 7. Report predicted accuracy
pred_cov_full = est_full.optimized_covariance()
for k in range(nqoi):
    print(f"Predicted std of mean(q_{k+1}): {float(bkd.sqrt(pred_cov_full[k,k])):.6f}")
Allocated samples: [476]
Actual cost: 999.6 (target: 1000.0)

Estimated mean:  [0.56878555 0.33459039 0.04283921]
Estimated covariance (lower triangle): [ 0.75843799  0.58913625  0.84125025 -0.32098129 -0.26696309  0.4798086 ]
Predicted std of mean(q_1): 0.041797
Predicted std of mean(q_2): 0.042768
Predicted std of mean(q_3): 0.032772

Visualizing the Estimator Covariance

The estimator covariance matrix at the allocated budget shows the block structure derived in the previous tutorial.

from pyapprox.tutorial_figures._multifidelity_advanced import plot_est_cov_final

# Build labels for the 9 statistics: 3 means + 6 lower-triangle covariance
stat_labels = [rf"$\hat{{\mu}}_{{{k+1}}}$" for k in range(nqoi)]
for j in range(nqoi):
    for k in range(j + 1):
        stat_labels.append(rf"$\hat{{\Sigma}}_{{{j+1}{k+1}}}$")

# Convert to numpy for plotting
pred_cov_plot = bkd.to_numpy(pred_cov_full)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
plot_est_cov_final(pred_cov_plot, stat_labels, nstats, nqoi, target_cost,
                   ax1, ax2, fig)
plt.tight_layout()
plt.show()
Figure 4: Estimator covariance matrix for simultaneous mean and covariance estimation at the allocated budget. The block structure from the previous tutorial is visible: mean-mean (top-left), mean-variance (off-diagonal), variance-variance (bottom-right).

Key Takeaways

  • The MSE formulas require knowledge of the QoI moments (\(\boldsymbol{\Sigma}\), \(\mathbf{W}\), \(\mathbf{B}\)), which must be estimated from pilot samples
  • The PyApprox estimator API encapsulates this workflow: define the statistic → set pilot quantities → allocate budget → generate samples → compute estimates
  • The pilot size matters: small pilots produce noisy moment estimates, leading to unreliable budget predictions — especially for variance estimation, which depends on fourth-order moments
  • Pilot samples can be reused as part of the final experiment, so they are never wasted
  • The predicted estimator covariance matrix provides a pre-computation accuracy forecast that can be verified empirically

Exercises

  1. Run the complete two-stage workflow with target_cost = 100 and target_cost = 10000. Compare the predicted estimator standard deviations. Does the improvement match the expected \(1/\sqrt{M}\) scaling?

  2. Modify the workflow to estimate the mean only (not mean + variance). How does the required budget compare for the same target accuracy? Why is it cheaper?

  3. In Figure 3, the histograms for small pilots are wide and sometimes produce budget predictions that are far too optimistic. Explain why an underestimate of \(\mathbf{W}\) leads to an overly optimistic (too small) budget for variance estimation.

  4. (Challenge) Implement a simple adaptive scheme: start with \(M_{\text{pilot}} = 30\), estimate the budget, run the experiment, then use all samples as a new pilot and re-estimate the budget. Does the budget estimate stabilize after one or two iterations?

Next Steps

Continue with:

References

Dixon, Thomas, Alex Gorodetsky, John Jakeman, Akil Narayan, and Yiming Xu. 2025. “Optimally Balancing Exploration and Exploitation to Automate Multi-Fidelity Statistical Estimation.” https://arxiv.org/abs/2505.09828.