Pilot Studies

PyApprox Tutorial Library

The bootstrapping problem at the heart of multi-fidelity estimation: you need the model covariance to plan the experiment, but computing the covariance requires running the models — and spending budget.

Download Notebook

Download as Jupyter Notebook

Learning Objectives

After completing this tutorial, you will be able to:

Articulate the bootstrapping problem: why any ACV estimator requires pilot data before its optimal allocation can be computed
Explain the two competing costs of a pilot study: covariance estimation error and budget consumed
Sketch the MSE-vs-pilot-size curve and identify its characteristic U-shape when pilot cost is accounted for
State the practical rule of thumb for pilot sizing on the polynomial benchmark

Prerequisites

Complete any of the concept tutorials and review the API Cookbook before this tutorial. Pilot studies are the final practical ingredient needed to run the full multi-fidelity workflow on a real problem where population statistics are unknown.

The Bootstrapping Problem

Every ACV estimator we have built so far has one hidden assumption: the model covariance matrix $\boldsymbol{\Sigma}$ is known. This assumption is used in two places:

Sample allocation: the allocator solves an optimisation problem that depends on $\boldsymbol{\Sigma}$. Without it, you cannot determine how many HF and LF samples to take.
Control variate coefficients: the optimal weights $\mathbf{H}^*$ or $\boldsymbol{\alpha}^*$ depend on $\boldsymbol{\Sigma}$.

In all previous tutorials, $\boldsymbol{\Sigma}$ was supplied via set_pilot_quantities using the population value from the benchmark. In practice, $\boldsymbol{\Sigma}$ is unknown — if you already knew the moments of $f_0$, you wouldn’t need to estimate them.

The solution is a pilot study: evaluate all $M+1$ models at a small shared set of $N_p$ samples, use the resulting data to estimate $\boldsymbol{\Sigma}$, then use that estimate to plan and run the main estimator. This introduces a circular dependency: you need moments to plan the experiment, but computing moments requires running the experiment.

Two Competing Costs

The pilot study involves a genuine trade-off between two costs:

Cost 1: Covariance estimation error. A small pilot ($N_p$ small) produces a noisy $\hat{\boldsymbol{\Sigma}}$. The sample allocation and control variate coefficients derived from $\hat{\boldsymbol{\Sigma}}$ are sub-optimal. The resulting estimator has higher variance than the oracle (population-covariance) version.

Cost 2: Budget consumed. The pilot uses real compute. If the total budget is $P$ and the pilot costs $P_p = N_p \sum_\alpha C_\alpha$, then only $P - P_p$ remains for the main estimator. A large pilot leaves too little budget for the estimator itself.

Figure 1 illustrates this tension on the three-model polynomial benchmark.

Figure 1: MSE (relative to single-fidelity MC MSE) vs pilot size $N_p$ for MFMC on the three-model polynomial benchmark at total budget $P=100$. **Left:** Pilot cost is *not* deducted from the estimator budget — only covariance estimation error matters, so MSE decreases monotonically with $N_p$. **Right:** Pilot cost is deducted — too large a pilot starves the main estimator, and MSE has a minimum at an intermediate $N_p$.

The left panel of Figure 1 shows that ignoring pilot cost gives a monotonically decreasing MSE — more pilot samples always help with covariance estimation. The right panel (the realistic case) shows a U-shape: too few pilot samples give a bad covariance estimate; too many leave nothing for the main estimator. The minimum is the optimal pilot size.

The Full Two-Stage Workflow

The pseudocode below summarises the two stages of a multi-fidelity estimation campaign. The key detail is that the pilot cost must be subtracted from the total budget before the main estimator allocates its samples.

Stage 1 — Pilot
  1. Choose pilot size  Np  (rule of thumb: 2–5 × (M + 1))
  2. Draw Np shared input samples
  3. Evaluate every model at those samples → pilot outputs
  4. Compute sample covariance  Σ̂  from pilot outputs
  5. Record pilot cost  Pp = Np × Σ Cα

Stage 2 — Main Estimator
  6. Subtract pilot cost from total budget:  P_main = P − Pp
  7. Set pilot quantities:      stat.set_pilot_quantities(Σ̂)
  8. Allocate main samples:     fitted = allocator.allocate(P_main)
  9. Generate sample sets:      fitted.generate_samples(variable)
 10. Evaluate models:           fitted.evaluate_samples(models)
 11. Compute estimate:          result = fitted(values)

Always subtract pilot cost — allocating the full budget $P$ as if the pilot were free leads to an over-optimistic allocation and wastes the budget spent on the pilot.

See the API Cookbook → Universal Workflow for a runnable code version of these steps.

What the Pilot Provides

The pilot study provides three things:

Covariance estimate $\hat{\boldsymbol{\Sigma}}$: used to set pilot quantities via stat.set_pilot_quantities(cov_hat).
Sample allocation: the allocator’s allocate(P - P_p) is called on the remaining budget after subtracting the pilot cost.
Model cost estimates: the median wall-clock time per model over the pilot samples is used as $C_\alpha$ when physical timing (not synthetic ratios) is needed.

All three are needed before the main estimator can be run.

What Makes a Good Pilot Design?

Sample count $N_p$. The minimum needed for a non-singular covariance estimate is $N_p > M + 1$ (one more than the number of models). A reliable estimate typically requires $N_p \geq 2(M+1)$ to $5(M+1)$. For three models ($M=2$) and $P=100$, $N_p = 10$–$20$ is usually sufficient when models have moderate correlation ($\rho \sim 0.9$). Weaker correlations require more pilot samples because the relevant off-diagonal entries of $\hat{\boldsymbol{\Sigma}}$ are harder to estimate.

Shared samples. All models must be evaluated at the same pilot sample points. This is essential: the cross-covariance estimate $\hat{\sigma}_{\alpha\beta} = \frac{1}{N_p}\sum_n (f_\alpha^{(n)} - \bar{f}_\alpha)(f_\beta^{(n)} - \bar{f}_\beta)$ requires paired evaluations.

Independent from main study. Pilot samples are separate from the main estimator’s sample partitions. Including pilot samples in the main allocation introduces an optimism bias.

Key Takeaways

Every ACV estimator requires a pilot study to estimate $\boldsymbol{\Sigma}$ before it can be planned or run
A small pilot introduces covariance estimation error; a large pilot consumes too much of the total budget
The realistic MSE-vs-pilot-size curve has a U-shape with a well-defined optimal $N_p^*$ (see Figure 1, right panel)
All models must be evaluated at the same pilot sample points; pilot samples are independent of the main estimator’s samples
As a rule of thumb, $N_p \approx 2\text{–}5\,(M+1)$ is a good starting point for problems with strong correlations ($\rho \geq 0.8$)

Exercises

From Figure 1 (right), estimate the optimal pilot size $N_p^*$ for this benchmark. At this $N_p^*$, what fraction of the total budget $P=100$ does the pilot consume?
Why must all models be evaluated at the same pilot sample points? What goes wrong if each model is evaluated at different samples?
Suppose the pilot covariance estimate is available but very noisy ($N_p = 5$). You observe that the allocator assigns 0 samples to one model. What is the likely cause, and how can you diagnose it?
For a 10-model ensemble at budget $P=500$, using the rule of thumb $N_p \approx 5(M+1)$, what is the pilot budget $P_p$ if all models have equal cost $C=1$? What fraction of $P$ does this represent?

Tip

Ready to try this? See API Cookbook → Universal Workflow.

Next Steps

Pilot Studies Analysis — MSE decomposition, sensitivity of optimal allocation to covariance estimation error, and how the U-shaped curve shifts with budget and correlation
API Cookbook — End-to-end two-stage workflow in PyApprox: pilot → plan → run

--- title: "Pilot Studies" subtitle: "PyApprox Tutorial Library" description: "The bootstrapping problem at the heart of multi-fidelity estimation: you need the model covariance to plan the experiment, but computing the covariance requires running the models — and spending budget." tutorial_type: concept topic: multi_fidelity difficulty: intermediate estimated_time: 7 render_time: 42 prerequisites: - multifidelity_estimation_cookbook tags: - multi-fidelity - pilot-study - covariance-estimation - budget-allocation format: html: code-fold: false code-tools: true toc: true execute: echo: true warning: false jupyter: python3 --- ::: {.callout-tip collapse="true"} ## Download Notebook [Download as Jupyter Notebook](notebooks/pilot_studies_concept.ipynb) ::: ## Learning Objectives After completing this tutorial, you will be able to: - Articulate the bootstrapping problem: why any ACV estimator requires pilot data before its optimal allocation can be computed - Explain the two competing costs of a pilot study: covariance estimation error and budget consumed - Sketch the MSE-vs-pilot-size curve and identify its characteristic U-shape when pilot cost is accounted for - State the practical rule of thumb for pilot sizing on the polynomial benchmark ## Prerequisites Complete any of the concept tutorials and review the [API Cookbook](multifidelity_estimation_cookbook.qmd#universal-workflow) before this tutorial. Pilot studies are the final practical ingredient needed to run the full multi-fidelity workflow on a real problem where population statistics are unknown. ## The Bootstrapping Problem Every ACV estimator we have built so far has one hidden assumption: the model covariance matrix $\boldsymbol{\Sigma}$ is known. This assumption is used in two places: 1. **Sample allocation**: the allocator solves an optimisation problem that depends on $\boldsymbol{\Sigma}$. Without it, you cannot determine how many HF and LF samples to take. 2. **Control variate coefficients**: the optimal weights $\mathbf{H}^*$ or $\boldsymbol{\alpha}^*$ depend on $\boldsymbol{\Sigma}$. In all previous tutorials, $\boldsymbol{\Sigma}$ was supplied via `set_pilot_quantities` using the population value from the benchmark. In practice, $\boldsymbol{\Sigma}$ is unknown — if you already knew the moments of $f_0$, you wouldn't need to estimate them. The solution is a **pilot study**: evaluate all $M+1$ models at a small shared set of $N_p$ samples, use the resulting data to estimate $\boldsymbol{\Sigma}$, then use that estimate to plan and run the main estimator. This introduces a circular dependency: you need moments to plan the experiment, but computing moments requires running the experiment. ## Two Competing Costs The pilot study involves a genuine trade-off between two costs: **Cost 1: Covariance estimation error.** A small pilot ($N_p$ small) produces a noisy $\hat{\boldsymbol{\Sigma}}$. The sample allocation and control variate coefficients derived from $\hat{\boldsymbol{\Sigma}}$ are sub-optimal. The resulting estimator has higher variance than the oracle (population-covariance) version. **Cost 2: Budget consumed.** The pilot uses real compute. If the total budget is $P$ and the pilot costs $P_p = N_p \sum_\alpha C_\alpha$, then only $P - P_p$ remains for the main estimator. A large pilot leaves too little budget for the estimator itself. @fig-tradeoff illustrates this tension on the three-model polynomial benchmark. ```{python} #| echo: false #| fig-cap: "MSE (relative to single-fidelity MC MSE) vs pilot size $N_p$ for MFMC on the three-model polynomial benchmark at total budget $P=100$. **Left:** Pilot cost is *not* deducted from the estimator budget — only covariance estimation error matters, so MSE decreases monotonically with $N_p$. **Right:** Pilot cost *is* deducted — too large a pilot starves the main estimator, and MSE has a minimum at an intermediate $N_p$." #| label: fig-tradeoff import matplotlib.pyplot as plt from pyapprox_tutorials.figures._multifidelity_advanced import plot_pilot_tradeoff fig, axes = plt.subplots(1, 2, figsize=(12, 4.5), sharey=True) plot_pilot_tradeoff(axes) fig.suptitle("Pilot size trade-off --- three-model MFMC, P=100", fontsize=12) plt.tight_layout() plt.show() ``` The left panel of @fig-tradeoff shows that ignoring pilot cost gives a monotonically decreasing MSE — more pilot samples always help with covariance estimation. The right panel (the realistic case) shows a U-shape: too few pilot samples give a bad covariance estimate; too many leave nothing for the main estimator. The minimum is the optimal pilot size. ## The Full Two-Stage Workflow The pseudocode below summarises the two stages of a multi-fidelity estimation campaign. The key detail is that the pilot cost must be subtracted from the total budget before the main estimator allocates its samples. Stage 1 — Pilot 1. Choose pilot size Np (rule of thumb: 2–5 × (M + 1)) 2. Draw Np shared input samples 3. Evaluate every model at those samples → pilot outputs 4. Compute sample covariance Σ̂ from pilot outputs 5. Record pilot cost Pp = Np × Σ Cα Stage 2 — Main Estimator 6. Subtract pilot cost from total budget: P_main = P − Pp 7. Set pilot quantities: stat.set_pilot_quantities(Σ̂) 8. Allocate main samples: fitted = allocator.allocate(P_main) 9. Generate sample sets: fitted.generate_samples(variable) 10. Evaluate models: fitted.evaluate_samples(models) 11. Compute estimate: result = fitted(values) **Always subtract pilot cost** — allocating the full budget $P$ as if the pilot were free leads to an over-optimistic allocation and wastes the budget spent on the pilot. See the [API Cookbook → Universal Workflow](multifidelity_estimation_cookbook.qmd#universal-workflow) for a runnable code version of these steps. ## What the Pilot Provides The pilot study provides three things: 1. **Covariance estimate $\hat{\boldsymbol{\Sigma}}$**: used to set pilot quantities via `stat.set_pilot_quantities(cov_hat)`. 2. **Sample allocation**: the allocator's `allocate(P - P_p)` is called on the *remaining* budget after subtracting the pilot cost. 3. **Model cost estimates**: the median wall-clock time per model over the pilot samples is used as $C_\alpha$ when physical timing (not synthetic ratios) is needed. All three are needed before the main estimator can be run. ## What Makes a Good Pilot Design? **Sample count $N_p$.** The minimum needed for a non-singular covariance estimate is $N_p > M + 1$ (one more than the number of models). A reliable estimate typically requires $N_p \geq 2(M+1)$ to $5(M+1)$. For three models ($M=2$) and $P=100$, $N_p = 10$–$20$ is usually sufficient when models have moderate correlation ($\rho \sim 0.9$). Weaker correlations require more pilot samples because the relevant off-diagonal entries of $\hat{\boldsymbol{\Sigma}}$ are harder to estimate. **Shared samples.** All models must be evaluated at the **same** pilot sample points. This is essential: the cross-covariance estimate $\hat{\sigma}_{\alpha\beta} = \frac{1}{N_p}\sum_n (f_\alpha^{(n)} - \bar{f}_\alpha)(f_\beta^{(n)} - \bar{f}_\beta)$ requires paired evaluations. **Independent from main study.** Pilot samples are separate from the main estimator's sample partitions. Including pilot samples in the main allocation introduces an optimism bias. ## Key Takeaways - Every ACV estimator requires a pilot study to estimate $\boldsymbol{\Sigma}$ before it can be planned or run - A small pilot introduces covariance estimation error; a large pilot consumes too much of the total budget - The realistic MSE-vs-pilot-size curve has a U-shape with a well-defined optimal $N_p^*$ (see @fig-tradeoff, right panel) - All models must be evaluated at the same pilot sample points; pilot samples are independent of the main estimator's samples - As a rule of thumb, $N_p \approx 2\text{–}5\,(M+1)$ is a good starting point for problems with strong correlations ($\rho \geq 0.8$) ## Exercises 1. From @fig-tradeoff (right), estimate the optimal pilot size $N_p^*$ for this benchmark. At this $N_p^*$, what fraction of the total budget $P=100$ does the pilot consume? 2. Why must all models be evaluated at the *same* pilot sample points? What goes wrong if each model is evaluated at different samples? 3. Suppose the pilot covariance estimate is available but very noisy ($N_p = 5$). You observe that the allocator assigns 0 samples to one model. What is the likely cause, and how can you diagnose it? 4. For a 10-model ensemble at budget $P=500$, using the rule of thumb $N_p \approx 5(M+1)$, what is the pilot budget $P_p$ if all models have equal cost $C=1$? What fraction of $P$ does this represent? ::: {.callout-tip} Ready to try this? See [API Cookbook → Universal Workflow](multifidelity_estimation_cookbook.qmd#universal-workflow). ::: ## Next Steps - [Pilot Studies Analysis](pilot_studies_analysis.qmd) — MSE decomposition, sensitivity of optimal allocation to covariance estimation error, and how the U-shaped curve shifts with budget and correlation - [API Cookbook](multifidelity_estimation_cookbook.qmd#universal-workflow) — End-to-end two-stage workflow in PyApprox: pilot → plan → run