General Approximate Control Variates

PyApprox Tutorial Library

How using all low-fidelity models as direct control variates for the high-fidelity model breaks the CV-1 variance ceiling that limits MLMC and MFMC.

Download Notebook

Download as Jupyter Notebook

Learning Objectives

After completing this tutorial, you will be able to:

Explain why MLMC and MFMC both plateau at the one-model CV-1 variance ceiling
Contrast indirect correction (MLMC/MFMC recursive chain) with direct correction (every LF model corrects $f_0$ simultaneously)
Predict which variance ceiling is reachable for a given model hierarchy
Identify the settings in which switching from MFMC to a general ACV estimator pays off

Prerequisites

Complete Multi-Fidelity Monte Carlo before this tutorial.

The Ceiling That MFMC Cannot Break

The MFMC Concept tutorial showed that both MFMC and MLMC plateau at a hard variance floor as LF samples grow:

\[ \min\,\mathbb{V}[\hat{\mu}_0^{\text{MFMC}}] \;\xrightarrow{r\to\infty}\; \frac{\sigma_0^2}{N_0}(1 - \rho_{0,1}^2). \]

This is the CV-1 ceiling — the variance reduction achievable if the mean of the single most informative LF model were known exactly. Adding more LF samples or more LF models beyond $f_1$ does not lower this floor: it only makes the estimator converge to it faster.

The reason is structural. In both MLMC and MFMC, models $f_\alpha$ for $\alpha \geq 2$ act as control variates for $f_{\alpha-1}$, not for $f_0$. They improve the correction provided by $f_1$, but each model in the chain can only reduce variance in the step above it. Only $f_1$ is wired directly into the HF estimator $\hat{\mu}_0(\mathcal{Z}_0)$. This is the indirect correction structure.

What if instead every LF model’s correction were applied directly to $\hat{\mu}_0$?

Direct vs Indirect Correction

The structural difference is visible before any algebra.

MFMC / MLMC (indirect): The correction chain runs $f_M \to f_{M-1} \to \cdots \to f_1 \to f_0$. Each model sharpens the correction provided by the model above it in the hierarchy. Only $f_1$ directly reduces the variance of $\hat{\mu}_0(\mathcal{Z}_0)$.

General ACV — direct correction: Every LF model simultaneously corrects $f_0$. The estimator is

\[ \hat{\mu}_0^{\text{ACV}} = \hat{\mu}_0(\mathcal{Z}_0) + \sum_{\alpha=1}^{M} \eta_\alpha \bigl(\hat{\mu}_\alpha(\mathcal{Z}_\alpha^*) - \hat{\mu}_\alpha(\mathcal{Z}_\alpha)\bigr), \tag{1}\]

where $\hat{\mu}_\alpha(\mathcal{Z}_\alpha^*) - \hat{\mu}_\alpha(\mathcal{Z}_\alpha)$ is the correction term for model $\alpha$: it is unbiased for zero, so the estimator remains unbiased for any weights $\eta_\alpha$.

This looks identical to the MFMC estimator — and algebraically it is. What differs is the sample-set structure. In MFMC, $\mathcal{Z}_\alpha^* = \mathcal{Z}_{\alpha-1}$ (each model anchors on the model above it in the chain). In the ACVMF estimator, $\mathcal{Z}_\alpha^* = \mathcal{Z}_0$ for every $\alpha$: all LF models share the HF sample set as their comparison point. Each correction is therefore directly correlated with $\hat{\mu}_0(\mathcal{Z}_0)$, and each one can reduce HF variance independently.

Figure 1 makes this structural difference visual.

Figure 1: Sample-set wiring for four models ($f_0$ HF, $f_1$–$f_3$ LF). Left: MFMC — corrections are chained; only $f_1$ directly touches $f_0$’s sample set $\mathcal{Z}_0$ (red). Right: ACVMF — every LF model uses $\mathcal{Z}_0$ as its anchor, so all three corrections are directly correlated with $\hat{\mu}_0(\mathcal{Z}_0)$. Unlike MFMC, ACV does not require a fidelity hierarchy among the LF models.

Breaking the Ceiling

When all LF models correct $f_0$ directly and have access to enough exclusive samples, the variance of the ACVMF estimator converges to the multi-model CV limit — the reduction achievable if all LF means were known exactly simultaneously. This limit depends on the joint correlation between $f_0$ and all LF models, not just the pairwise correlation with $f_1$.

Figure 2 shows this on the five-model polynomial benchmark.

Figure 2: Variance / MC variance vs total cost. MLMC and MFMC plateau at the CV-1 ceiling (set by $\rho_{0,1}$ alone). ACVMF converges toward the much lower CV-4 ceiling that exploits all four LF models simultaneously. CV-$k$ limits (dashed) are the theoretical minima if the means of the $k$ most informative LF models were known exactly.

The green curve converges to the CV-4 limit — the reduction possible when all four LF models simultaneously correct $f_0$. MFMC and MLMC are bounded by the much higher CV-1 limit regardless of how many additional LF samples are added.

Why? Because in Equation 1 with $\mathcal{Z}_\alpha^* = \mathcal{Z}_0$ for every $\alpha$, the optimal weights exploit the full joint covariance between $\hat{\mu}_0(\mathcal{Z}_0)$ and all $M$ corrections simultaneously. The resulting variance reduction — shown in General ACV Analysis to be a multi-model Schur complement — is at least as large as using any single correction alone, and often much larger.

Note, however, that Figure 2 also shows MFMC outperforming ACVMF at small total costs. Moreover, the plot above holds the HF sample count fixed at $N_0 = 1$. When the optimizer is free to jointly choose $N_0$ and the LF partition sizes, MFMC can outperform ACVMF on problems whose models form a natural hierarchy ordered by correlation per unit cost — for example, models obtained by successive mesh refinement. In such hierarchies the chained correction structure of MFMC aligns with the cost–accuracy ordering, and the indirect path through $f_1$ is already highly efficient. The general ACV framework pays off most when the LF models do not form a hierarchy but are still correlated with the HF model.

The Allocation Problem

As with the two-model ACV estimator, the goal is to minimize estimator variance subject to a computational budget. The estimator’s sample design is built from $M+1$ independent partitions $\mathcal{P}_0, \mathcal{P}_1, \ldots, \mathcal{P}_M$ with sample counts $m_0, m_1, \ldots, m_M$. Which models are evaluated on which partitions is determined by the allocation matrix — the structural choice that distinguishes MLMC, MFMC, and ACVMF. For ACVMF with four models, the allocation matrix assigns:

$\mathcal{P}_0$ ($m_0$ samples): all models $\{f_0, f_1, f_2, f_3\}$ — the shared HF partition
$\mathcal{P}_1$ ($m_1$ samples): models $\{f_1, f_2, f_3\}$ — LF only
$\mathcal{P}_2$ ($m_2$ samples): models $\{f_2, f_3\}$
$\mathcal{P}_3$ ($m_3$ samples): model $\{f_3\}$ only

The total number of samples for model $\alpha$ is the sum across all partitions it appears in. For ACVMF: $N_0 = m_0$, $N_1 = m_0 + m_1$, $N_2 = m_0 + m_1 + m_2$, and so on — the cheapest model accumulates the most samples.

The total cost is \[ P = \sum_{k=0}^{M} m_k\, c^k, \] where $c^k = \sum_{\alpha \in \mathcal{P}_k} c_\alpha$ is the per-sample cost of partition $k$.

Writing $r_k = m_k / m_0$ for the partition ratios (with $r_0 = 1$), the budget determines $m_0$: \[ m_0 = \frac{P}{c^0 + \sum_{k=1}^{M} r_k\, c^k}. \]

The weights $\boldsymbol{\eta}$ are optimal in closed form for any fixed allocation (see General ACV Analysis). After plugging in $\boldsymbol{\eta}^*$, the allocation problem reduces to

\[ \min_{r_1, \ldots, r_M \geq 0} \; \mathbb{V}\!\left[\hat{\mu}_0^{\text{ACV}}\right]_{\boldsymbol{\eta}=\boldsymbol{\eta}^*} \quad \text{subject to} \quad m_0\!\left(c^0 + \sum_{k=1}^{M} r_k\, c^k\right) \leq P, \tag{2}\]

an $M$-dimensional optimization over the partition ratios. Unlike the two-model case — where the single ratio $r$ has a closed-form optimum — the many-model allocation requires numerical optimization. PyApprox uses a chained optimizer (differential evolution for global exploration followed by trust-constr for local refinement) by default; see the API Cookbook for configuration details.

The key trade-off is the same as in the two-model case, replicated across partitions: enlarging any partition tightens the corrections it supports but leaves fewer resources for $m_0$, which controls the baseline HF variance $\sigma_0^2 / m_0$. The optimizer balances these competing effects jointly.

Key Takeaways

Both MLMC and MFMC plateau at the one-model CV-1 ceiling; only $f_1$ directly reduces $\hat{\mu}_0$ variance in either estimator
The ceiling is structural — a consequence of the recursive correction chain, not suboptimal weights
General ACV estimators (e.g. ACVMF) route every LF model’s correction directly to $\hat{\mu}_0$, approaching the multi-model CV-$M$ ceiling
The payoff is largest when several moderately correlated LF models exist and the gap between CV-1 and CV-$M$ is large
The gap can be estimated from the pilot covariance alone using the cv_limit function above before committing to any sample budget

Exercises

From Figure 2, at approximately what LF-to-HF ratio does ACVMF achieve half the variance of MFMC at the same ratio?
Suppose $\rho_{0,1} = 0.95$ and $\rho_{0,\alpha} = 0.6$ for all $\alpha \geq 2$. Compute the CV-1 and CV-$M$ limits as $M$ grows from 1 to 5. At what $M$ does the incremental gain become less than 1% of MC variance?
Explain in one sentence why fixing $\mathcal{Z}_\alpha^* = \mathcal{Z}_0$ for every $\alpha$ in Equation 1 is the structural change that breaks the CV-1 ceiling.

Next Steps

General ACV Analysis — Derive the optimal weight matrix $\mathbf{H}^*$, the minimum covariance formula, and visualise allocation matrices for MLMC, MFMC, ACVMF, and ACVIS

Tip

Ready to try this? See API Cookbook → ACVSearch.

References

[GGEJJCP2020] A. Gorodetsky, S. Geraci, M. Eldred, J. Jakeman. A generalized approximate control variate framework for multifidelity uncertainty quantification. Journal of Computational Physics, 408:109257, 2020. DOI
[PWGSIAM2016] B. Peherstorfer, K. Willcox, M. Gunzburger. Optimal model management for multifidelity Monte Carlo estimation. SIAM Journal on Scientific Computing, 38(5):A3163–A3194, 2016. DOI

--- title: "General Approximate Control Variates" subtitle: "PyApprox Tutorial Library" description: "How using all low-fidelity models as direct control variates for the high-fidelity model breaks the CV-1 variance ceiling that limits MLMC and MFMC." tutorial_type: concept topic: multi_fidelity difficulty: intermediate estimated_time: 7 render_time: 12 prerequisites: - mfmc_concept tags: - multi-fidelity - approximate-control-variate - variance-reduction - acvmf format: html: code-fold: false code-tools: true toc: true execute: echo: true warning: false jupyter: python3 --- ::: {.callout-tip collapse="true"} ## Download Notebook [Download as Jupyter Notebook](notebooks/acv_many_models_concept.ipynb) ::: ## Learning Objectives After completing this tutorial, you will be able to: - Explain why MLMC and MFMC both plateau at the one-model CV-1 variance ceiling - Contrast *indirect* correction (MLMC/MFMC recursive chain) with *direct* correction (every LF model corrects $f_0$ simultaneously) - Predict which variance ceiling is reachable for a given model hierarchy - Identify the settings in which switching from MFMC to a general ACV estimator pays off ## Prerequisites Complete [Multi-Fidelity Monte Carlo](mfmc_concept.qmd) before this tutorial. ## The Ceiling That MFMC Cannot Break The [MFMC Concept](mfmc_concept.qmd) tutorial showed that both MFMC and MLMC plateau at a hard variance floor as LF samples grow: $$ \min\,\mathbb{V}[\hat{\mu}_0^{\text{MFMC}}] \;\xrightarrow{r\to\infty}\; \frac{\sigma_0^2}{N_0}(1 - \rho_{0,1}^2). $$ This is the **CV-1 ceiling** — the variance reduction achievable if the mean of the single most informative LF model were known exactly. Adding more LF samples or more LF models beyond $f_1$ does not lower this floor: it only makes the estimator converge to it faster. The reason is structural. In both MLMC and MFMC, models $f_\alpha$ for $\alpha \geq 2$ act as control variates for $f_{\alpha-1}$, **not** for $f_0$. They improve the correction provided by $f_1$, but each model in the chain can only reduce variance in the step above it. Only $f_1$ is wired directly into the HF estimator $\hat{\mu}_0(\mathcal{Z}_0)$. This is the *indirect* correction structure. What if instead every LF model's correction were applied directly to $\hat{\mu}_0$? ## Direct vs Indirect Correction The structural difference is visible before any algebra. **MFMC / MLMC (indirect):** The correction chain runs $f_M \to f_{M-1} \to \cdots \to f_1 \to f_0$. Each model sharpens the correction provided by the model above it in the hierarchy. Only $f_1$ directly reduces the variance of $\hat{\mu}_0(\mathcal{Z}_0)$. **General ACV — direct correction:** Every LF model simultaneously corrects $f_0$. The estimator is $$ \hat{\mu}_0^{\text{ACV}} = \hat{\mu}_0(\mathcal{Z}_0) + \sum_{\alpha=1}^{M} \eta_\alpha \bigl(\hat{\mu}_\alpha(\mathcal{Z}_\alpha^*) - \hat{\mu}_\alpha(\mathcal{Z}_\alpha)\bigr), $$ {#eq-general-acv} where $\hat{\mu}_\alpha(\mathcal{Z}_\alpha^*) - \hat{\mu}_\alpha(\mathcal{Z}_\alpha)$ is the *correction term* for model $\alpha$: it is unbiased for zero, so the estimator remains unbiased for any weights $\eta_\alpha$. This looks identical to the MFMC estimator — and algebraically it is. What differs is the **sample-set structure**. In MFMC, $\mathcal{Z}_\alpha^* = \mathcal{Z}_{\alpha-1}$ (each model anchors on the model above it in the chain). In the **ACVMF** estimator, $\mathcal{Z}_\alpha^* = \mathcal{Z}_0$ for **every** $\alpha$: all LF models share the HF sample set as their comparison point. Each correction is therefore directly correlated with $\hat{\mu}_0(\mathcal{Z}_0)$, and each one can reduce HF variance independently. @fig-direct-vs-indirect makes this structural difference visual. ```{python} #| echo: false #| fig-cap: "Sample-set wiring for four models ($f_0$ HF, $f_1$–$f_3$ LF). Left: MFMC — corrections are chained; only $f_1$ directly touches $f_0$'s sample set $\\mathcal{Z}_0$ (red). Right: ACVMF — every LF model uses $\\mathcal{Z}_0$ as its anchor, so all three corrections are directly correlated with $\\hat{\\mu}_0(\\mathcal{Z}_0)$. Unlike MFMC, ACV does not require a fidelity hierarchy among the LF models." #| label: fig-direct-vs-indirect import matplotlib.pyplot as plt from pyapprox_tutorials.figures._cv_acv import plot_direct_vs_indirect fig, axes = plt.subplots(1, 2, figsize=(12, 5)) plot_direct_vs_indirect(axes) plt.suptitle("Indirect vs direct correction structure", fontsize=12, y=1.01) plt.tight_layout() plt.show() ``` ## Breaking the Ceiling When all LF models correct $f_0$ directly and have access to enough exclusive samples, the variance of the ACVMF estimator converges to the **multi-model CV limit** — the reduction achievable if all LF means were known exactly simultaneously. This limit depends on the *joint* correlation between $f_0$ and all LF models, not just the pairwise correlation with $f_1$. @fig-acv-ceiling shows this on the five-model polynomial benchmark. ```{python} #| echo: false #| fig-cap: "Variance / MC variance vs total cost. MLMC and MFMC plateau at the CV-1 ceiling (set by $\\rho_{0,1}$ alone). ACVMF converges toward the much lower CV-4 ceiling that exploits all four LF models simultaneously. CV-$k$ limits (dashed) are the theoretical minima if the means of the $k$ most informative LF models were known exactly." #| label: fig-acv-ceiling import numpy as np np.random.seed(42) import matplotlib.pyplot as plt from pyapprox.util.backends.numpy import NumpyBkd from pyapprox_benchmarks.statest import ( PolynomialEnsembleBenchmark, ) from pyapprox_tutorials.figures._cv_acv import plot_acv_ceiling bkd = NumpyBkd() benchmark = PolynomialEnsembleBenchmark(bkd, nmodels=5) fig, ax = plt.subplots(figsize=(10, 5)) plot_acv_ceiling(benchmark, bkd, ax) plt.tight_layout() plt.show() ``` The green curve converges to the CV-4 limit — the reduction possible when all four LF models simultaneously correct $f_0$. MFMC and MLMC are bounded by the much higher CV-1 limit regardless of how many additional LF samples are added. Why? Because in @eq-general-acv with $\mathcal{Z}_\alpha^* = \mathcal{Z}_0$ for every $\alpha$, the optimal weights exploit the *full* joint covariance between $\hat{\mu}_0(\mathcal{Z}_0)$ and all $M$ corrections simultaneously. The resulting variance reduction — shown in [General ACV Analysis](acv_many_models_analysis.qmd) to be a multi-model Schur complement — is at least as large as using any single correction alone, and often much larger. Note, however, that @fig-acv-ceiling also shows MFMC outperforming ACVMF at small total costs. Moreover, the plot above holds the HF sample count fixed at $N_0 = 1$. When the optimizer is free to jointly choose $N_0$ and the LF partition sizes, MFMC can outperform ACVMF on problems whose models form a natural hierarchy ordered by correlation per unit cost — for example, models obtained by successive mesh refinement. In such hierarchies the chained correction structure of MFMC aligns with the cost–accuracy ordering, and the indirect path through $f_1$ is already highly efficient. The general ACV framework pays off most when the LF models do not form a hierarchy but are still correlated with the HF model. ## The Allocation Problem As with the two-model ACV estimator, the goal is to minimize estimator variance subject to a computational budget. The estimator's sample design is built from $M+1$ independent **partitions** $\mathcal{P}_0, \mathcal{P}_1, \ldots, \mathcal{P}_M$ with sample counts $m_0, m_1, \ldots, m_M$. Which models are evaluated on which partitions is determined by the **allocation matrix** — the structural choice that distinguishes MLMC, MFMC, and ACVMF. For ACVMF with four models, the allocation matrix assigns: - $\mathcal{P}_0$ ($m_0$ samples): all models $\{f_0, f_1, f_2, f_3\}$ — the shared HF partition - $\mathcal{P}_1$ ($m_1$ samples): models $\{f_1, f_2, f_3\}$ — LF only - $\mathcal{P}_2$ ($m_2$ samples): models $\{f_2, f_3\}$ - $\mathcal{P}_3$ ($m_3$ samples): model $\{f_3\}$ only The total number of samples for model $\alpha$ is the sum across all partitions it appears in. For ACVMF: $N_0 = m_0$, $N_1 = m_0 + m_1$, $N_2 = m_0 + m_1 + m_2$, and so on — the cheapest model accumulates the most samples. The total cost is $$ P = \sum_{k=0}^{M} m_k\, c^k, $$ where $c^k = \sum_{\alpha \in \mathcal{P}_k} c_\alpha$ is the per-sample cost of partition $k$. Writing $r_k = m_k / m_0$ for the **partition ratios** (with $r_0 = 1$), the budget determines $m_0$: $$ m_0 = \frac{P}{c^0 + \sum_{k=1}^{M} r_k\, c^k}. $$ The weights $\boldsymbol{\eta}$ are optimal in closed form for any fixed allocation (see [General ACV Analysis](acv_many_models_analysis.qmd)). After plugging in $\boldsymbol{\eta}^*$, the allocation problem reduces to $$ \min_{r_1, \ldots, r_M \geq 0} \; \mathbb{V}\!\left[\hat{\mu}_0^{\text{ACV}}\right]_{\boldsymbol{\eta}=\boldsymbol{\eta}^*} \quad \text{subject to} \quad m_0\!\left(c^0 + \sum_{k=1}^{M} r_k\, c^k\right) \leq P, $$ {#eq-general-acv-allocation} an $M$-dimensional optimization over the partition ratios. Unlike the two-model case — where the single ratio $r$ has a closed-form optimum — the many-model allocation requires numerical optimization. PyApprox uses a chained optimizer (differential evolution for global exploration followed by trust-constr for local refinement) by default; see the [API Cookbook](multifidelity_estimation_cookbook.qmd#acvsearch-in-depth) for configuration details. The key trade-off is the same as in the two-model case, replicated across partitions: enlarging any partition tightens the corrections it supports but leaves fewer resources for $m_0$, which controls the baseline HF variance $\sigma_0^2 / m_0$. The optimizer balances these competing effects jointly. ## Key Takeaways - Both MLMC and MFMC plateau at the one-model CV-1 ceiling; only $f_1$ directly reduces $\hat{\mu}_0$ variance in either estimator - The ceiling is structural — a consequence of the recursive correction chain, not suboptimal weights - General ACV estimators (e.g. ACVMF) route every LF model's correction directly to $\hat{\mu}_0$, approaching the multi-model CV-$M$ ceiling - The payoff is largest when several moderately correlated LF models exist and the gap between CV-1 and CV-$M$ is large - The gap can be estimated from the pilot covariance alone using the `cv_limit` function above before committing to any sample budget ## Exercises 1. From @fig-acv-ceiling, at approximately what LF-to-HF ratio does ACVMF achieve half the variance of MFMC at the same ratio? 2. Suppose $\rho_{0,1} = 0.95$ and $\rho_{0,\alpha} = 0.6$ for all $\alpha \geq 2$. Compute the CV-1 and CV-$M$ limits as $M$ grows from 1 to 5. At what $M$ does the incremental gain become less than 1% of MC variance? 3. Explain in one sentence why fixing $\mathcal{Z}_\alpha^* = \mathcal{Z}_0$ for every $\alpha$ in @eq-general-acv is the structural change that breaks the CV-1 ceiling. ## Next Steps - [General ACV Analysis](acv_many_models_analysis.qmd) — Derive the optimal weight matrix $\mathbf{H}^*$, the minimum covariance formula, and visualise allocation matrices for MLMC, MFMC, ACVMF, and ACVIS ::: {.callout-tip} Ready to try this? See [API Cookbook → ACVSearch](multifidelity_estimation_cookbook.qmd#acvsearch-in-depth). ::: ## References - [GGEJJCP2020] A. Gorodetsky, S. Geraci, M. Eldred, J. Jakeman. *A generalized approximate control variate framework for multifidelity uncertainty quantification.* Journal of Computational Physics, 408:109257, 2020. [DOI](https://doi.org/10.1016/j.jcp.2020.109257) - [PWGSIAM2016] B. Peherstorfer, K. Willcox, M. Gunzburger. *Optimal model management for multifidelity Monte Carlo estimation.* SIAM Journal on Scientific Computing, 38(5):A3163–A3194, 2016. [DOI](https://doi.org/10.1137/15M1046472)