Control Variate Monte Carlo

PyApprox Tutorial Library

Reducing Monte Carlo estimator variance using a correlated low-fidelity model with a known statistic.

Download Notebook

Download as Jupyter Notebook

Learning Objectives

After completing this tutorial, you will be able to:

Explain why a correlated low-fidelity model can reduce the variance of a Monte Carlo (MC) estimator
Write down the Control Variate Monte Carlo (CVMC) estimator and identify its free parameter $\eta$
State the optimal $\eta$ and the resulting variance reduction in terms of model correlation
Identify when CVMC helps and when it does not

Prerequisites

Complete Monte Carlo Sampling and Estimator Accuracy and MSE before this tutorial.

Motivation

The Estimator Accuracy tutorial showed that the variance of the MC mean estimator is $\sigma^2_\alpha / N$, where $\sigma^2_\alpha$ is the variance of the high-fidelity model output $f_\alpha$ and $N$ is the number of samples. Reducing this variance requires either more samples — which is expensive — or a smarter estimator.

Control Variate Monte Carlo (CVMC) is the simplest example of a smarter estimator. The idea: if we have access to a cheap model $f_\kappa$ that is correlated with $f_\alpha$ and whose mean $\mu_\kappa = \mathbb{E}_\theta[f_\kappa(\boldsymbol{\theta})]$ is known exactly, we can use the cheap model to cancel a large fraction of the MC error.

Figure 1 illustrates this with two models of a 2D input. The surface plots show that $f_\alpha$ and $f_\kappa$ have similar shape — when one is large, so is the other. The scatter plot on the right confirms that model outputs are tightly correlated ($\rho \approx 0.9$).

Figure 1: Left: overlaid response surfaces of $f_\alpha$ (blue) and $f_\kappa$ (orange) over $[-1,1]^2$, showing similar shape with a gap between them. Right: scatter plot of 100 random evaluations confirming tight output correlation $\rho$.

The CVMC Estimator

The standard MC estimator of $\mu_\alpha = \mathbb{E}_\theta[f_\alpha(\boldsymbol{\theta})]$ is

\[ \hat{\mu}_\alpha = \frac{1}{N} \sum_{k=1}^{N} f_\alpha(\boldsymbol{\theta}^{(k)}). \]

The CVMC estimator adds a correction term built from $f_\kappa$:

\[ \hat{\mu}_\alpha^{\text{CV}} = \hat{\mu}_\alpha + \eta \left( \hat{\mu}_\kappa - \mu_\kappa \right) \tag{1}\]

where $\hat{\mu}_\kappa = \frac{1}{N}\sum_{k=1}^N f_\kappa(\boldsymbol{\theta}^{(k)})$ is the MC mean of the low-fidelity model evaluated on the same $N$ samples, and $\eta$ is a scalar weight we are free to choose.

A critical feature of CVMC is that both models are evaluated at exactly the same set of input samples. Figure 2 illustrates this: on the left, every input sample (orange ring) has a high-fidelity evaluation (cyan dot) overlaid on top of it. On the right, the response curves show that the two models move together at these shared sample locations — the dashed connectors pair HF and LF values to highlight the correlation that the correction term exploits.

Figure 2: CVMC sampling: every input sample is evaluated by both models. Left: input space with $N$ shared sample locations — each orange ring (LF) has a cyan dot (HF) on top. Right: the high-fidelity (cyan) and low-fidelity (orange) response curves with sampled values; dashed connectors pair the HF/LF values at each shared location, showing the correlation ($\rho \approx 0.88$) that makes the correction effective.

The correction term $\hat{\mu}_\kappa - \mu_\kappa$ has mean zero — it is pure MC error in the low-fidelity estimate. Adding it to $\hat{\mu}_\alpha$ does not introduce bias. But if the errors in $\hat{\mu}_\alpha$ and $\hat{\mu}_\kappa$ are correlated, choosing $\eta$ with the right sign causes the correction to partially cancel the error in $\hat{\mu}_\alpha$.

Variance Reduction Depends on Correlation

The key result (derived in Control Variate Analysis) is that with the optimal choice of $\eta$, the CVMC estimator variance satisfies

\[ \mathbb{V}[\hat{\mu}_\alpha^{\text{CV}}] = \mathbb{V}[\hat{\mu}_\alpha] \left(1 - \rho^2_{\alpha\kappa}\right) \tag{2}\]

where $\rho_{\alpha\kappa} = \mathrm{Corr}(f_\alpha, f_\kappa)$ is the correlation between the two model outputs. The factor $(1 - \rho^2_{\alpha\kappa})$ is always between 0 and 1, so CVMC always reduces (or at worst equals) the MC variance.

Figure 3 shows this relationship. Near-perfectly correlated models ($|\rho| \approx 1$) can reduce variance by orders of magnitude; uncorrelated models ($\rho \approx 0$) offer no benefit.

Figure 3: CVMC variance reduction factor $(1 - \rho^2)$ as a function of model correlation $\rho$. A correlation of $|\rho| = 0.9$ reduces variance by $81\%$; $|\rho| = 0.99$ reduces it by $99\%$.

What CVMC Looks Like in Practice

Figure 4 shows the distribution of 1000 independent MC and CVMC mean estimates for a pair of models with $\rho \approx 0.9$. Both estimators are centered on the true mean, confirming that CVMC is unbiased. But the CVMC histogram is dramatically narrower.

Figure 4: Distribution of 1000 independent MC and CVMC estimates of $\mu_\alpha$. Both are centered on the true mean (black line), but CVMC has far smaller spread. The low-fidelity model has correlation $\rho \approx 0.9$ with the high-fidelity model.

The Allocation Problem

Every multi-fidelity estimator faces the same core problem: given a computational budget $P$, choose the free parameters of the estimator to minimize its variance. In the general case this takes the form

\[ \min_{\boldsymbol{\theta}} \;\mathbb{V}[\hat{\mu}(\boldsymbol{\theta})] \quad \text{subject to} \quad \mathrm{Cost}(\boldsymbol{\theta}) \leq P, \tag{3}\]

where $\boldsymbol{\theta}$ collects all tunable parameters — sample counts, weights, and any structural choices — and $\mathrm{Cost}(\boldsymbol{\theta})$ is the total computational cost of evaluating the estimator.

For CVMC the free parameters are the sample count $N$ and the weight $\eta$. Because both models are evaluated at every sample, the total cost is

\[ P = N\,(c_\alpha + c_\kappa), \tag{4}\]

where $c_\alpha$ and $c_\kappa$ are the per-sample costs of the two models. The optimal weight $\eta^*$ is determined by the model covariance (see Control Variate Analysis) and does not depend on $N$. The only remaining decision is $N$ itself, and the budget constraint pins it directly:

\[ N^* = \left\lfloor \frac{P}{c_\alpha + c_\kappa} \right\rfloor. \tag{5}\]

There is no numerical optimization required — the allocation problem Equation 3 has a closed-form solution.

The closed form hides one feature worth seeing directly. Because both models are evaluated at the same $N$ samples, the budget split between them is fixed entirely by their per-sample cost ratio $c_\alpha : c_\kappa$ — there is no freedom to sample the cheap model more often. Figure 5 shows the consequence for an expensive high-fidelity model: at $c_\alpha:c_\kappa = 10:1$ the high-fidelity evaluations consume roughly nine-tenths of the budget even though both models run the identical $N^*$ times. This lock-step sampling is exactly the restriction that Approximate Control Variates relaxes by letting the low-fidelity model take additional samples of its own; however, this allocation is optimal for two models.

Figure 5: CVMC budget allocation for $c_\alpha=1$, $c_\kappa=0.1$, $P=100$, giving $N^*=\lfloor P/(c_\alpha+c_\kappa)\rfloor = 90$. Both models are evaluated the same $N^*$ times (labeled inside each bar), but the bar height is computational cost $N^*\cdot c_{\text{model}}$, so the expensive high-fidelity model dominates the budget. The split is pure cost ratio: CVMC cannot sample the cheap model more often than the expensive one.

This is the simplest possible case. In the ACV estimator, the LF sample ratio $r$ becomes a second free parameter and the allocation requires solving a one-dimensional optimization. In the many-model extensions and group ACV, the number of free parameters grows with the number of models and subsets, and the allocation becomes a high-dimensional constrained optimization problem.

When Does CVMC Help?

CVMC requires two ingredients:

A low-fidelity model $f_\kappa$ with a known mean $\mu_\kappa$. If $\mu_\kappa$ is not known analytically, it must be estimated — introducing additional error. That case is handled by Approximate Control Variates.
High correlation $|\rho_{\alpha\kappa}|$ between the models. The variance reduction $(1 - \rho^2)$ is only substantial when $|\rho| \gtrsim 0.5$. A weakly correlated low-fidelity model offers little benefit.

The variance reduction is entirely due to cancellation of correlated errors, not a reduction in work — CVMC evaluates both models $N$ times, at cost $N(c_\alpha + c_\kappa)$ per Equation 4.

Key Takeaways

CVMC adds a zero-mean correction $\eta(\hat{\mu}_\kappa - \mu_\kappa)$ to the MC estimator; the correction is unbiased by construction
With the optimal $\eta$, the variance reduction factor is $(1 - \rho^2_{\alpha\kappa})$, determined entirely by the correlation between models
$|\rho| \approx 1$ gives near-perfect variance cancellation; $\rho \approx 0$ gives no benefit
CVMC requires $\mu_\kappa$ to be known; when it is not, use Approximate Control Variates

Tip

Ready to try this? See API Cookbook → CVEstimator.

Exercises

If $\rho = 0.7$, by what factor does CVMC reduce the variance compared to plain MC? How many fewer samples would you need to achieve the same standard error?
Suppose $f_\kappa$ is the true model $f_\alpha$ itself (i.e., $\rho = 1$). What does $\hat{\mu}^{\text{CV}}_\alpha$ reduce to? Is this useful in practice?
The correction term is $\eta(\hat{\mu}_\kappa - \mu_\kappa)$. Explain in words why a negative $\eta$ is appropriate when the models are positively correlated ($\rho > 0$).

Next Steps

Control Variate Analysis — Derive the optimal $\eta$ and the $1 - \rho^2$ result from first principles
API Cookbook — Use the PyApprox CVMC API on a real model
Approximate Control Variates — What to do when $\mu_\kappa$ is unknown

--- title: "Control Variate Monte Carlo" subtitle: "PyApprox Tutorial Library" description: "Reducing Monte Carlo estimator variance using a correlated low-fidelity model with a known statistic." tutorial_type: concept topic: multi_fidelity difficulty: beginner estimated_time: 7 render_time: 10 prerequisites: - monte_carlo_sampling - estimator_accuracy_mse tags: - multi-fidelity - control-variate - variance-reduction - monte-carlo format: html: code-fold: false code-tools: true toc: true execute: echo: true warning: false jupyter: python3 --- ::: {.callout-tip collapse="true"} ## Download Notebook [Download as Jupyter Notebook](notebooks/control_variate_concept.ipynb) ::: ## Learning Objectives After completing this tutorial, you will be able to: - Explain why a correlated low-fidelity model can reduce the variance of a Monte Carlo (MC) estimator - Write down the Control Variate Monte Carlo (CVMC) estimator and identify its free parameter $\eta$ - State the optimal $\eta$ and the resulting variance reduction in terms of model correlation - Identify when CVMC helps and when it does not ## Prerequisites Complete [Monte Carlo Sampling](monte_carlo_sampling.qmd) and [Estimator Accuracy and MSE](estimator_accuracy_mse.qmd) before this tutorial. ## Motivation The [Estimator Accuracy](estimator_accuracy_mse.qmd) tutorial showed that the variance of the MC mean estimator is $\sigma^2_\alpha / N$, where $\sigma^2_\alpha$ is the variance of the high-fidelity model output $f_\alpha$ and $N$ is the number of samples. Reducing this variance requires either more samples --- which is expensive --- or a smarter estimator. Control Variate Monte Carlo (CVMC) is the simplest example of a smarter estimator. The idea: if we have access to a cheap model $f_\kappa$ that is **correlated** with $f_\alpha$ and whose mean $\mu_\kappa = \mathbb{E}_\theta[f_\kappa(\boldsymbol{\theta})]$ is **known exactly**, we can use the cheap model to cancel a large fraction of the MC error. @fig-model-surfaces illustrates this with two models of a 2D input. The surface plots show that $f_\alpha$ and $f_\kappa$ have similar shape --- when one is large, so is the other. The scatter plot on the right confirms that model outputs are tightly correlated ($\rho \approx 0.9$). ```{python} #| echo: false #| fig-cap: "Left: overlaid response surfaces of $f_\\alpha$ (blue) and $f_\\kappa$ (orange) over $[-1,1]^2$, showing similar shape with a gap between them. Right: scatter plot of 100 random evaluations confirming tight output correlation $\\rho$." #| label: fig-model-surfaces import numpy as np import matplotlib.pyplot as plt from pyapprox.util.backends.numpy import NumpyBkd from pyapprox_benchmarks.statest import ( TunableEnsembleBenchmark, ) from pyapprox_tutorials.figures._cv_acv import plot_model_surfaces bkd = NumpyBkd() np.random.seed(0) benchmark = TunableEnsembleBenchmark(bkd, theta1=np.pi / 2 * 0.95) fig = plt.figure(figsize=(12, 5)) ax1 = fig.add_subplot(121, projection="3d") ax2 = fig.add_subplot(122) plot_model_surfaces(benchmark, bkd, ax1, ax2) plt.tight_layout() plt.show() ``` ## The CVMC Estimator The standard MC estimator of $\mu_\alpha = \mathbb{E}_\theta[f_\alpha(\boldsymbol{\theta})]$ is $$ \hat{\mu}_\alpha = \frac{1}{N} \sum_{k=1}^{N} f_\alpha(\boldsymbol{\theta}^{(k)}). $$ The CVMC estimator adds a correction term built from $f_\kappa$: $$ \hat{\mu}_\alpha^{\text{CV}} = \hat{\mu}_\alpha + \eta \left( \hat{\mu}_\kappa - \mu_\kappa \right) $$ {#eq-cvmc} where $\hat{\mu}_\kappa = \frac{1}{N}\sum_{k=1}^N f_\kappa(\boldsymbol{\theta}^{(k)})$ is the MC mean of the low-fidelity model evaluated on the **same** $N$ samples, and $\eta$ is a scalar weight we are free to choose. A critical feature of CVMC is that both models are evaluated at exactly the same set of input samples. @fig-cv-sampling illustrates this: on the left, every input sample (orange ring) has a high-fidelity evaluation (cyan dot) overlaid on top of it. On the right, the response curves show that the two models move together at these shared sample locations — the dashed connectors pair HF and LF values to highlight the correlation that the correction term exploits. ```{python} #| echo: false #| fig-cap: "CVMC sampling: every input sample is evaluated by both models. Left: input space with $N$ shared sample locations — each orange ring (LF) has a cyan dot (HF) on top. Right: the high-fidelity (cyan) and low-fidelity (orange) response curves with sampled values; dashed connectors pair the HF/LF values at each shared location, showing the correlation ($\\rho \\approx 0.88$) that makes the correction effective." #| label: fig-cv-sampling import matplotlib.pyplot as plt from pyapprox_tutorials.figures._cv_acv import plot_cv_sampling fig, axes = plt.subplots(1, 2, figsize=(12, 4)) fig.patch.set_facecolor("#05080d") plot_cv_sampling(axes) plt.tight_layout() plt.show() ``` The correction term $\hat{\mu}_\kappa - \mu_\kappa$ has **mean zero** --- it is pure MC error in the low-fidelity estimate. Adding it to $\hat{\mu}_\alpha$ does not introduce bias. But if the errors in $\hat{\mu}_\alpha$ and $\hat{\mu}_\kappa$ are correlated, choosing $\eta$ with the right sign causes the correction to partially cancel the error in $\hat{\mu}_\alpha$. ## Variance Reduction Depends on Correlation The key result (derived in [Control Variate Analysis](control_variate_analysis.qmd)) is that with the optimal choice of $\eta$, the CVMC estimator variance satisfies $$ \mathbb{V}[\hat{\mu}_\alpha^{\text{CV}}] = \mathbb{V}[\hat{\mu}_\alpha] \left(1 - \rho^2_{\alpha\kappa}\right) $$ {#eq-variance-reduction} where $\rho_{\alpha\kappa} = \mathrm{Corr}(f_\alpha, f_\kappa)$ is the correlation between the two model outputs. The factor $(1 - \rho^2_{\alpha\kappa})$ is always between 0 and 1, so CVMC always reduces (or at worst equals) the MC variance. @fig-variance-reduction-vs-rho shows this relationship. Near-perfectly correlated models ($|\rho| \approx 1$) can reduce variance by orders of magnitude; uncorrelated models ($\rho \approx 0$) offer no benefit. ```{python} #| echo: false #| fig-cap: "CVMC variance reduction factor $(1 - \\rho^2)$ as a function of model correlation $\\rho$. A correlation of $|\\rho| = 0.9$ reduces variance by $81\\%$; $|\\rho| = 0.99$ reduces it by $99\\%$." #| label: fig-variance-reduction-vs-rho import matplotlib.pyplot as plt from pyapprox_tutorials.figures._cv_acv import plot_variance_reduction_vs_rho fig, ax = plt.subplots(figsize=(7, 4)) plot_variance_reduction_vs_rho(ax) plt.tight_layout() plt.show() ``` ## What CVMC Looks Like in Practice @fig-cvmc-histograms shows the distribution of 1000 independent MC and CVMC mean estimates for a pair of models with $\rho \approx 0.9$. Both estimators are centered on the true mean, confirming that CVMC is unbiased. But the CVMC histogram is dramatically narrower. ```{python} #| echo: false #| fig-cap: "Distribution of 1000 independent MC and CVMC estimates of $\\mu_\\alpha$. Both are centered on the true mean (black line), but CVMC has far smaller spread. The low-fidelity model has correlation $\\rho \\approx 0.9$ with the high-fidelity model." #| label: fig-cvmc-histograms import numpy as np import matplotlib.pyplot as plt from pyapprox.util.backends.numpy import NumpyBkd from pyapprox_benchmarks.statest import ( TunableEnsembleBenchmark, ) from pyapprox_tutorials.figures._cv_acv import plot_cvmc_histograms bkd = NumpyBkd() np.random.seed(0) benchmark = TunableEnsembleBenchmark(bkd, theta1=np.pi / 2 * 0.95) fig, axes = plt.subplots(1, 2, figsize=(12, 4), sharey=True) N, rho, n_trials = plot_cvmc_histograms(benchmark, bkd, axes) fig.suptitle( rf"$N = {N}$, $\rho = {rho:.2f}$, {n_trials} independent trials", fontsize=11, ) plt.tight_layout() plt.show() ``` ## The Allocation Problem Every multi-fidelity estimator faces the same core problem: given a computational budget $P$, choose the free parameters of the estimator to minimize its variance. In the general case this takes the form $$ \min_{\boldsymbol{\theta}} \;\mathbb{V}[\hat{\mu}(\boldsymbol{\theta})] \quad \text{subject to} \quad \mathrm{Cost}(\boldsymbol{\theta}) \leq P, $$ {#eq-allocation-problem} where $\boldsymbol{\theta}$ collects all tunable parameters — sample counts, weights, and any structural choices — and $\mathrm{Cost}(\boldsymbol{\theta})$ is the total computational cost of evaluating the estimator. For CVMC the free parameters are the sample count $N$ and the weight $\eta$. Because both models are evaluated at every sample, the total cost is $$ P = N\,(c_\alpha + c_\kappa), $$ {#eq-cvmc-cost} where $c_\alpha$ and $c_\kappa$ are the per-sample costs of the two models. The optimal weight $\eta^*$ is determined by the model covariance (see [Control Variate Analysis](control_variate_analysis.qmd)) and does not depend on $N$. The only remaining decision is $N$ itself, and the budget constraint pins it directly: $$ N^* = \left\lfloor \frac{P}{c_\alpha + c_\kappa} \right\rfloor. $$ {#eq-cvmc-nstar} There is no numerical optimization required --- the allocation problem @eq-allocation-problem has a closed-form solution. The closed form hides one feature worth seeing directly. Because both models are evaluated at the **same** $N$ samples, the budget split between them is fixed entirely by their per-sample cost ratio $c_\alpha : c_\kappa$ --- there is no freedom to sample the cheap model more often. @fig-cvmc-cost-allocation shows the consequence for an expensive high-fidelity model: at $c_\alpha:c_\kappa = 10:1$ the high-fidelity evaluations consume roughly nine-tenths of the budget even though both models run the identical $N^*$ times. This lock-step sampling is exactly the restriction that [Approximate Control Variates](acv_concept.qmd) relaxes by letting the low-fidelity model take additional samples of its own; however, this allocation is optimal for two models. ```{python} #| echo: false #| label: fig-cvmc-cost-allocation #| fig-cap: "CVMC budget allocation for $c_\\alpha=1$, $c_\\kappa=0.1$, $P=100$, giving $N^*=\\lfloor P/(c_\\alpha+c_\\kappa)\\rfloor = 90$. Both models are evaluated the same $N^*$ times (labeled inside each bar), but the bar height is computational cost $N^*\\cdot c_{\\text{model}}$, so the expensive high-fidelity model dominates the budget. The split is pure cost ratio: CVMC cannot sample the cheap model more often than the expensive one." import matplotlib.pyplot as plt from pyapprox_tutorials.figures._cv_acv import plot_cvmc_cost_allocation fig, ax = plt.subplots(figsize=(6.5, 4.2)) plot_cvmc_cost_allocation(ax) plt.tight_layout() plt.show() ``` This is the simplest possible case. In the [ACV estimator](acv_concept.qmd), the LF sample ratio $r$ becomes a second free parameter and the allocation requires solving a one-dimensional optimization. In the [many-model extensions](acv_many_models_concept.qmd) and [group ACV](group_acv_concept.qmd), the number of free parameters grows with the number of models and subsets, and the allocation becomes a high-dimensional constrained optimization problem. ## When Does CVMC Help? CVMC requires two ingredients: 1. **A low-fidelity model $f_\kappa$ with a known mean $\mu_\kappa$.** If $\mu_\kappa$ is not known analytically, it must be estimated --- introducing additional error. That case is handled by [Approximate Control Variates](acv_concept.qmd). 2. **High correlation $|\rho_{\alpha\kappa}|$ between the models.** The variance reduction $(1 - \rho^2)$ is only substantial when $|\rho| \gtrsim 0.5$. A weakly correlated low-fidelity model offers little benefit. The variance reduction is entirely due to cancellation of correlated errors, not a reduction in work --- CVMC evaluates both models $N$ times, at cost $N(c_\alpha + c_\kappa)$ per @eq-cvmc-cost. ## Key Takeaways - CVMC adds a zero-mean correction $\eta(\hat{\mu}_\kappa - \mu_\kappa)$ to the MC estimator; the correction is unbiased by construction - With the optimal $\eta$, the variance reduction factor is $(1 - \rho^2_{\alpha\kappa})$, determined entirely by the correlation between models - $|\rho| \approx 1$ gives near-perfect variance cancellation; $\rho \approx 0$ gives no benefit - CVMC requires $\mu_\kappa$ to be known; when it is not, use [Approximate Control Variates](acv_concept.qmd) ::: {.callout-tip} Ready to try this? See [API Cookbook → CVEstimator](multifidelity_estimation_cookbook.qmd#estimator-quick-reference). ::: ## Exercises 1. If $\rho = 0.7$, by what factor does CVMC reduce the variance compared to plain MC? How many fewer samples would you need to achieve the same standard error? 2. Suppose $f_\kappa$ is the true model $f_\alpha$ itself (i.e., $\rho = 1$). What does $\hat{\mu}^{\text{CV}}_\alpha$ reduce to? Is this useful in practice? 3. The correction term is $\eta(\hat{\mu}_\kappa - \mu_\kappa)$. Explain in words why a negative $\eta$ is appropriate when the models are positively correlated ($\rho > 0$). ## Next Steps - [Control Variate Analysis](control_variate_analysis.qmd) --- Derive the optimal $\eta$ and the $1 - \rho^2$ result from first principles - [API Cookbook](multifidelity_estimation_cookbook.qmd#estimator-quick-reference) --- Use the PyApprox CVMC API on a real model - [Approximate Control Variates](acv_concept.qmd) --- What to do when $\mu_\kappa$ is unknown