Control Variate Monte Carlo

PyApprox Tutorial Library

Reducing Monte Carlo estimator variance using a correlated low-fidelity model with a known statistic.

Learning Objectives

After completing this tutorial, you will be able to:

  • Explain why a correlated low-fidelity model can reduce the variance of a Monte Carlo (MC) estimator
  • Write down the Control Variate Monte Carlo (CVMC) estimator and identify its free parameter \(\eta\)
  • State the optimal \(\eta\) and the resulting variance reduction in terms of model correlation
  • Identify when CVMC helps and when it does not

Prerequisites

Complete Monte Carlo Sampling and Estimator Accuracy and MSE before this tutorial.

Motivation

The Estimator Accuracy tutorial showed that the variance of the MC mean estimator is \(\sigma^2_\alpha / N\), where \(\sigma^2_\alpha\) is the variance of the high-fidelity model output \(f_\alpha\) and \(N\) is the number of samples. Reducing this variance requires either more samples — which is expensive — or a smarter estimator.

Control Variate Monte Carlo (CVMC) is the simplest example of a smarter estimator. The idea: if we have access to a cheap model \(f_\kappa\) that is correlated with \(f_\alpha\) and whose mean \(\mu_\kappa = \mathbb{E}_\theta[f_\kappa(\boldsymbol{\theta})]\) is known exactly, we can use the cheap model to cancel a large fraction of the MC error.

Figure 1 illustrates this with two models of a 2D input. The surface plots show that \(f_\alpha\) and \(f_\kappa\) have similar shape — when one is large, so is the other. The scatter plot on the right confirms that model outputs are tightly correlated (\(\rho \approx 0.9\)).

Figure 1: Left: overlaid response surfaces of \(f_\alpha\) (blue) and \(f_\kappa\) (orange) over \([-1,1]^2\), showing similar shape with a gap between them. Right: scatter plot of 100 random evaluations confirming tight output correlation \(\rho\).

The CVMC Estimator

The standard MC estimator of \(\mu_\alpha = \mathbb{E}_\theta[f_\alpha(\boldsymbol{\theta})]\) is

\[ \hat{\mu}_\alpha = \frac{1}{N} \sum_{k=1}^{N} f_\alpha(\boldsymbol{\theta}^{(k)}). \]

The CVMC estimator adds a correction term built from \(f_\kappa\):

\[ \hat{\mu}_\alpha^{\text{CV}} = \hat{\mu}_\alpha + \eta \left( \hat{\mu}_\kappa - \mu_\kappa \right) \tag{1}\]

where \(\hat{\mu}_\kappa = \frac{1}{N}\sum_{k=1}^N f_\kappa(\boldsymbol{\theta}^{(k)})\) is the MC mean of the low-fidelity model evaluated on the same \(N\) samples, and \(\eta\) is a scalar weight we are free to choose.

A critical feature of CVMC is that both models are evaluated at exactly the same set of input samples. Figure 2 illustrates this: on the left, every input sample (orange ring) has a high-fidelity evaluation (cyan dot) overlaid on top of it. On the right, the response curves show that the two models move together at these shared sample locations — the dashed connectors pair HF and LF values to highlight the correlation that the correction term exploits.

Figure 2: CVMC sampling: every input sample is evaluated by both models. Left: input space with \(N\) shared sample locations — each orange ring (LF) has a cyan dot (HF) on top. Right: the high-fidelity (cyan) and low-fidelity (orange) response curves with sampled values; dashed connectors pair the HF/LF values at each shared location, showing the correlation (\(\rho \approx 0.88\)) that makes the correction effective.

The correction term \(\hat{\mu}_\kappa - \mu_\kappa\) has mean zero — it is pure MC error in the low-fidelity estimate. Adding it to \(\hat{\mu}_\alpha\) does not introduce bias. But if the errors in \(\hat{\mu}_\alpha\) and \(\hat{\mu}_\kappa\) are correlated, choosing \(\eta\) with the right sign causes the correction to partially cancel the error in \(\hat{\mu}_\alpha\).

Variance Reduction Depends on Correlation

The key result (derived in Control Variate Analysis) is that with the optimal choice of \(\eta\), the CVMC estimator variance satisfies

\[ \mathbb{V}[\hat{\mu}_\alpha^{\text{CV}}] = \mathbb{V}[\hat{\mu}_\alpha] \left(1 - \rho^2_{\alpha\kappa}\right) \tag{2}\]

where \(\rho_{\alpha\kappa} = \mathrm{Corr}(f_\alpha, f_\kappa)\) is the correlation between the two model outputs. The factor \((1 - \rho^2_{\alpha\kappa})\) is always between 0 and 1, so CVMC always reduces (or at worst equals) the MC variance.

Figure 3 shows this relationship. Near-perfectly correlated models (\(|\rho| \approx 1\)) can reduce variance by orders of magnitude; uncorrelated models (\(\rho \approx 0\)) offer no benefit.

Figure 3: CVMC variance reduction factor \((1 - \rho^2)\) as a function of model correlation \(\rho\). A correlation of \(|\rho| = 0.9\) reduces variance by \(81\%\); \(|\rho| = 0.99\) reduces it by \(99\%\).

What CVMC Looks Like in Practice

Figure 4 shows the distribution of 1000 independent MC and CVMC mean estimates for a pair of models with \(\rho \approx 0.9\). Both estimators are centered on the true mean, confirming that CVMC is unbiased. But the CVMC histogram is dramatically narrower.

Figure 4: Distribution of 1000 independent MC and CVMC estimates of \(\mu_\alpha\). Both are centered on the true mean (black line), but CVMC has far smaller spread. The low-fidelity model has correlation \(\rho \approx 0.9\) with the high-fidelity model.

The Allocation Problem

Every multi-fidelity estimator faces the same core problem: given a computational budget \(P\), choose the free parameters of the estimator to minimize its variance. In the general case this takes the form

\[ \min_{\boldsymbol{\theta}} \;\mathbb{V}[\hat{\mu}(\boldsymbol{\theta})] \quad \text{subject to} \quad \mathrm{Cost}(\boldsymbol{\theta}) \leq P, \tag{3}\]

where \(\boldsymbol{\theta}\) collects all tunable parameters — sample counts, weights, and any structural choices — and \(\mathrm{Cost}(\boldsymbol{\theta})\) is the total computational cost of evaluating the estimator.

For CVMC the free parameters are the sample count \(N\) and the weight \(\eta\). Because both models are evaluated at every sample, the total cost is

\[ P = N\,(c_\alpha + c_\kappa), \tag{4}\]

where \(c_\alpha\) and \(c_\kappa\) are the per-sample costs of the two models. The optimal weight \(\eta^*\) is determined by the model covariance (see Control Variate Analysis) and does not depend on \(N\). The only remaining decision is \(N\) itself, and the budget constraint pins it directly:

\[ N^* = \left\lfloor \frac{P}{c_\alpha + c_\kappa} \right\rfloor. \tag{5}\]

There is no numerical optimization required — the allocation problem Equation 3 has a closed-form solution.

The closed form hides one feature worth seeing directly. Because both models are evaluated at the same \(N\) samples, the budget split between them is fixed entirely by their per-sample cost ratio \(c_\alpha : c_\kappa\) — there is no freedom to sample the cheap model more often. Figure 5 shows the consequence for an expensive high-fidelity model: at \(c_\alpha:c_\kappa = 10:1\) the high-fidelity evaluations consume roughly nine-tenths of the budget even though both models run the identical \(N^*\) times. This lock-step sampling is exactly the restriction that Approximate Control Variates relaxes by letting the low-fidelity model take additional samples of its own; however, this allocation is optimal for two models.

Figure 5: CVMC budget allocation for \(c_\alpha=1\), \(c_\kappa=0.1\), \(P=100\), giving \(N^*=\lfloor P/(c_\alpha+c_\kappa)\rfloor = 90\). Both models are evaluated the same \(N^*\) times (labeled inside each bar), but the bar height is computational cost \(N^*\cdot c_{\text{model}}\), so the expensive high-fidelity model dominates the budget. The split is pure cost ratio: CVMC cannot sample the cheap model more often than the expensive one.

This is the simplest possible case. In the ACV estimator, the LF sample ratio \(r\) becomes a second free parameter and the allocation requires solving a one-dimensional optimization. In the many-model extensions and group ACV, the number of free parameters grows with the number of models and subsets, and the allocation becomes a high-dimensional constrained optimization problem.

When Does CVMC Help?

CVMC requires two ingredients:

  1. A low-fidelity model \(f_\kappa\) with a known mean \(\mu_\kappa\). If \(\mu_\kappa\) is not known analytically, it must be estimated — introducing additional error. That case is handled by Approximate Control Variates.

  2. High correlation \(|\rho_{\alpha\kappa}|\) between the models. The variance reduction \((1 - \rho^2)\) is only substantial when \(|\rho| \gtrsim 0.5\). A weakly correlated low-fidelity model offers little benefit.

The variance reduction is entirely due to cancellation of correlated errors, not a reduction in work — CVMC evaluates both models \(N\) times, at cost \(N(c_\alpha + c_\kappa)\) per Equation 4.

Key Takeaways

  • CVMC adds a zero-mean correction \(\eta(\hat{\mu}_\kappa - \mu_\kappa)\) to the MC estimator; the correction is unbiased by construction
  • With the optimal \(\eta\), the variance reduction factor is \((1 - \rho^2_{\alpha\kappa})\), determined entirely by the correlation between models
  • \(|\rho| \approx 1\) gives near-perfect variance cancellation; \(\rho \approx 0\) gives no benefit
  • CVMC requires \(\mu_\kappa\) to be known; when it is not, use Approximate Control Variates
Tip

Ready to try this? See API Cookbook → CVEstimator.

Exercises

  1. If \(\rho = 0.7\), by what factor does CVMC reduce the variance compared to plain MC? How many fewer samples would you need to achieve the same standard error?

  2. Suppose \(f_\kappa\) is the true model \(f_\alpha\) itself (i.e., \(\rho = 1\)). What does \(\hat{\mu}^{\text{CV}}_\alpha\) reduce to? Is this useful in practice?

  3. The correction term is \(\eta(\hat{\mu}_\kappa - \mu_\kappa)\). Explain in words why a negative \(\eta\) is appropriate when the models are positively correlated (\(\rho > 0\)).

Next Steps