Two Model Control Variate Monte Carlo

This tutorial describes how to implement and deploy control variate Monte Carlo sampling to compute the statistics of the output of a high-fidelity model using a lower-fidelity model with a known mean. The information presented here builds upon the tutorial Monte Carlo Quadrature. We will focus on estimation of a single statistic for now, but control variates can be used to estiamte multiple statistics simultaneoulsy.

Let us introduce a model $f_{κ}$ with known statistic $Q_{κ}$ . We can use this model to estimate the mean of $f_{α}$ via [LMWOR1982]

Q_{α}^{CV} (Z_{N}) = Q_{α} (Z_{N}) + η (Q_{κ} (Z_{N}) - Q_{κ})

Here $η$ is a free parameter which can be optimized to the reduce the variance of this so called control variate estimator, which is given by

\begin{aligned} V [Q_{α}^{CV} (Z_{N})] & = V [Q_{α} (Z_{N}) + η (Q_{κ} (Z_{N}) - Q_{κ})] \\ = V [Q_{α} (Z_{N})] + η^{2} V [(Q_{κ} (Z_{N}) - Q_{κ})] + 2 η C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})] \\ = V [Q_{α} (Z_{N})] (1 + η^{2} \frac{V [(Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]} + 2 η \frac{C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]}) . \end{aligned}

The first line follows from the variance of sums of random variables.

We can measure the change in MSE of the control variate estimator from the single model MC estimator, by looking at the ratio of the CVMC and MC estimator variances. This variance reduction ratio is

γ = \frac{V [Q_{α}^{CV} (Z_{N})]}{V [Q_{α} (Z_{N})]} = (1 + η^{2} \frac{V [(Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]} + 2 η \frac{C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]})

and can be minimized by setting its gradient to zero and solving for $η$ , i.e.

\begin{aligned} \frac{d}{d η} γ & = 2 η \frac{V [(Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]} + 2 \frac{C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]} = 0 \\ ⟹ η V [(Q_{κ} (Z_{N}) - Q_{κ})] + C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})] = 0 \\ ⟹ η = - \frac{C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}{V [(Q_{κ} (Z_{N}) - Q_{κ})]} \\ = - \frac{C ov [Q_{α} (Z_{N}), Q_{κ} (Z_{N})]}{V [Q_{κ} (Z_{N})]} \end{aligned}

With this choice

\begin{aligned} γ & = 1 + \frac{C ov {[Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}^{2}}{{V [(Q_{κ} (Z_{N}) - Q_{κ})]}^{2}} \frac{V [(Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]} \\ - 2 \frac{C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}{V [(Q_{κ} (Z_{N}) - Q_{κ})]} \frac{C ov [Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}{V [Q_{α} (Z_{N})]} \\ = 1 + \frac{C ov {[Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}^{2}}{V [(Q_{κ} (Z_{N}) - Q_{κ})] V [Q_{α} (Z_{N})]} - 2 \frac{C ov {[Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}^{2}}{V [(Q_{κ} (Z_{N}) - Q_{κ})] V [Q_{α} (Z_{N})]} \\ = 1 - C or {[Q_{α} (Z_{N}), (Q_{κ} (Z_{N}) - Q_{κ})]}^{2} \\ = 1 - C or {[Q_{α} (Z_{N}), Q_{κ} (Z_{N})]}^{2} \end{aligned}

When estimating the mean we can use Equation to obtain

\begin{aligned} η & = - \frac{C ov [Q_{α} (Z_{N}), Q_{κ} (Z_{N})]}{V [Q_{κ} (Z_{N})]} \\ = N^{- 1} \frac{C ov [f_{α}, f_{κ}]}{N^{- 1} V [f_{κ}]} \\ = \frac{C ov [f_{α}, f_{κ}]}{V [f_{κ}]} \end{aligned}

which we can plug back into to $γ$ to give

\begin{aligned} γ & = 1 - C or {[Q_{α} (Z_{N}), Q_{κ} (Z_{N})]}^{2} \\ = 1 - C or {[f_{α}, f_{κ}]}^{2} \end{aligned}

and so

V [Q_{α}^{CV} (Z_{N})] = V [Q_{α} (Z_{N})] (1 - C or {[f_{α}, f_{κ}]}^{2})

Thus, if two highly correlated models (one with a known mean) are available then we can drastically reduce the MSE of our estimate of the unknown mean. Similar reductions can be obtained for other statistics such as variance. But when estimating variance the estimator variance reduction ratio will no nolonger depend just on the correlation between the models but also higher order moments.

Again consider the tunable model ensemble. The correlation between the models $f_{0}$ and $f_{1}$ can be tuned by varying $θ_{1}$ . For a given choice of theta lets compute a single relization of the CVMC estimate of $Q_{0} = E [f_{0}]$

First let us setup the problem and compute a single estimate using CVMC

import numpy as np
import matplotlib.pyplot as plt
from pyapprox.benchmarks import setup_benchmark

np.random.seed(1)
shifts = [.1, .2]
benchmark = setup_benchmark(
    "tunable_model_ensemble", theta1=np.pi/2*.95, shifts=shifts)
model = benchmark.fun

nsamples = int(1e2)
samples = benchmark.variable.rvs(nsamples)
values0 = model.m0(samples)
values1 = model.m1(samples)
cov = benchmark.covariance
eta = -cov[0, 1]/cov[0, 0]
#cov_mc = np.cov(values0,values1)
#eta_mc = -cov_mc[0,1]/cov_mc[0,0]
exact_integral_f0, exact_integral_f1 = 0, shifts[0]
cv_mean = values0.mean()+eta*(values1.mean()-exact_integral_f1)
print('MC difference squared =', (values0.mean()-exact_integral_f0)**2)
print('CVMC difference squared =', (cv_mean-exact_integral_f0)**2)

MC difference squared = 0.01473604359753749
CVMC difference squared = 5.954528881712521e-05

Now lets look at the statistical properties of the CVMC estimator

ntrials = 1000
means = np.empty((ntrials, 2))
for ii in range(ntrials):
    samples = benchmark.variable.rvs(nsamples)
    values0 = model.m0(samples)
    values1 = model.m1(samples)
    means[ii, 0] = values0.mean()
    means[ii, 1] = values0.mean()+eta*(values1.mean()-exact_integral_f1)

print("Theoretical variance reduction",
      1-cov[0, 1]**2/(cov[0, 0]*cov[1, 1]))
print("Achieved variance reduction",
      means[:, 1].var(axis=0)/means[:, 0].var(axis=0))

Theoretical variance reduction 0.055234554161570304
Achieved variance reduction 0.05749985629263281

The following plot shows that unlike the MC estimator of. $E [f_{1}]$ the CVMC estimator is unbiased and has a smaller variance.

fig,ax = plt.subplots()
textstr = '\n'.join(
    [r'$\mathbb{E}[Q_{0}(\mathcal{Z}_N)]=\mathrm{%.2e}$' % means[:, 0].mean(),
     r'$\mathbb{V}[Q_{0}(\mathcal{Z}_N)]=\mathrm{%.2e}$' % means[:, 0].var(),
     r'$\mathbb{E}[Q_{0}^\mathrm{CV}(\mathcal{Z}_N)]=\mathrm{%.2e}$' % (
         means[:, 1].mean()),
     r'$\mathbb{V}[Q_{0}^\mathrm{CV}(\mathcal{Z}_N)]=\mathrm{%.2e}$' % (
         means[:, 1].var())])
ax.hist(means[:, 0], bins=ntrials//100, density=True, alpha=0.5,
        label=r'$Q_{0}(\mathcal{Z}_N)$')
ax.hist(means[:, 1], bins=ntrials//100, density=True, alpha=0.5,
        label=r'$Q_{0}^\mathrm{CV}(\mathcal{Z}_N)$')
ax.axvline(x=0,c='k',label=r'$\mathbb{E}[Q_0]$')
props = {'boxstyle': 'round', 'facecolor': 'white', 'alpha': 1}
ax.text(0.6, 0.75, textstr,transform=ax.transAxes, bbox=props)
_ = ax.legend(loc='upper left')