Bayesian OED: The Double-Loop Estimator

PyApprox Tutorial Library

How the double-loop Monte Carlo estimator approximates EIG by treating the marginal evidence as a nested average over prior samples.

Learning Objectives

After completing this tutorial, you will be able to:

  • State the mutual-information form of EIG and identify the term that makes it expensive to evaluate
  • Explain why the inner loop approximates the marginal evidence and what determines its accuracy
  • Describe the joint sampling requirement for the outer loop
  • Trace data flow through the double-loop algorithm and identify which arrays must be reused across loops

Prerequisites

Complete Learning from Experiments: Bayesian OED before this tutorial.

Problem Setup

We observe \(K\) candidate measurements. Each sensor \(k\) has a design weight \(w_k\) that scales its effective precision. The observation model is:

\[ y_k \mid \boldsymbol{\theta}, w_k \;\sim\; \mathcal{N}\!\left(h_k(\boldsymbol{\theta}),\; \gamma_k / w_k\right), \]

where \(h_k(\boldsymbol{\theta}) = [\mathbf{f}(\boldsymbol{\theta})]_k\) is the forward model at sensor \(k\) and \(\gamma_k\) is the base noise variance. The \(K\)-sensor joint log-likelihood is:

\[ \log p(\mathbf{y} \mid \boldsymbol{\theta}, \mathbf{w}) = -\tfrac{1}{2}\,\mathbf{r}^\top \mathbf{W}^{1/2} \boldsymbol{\Gamma}^{-1} \mathbf{W}^{1/2} \mathbf{r} + C_{\mathbf{w}}, \]

where \(\mathbf{r} = \mathbf{f}(\boldsymbol{\theta}) - \mathbf{y}\), \(\mathbf{W} = \mathrm{Diag}[\mathbf{w}]\), \(\boldsymbol{\Gamma} = \mathrm{Diag}[\boldsymbol{\gamma}]\), and

\[ C_{\mathbf{w}} = \tfrac{1}{2}\log\lvert\mathbf{W}^{1/2}\boldsymbol{\Gamma}^{-1}\mathbf{W}^{1/2}\rvert = \tfrac{1}{2}\sum_{k=1}^{K}\log(w_k/\gamma_k). \]

The weights satisfy \(\mathbf{w} \in [0,1]^K\) with \(\sum_k w_k = 1\).

Reparameterized Outer Samples

Because \(w_k\) sets the noise variance to \(\gamma_k/w_k\), the correct way to generate an outer observation is:

\[ \mathbf{y}^{(m)}(\mathbf{w}) = \mathbf{f}(\boldsymbol{\theta}^{(m)}) + \mathbf{W}^{-1/2}\boldsymbol{\Gamma}^{1/2}\boldsymbol{\varepsilon}^{(m)}, \quad \boldsymbol{\varepsilon}^{(m)} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_K). \]

Drawing from \(\mathcal{N}(\mathbf{0}, \mathbf{I}_K)\) and scaling by \(\mathbf{W}^{-1/2}\boldsymbol{\Gamma}^{1/2}\) is the reparameterization of the observation. It keeps the randomness in a fixed standard-normal draw while making \(\mathbf{y}^{(m)}\) a differentiable function of \(\mathbf{w}\). Why this matters for gradients is covered in Bayesian OED: Gradients and the Reparameterization Trick; for estimation alone, the key point is to use \(\mathbf{W}^{-1/2}\boldsymbol{\Gamma}^{1/2}\) scaling, not raw \(\boldsymbol{\Gamma}^{1/2}\), when drawing outer noise.

The Double-Loop Estimator

Why an Inner Loop?

The mutual information form of EIG is:

\[ \text{EIG}(\mathbf{w}) = \mathbb{E}_{p(\boldsymbol{\theta},\mathbf{y}\mid\mathbf{w})} \!\left[ \log p(\mathbf{y}\mid\boldsymbol{\theta},\mathbf{w}) - \log p(\mathbf{y}\mid\mathbf{w}) \right]. \]

The first term — the log-likelihood — is cheap to evaluate for any \((\boldsymbol{\theta}, \mathbf{y})\) pair. The second term — the log marginal evidence — requires integrating out all parameters:

\[ p(\mathbf{y}\mid\mathbf{w}) = \int p(\mathbf{y}\mid\boldsymbol{\theta},\mathbf{w})\,p(\boldsymbol{\theta})\,d\boldsymbol{\theta}. \]

This integral is analytically intractable for nonlinear forward models. The inner loop approximates it by averaging over \(N_\text{in}\) freshly drawn prior samples:

\[ p(\mathbf{y}\mid\mathbf{w}) \approx \frac{1}{N_\text{in}}\sum_{n=1}^{N_\text{in}} p(\mathbf{y}\mid\boldsymbol{\theta}^{(n)},\mathbf{w}). \]

The Estimator

Substituting the inner-loop approximation and averaging over \(M\) outer samples:

\[ \widehat{\text{EIG}}(\mathbf{w}) = \frac{1}{M} \sum_{m=1}^{M} \left\{ \log p\!\left(\mathbf{y}^{(m)} \mid \boldsymbol{\theta}^{(m)}, \mathbf{w}\right) - \log\!\left[ \frac{1}{N_\text{in}} \sum_{n=1}^{N_\text{in}} p\!\left(\mathbf{y}^{(m)} \mid \boldsymbol{\theta}^{(n)}, \mathbf{w}\right) \right] \right\}. \]

The outer samples \((\boldsymbol{\theta}^{(m)}, \boldsymbol{\varepsilon}^{(m)})\) are drawn jointly from the model. The inner samples \(\boldsymbol{\theta}^{(n)}\) are drawn independently from the same prior. The observation \(\mathbf{y}^{(m)}\) is reused across all \(N_\text{in}\) inner likelihood evaluations — this is what makes the estimator a nested loop rather than two independent sweeps.

Figure 1: Double-loop estimator flow. Outer loop (blue, left) draws (θ,ε) jointly and forms y(w) via the reparameterization. The observation y^(m) is passed to the inner loop (orange, right), which draws independent θ* samples and averages their likelihoods to approximate the marginal evidence. The outer log-likelihood ℓₘ and inner log-evidence feed the final EIG estimate (green, bottom).

Computational Steps

Outer loop (\(M\) samples):

  1. Draw \(\boldsymbol{\theta}^{(m)} \sim p(\boldsymbol{\theta})\) for \(m=1,\ldots,M\).

  2. Evaluate \(\mathbf{F} = [\mathbf{f}(\boldsymbol{\theta}^{(1)}), \ldots, \mathbf{f}(\boldsymbol{\theta}^{(M)})]\).

  3. Draw \(\boldsymbol{\mathcal{E}} \in \mathbb{R}^{K \times M}\) with columns \(\sim \mathcal{N}(\mathbf{0}, \mathbf{I}_K)\) and form \(\mathbf{Y} = \mathbf{F} + \mathbf{W}^{-1/2}\boldsymbol{\Gamma}^{1/2}\boldsymbol{\mathcal{E}}\).

  4. Compute per-sample log-likelihood via Cholesky:

    1. \(\mathbf{R} = \mathbf{Y} - \mathbf{F}\)
    2. Factorize \(\boldsymbol{\Gamma} = \mathbf{L}\mathbf{L}^\top\)
    3. \(\mathbf{P} = \mathbf{L}^{-1} \mathbf{W}^{1/2} \mathbf{R}\)
    4. \(\ell_m = C_{\mathbf{w}} - \tfrac{1}{2}[\mathbf{P}^\top\mathbf{P}]_{mm}\)

Inner loop (\(N_\text{in}\) samples, reusing \(\mathbf{Y}\)):

  1. Draw \(\boldsymbol{\theta}^{(n)} \sim p(\boldsymbol{\theta})\) for \(n=1,\ldots,N_\text{in}\) (independent of the outer draws).
  2. Evaluate \(\mathbf{F}^\star\).
  3. Compute residuals \(R^\star_{k,m,n} = Y_{k,m} - F^\star_{k,n}\) for all \((m,n)\).
  4. Compute log-likelihoods, then accumulate the inner evidence via stable log-sum-exp:

\[ \log\hat{p}(\mathbf{y}^{(m)}\mid\mathbf{w}) = \log\!\sum_{n=1}^{N_\text{in}}\exp(\ell_n^{(m)}) - \log N_\text{in}. \]

Final estimate: \(\widehat{\text{EIG}} = M^{-1}\sum_m [\ell_m - \log\hat{p}(\mathbf{y}^{(m)}\mid\mathbf{w})]\).

NoteLog-sum-exp is essential

Direct summation in probability space underflows to zero at realistic sample sizes. Always compute the inner average as \(\log(\text{sum of exp})\) then subtract \(\log N_\text{in}\), using the log-sum-exp trick for stability.

Key Takeaways

  • The inner loop exists solely to approximate the intractable marginal evidence \(p(\mathbf{y}\mid\mathbf{w})\); the outer loop controls the Monte Carlo variance of the full EIG estimate.
  • Outer observations must use the \(\mathbf{w}\)-dependent noise scaling: \(\mathbf{y}^{(m)} = \mathbf{f}(\boldsymbol{\theta}^{(m)}) + \mathbf{W}^{-1/2}\boldsymbol{\Gamma}^{1/2}\boldsymbol{\varepsilon}^{(m)}\).
  • The observation \(\mathbf{y}^{(m)}\) is reused across all \(N_\text{in}\) inner evaluations; inner samples are drawn independently.
  • Log-sum-exp is required when accumulating the inner average.
  • Convergence depends on both \(M\) (variance) and \(N_\text{in}\) (bias); see Bayesian OED: Verifying Convergence for empirical rates.

Exercises

  1. In step 8, why must log-sum-exp be computed over axis \(n\) (inner samples) rather than axis \(m\) (outer samples)?

  2. Suppose the forward model is linear, \(\mathbf{f}(\boldsymbol{\theta}) = \mathbf{A}\boldsymbol{\theta}\), and the prior is Gaussian. Show that the inner-loop average converges to the exact marginal as \(N_\text{in}\to\infty\) and write down what \(p(\mathbf{y}\mid\mathbf{w})\) equals analytically.

  3. The estimator is biased downward for finite \(N_\text{in}\). In which direction does the bias go as \(N_\text{in}\to\infty\), and why?

  4. What goes wrong if inner samples \(\boldsymbol{\theta}^{(n)}\) are reused from the outer loop instead of drawn independently?

Next Steps