Sobol Sensitivity Analysis

PyApprox Tutorial Library

Variance-based sensitivity indices for identifying important parameters

Learning Objectives

After completing this tutorial, you will be able to:

  • Explain variance-based sensitivity analysis
  • Interpret first-order and total-order Sobol indices
  • Estimate sensitivity indices using Monte Carlo
  • Identify the most important parameters in a model

Prerequisites

Complete Sensitivity Analysis before this tutorial.

Why Sensitivity Analysis?

Sensitivity Analysis (SA) answers: Which parameters matter most?

This helps us:

  • Focus resources on reducing important uncertainties
  • Simplify models by fixing unimportant parameters
  • Understand model behavior and interactions

Variance-Based Sensitivity

The idea: decompose output variance into contributions from each parameter.

Total variance: \[ \Var_{\params}[q] = \Var_{\theta_1}[\E_{\theta_{\sim 1}}[q | \theta_1]] + \E_{\theta_1}[\Var_{\theta_{\sim 1}}[q | \theta_1]] \]

The first term measures how much knowing \(\theta_1\) reduces variance.

Sobol Indices

First-Order Index

The first-order Sobol index \(\Sobol{i}\) measures the fraction of variance explained by parameter \(\theta_i\) alone:

\[ \Sobol{i} = \frac{\Var_{\theta_i}[\E_{\theta_{\sim i}}[q | \theta_i]]}{\Var_{\params}[q]} \]

  • \(\Sobol{i} \approx 1\): Parameter \(i\) explains nearly all variance
  • \(\Sobol{i} \approx 0\): Parameter \(i\) has little direct effect

Total-Order Index

The total-order Sobol index \(\SobolT{i}\) includes interactions with other parameters:

\[ \SobolT{i} = 1 - \frac{\Var_{\theta_{\sim i}}[\E_{\theta_i}[q | \theta_{\sim i}]]}{\Var_{\params}[q]} \]

where \(\theta_{\sim i}\) means all parameters except \(\theta_i\).

  • \(\SobolT{i}\) includes direct effects AND interactions
  • Always \(\SobolT{i} \geq \Sobol{i}\)
  • If \(\SobolT{i} \approx \Sobol{i}\): few interactions involving \(\theta_i\)

Setup

import numpy as np
import matplotlib.pyplot as plt
from pyapprox.util.backends.numpy import NumpyBkd
from pyapprox.benchmarks import lotka_volterra_3species

bkd = NumpyBkd()
benchmark = lotka_volterra_3species(bkd)
gt = benchmark.ground_truth()

# QoI: Species 1 final population
qoi_func = benchmark.qoi_function(functional="endpoint_0")
print(f"Model has {qoi_func.nvars()} parameters")
Model has 12 parameters

Estimating Sobol Indices

We’ll use a simple Monte Carlo approach. For a more efficient implementation, see PyApprox’s built-in Sobol estimators.

Generate Sample Matrices

The Sobol method requires two independent sample matrices:

np.random.seed(42)

N = 1000  # Base sample size
d = 12    # Number of parameters
lower, upper = 0.3, 0.7

# Two independent sample matrices
A = lower + (upper - lower) * np.random.rand(d, N)
B = lower + (upper - lower) * np.random.rand(d, N)

print(f"Matrix A shape: {A.shape}")
print(f"Matrix B shape: {B.shape}")
Matrix A shape: (12, 1000)
Matrix B shape: (12, 1000)

Evaluate Base Cases

# Evaluate f(A) and f(B)
f_A = bkd.to_numpy(qoi_func(bkd.asarray(A))).flatten()
f_B = bkd.to_numpy(qoi_func(bkd.asarray(B))).flatten()

print(f"f(A) mean: {np.mean(f_A):.4f}")
print(f"f(B) mean: {np.mean(f_B):.4f}")
f(A) mean: 0.6353
f(B) mean: 0.6412

Create Mixed Matrices

For each parameter \(i\), create a matrix \(C_i\) that equals \(B\) except column \(i\) comes from \(A\):

def make_C_matrix(A, B, i):
    """Create C_i matrix: B with column i from A."""
    C = B.copy()
    C[i, :] = A[i, :]
    return C

# Example: C_0 has parameter 0 from A, all others from B
C_0 = make_C_matrix(A, B, 0)
print(f"C_0 differs from B only in row 0:")
print(f"  B[0, :5] = {B[0, :5]}")
print(f"  C_0[0, :5] = {C_0[0, :5]}")
print(f"  A[0, :5] = {A[0, :5]}")
C_0 differs from B only in row 0:
  B[0, :5] = [0.56172253 0.33201303 0.39693192 0.60947174 0.51147434]
  C_0[0, :5] = [0.44981605 0.68028572 0.59279758 0.53946339 0.36240746]
  A[0, :5] = [0.44981605 0.68028572 0.59279758 0.53946339 0.36240746]

Compute First-Order Indices

# Estimate variance
var_total = np.var(np.concatenate([f_A, f_B]))

first_order = np.zeros(d)
for i in range(d):
    C_i = make_C_matrix(A, B, i)
    f_C_i = bkd.to_numpy(qoi_func(bkd.asarray(C_i))).flatten()

    # Jansen estimator for first-order index
    first_order[i] = np.mean(f_B * (f_C_i - f_A)) / var_total

print("First-order Sobol indices (S_i):")
for i, s in enumerate(first_order):
    print(f"  θ_{i:2d}: {s:.4f}")
First-order Sobol indices (S_i):
  θ_ 0: 0.8804
  θ_ 1: 1.0620
  θ_ 2: 1.0815
  θ_ 3: 0.7431
  θ_ 4: 0.8226
  θ_ 5: 0.9421
  θ_ 6: 1.0186
  θ_ 7: 0.9968
  θ_ 8: 1.0563
  θ_ 9: 1.0462
  θ_10: 1.0244
  θ_11: 1.0356

Compute Total-Order Indices

total_order = np.zeros(d)
for i in range(d):
    C_i = make_C_matrix(A, B, i)
    f_C_i = bkd.to_numpy(qoi_func(bkd.asarray(C_i))).flatten()

    # Jansen estimator for total-order index
    total_order[i] = 0.5 * np.mean((f_A - f_C_i)**2) / var_total

print("Total-order Sobol indices (S^T_i):")
for i, st in enumerate(total_order):
    print(f"  θ_{i:2d}: {st:.4f}")
Total-order Sobol indices (S^T_i):
  θ_ 0: 0.9545
  θ_ 1: 1.0279
  θ_ 2: 1.0174
  θ_ 3: 0.8314
  θ_ 4: 0.7547
  θ_ 5: 0.8646
  θ_ 6: 1.0135
  θ_ 7: 0.9796
  θ_ 8: 1.0243
  θ_ 9: 1.0068
  θ_10: 0.9911
  θ_11: 1.0150

Visualizing Results

from pyapprox.tutorial_figures._sensitivity import plot_sobol_indices

# Parameter labels
labels = [f'r_{i}' for i in range(1, 4)] + \
         [f'a_{i//3+1}{i%3+1}' for i in range(9)]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
plot_sobol_indices(first_order, total_order, labels, d, axes)
plt.tight_layout()
plt.show()
Figure 1: Sobol index estimates for the 3-species Lotka–Volterra model (QoI: species 1 final population). Left: first-order indices \(S_i\) showing direct parameter contributions. Right: comparison of first-order and total-order indices—large gaps (e.g. interaction coefficients) indicate parameter interactions.

Interpreting Results

# Sort parameters by total-order importance
importance_order = np.argsort(total_order)[::-1]

print("Parameters ranked by importance (total-order):")
print("-" * 50)
print(f"{'Rank':<6}{'Param':<8}{'S_i':<10}{'S^T_i':<10}{'Interaction':<12}")
print("-" * 50)

for rank, i in enumerate(importance_order[:6]):  # Top 6
    interaction = total_order[i] - first_order[i]
    print(f"{rank+1:<6}{labels[i]:<8}{first_order[i]:<10.4f}{total_order[i]:<10.4f}{interaction:<12.4f}")
Parameters ranked by importance (total-order):
--------------------------------------------------
Rank  Param   S_i       S^T_i     Interaction 
--------------------------------------------------
1     r_2     1.0620    1.0279    -0.0341     
2     a_23    1.0563    1.0243    -0.0319     
3     r_3     1.0815    1.0174    -0.0641     
4     a_33    1.0356    1.0150    -0.0206     
5     a_21    1.0186    1.0135    -0.0051     
6     a_31    1.0462    1.0068    -0.0394     

Key Observations

# Sum of first-order indices
sum_first = np.sum(first_order)
print(f"\nSum of first-order indices: {sum_first:.4f}")

# If sum ≈ 1, model is mostly additive (few interactions)
# If sum << 1, strong interactions exist

if sum_first > 0.9:
    print("  → Model is approximately additive (few interactions)")
elif sum_first > 0.7:
    print("  → Moderate interactions present")
else:
    print("  → Strong parameter interactions")

Sum of first-order indices: 11.7096
  → Model is approximately additive (few interactions)

Sensitivity by Parameter Group

Let’s aggregate sensitivity by parameter type:

# Group indices
growth_rates = [0, 1, 2]  # r_1, r_2, r_3
competition_diag = [3, 6, 9]  # a_11, a_22, a_33 (self-competition)
competition_off = [4, 5, 7, 8, 10, 11]  # off-diagonal (inter-species)

group_sensitivity = {
    'Growth rates': np.sum(total_order[growth_rates]),
    'Self-competition': np.sum(total_order[competition_diag]),
    'Inter-competition': np.sum(total_order[competition_off]),
}

print("Aggregated sensitivity by parameter group:")
for group, sens in group_sensitivity.items():
    print(f"  {group}: {sens:.4f}")
Aggregated sensitivity by parameter group:
  Growth rates: 2.9998
  Self-competition: 2.8517
  Inter-competition: 5.6294
from pyapprox.tutorial_figures._sensitivity import plot_group_pie

fig, ax = plt.subplots(figsize=(8, 6))
plot_group_pie(group_sensitivity, ax)
plt.show()
Figure 2: Total-order sensitivity aggregated by parameter group. Growth rates (\(r_i\)) and self-competition (\(a_{ii}\)) dominate; inter-species competition (\(a_{ij}, i \neq j\)) contributes the remainder.

Practical Implications

Based on the sensitivity analysis:

# Find unimportant parameters (S^T < 0.01)
unimportant = np.where(total_order < 0.01)[0]
print(f"\nParameters with negligible influence (S^T < 0.01):")
if len(unimportant) > 0:
    for i in unimportant:
        print(f"  {labels[i]}: S^T = {total_order[i]:.4f}")
else:
    print("  None (all parameters are somewhat important)")

# Find dominant parameters (S_i > 0.1)
dominant = np.where(first_order > 0.1)[0]
print(f"\nDominant parameters (S_i > 0.1):")
if len(dominant) > 0:
    for i in dominant:
        print(f"  {labels[i]}: S_i = {first_order[i]:.4f}")
else:
    print("  None (no single parameter dominates)")

Parameters with negligible influence (S^T < 0.01):
  None (all parameters are somewhat important)

Dominant parameters (S_i > 0.1):
  r_1: S_i = 0.8804
  r_2: S_i = 1.0620
  r_3: S_i = 1.0815
  a_11: S_i = 0.7431
  a_12: S_i = 0.8226
  a_13: S_i = 0.9421
  a_21: S_i = 1.0186
  a_22: S_i = 0.9968
  a_23: S_i = 1.0563
  a_31: S_i = 1.0462
  a_32: S_i = 1.0244
  a_33: S_i = 1.0356

Summary of Sobol Indices

Index Formula Interpretation
\(\Sobol{i}\) \(\frac{\Var_{\theta_i}[\E_{\theta_{\sim i}}[q|\theta_i]]}{\Var_{\params}[q]}\) Direct effect of \(\theta_i\) alone
\(\SobolT{i}\) \(1 - \frac{\Var_{\theta_{\sim i}}[\E_{\theta_i}[q|\theta_{\sim i}]]}{\Var_{\params}[q]}\) Total effect including interactions
\(\SobolT{i} - \Sobol{i}\) - Interaction effects involving \(\theta_i\)

Key Takeaways

  • Sobol indices decompose variance by parameter contribution
  • First-order \(\Sobol{i}\): direct effect of parameter \(i\)
  • Total-order \(\SobolT{i}\): total effect including interactions
  • Sum of \(\Sobol{i}\) indicates how additive the model is
  • Focus uncertainty reduction on high \(\SobolT{i}\) parameters

Exercises

  1. (Easy) Compute Sobol indices for Species 2 final population. How do the important parameters differ from Species 1?

  2. (Medium) Create a convergence plot showing how the Sobol index estimates for the top 3 parameters stabilize as N increases from 100 to 2000.

  3. (Challenge) Compute Sobol indices for the “max” QoI functional (maximum population over time). Compare to the endpoint indices. What changes in the importance ranking?

Next Steps

Continue with: