API Reference

pypolymix splits stochastic surrogate modeling into three composable layers:

  1. Surrogate models (pypolymix.surrogate_models): deterministic forward models that expect an input tensor and a tensor of parameters.
  2. Parameter groups (pypolymix.parameter_groups): variational families with associated priors defined over blocks of surrogate parameters.
  3. Stochastic model (pypolymix.StochasticModel): glues a surrogate to one or more parameter groups and exposes a familiar PyTorch nn.Module.

Stochastic Model

Wrap any surrogate in the StochasticModel framework and provide a list of parameter groups whose samples are concatenated before being fed to the surrogate.

import torch
from pypolymix import StochasticModel, parameter_groups, surrogate_models

surrogate = surrogate_models.NeuralNetwork(num_inputs=1, num_outputs=1)
group = parameter_groups.IIDGaussianGroup("nn", surrogate.num_params())
model = StochasticModel(surrogate, [group])

x = torch.linspace(-1, 1, 32).unsqueeze(-1)
y = model(x, num_samples=8)  # (8, 32, 1)
loss = y.mean() + model.distribution_loss()

Bases: Module

Wrap a deterministic surrogate model with sampled parameters.

Example
>>> from pypolymix import parameter_groups, surrogate_models
>>> surrogate = surrogate_models.NeuralNetwork(num_inputs=1, num_outputs=1)
>>> groups = [parameter_groups.IIDGaussianGroup("nn", surrogate.num_params())]
>>> model = StochasticModel(surrogate, groups)
>>> x = torch.linspace(-1, 1, 32).unsqueeze(-1)
>>> y = model(x, num_samples=4)  # (4, 32, 1)
Parameters:
  • surrogate_model (SurrogateModel) –

    Base surrogate that consumes input x and sampled parameters.

  • parameter_groups (Union[ParameterGroup, Iterable[ParameterGroup]]) –

    Parameter-group module or iterable of modules whose draws are concatenated to produce the full parameter vector expected by surrogate_model.

distribution_loss

distribution_loss()

Return the sum of KL/cross-entropy terms provided by every parameter group.

forward

forward(x, num_samples=1)

Evaluate the surrogate under randomly drawn parameters.

Parameters:
  • x (Tensor) –

    Input tensor passed to the surrogate model.

  • num_samples (int, default: 1 ) –

    Number of parameter draws to average/evaluate over.

Example

Evaluate a function at x using 10 random draws of the parameters:

>>> y = model(x, num_samples=10)
>>> y.shape
torch.Size([10, x.shape[0], surrogate_model.num_outputs])

num_params

num_params()

Return the total number of scalar parameters managed across groups.

sample_parameters

sample_parameters(num_samples=1)

Draw parameter samples from each group and concatenate them.

Parameters:
  • num_samples (int, default: 1 ) –

    Number of Monte Carlo samples per parameter group.

Returns:
  • Tensor

    Tensor with shape (num_samples, surrogate_model.num_params()).

Parameter Groups

Parameter groups describe how parameters are sampled and regularised. They can be mixed (e.g. deterministic biases and stochastic weights) by instantiating multiple groups and passing them to the same stochastic model.

DeterministicGroup

Use when you want point estimates for a parameter block while still leveraging the same interface as the stochastic groups.

Bases: ParameterGroup

Parameter group for deterministic inference.

Use this when optimisation should learn a single point estimate rather than sampling from a posterior approximation.

Example
>>> group = DeterministicGroup("weights", num_params=3)
>>> theta = group.sample_parameters(2)
>>> theta.shape
torch.Size([2, 3])

distribution_loss

distribution_loss()

Return the negative log prior density evaluated at the current point.

sample_parameters

sample_parameters(num_samples=1)

Return the same parameter vector repeated num_samples times.

variational_distribution

variational_distribution()

Raises an error.

IIDGaussianGroup

Independent Normal posterior with per-parameter mean and (log) std that supports reparameterised sampling for variational inference.

Bases: ParameterGroup

I.i.d. Gaussian variational family: q = Normal(mean, std).

Example
>>> group = IIDGaussianGroup("weights", num_params=4)
>>> samples = group.sample_parameters(16)
>>> samples.shape
torch.Size([16, 4])

std property

std

Positive standard deviation parameter obtained via exp.

sample_parameters

sample_parameters(num_samples=1)

Draw n_samples parameter vectors via rsample for reparameterization.

variational_distribution

variational_distribution()

Return an independent normal distribution over all parameters.

GaussianGroup

Full-covariance Gaussian variational family parameterised by a Cholesky factor, useful when posterior correlations cannot be ignored.

Bases: ParameterGroup

Full-covariance Gaussian variational family with a learnable Cholesky factor.

This is useful when posterior correlations between parameters are important.

Example
>>> from pypolymix.parameter_groups import GaussianGroup
>>> group = GaussianGroup("weights", num_params=2)
>>> group.variational_distribution().rsample().shape
torch.Size([2])

sample_parameters

sample_parameters(num_samples=1)

Draw n_samples reparameterized samples from the full-covariance Gaussian.

variational_distribution

variational_distribution()

Return torch.distributions.MultivariateNormal with scale_tril.

LowRankGaussianGroup

Low-rank plus diagonal Gaussian family that captures the largest correlations with a configurable rank while staying closer to O(d) memory.

Bases: ParameterGroup

Gaussian family with a low-rank plus diagonal covariance approximation.

The covariance matrix is parameterized as U U^T + diag(d) with rank(U) controlled by rank. This captures the dominant correlations without the O(d^2) parameters and compute cost of a full Cholesky factor.

Example
>>> group = LowRankGaussianGroup("weights", num_params=8, rank=3)
>>> samples = group.sample_parameters(4)
>>> samples.shape
torch.Size([4, 8])

LangevinGroup

Implicit posterior sampler driven by unadjusted Langevin dynamics: theta <- theta + step_size * score(theta) + sqrt(2 * step_size) * noise.

LangevinGroup keeps the same ParameterGroup interface and can be mixed with the other groups inside StochasticModel.

The score model is passed in as any SurrogateModel satisfying:

  • score_model.num_inputs == num_params
  • score_model.num_outputs == num_params

NeuralNetwork is the most common choice for this role.

Bases: ParameterGroup

Parameter group based on unadjusted Langevin dynamics.

The group learns a score model s(theta) and generates samples by iterating:

theta_{k+1} = theta_k + step_size * s(theta_k) + sqrt(2 * step_size) * xi_k

where xi_k ~ Normal(0, I).

Example
>>> from pypolymix.surrogate_models import NeuralNetwork
>>> score_model = NeuralNetwork(num_inputs=6, num_outputs=6, width=16, depth=2)
>>> group = LangevinGroup("coeffs", num_params=6, score_model=score_model)
>>> samples = group.sample_parameters(num_samples=8)
>>> samples.shape
torch.Size([8, 6])

distribution_loss

distribution_loss()

Monte Carlo estimate of -E_q[log p(theta)] under recent particles.

sample_parameters

sample_parameters(num_samples=None)

Draw parameter samples by running Langevin dynamics.

variational_distribution

variational_distribution()

Langevin sampling defines an implicit posterior, not an analytic distribution.

Priors

All parameter groups accept a Prior object that creates a torch.distributions.Distribution on demand. Priors can therefore share learnable buffers or be reused across groups.

IIDGaussianPrior

Bases: Prior

Independent Gaussian prior with per-parameter mean and standard deviation.

Example

prior = IIDGaussianPrior(mean=0.0, std=0.5) prior.distribution(torch.Size([4]), None, None).sample().shape torch.Size([4])

distribution

distribution(event_shape, device, dtype)

Return Independent(N(mean, std), 1) with broadcasted parameters.

GaussianPrior

Bases: Prior

Full-covariance Gaussian prior :math:\mathcal{N}(\mu, \Sigma).

Users must provide either covariance_matrix or scale_tril when instantiating the prior; the other argument should be None.

Example
>>> mean = torch.zeros(2)
>>> cov = torch.eye(2)
>>> prior = GaussianPrior(mean, covariance_matrix=cov)
>>> isinstance(prior.distribution(torch.Size([2]), None, None), td.MultivariateNormal)
True

distribution

distribution(event_shape, device, dtype)

Validate tensor shapes and build a multivariate normal distribution.

LaplacePrior

Bases: Prior

IID Laplace prior that encourages sparsity.

Example
>>> prior = LaplacePrior(loc=0.0, scale=1e-1)
>>> prior.distribution(torch.Size([3]), None, None).sample().shape
torch.Size([3])

distribution

distribution(event_shape, device, dtype)

Return Independent(Laplace(loc, scale), 1) with broadcasted params.

Surrogate Models

Surrogates implement the deterministic mapping from (x, params) to outputs. They are ordinary PyTorch modules, but operate on batched parameter samples.

NeuralNetwork

Fully-connected MLP whose weights/biases are supplied dynamically via sampled parameters.

Bases: SurrogateModel

Neural network driven by sampled parameters.

Example
>>> surrogate = NeuralNetwork(num_inputs=2, num_outputs=1, width=8, depth=2)
>>> surrogate.num_params()
105
>>> params = torch.randn(3, surrogate.num_params())
>>> x = torch.randn(5, 2)
>>> surrogate(x, params).shape
torch.Size([3, 5, 1])
Parameters:
  • num_inputs (int) –

    Dimensionality of x.

  • num_outputs (int, default: 1 ) –

    Number of response dimensions.

  • width (int, default: 16 ) –

    Hidden-layer width.

  • depth (int, default: 1 ) –

    Number of hidden layers.

  • activation (Callable, default: relu ) –

    Callable applied after each hidden linear block.

forward

forward(x, params)

Evaluate the neural network for multiple parameter samples in parallel.

Parameters:
  • x (Tensor) –

    Tensor of shape (batch_size, num_inputs)

  • params (Tensor) –

    Tensor of shape (num_samples, num_params)

Returns:
  • y( Tensor ) –

    Tensor of shape (num_samples, batch_size, num_outputs)

num_params

num_params()

Return the number of scalar parameters implied by the architecture.

PolynomialChaosExpansion

Legendre polynomial chaos expansion with configurable dimension, degree, and number of outputs.

Bases: SurrogateModel

Polynomial chaos expansion with Legendre basis.

Example
>>> surrogate = PolynomialChaosExpansion(num_inputs=1, degree=2)
>>> params = torch.randn(5, surrogate.num_params())
>>> x = torch.linspace(-1, 1, 20).unsqueeze(-1)
>>> surrogate(x, params).shape
torch.Size([5, 20, 1])

num_terms property

num_terms

Calculate the number of terms in the total order polynomial expansion.

forward

forward(x, params)

Evaluate the polynomial chaos expansion.

Parameters:
  • x (Tensor) –

    Sample points of shape (batch_size, num_inputs)

  • params (Tensor) –

    Coefficient tensor of shape (num_samples, num_params)

Returns:
  • y( Tensor ) –

    PCE evaluations (num_samples, batch_size, num_outputs)

num_params

num_params()

Return num_terms * num_outputs.

Mixture Components

The mixture module contains both the gating network and the full Mixture-of-Experts surrogate, enabling scalable ensembles driven by sampled parameters.

GatingNetwork

Bases: SurrogateModel

Gating network that outputs mixture weights via softmax.

Example
>>> gating = GatingNetwork(num_inputs=1, num_experts=3, width=8)
>>> params = torch.randn(2, gating.num_params())
>>> x = torch.zeros(4, 1)
>>> gating(x, params).shape
torch.Size([2, 4, 3])

forward

forward(x, params)

Compute mixture weights.

Parameters:
  • x (Tensor) –

    (batch_size, num_inputs)

  • params (Tensor) –

    (num_samples, num_params)

Returns:
  • Tensor

    Gating weights: (num_samples, batch_size, num_experts)

MixtureOfExperts

Bases: SurrogateModel

Mixture of Experts surrogate model compatible with stochastic parameter sampling.

Example
>>> from pypolymix.surrogate_models import NeuralNetwork
>>> experts = [NeuralNetwork(num_inputs=1, num_outputs=1, width=4) for _ in range(2)]
>>> gating = GatingNetwork(num_inputs=1, num_experts=len(experts))
>>> moe = MixtureOfExperts(experts, gating)
>>> params = torch.randn(3, moe.num_params())
>>> x = torch.randn(5, 1)
>>> moe(x, params).shape
torch.Size([3, 5, 1])
Parameters:
  • experts (List[SurrogateModel]) –

    List of expert surrogate models.

  • gating_network (SurrogateModel) –

    A surrogate model returning mixture weights.

forward

forward(x, params)

Vectorized forward pass through all experts and the gating network.

Parameters:
  • x (Tensor) –

    (batch_size, num_inputs)

  • params (Tensor) –

    (num_samples, total_params)

Returns:
  • Tensor

    Tensor of shape (num_samples, batch_size, num_outputs) containing

  • Tensor

    the expert ensemble prediction for each draw of params.

get_expert_outputs

get_expert_outputs(x, params)

Evaluate each expert with the parameter slice assigned to it.

Returns:
  • Tensor

    Tensor with shape (num_samples, num_experts, batch_size, num_outputs).

get_gating_weights

get_gating_weights(x, params)

Compute the mixture weights produced by the gating network.

num_params

num_params()

Total number of scalar parameters across all experts and gating network.