API Reference
pypolymix splits stochastic surrogate modeling into three composable layers:
- Surrogate models (
pypolymix.surrogate_models): deterministic forward models that expect an input tensor and a tensor of parameters. - Parameter groups (
pypolymix.parameter_groups): variational families with associated priors defined over blocks of surrogate parameters. - Stochastic model (
pypolymix.StochasticModel): glues a surrogate to one or more parameter groups and exposes a familiar PyTorchnn.Module.
Stochastic Model
Wrap any surrogate in the StochasticModel framework and provide a list of
parameter groups whose samples are concatenated before being fed to the surrogate.
import torch
from pypolymix import StochasticModel, parameter_groups, surrogate_models
surrogate = surrogate_models.NeuralNetwork(num_inputs=1, num_outputs=1)
group = parameter_groups.IIDGaussianGroup("nn", surrogate.num_params())
model = StochasticModel(surrogate, [group])
x = torch.linspace(-1, 1, 32).unsqueeze(-1)
y = model(x, num_samples=8) # (8, 32, 1)
loss = y.mean() + model.distribution_loss()
Bases: Module
Wrap a deterministic surrogate model with sampled parameters.
Example
>>> from pypolymix import parameter_groups, surrogate_models
>>> surrogate = surrogate_models.NeuralNetwork(num_inputs=1, num_outputs=1)
>>> groups = [parameter_groups.IIDGaussianGroup("nn", surrogate.num_params())]
>>> model = StochasticModel(surrogate, groups)
>>> x = torch.linspace(-1, 1, 32).unsqueeze(-1)
>>> y = model(x, num_samples=4) # (4, 32, 1)
| Parameters: |
|
|---|
distribution_loss
distribution_loss()
Return the sum of KL/cross-entropy terms provided by every parameter group.
forward
forward(x, num_samples=1)
Evaluate the surrogate under randomly drawn parameters.
| Parameters: |
|
|---|
Example
Evaluate a function at x using 10 random draws of the parameters:
>>> y = model(x, num_samples=10)
>>> y.shape
torch.Size([10, x.shape[0], surrogate_model.num_outputs])
num_params
num_params()
Return the total number of scalar parameters managed across groups.
sample_parameters
sample_parameters(num_samples=1)
Draw parameter samples from each group and concatenate them.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Parameter Groups
Parameter groups describe how parameters are sampled and regularised. They can be mixed (e.g. deterministic biases and stochastic weights) by instantiating multiple groups and passing them to the same stochastic model.
DeterministicGroup
Use when you want point estimates for a parameter block while still leveraging the same interface as the stochastic groups.
Bases: ParameterGroup
Parameter group for deterministic inference.
Use this when optimisation should learn a single point estimate rather than sampling from a posterior approximation.
Example
>>> group = DeterministicGroup("weights", num_params=3)
>>> theta = group.sample_parameters(2)
>>> theta.shape
torch.Size([2, 3])
distribution_loss
distribution_loss()
Return the negative log prior density evaluated at the current point.
sample_parameters
sample_parameters(num_samples=1)
Return the same parameter vector repeated num_samples times.
variational_distribution
variational_distribution()
Raises an error.
IIDGaussianGroup
Independent Normal posterior with per-parameter mean and (log) std that supports reparameterised sampling for variational inference.
Bases: ParameterGroup
I.i.d. Gaussian variational family: q = Normal(mean, std).
Example
>>> group = IIDGaussianGroup("weights", num_params=4)
>>> samples = group.sample_parameters(16)
>>> samples.shape
torch.Size([16, 4])
std
property
std
Positive standard deviation parameter obtained via exp.
sample_parameters
sample_parameters(num_samples=1)
Draw n_samples parameter vectors via rsample for reparameterization.
variational_distribution
variational_distribution()
Return an independent normal distribution over all parameters.
GaussianGroup
Full-covariance Gaussian variational family parameterised by a Cholesky factor, useful when posterior correlations cannot be ignored.
Bases: ParameterGroup
Full-covariance Gaussian variational family with a learnable Cholesky factor.
This is useful when posterior correlations between parameters are important.
Example
>>> from pypolymix.parameter_groups import GaussianGroup
>>> group = GaussianGroup("weights", num_params=2)
>>> group.variational_distribution().rsample().shape
torch.Size([2])
sample_parameters
sample_parameters(num_samples=1)
Draw n_samples reparameterized samples from the full-covariance Gaussian.
variational_distribution
variational_distribution()
Return torch.distributions.MultivariateNormal with scale_tril.
LowRankGaussianGroup
Low-rank plus diagonal Gaussian family that captures the largest correlations
with a configurable rank while staying closer to O(d) memory.
Bases: ParameterGroup
Gaussian family with a low-rank plus diagonal covariance approximation.
The covariance matrix is parameterized as U U^T + diag(d) with rank(U)
controlled by rank. This captures the dominant correlations without the
O(d^2) parameters and compute cost of a full Cholesky factor.
Example
>>> group = LowRankGaussianGroup("weights", num_params=8, rank=3)
>>> samples = group.sample_parameters(4)
>>> samples.shape
torch.Size([4, 8])
LangevinGroup
Implicit posterior sampler driven by unadjusted Langevin dynamics:
theta <- theta + step_size * score(theta) + sqrt(2 * step_size) * noise.
LangevinGroup keeps the same ParameterGroup interface and can be mixed with
the other groups inside StochasticModel.
The score model is passed in as any SurrogateModel satisfying:
score_model.num_inputs == num_paramsscore_model.num_outputs == num_params
NeuralNetwork is the most common choice for this role.
Bases: ParameterGroup
Parameter group based on unadjusted Langevin dynamics.
The group learns a score model s(theta) and generates samples by iterating:
theta_{k+1} = theta_k + step_size * s(theta_k) + sqrt(2 * step_size) * xi_k
where xi_k ~ Normal(0, I).
Example
>>> from pypolymix.surrogate_models import NeuralNetwork
>>> score_model = NeuralNetwork(num_inputs=6, num_outputs=6, width=16, depth=2)
>>> group = LangevinGroup("coeffs", num_params=6, score_model=score_model)
>>> samples = group.sample_parameters(num_samples=8)
>>> samples.shape
torch.Size([8, 6])
distribution_loss
distribution_loss()
Monte Carlo estimate of -E_q[log p(theta)] under recent particles.
sample_parameters
sample_parameters(num_samples=None)
Draw parameter samples by running Langevin dynamics.
variational_distribution
variational_distribution()
Langevin sampling defines an implicit posterior, not an analytic distribution.
Priors
All parameter groups accept a Prior object that
creates a torch.distributions.Distribution on demand. Priors can therefore
share learnable buffers or be reused across groups.
IIDGaussianPrior
Bases: Prior
Independent Gaussian prior with per-parameter mean and standard deviation.
Example
prior = IIDGaussianPrior(mean=0.0, std=0.5) prior.distribution(torch.Size([4]), None, None).sample().shape torch.Size([4])
distribution
distribution(event_shape, device, dtype)
Return Independent(N(mean, std), 1) with broadcasted parameters.
GaussianPrior
Bases: Prior
Full-covariance Gaussian prior :math:\mathcal{N}(\mu, \Sigma).
Users must provide either covariance_matrix or scale_tril when
instantiating the prior; the other argument should be None.
Example
>>> mean = torch.zeros(2)
>>> cov = torch.eye(2)
>>> prior = GaussianPrior(mean, covariance_matrix=cov)
>>> isinstance(prior.distribution(torch.Size([2]), None, None), td.MultivariateNormal)
True
distribution
distribution(event_shape, device, dtype)
Validate tensor shapes and build a multivariate normal distribution.
LaplacePrior
Bases: Prior
IID Laplace prior that encourages sparsity.
Example
>>> prior = LaplacePrior(loc=0.0, scale=1e-1)
>>> prior.distribution(torch.Size([3]), None, None).sample().shape
torch.Size([3])
distribution
distribution(event_shape, device, dtype)
Return Independent(Laplace(loc, scale), 1) with broadcasted params.
Surrogate Models
Surrogates implement the deterministic mapping from (x, params) to outputs.
They are ordinary PyTorch modules, but operate on batched parameter samples.
NeuralNetwork
Fully-connected MLP whose weights/biases are supplied dynamically via sampled parameters.
Bases: SurrogateModel
Neural network driven by sampled parameters.
Example
>>> surrogate = NeuralNetwork(num_inputs=2, num_outputs=1, width=8, depth=2)
>>> surrogate.num_params()
105
>>> params = torch.randn(3, surrogate.num_params())
>>> x = torch.randn(5, 2)
>>> surrogate(x, params).shape
torch.Size([3, 5, 1])
| Parameters: |
|
|---|
forward
forward(x, params)
Evaluate the neural network for multiple parameter samples in parallel.
| Parameters: |
|
|---|
| Returns: |
|
|---|
num_params
num_params()
Return the number of scalar parameters implied by the architecture.
PolynomialChaosExpansion
Legendre polynomial chaos expansion with configurable dimension, degree, and number of outputs.
Bases: SurrogateModel
Polynomial chaos expansion with Legendre basis.
Example
>>> surrogate = PolynomialChaosExpansion(num_inputs=1, degree=2)
>>> params = torch.randn(5, surrogate.num_params())
>>> x = torch.linspace(-1, 1, 20).unsqueeze(-1)
>>> surrogate(x, params).shape
torch.Size([5, 20, 1])
num_terms
property
num_terms
Calculate the number of terms in the total order polynomial expansion.
forward
forward(x, params)
Evaluate the polynomial chaos expansion.
| Parameters: |
|
|---|
| Returns: |
|
|---|
num_params
num_params()
Return num_terms * num_outputs.
Mixture Components
The mixture module contains both the gating network and the full Mixture-of-Experts surrogate, enabling scalable ensembles driven by sampled parameters.
GatingNetwork
Bases: SurrogateModel
Gating network that outputs mixture weights via softmax.
Example
>>> gating = GatingNetwork(num_inputs=1, num_experts=3, width=8)
>>> params = torch.randn(2, gating.num_params())
>>> x = torch.zeros(4, 1)
>>> gating(x, params).shape
torch.Size([2, 4, 3])
forward
forward(x, params)
Compute mixture weights.
| Parameters: |
|
|---|
| Returns: |
|
|---|
MixtureOfExperts
Bases: SurrogateModel
Mixture of Experts surrogate model compatible with stochastic parameter sampling.
Example
>>> from pypolymix.surrogate_models import NeuralNetwork
>>> experts = [NeuralNetwork(num_inputs=1, num_outputs=1, width=4) for _ in range(2)]
>>> gating = GatingNetwork(num_inputs=1, num_experts=len(experts))
>>> moe = MixtureOfExperts(experts, gating)
>>> params = torch.randn(3, moe.num_params())
>>> x = torch.randn(5, 1)
>>> moe(x, params).shape
torch.Size([3, 5, 1])
| Parameters: |
|
|---|
forward
forward(x, params)
Vectorized forward pass through all experts and the gating network.
| Parameters: |
|
|---|
| Returns: |
|
|---|
get_expert_outputs
get_expert_outputs(x, params)
Evaluate each expert with the parameter slice assigned to it.
| Returns: |
|
|---|
get_gating_weights
get_gating_weights(x, params)
Compute the mixture weights produced by the gating network.
num_params
num_params()
Total number of scalar parameters across all experts and gating network.