Developer Reference Guide

Drivers

src.prime_run.main(setupfile)[source]

Driver script to run MCMC for parameter inference for a multi-wave epidemic model. Currently limited to up to three infection curves.

To run this script:

python <path-to-this-directory>/prime_run.py <name-of-json-input-file>

Parameters
setupfile: string

json format input file with information on observations data, filtering options, MCMC options, and postprocessing options. See “setup_template.json” for a detailed example

src.prime_plot_data.main(setupfile)[source]

Plot raw and filtered data for the region specified in the setupfile.

Parameters
setupfile: string

json file (.json) including the region name. The “regionname.dat” should exist in the path accessible for this script

src.prime_plotKDE.main(filename)[source]

Plots 1D and 2D marginal kernel density estimates based on MCMC samples

Parameters
filename: string

json file (.json) including run setup information and postprocessing information for an MCMC run. It should specify the name of the file containing the MCMC chain

or

pickle file (.pkl) with a dictionary containing the KDE distributions.This file is generated by running this script with a json file (see above)

src.prime_compute_info_criteria.main(setupfile)[source]

This script postprocesses data from PRIME to compute statistical information including: - AIC: Akaike Information Criterion - BIC: Bayesian Information Criterion - CPRS: Continuous Rank Probability Score Results are saved in “info_criteria.txt”

Parameters
setupfile: string

json file (.json) including run setup information and postprocessing information for an MCMC run. It should specify the name of the file containing the MCMC chain

src.prime_compute_distance_correlation.main(setupfile)[source]

Computes and saves distance correlations based on samples. The distance correlation matrix is saved in “distanceCorr.txt”

Parameters
setupfile: string

json file (.json) including run setup information and postprocessing information for an MCMC run. It should specify the name of the file containing the MCMC chain

Epidemiological Model

src.prime_model.modelPred(state, params, is_cdf=False)[source]

Evaluates the PRIME model for a set of model parameters; specific model settings (e.g. date range, other control knobs, etc) are specified via the "params" dictionary

Parameters
state: python list or numpy array

model parameters

params: dictionary

detailed settings for the epidemiological model

is_cdf: boolean (optional, default False)

estimate the epidemiological curve based on the CDF of the incubation model (True) or via the formulation that employs the PDF of the icubation model (False)

Returns
Ncases: numpy array

daily counts for people turning symptomatic

src.prime_infection.infection(state, params)[source]
Compute infection curve for multi-wave epidemics
  • this function is currently used by the post-processing script to push-forward the posterior into a set of infection curves that are consistent with the observed cases

Parameters
state: python list or numpy array

model parameters

params: dictionary

detailed settings for the epidemiological model

Returns
dates: numpy array

list of dates for which the infection rates were computed

infectons: numpy array

infection rate values corresponding to the list of dates

src.prime_infection.infection_rate(time, qshape, qscale, inftype)[source]

Infection rate (gamma or log-normal distribution)

Parameters
time: float, list, or numpy array

instances in time for the evaluation of the infection_rate model

qshape: float

shape parameter

qscale: float

scale parameter

inftype: string

infection rate type (“gamma” for Gamma distribution, otherwise the Log-normal distribution)

Returns
vals: numpy array

infection rates corresponding to the time values provided as input parameters

src.prime_incubation.incubation_fcn(time, incubation_median, incubation_sigma, is_cdf=False)[source]

Computes the incubation rate

Parameters
time: float, list, or numpy array

instances in time for the evaluation of the incubation rate model

incubation_median: float

median of the incubation rate model

incubation_sigma: float

standard deviation of the incubation rate model

is_cdf: boolean (optional, default False)

select either the CDF of the incubation rate model (True) or its PDF (False)

Returns
vals: numpy array

incubation rates corresponding to the time values provided as input parameters

Bayesian Inference

src.prime_posterior.logpost(state, params)[source]

Compute log-posterior density values; this function assumes the likelihood is a product of independent Gaussian distributions

Parameters
state: python list or numpy array

model parameters

params: dictionary

detailed settings for the epidemiological model

Returns
llik: float

natural logarithm of the likelihood density

lpri: float

natural logarithm of the prior density

src.prime_posterior.logpost_negb(state, params)[source]

Compute log-posterior density values; this function assumes the likelihood is a product of negative-binomial distributions

Parameters
state: python list or numpy array

model parameters

params: dictionary

detailed settings for the epidemiological model

Returns
llik: float

natural logarithm of the likelihood density

lpri: float

natural logarithm of the prior density

src.prime_posterior.logpost_poisson(state, params)[source]

Compute log-posterior density values; this function assumes the likelihood is a product of poisson distributions

Parameters
state: python list or numpy array

model parameters

params: dictionary

detailed settings for the epidemiological model

Returns
llik: float

natural logarithm of the likelihood density

lpri: float

natural logarithm of the prior density

src.prime_mcmc.ammcmc(opts, cini, likTpr, lpinfo)[source]

Adaptive Metropolis Markov Chain Monte Carlo

Parameters
optsdictionary of parameters
  • nsteps : no. of mcmc steps

  • nburn : no. of mcmc steps for burn-in (proposal fixed to initial covariance)

  • nadapt : adapt every nadapt steps after nburn

  • nfinal : stop adapting after nfinal steps

  • inicov : initial covariance

  • coveps : small additive factor to ensure covariance matrix is positive definite (only added to diagonal if covariance matrix is singular without it)

  • burnsc : factor to scale up/down proposal if acceptance rate is too high/low

  • gamma : factor to multiply proposed jump size with in the chain past the burn-in phase (Reduce this factor to get a higher acceptance rate. Defaults to 1.0)

  • spllo : lower bounds for chain samples

  • splhi : upper bounds for chain samples

  • rnseed : Optional seed for random number generator (needs to be integer >= 0) If not specified, then random number seed is not fixed and every chain will be different.

  • tmpchn : Optional; if present, will save chain state every ‘ofreq’ to ascii file. Filename is randomly generated if tmpchn is set to ‘tmpchn’, or set to the string passed through this option if not present, chain states are not saved during the MCMC progress

cinistarting mcmc state
likTprlog-posterior function; it takes two input parameters as follows
  • first parameter is a 1D array containing the chain state at which the posterior will to be evaluated

  • the second parameter contains settings the user can pass to this function; see below info for ‘lpinfo’

  • this function is expected to return log-Likelihood and log-Prior values (in this order)

lpinfoinfo to be passed to the log-posterior function

this object can be of any type (e.g. None, scalar, list, array, dictionary, etc) as long as it is consistent with settings expected inside the ‘likTpr’ function

Returns
mcmcRes: results dictionary
  • ‘chain’ : chain samples (nsteps x chain dimension)

  • ‘cmap’ : MAP estimate

  • ‘pmap’ : MAP log posterior

  • ‘accr’ : overall acceptance rate

  • ‘accb’ : fraction of samples inside bounds

  • ‘rejAll’ : overall no. of samples rejected

  • rejOut’ : no. of samples rejected due to being outside bounds

  • ‘minfo’ : meta_info, acceptance probability, log likelihood, log prior

  • ‘final_cov’ : the covariance matrix at the end of the run

Statistical Utilities

src.prime_stats.computeAICandBIC(run_setup, verbose=0)[source]

Compute Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)

Parameters
run_setup: dictionary with run settings; see the Examples section in the manual
Returns
AIC: float
BIC: float
src.prime_stats.computeCRPS(run_setup)[source]

Compute Continuous Rank Predictive Score (CRPS)

Parameters
run_setup: dictionary with run settings; see the Examples section in the manual
Returns
CRPS: float
src.prime_stats.distcorr(spl)[source]

Compute distance correlation between random vectors

Parameters
spl: numpy array [number of samples x number of variables]

first dimension is the number of samples, second dimension is the number of random vectors

Returns
Returns a 2D array of distance correlations between pairs of random vectors;

only entries 0<=j<i<no. of random vectors are populated

References:

http://en.wikipedia.org/wiki/Distance_correlation

src.prime_stats.getKDE(spl, nskip=0, nthin=1, npts=100, bwfac=1.0)[source]

Compute 1D and 2D marginal PDFs via Kernel Density Estimate

Parameters
spl: numpy array

MCMC chain [number of samples x number of parameters]

nskip: int

number of initial samples to skip when sampling the MCMC chain

nthin: int

use every ‘nthin’ samples

npts: int

number of grid points

bwfac: double

bandwidth factor

Returns
dict: dictionary with results

‘x1D’: list of numpy arrays with grids for the 1D PDFs; ‘p1D’: list of numpy arrays with 1D PDFs; ‘x2D’: list of numpy arrays of x-axis grids for the 2D PDFs; ‘y2D’: list of numpy arrays of y-axis grids for the 2D PDFs; ‘p2D’: list of numpy arrays containing 2D PDFs

General Utilities

src.prime_utils.compute_error_weight(error_info, days)[source]

Compute array with specified weighting for the daily cases data. The weights follow either linear of Gaussian expressions with higher weights for recent data and lower weights for older data

Parameters
error_info: list

(error_type,min_wgt,[tau]), error type is either ‘linear’ or ‘gaussian’, min_wgt is the minimum weight and tau is the standard deviation of the exponential term if a Gaussian formulation is chosen.

days: int

lenght of the weights array

Returns
——-
error_weight: numpy array

array of weights

src.prime_utils.prediction_filename(run_setup)[source]

Generate informative name for hdf5 file with prediction data

Parameters
run_setup: dictionary

detailed settings for the epidemiological model

Returns
filename: string

file name ending with a .h5 extension

src.prime_utils.runningAvg(f, nDays)[source]

Apply nDays running average to the input f

Parameters
f: numpy array

array (with daily data for this project) to by filtered

nDays: int

window width for the running average

Returns
favg: numpy array

filtered data