3. Bayesian Framework

Given data, \({\boldsymbol{y}}\), in the form of time-series of daily counts and the model predictions \({\boldsymbol{n}}\) for the number of new symptomatic counts daily for the same time range, we employ a Bayesian framework to calibrate the epidemiological model parameters. The discrepancy between the data and the model is written as

\begin{equation} \boldsymbol{y} = \boldsymbol{n}(\Theta)+\epsilon \label{eq:discrepancy} \end{equation}

Here, \(\Theta\) is a vector of model parameters, and \({\epsilon}\) represents the statistical discrepancy between the model and the data. The elements of \(\Theta\) depend on either the number of epidemic waves being modeled or on the number of adjacent regions. In the current version the framework can either model multiple epidemic waves for one region or forecast the dynamics of one wave for multiple regions

The error model encapsulates, in this context, both errors in the observations as well as errors due to imperfect modeling choices. The observation errors include variations due to testing capabilities as well as errors when tests are interpreted. Values for the vector of parameters \(\Theta\) is estimated in the form of a multivariate PDF via Bayes theorem

\begin{equation} p(\Theta\vert \boldsymbol{y})\propto p(\boldsymbol{y}\vert\Theta) p(\Theta) \label{eq:bayes} \end{equation}

where \(p(\Theta\vert\boldsymbol{y})\) is the posterior distribution we are seeking after observing the data \({\boldsymbol{y}}\), \(p(\boldsymbol{y}\vert\Theta)\) is the likelihood of observing the data \({\boldsymbol{y}}\) for a particular set of values for the model parameters \(\Theta\), and \(p(\Theta)\) encapsulates any prior information available for the model parameters. Bayesian methods are well-suited for dealing with heterogeneous sources of uncertainty, in this case from our modeling assumptions, i.e. model and parametric uncertainties, as well as the communicated daily counts of COVID-19 new cases, i.e. experimental errors.

3.1. Multiple Epidemic Waves

For the single region/multiple epidemic waves case, \(\Theta\) can be represented as

\begin{equation} \Theta=\Theta^{(1)}\cup\Theta^{(2)}\cup\ldots\cup\Theta^{(K)}\cup\Theta^{(\epsilon)} \label{eq:param_vec_def} \end{equation}

where \(\Theta^{(j)}\) are the parameter for the j-th wave of infections, K is the number of waves and \(\Theta^{(\epsilon)}\) are parameters for the error model. The parameters for each wave are given by

\begin{equation} \Theta^{(j)}=\{\Delta t_j,N_j,k_j,\theta_j\} \end{equation}

For the first epidemic wave \(\Delta t_1=t_0\). The error model consists of two componets, i.e. \(\Theta^{(\epsilon)}=\{\sigma_a,\sigma_m\}\).

3.2. Multiple Regions

For the single region/multiple epidemic waves case, \(\Theta\) can be similarly represented as a concatenated vectors of parameters as above where \(\Theta^{(j)}\) now represents the set of parameters for region j. The component \(\Theta^{(\epsilon)}\) is now given by

\begin{equation} \Theta^{(\epsilon)}=\{\tau_{\Phi}^2,\lambda,\sigma_a,\sigma_m\} \end{equation}

where \(\tau_{\Phi}^2\) and \(\lambda\) are components of the correlation model between adjacent regions.

3.3. Likelihood Construction

The library provides options for both deterministic and stochastic formulations for the incubation model. In the former case the mean and standard deviation of the incubation model are fixed at their nominal values and the model prediction \(n_i\) for day \(t_i\) is a scalar value that depends on \(\Theta\) only. In the latter case, the incubation model is stochastic with mean and standard deviation of its natural logarithm treated as Student’s t and \(\chi^2\) random variables, respectively, as discussed in Incubation Rate Model Section. Let us denote the underlying independent random variables by \({\boldsymbol \xi}=\{\xi_\mu,\xi_\sigma,\}\). The model prediction \(n_j({\boldsymbol{\xi}})\) for day j is now a random variable induced by \({\boldsymbol \xi}\) plugged into Single Wave Model or Multi-wave Model and \({\boldsymbol n}({\boldsymbol\xi})\) is a random vector.

3.3.1. Deterministic Incubation Model

PRIME provides options for both Gaussian and negative binomial formulations for the statistical discrepancy \(\epsilon\) between \({\boldsymbol n}\) and \({\boldsymbol y}\).

3.3.1.1. Gaussian distribution

In the first approach we assume \(\epsilon\) has a zero-mean Multivariate Normal (MVN) distribution. Given the sparsity of data, correlations across time are currently neglected. For the single region case the liklihood \(p({\boldsymbol y}\vert\Theta)\) is computed as

\begin{equation} p({\boldsymbol y}\vert\Theta)=\prod_{i=1}^D\pi_{n_i(\Theta)}(y_i) =(2\pi)^{-D/2}\prod_{i=1}^D \sigma_i^{-1} \exp\left(-\frac{(y_i-n_i)^2}{2\sigma_i^2}\right) \label{eq:deticg} \end{equation}

with

\begin{equation} \sigma_i=\sigma_a+\sigma_m\, n_i(\Theta) \end{equation}

The additive, \(\sigma_a\), and multiplicative, \(\sigma_a\), components of the error model \(\Theta^{(\epsilon)}=\{\sigma_a,\sigma_a\}\) will be inferred jointly with the model parameters. In practive, PRIME infers the logarithm of these parameters to ensure they remain positive.

For the multiple region case we will assume that the discrepancies are independent over the temporal axis and correlated in space, i.e.

\begin{equation} p({\boldsymbol y}\vert\Theta)=\prod_{i=1}^{D}\frac{1}{(2\pi)^{N_r/2} \mathrm{det}(\Sigma_i^{1/2})}\exp\left(-\frac{1}{2}({\boldsymbol y}_i-{\boldsymbol n}_i) \Sigma_i^{-1}({\boldsymbol y}_i-{\boldsymbol n}_i)^T\right) \label{eq:lik} \end{equation}

Here \(\Sigma_i\) is the block in the large covariance matrix (that spans over \(D\) days of observations) that corresponds to the predictions for day i. Per the BYM model, we will model the discrepancy \(({\boldsymbol y}_i-{\boldsymbol n}_i)={\boldsymbol\epsilon}_i\) with two components i.e., \({\boldsymbol\epsilon}_i={\boldsymbol\epsilon}_{i,1}+{\boldsymbol\epsilon}_{i,2}\). The first component will be modeled with a pCAR to capture spatial auto-correlation. The second component defined as for the single region model captures random, temporally independent, reporting errors and any model shortcomings. The covariance matrix is given by [Cressie2008]

\begin{equation} \Sigma_i = P^{-1}+\mathrm{diag}\left(\sigma_a+\sigma_m Y^{(p)}_i\right)^2, \label{eqn:covmat} \end{equation}

where \(P\) is the precision matrix associated with the Gaussian Markov Random Field (GMRF) model assumed to account for the spatial correlations between adjacent regions (a proper Conditional Auto-Regressive (pCAR) model [MacNab2022]). The precision matrix is defined as

\begin{equation} P=\frac{1}{\tau_{\phi}^2}\left(\mathrm{diag}\{g_1,g_2,\ldots,g_{N_r}\}-\lambda_{\phi} W\right) \end{equation}

Here, \(g_j\) is the number of regions adjacent to region j, and W is a matrix that encodes the relative topology of the regions considered in the joint inference, with entries defined as

\begin{equation} w_{jj}=0\,\textrm{and}\, w_{jk}=\begin{cases} 1 & \textrm{if regions j and k are adjacent,}\\ 0 & \textrm{otherwise.} \end{cases} \end{equation}

Thus P defines a pCAR spatial model with row-standardisation and is a function of the “spatial coefficients” \(\{\tau_{\phi}^2, \lambda_{\phi}\}\), which will also have to be estimated from the data. The dimensionality of the inverse problem scales with \(N_{R}\) and is limited by the scalability of the inversion method. We will use \(N_{R}\leq 3\) and consider inferences using the following setups:

two adjacent counties (i.e., \(N_{R}=2\)), i.e. Bernalillo & Santa Fe and Bernalillo & Valencia. For these cases the covariance matrix \(P^{-1}\) corresponding to the GMRF model is given by

\begin{equation} P^{-1} = \frac{\tau_{\phi}^2}{1-\lambda_{\phi}^2} \begin{bmatrix} 1 & \lambda_{\phi} \\ \lambda_{\phi} & 1 \\ \end{bmatrix} \end{equation}

three counties (i.e., \(N_{R}=3\)), Bernalillo, Santa Fe, and Valencia, jointly. Bernalillo is adjacent to the other two counties but Santa Fe and Valencia do not share a border. The GMRF covariance matrix \(P^{-1}\) is given by

\begin{equation} P^{-1} = \frac{\tau_{\phi}^2}{2\left(1-\lambda_{\phi}^2\right)} \begin{bmatrix} 1 & \lambda_{\phi} & \lambda_{\phi} \\ \lambda_{\phi} & 2-\lambda_{\phi}^2 & \lambda_{\phi}^2 \\ \lambda_{\phi} & \lambda_{\phi}^2 & 2-\lambda_{\phi}^2 \\ \end{bmatrix} \end{equation}

3.3.1.2. Negative-binomial distribution

The second approach assumes a negative-binomial distribution for the discrepancy between data and model predictions. The negative-binomial distribution is commonly used in epidemiology to model overly dispersed data, e.g. in case where the standard deviation exceeds the mean [Lloyd2007]. For this modeling choice, the likelihood of observing the data given a choice for the model parameters is given by

\begin{equation} p({\boldsymbol y}\vert\Theta)=\prod_{i=1}^D\pi_{n_i(\Theta)}(y_i) =\prod_{i=1}^D \binom{y_i+\alpha-1}{\alpha-1} \left(\frac{\alpha}{\alpha+n_i(\Theta)}\right)^\alpha \left(\frac{n_i(\Theta)}{\alpha+n_i(\Theta)}\right)^{y_i} \label{eq:deticnb} \end{equation}

where \(\alpha>0\) is the dispersion parameter, and

\begin{equation} \binom{y_i+\alpha-1}{\alpha-1}=\frac{\Gamma(y_i+\alpha)}{\Gamma(\alpha)\Gamma(y_i+1)} \end{equation}

is the binomial coefficient. For simulations employing a negative binomial distribution of discrepancies, the logarithm of the dispersion parameter \(\alpha\) will be inferred jointly with the other model parameters.

3.3.2. Stochastic Incubation Model

For the {it stochastic incubation model} the likelihood reads as

\begin{equation} p({\boldsymbol y}\vert\Theta)=\pi_{{\boldsymbol n}(\Theta),{\boldsymbol\xi}} ({\boldsymbol y})\approx\prod_{i=1}^D\pi_{n_i(\Theta),{\boldsymbol\xi}}(y_i) \label{eq:detfullpost} \end{equation}

The second expression in the right-hand side above assumes independence of the discrepancies between different days. Unlike the deterministic incubation model, the likelihood components for each day \(\pi_{n_i(\Theta),{\boldsymbol\xi}}(y_i)\) are not analytically tractable anymore since they now incorporate contributions from \({\boldsymbol \xi}\), i.e. from the variability of the parameters of the incubation model. One can evaluate the likelihood via kernel density estimation (KDE) by sampling \({\boldsymbol \xi}\) for each sample of \(\Theta\), and combining these samples with samples of the assumed discrepancy \(\epsilon\), in order to arrive at an estimate of \(\pi_{n_i(\Theta),{\boldsymbol\xi}}(y_i)\). In fact, by sampling a single value of \({\boldsymbol \xi}\) for each sample of \(\Theta\), one achieves an unbiased estimate of the likelihood \(\pi_{n_i(\Theta),{\boldsymbol\xi}}(y_i)\), and given the independent-component assumption, it also leads to an unbiased estimate of the full likelihood \(\pi_{{\boldsymbol n}(\Theta),{\boldsymbol\xi}}({\boldsymbol y})\).

3.4. Posterior Distribution

A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the posterior density \(p(\Theta\vert\boldsymbol{y})\). MCMC is a class of techniques that allows sampling from a posterior distribution by constructing a Markov Chain that has the posterior as its stationary distribution. In particular, PRIME uses a an adaptive Metropolis algorithm [Haario2001]. Given the construction corresponding to the stochastic incubation model presented above, we employ the unbiased estimate of the approximate likelihood. This is the essence of the pseudo-marginal MCMC algorithm [Andrieu2009] guaranteeing that the accepted MCMC samples correspond to the posterior distribution. At each MCMC step we draw a random sample \(\xi\) from its distribution, and then we estimate the likelihood in a way similar to the deterministic incubation model.

Fig. 3.1 displays 1D and 2D joint marginal distributions based on two-wave model results. We use the Raftery-Lewis diagnostic [Raftery1992] to determine the number of MCMC samples required for converged statistics corresponding to stationary posterior distributions for \(\Theta\). The required number of samples is of the order \(o(10^5-10^6)\) depending on the geographical region employed in the inference. The resulting Effective Sample Size [Kass1998] varies between 8,000 and 15,000 samples depending on each parameter which is sufficient to estimate joint distributions for the model parameters.

_images/CA_0819_kde.pdf — Fig. 3.1 1D and 2D joint marginal distributions the components of \(\Theta=\{t_0,N_1,k_1,\theta_1,\Delta t_2,N_2,k_2,\theta_2,\log\sigma_a, \log\sigma_m\}\) for data from California up to 2020-08-19. Distance correlations for each pair of parameters is displayed above each joint marginal distribution.

3.4.1. Posterior Predictive Tests

We will employ Bayesian posterior-predictive distributions [Lynch2004] to assess the predictive skill of the statistical model. The Bayesian posterior-predictive distribution, defined below is computed by marginalization of the likelihood over the posterior distribution of model parameters \(\Theta\):

\begin{equation} p_{\mathrm{pp}}\left(\boldsymbol{y}^{\mathrm{(pp)}}\vert\boldsymbol{y}\right)=\int_{\boldsymbol{\Theta}} p(\boldsymbol{y}^{\mathrm{(pp)}}\vert\Theta) p(\Theta\vert\boldsymbol{y}) d\Theta. \label{eq:ppd} \end{equation}

The posterior predictive distribution \(p_{\mathrm{pp}}\left(\boldsymbol{y}^{\mathrm{(pp)}}\vert\boldsymbol{y}\right)\) is estimated through sampling, using the parameter samples readily available from the MCMC exploration of the parameter space, i.e. similar to results shown in Fig.~ref{fig:mcmc}. Typically we subsample the MCMC chain to about 10-15K samples that will be used to generate posterior predictive statistics. After the model evaluations \(\boldsymbol{y}=\boldsymbol{n}(\Theta)\) are completed, we add random noise consistent with the likelihood model settings presented in ref{sec:lk}. The resulting samples are used to compute summary statistics corresponding to \(p_{\mathrm{pp}}\left(\boldsymbol{y}^{\mathrm{(pp)}}\vert\boldsymbol{y}\right)\).

The posterior-predictive distribution results can be used in hindcast mode, to check how well the model follows the data, and for short-term forecasts for the spread dynamics of this disease. In the hindcast regime, the infection rate is convolved with the incubation rate model to generate statistics for \(\boldsymbol{y}^{\mathrm{(pp)}}\) that will be compared against \(\boldsymbol{y}\), the data used to infer the model parameters. The same functional form can be used to generate statistics for \(\boldsymbol{y}^{\mathrm{(pp)}}\) beyond the set of dates for which data was available. We limit these forecasts to 7–10 days as our infection rate model does not count for changes in social dynamics that can significantly impact the epidemic over a longer timerange.

3.4.2. Model Selection

Quantitative comparisons between models can be made with several metrics defined in the following sections.

3.4.2.1. Information Criteria

The Akaike Information Criteria (AIC) [Akaike1974] is defined as

\begin{equation} AIC = 2 m_{\Theta} - 2 \ln (L_{max}), \label{eq:AIC} \end{equation}

where \(m_{\Theta}\) is the number of parameters in \(\Theta\) and \(L_{max}\) is the maximum value of the likelihood \(p(\boldsymbol{y}\vert\Theta)\). This is estimated by the maximum likelihood in the MCMC chain. Given a choice of models, the model with the smallest AIC value is considered to be the highest quality model.

The Bayesian Information Criteria (BIC) [Schwarz1978] is defined as

\begin{equation} BIC = m_{\Theta} \ln(d) - 2 \ln (L_{max}), \label{eq:BIC} \end{equation}

where \(d\) is the number of observations, equal to the length of the array \(\boldsymbol{n}\). Given a choice of models, the model with the smallest BIC value is considered to be the highest quality model.

3.4.2.2. Continuous Ranked Probability Score

The Continuous Ranked Probability Score (CRPS) [Gneiting2007] measures the difference between the CDF of the provided data and that of the forecast/predicted data, i.e., data generated based on the posterior predictive distribution. It is computed by summing up marginal distributions for each day that data is available

\begin{equation} CRPS = \frac{1}{d} \sum_{j=1}^{d} \int_{-\infty}^{\infty} \left(\mathcal{F}_{pp,j} \left(y^{(pp)}_j| \boldsymbol{y} \right) - \mathcal{H}_{y_j}\left(y^{(pp)}_j \right) \right)^2 dy^{(pp)}_j, \label{eq:crps} \end{equation}

where \(y^{(pp)}_j \equiv y^{(pp)}(t_j)\) is new daily case predictions on day \(j\) obtained via the posterior-predictive distribution, \(y_j \equiv y(t_j)\) is new daily case data on day \(j\) and \(\mathcal{F}_{pp,j}\) is the 1-D marginal posterior predictive CDF for day \(j\) computed using 1-D marginal posterior predictive distributions

\begin{equation} \mathcal{F}_{pp,j}( y^{(pp)}_j | \boldsymbol{n} ) = \int_{-\infty}^{y^{(pp)}_j} p_{pp,j} \left(y^{(pp)'}_j | \boldsymbol{n} \right) dy^{(pp)'}_j \end{equation}

where

\begin{equation} p_{pp,j} \left(y^{(pp)}_j | \boldsymbol{n} \right) = \int p_{pp}(\boldsymbol{y}^{(pp)} | \boldsymbol{y}) d\boldsymbol{y}^{(pp)}_{\sim j} \end{equation}

is the marginal 1-D posterior predictive density corresponding to day \(j\), based on \(p_{pp}\left( \boldsymbol{y}^{(pp)} | \boldsymbol{y} \right)\). Here, \(d\boldsymbol{y}^{(pp)}_{\sim j} \equiv dy^{(pp)}_1 \cdots dy^{(pp)}_{j-1} dy^{(pp)}_{j+1} \cdots dy^{(pp)}_d\). The CDF of the provided case data \(\boldsymbol{y}\) is approximated as a Heaviside function centered at \(y_j\), \(\mathcal{H}_{y_j}(y^{(pp)}_j) = 1_{y^{(pp)}_j \ge y_j}\). Like AIC and BIC, the model with the smallest value of CRPS is considered to be of higher quality than other models.

[Akaike1974]

Akaike, H., A new look at the statistical model identification, Annals of Statistics (2009)

[Andrieu2009]

Andrieu C., Roberts G.O., The pseudo-marginal approach for efficient Monte Carlo computations, Annals of Statistics (2009)

[Cressie2008]

Cressie N., Johannesson G., TFixed rank kriging for very large spatial data sets, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2008)

[Gneiting2007]

Gneiting T., Raftery A., Strictly Proper Scoring Rules, Prediction, and Estimation (2007)

[Haario2001]

Haario H., Saksman E., Tamminen J., An adaptive Metropolis algorithm, Bernoulli (2001)

[Kass1998]

Raftery A.E., Lewis S., Markov Chain Monte Carlo in Practice: A Roundtable Discussion, The American Statistician (1998)