Error model#


Historical context#

The multiplicative error model used in MAPIT was chosen based on the widespread and commonplace use within the IAEA. It was recognized that as early as the IAEA’s founding in 1957 that there would be a need to account for nuclear material in facilities. This created the subsequent need for statistical methods to estimate uncertainties in measurements. One key component for the selection of an error model is the necessity to propagate error from many measurements for downstream statistical analyses. The multiplicative error model arose from the need to propagate measurements and perform top-down (i.e., empirical) uncertainty quantification using both in-field IAEA measurements and operator declared measurements. The specific values used in the multiplicative error models are determined using a variety of techniques that changes based on the context of the underlying data. We refer the reader to supplementary reading section for more information on the history and determination of the multiplicative error model.

Theory#

No measurement, except counting, is completely accurate which results in a non-zero measurement error. This is the reason that robust statistical analysis of the material balance is required; if material in a facility were exactly known, detecting loss of that material would be trivial. Statistical analyses in material accountancy often assumes a multiplicative error model (describe in the following equation)

\[ \widetilde{x}_t = x_t (1 + r_t + s_t) \]

Where:

  • \(\widetilde{x}_t\) is the observed value (i.e., what is actually measured) at time \(t\)

  • \(x\) is the true, but unknowable value at time \(t\)

  • \(r_t\) is relative random error of \(x\)

    • Specifically \(r_t\) is a random variate of the distribution \(\mathcal{N}(0, \delta_r^2)\) where \(\delta_r^2\) is the random relative standard deviation.

  • \(s_t\) is the relative systematic error of \(x\)

    • Specifically, \(s_t\) is a random variate of the distribution \(\mathcal{N}(0, \delta_s^2)\) where \(\delta_s^2\) is the systematic relative standard deviation.

Here, random error refers to sources of error that can be reduced through repeated measurements of the same item. Systematic errors refers to short-term biases that are generally irreducible. These systematic biases can arise from a variety of sources such as calibration errors. Regardless of the measurement type, errors are characterized by a mean zero normal distribution with non-zero standard deviation. The distributions characterizing the random and systematic error can vary based on a variety of factors such as measurement type, measurement system, and even the specific isotope measured.

Tip

Systematic errors behave as a bias. Consequently, the systematic variate, \(S_t\), applied to the true value \(x_t\), from the multiplicative model described above is not updated at every timestep. This contrasts with the random variate which is updated at each time step. The systematic variate is held constant and only updated on a periodic basis that corresponds to a specified calibration period. The specifics of the calibration period are measurement system specific.

Important

As of version 1.4.6, there is no functionality to specify a calibration period directly. The API can be combined with data manipulation to simulate different calibrations by slicing the data and passing it to MAPIT.core.Preprocessing.SimErrors.

By default, MAPIT assumes a single calibration for the length of the dataset that does not vary with time. For example, a time series with 1000 steps will be assumed to have a single systematic variate, drawn from a distribution described by the user supplied error matrix, that is applied to every time step and does not change with time. A new feature is planned for FY25 that will add the capability to specify a calibration period.

Implementation#

Important

Note that this function is not intended to be used standalone through direct calls, rather, it is designed to be called through the MBArea class of StatsProcessor.

The multiplicative error model described in the introduction assumes that there is a single iteration and location. For example, the model in the introduction might express the simulated error for a single input key measurement point. There might be multiple key measurement points in practice resulting an error model with location \(p\) and time \(t\) such that:

\[ \widetilde{x}_{p,t} = x_{p,t} (1 + r_{p,t} + s_{p,t}) \]

It is often desireable to consider simulated statistics and calculate error for multiple iterations to help determine performance statistics of a safeguards system even if, in practice, only a single iteration is measured. The multiplicative error model can be further expanded such that an iterative dimension, \(n\), that reflects independent draws of the underlying random and systematic error distributions, is also considered. Note that there is no iteration added to the unobservable true value, \(x_{p,t}\) as it is not a random variate.

\[ \widetilde{x}_{n,p,t} = x_{p,t} (1 + r_{n,p,t} + s_{n,p,t}) \]

For simplicity, assume that the systematic error does not have a calibration period and applies for an entire iteration (i.e., the same bias is applied for all time steps of an iteration). One naive implementation of the multiplicative model might then be as follows:

for n in range(len(iterations)):
    for p in range(len(locations)):
        sysError = np.random.normal(loc = 0, scale = sysSTD)

        for t in range(len(timesteps)):
            randomError = np.random.normal(loc = 0, scale = randSTD)
            x_observed[n, p, t] = x_true[p, t] * (1 + randomError + sysError)

This approach is valid, but scales poorly. MAPIT vectorizes both the iteration and timestep dimension on a per location basis. Each location might have a different sample rate, so it is not possible to develop a fully vectorized implementation. The multiplicative model, in a vectorized form, can first be expressed as follows when vectorizing the time dimensions:

\[ \widetilde{\boldsymbol{x}}_\boldsymbol{n,p} = \boldsymbol{x}_{p} (1 + \boldsymbol{r}_{n,p} + \boldsymbol{s}_{n,p}) \]

Additionally, the iterative dimension can be vectorized resulting in the following:

\[ \widetilde{\boldsymbol{X}}_\boldsymbol{p} = \boldsymbol{x}_{p} (1 + \boldsymbol{R}_{p} + \boldsymbol{S}_{p}) \]

Note

While the random error component, \(R_{n,p,t}\) is sampled at every time step, sensor setup might complicated the specification of the systematic error component. It is assumed here that the systematic error changes with location but not time as no calibration time is assumed (\(S_{n,p,t=0} = S_{n,p,t=50} = S_{n,p,t=t}\)).

The implementation of this error model is performed in MAPIT by MAPIT.core.Preprocessing.SimErrors starting on line 352:

      ErrorMatrix (numpy array): Matrix containing error values (RSD) for each location.

The function generates iterations simulated realizations of measurements (i.e., the iteration dimension \(n\)) for each list entry. Each iteration represents a possible result of measuring at the specific key measurement point (i.e., location dimension \(p\)) represented by the list entry.

First, a list of arrays is initialized to hold the errors calculated by the function (Lines 408-409).

408    
409    # If random RSD is zero, set the random variate array to zero

Each entry in the AppliedError list is an array with shape (iterations \(n\), time steps \(t\)) and presumably refers to a different location in a measurement type. For example, the first entry in the list might be a (time steps \(t\), 1) shaped array of data collected at the first input key measurement point. Since each list entry might have a different number of time steps, the arrays are stored in a list rather than being concatenated. The list and constituent arrays are preinitalized.

The main calculation loop occurs between lines 442 and 479:

442        ErrorMatrix = np.array([[0.1, 0.2], [0.3, 0.4]])
443        iterations = 100
444
445        result = SimErrors(rawData, ErrorMatrix, iterations)
446
447        print(result[0].shape)
448        >>> (100, 10)
449
450
451      Args:
452        rawData (list of ndarray): Raw data to apply errors to, list of 2D ndarrays. Each entry in the list should correspond to a different location and the shape of ndarray in the list should be [MxN] where M is the sample dimension (number of samples) and N is the elemental dimension, if applicable. If only considering one element, each ndarray in the rawData list should be [Mx1].
453
454        ErrorMatrix (ndarray): 2D ndarray of shape [Mx2] describing the relative standard deviation to apply to ``rawData``. M sample dimension in each input array and should be identical to M described in  ``rawData``. The second dimension (e.g., 2) refers to the random and systematic error respectively such that ``ErrorMatrix[0,0]`` refers to the random relative standard deviation of the first location and ``ErrorMatrix[0,1]`` refers to the systematic relative standard deviation. 
455
456        iterations (int): Number of iterations to calculate
457
458        GUIObject (obj, default=None): GUI object for internal MAPIT use
459
460        doTQDM (bool, default=True): Controls the use of TQDM progress bar for command line or notebook operation. 
461
462        batchSize (int, default=10): Batch size for parallel processing.
463
464        dopar (bool, default=False): Controls the use of parallel processing.
465
466        times (list of ndarray, default=None): List of ndarrays of shape [Mx1] describing the time of each sample in the rawData. Required if ``calibrationPeriod`` is provided.
467
468        calibrationPeriod (list of float, default=None): List of floats of length M describing the calibration period for each location in rawData. Required if ``times`` is provided.
469
470
471      Returns:
472        list: List of arrays identical in shape to ``rawData``. A list is returned so that each location can have a different sample rate.
473
474    """
475
476    if GUIObject is not None:
477      doTQDM = False
478
479    # times and calibrationPeriod must be present

SimErrors has a parameter batchSize that controls the number of iterations that are calculated at once. The most efficient implementation would calculate all iterations at once using a single matrix calculation. However, this could consume more memory than available in some scenarios, so the batchSize parameter is provided. The code tries to batch iterations into batchSize chunks. If iterations is not equally divisible by batchSize, then an extra remRuns sized calculation is performed after the all iterations/batchSize chunks are calculated.

MAPIT specifically uses vectorized representations to more efficiently calculated the simulated error model. The sections below map the model components to the relevant code expressions.

Model Component

Code Expression

\(\boldsymbol{r}_{n=1:\textrm{batch},p} \)

randRSD = np.random.normal(size=(batchSize, rawData[i].shape[0], 1), loc=0, scale=ErrorMatrix[i,0])

\(\boldsymbol{s}_{n=1:\textrm{batch},p} \)

sysRSD = np.random.normal(size=(batchSize, 1, 1), loc=0, scale=ErrorMatrix[i,1])

\(\boldsymbol{x}_{p}\)

rawData

\(\widetilde{\boldsymbol{x}}_{n=1:\textrm{batch},p}\)

AppliedError[i][startIdx:endIdx,] = rawData[i][:,0].reshape((1,-1)) * (1+sysRSD+randRSD).reshape((batchSize,-1))

Important

Note that sysRSD has a shape of 1 for dim 1 (i.e., sysRSD.shape[1] = 1) as a single random variate from the underlying distribution is applied to all time steps for location i. Numpy broadcasting ensures that the shape of \(\boldsymbol{S_{n,p}}\) is the same for all \(t\). This will be configurable in a future update.

Additional code present in this section is used to support GUI changes, such as updating of a progress bar update, and is not important to the core multiplicative error model calculation.

Further reading#