Error model#
Historical context#
The multiplicative error model used in MAPIT was chosen based on the widespread and commonplace use within the IAEA. It was recognized that as early as the IAEA’s founding in 1957 that there would be a need to account for nuclear material in facilities. This created the subsequent need for statistical methods to estimate uncertainties in measurements. One key component for the selection of an error model is the necessity to propagate error from many measurements for downstream statistical analyses. The multiplicative error model arose from the need to propagate measurements and perform top-down (i.e., empirical) uncertainty quantification using both in-field IAEA measurements and operator declared measurements. The specific values used in the multiplicative error models are determined using a variety of techniques that changes based on the context of the underlying data. We refer the reader to supplementary reading section for more information on the history and determination of the multiplicative error model.
Theory#
No measurement, except counting, is completely accurate which results in a non-zero measurement error. This is the reason that robust statistical analysis of the material balance is required; if material in a facility were exactly known, detecting loss of that material would be trivial. Statistical analyses in material accountancy often assumes a multiplicative error model (describe in the following equation)
Where:
\(\widetilde{x}_t\) is the observed value (i.e., what is actually measured) at time \(t\)
\(x\) is the true, but unknowable value at time \(t\)
\(r_t\) is relative random error of \(x\)
Specifically \(r_t\) is a random variate of the distribution \(\mathcal{N}(0, \delta_r^2)\) where \(\delta_r^2\) is the random relative standard deviation.
\(s_t\) is the relative systematic error of \(x\)
Specifically, \(s_t\) is a random variate of the distribution \(\mathcal{N}(0, \delta_s^2)\) where \(\delta_s^2\) is the systematic relative standard deviation.
Here, random error refers to sources of error that can be reduced through repeated measurements of the same item. Systematic errors refers to short-term biases that are generally irreducible. These systematic biases can arise from a variety of sources such as calibration errors. Regardless of the measurement type, errors are characterized by a mean zero normal distribution with non-zero standard deviation. The distributions characterizing the random and systematic error can vary based on a variety of factors such as measurement type, measurement system, and even the specific isotope measured.
Tip
Systematic errors behave as a bias. Consequently, the systematic variate, \(S_t\), applied to the true value \(x_t\), from the multiplicative model described above is not updated at every timestep. This contrasts with the random variate which is updated at each time step. The systematic variate is held constant and only updated on a periodic basis that corresponds to a specified calibration period. The specifics of the calibration period are measurement system specific.
Important
As of version 1.4.6, there is no functionality to specify a calibration period directly. The API can be combined with data manipulation to simulate different calibrations by slicing the data and passing it to MAPIT.core.Preprocessing.SimErrors.
By default, MAPIT assumes a single calibration for the length of the dataset that does not vary with time. For example, a time series with 1000 steps will be assumed to have a single systematic variate, drawn from a distribution described by the user supplied error matrix, that is applied to every time step and does not change with time. A new feature is planned for FY25 that will add the capability to specify a calibration period.
Implementation#
Important
Note that this function is not intended to be used standalone through direct calls, rather, it is designed to be called through the MBArea class of StatsProcessor.
The multiplicative error model described in the introduction assumes that there is a single iteration and location. For example, the model in the introduction might express the simulated error for a single input key measurement point. There might be multiple key measurement points in practice resulting an error model with location \(p\) and time \(t\) such that:
It is often desireable to consider simulated statistics and calculate error for multiple iterations to help determine performance statistics of a safeguards system even if, in practice, only a single iteration is measured. The multiplicative error model can be further expanded such that an iterative dimension, \(n\), that reflects independent draws of the underlying random and systematic error distributions, is also considered. Note that there is no iteration added to the unobservable true value, \(x_{p,t}\) as it is not a random variate.
For simplicity, assume that the systematic error does not have a calibration period and applies for an entire iteration (i.e., the same bias is applied for all time steps of an iteration). One naive implementation of the multiplicative model might then be as follows:
for n in range(len(iterations)):
    for p in range(len(locations)):
        sysError = np.random.normal(loc = 0, scale = sysSTD)
        for t in range(len(timesteps)):
            randomError = np.random.normal(loc = 0, scale = randSTD)
            x_observed[n, p, t] = x_true[p, t] * (1 + randomError + sysError)
This approach is valid, but scales poorly. MAPIT vectorizes both the iteration and timestep dimension on a per location basis. Each location might have a different sample rate, so it is not possible to develop a fully vectorized implementation. The multiplicative model, in a vectorized form, can first be expressed as follows when vectorizing the time dimensions:
Additionally, the iterative dimension can be vectorized resulting in the following:
Note
While the random error component, \(R_{n,p,t}\) is sampled at every time step, sensor setup might complicated the specification of the systematic error component. It is assumed here that the systematic error changes with location but not time as no calibration time is assumed (\(S_{n,p,t=0} = S_{n,p,t=50} = S_{n,p,t=t}\)).
The implementation of this error model is performed in MAPIT by MAPIT.core.Preprocessing.SimErrors:
The function generates iterations simulated realizations of measurements (i.e., the iteration dimension \(n\)) for each list entry. Each iteration represents a possible result of measuring at the specific key measurement point (i.e., location dimension \(p\)) represented by the list entry.
First, a list of arrays is initialized to hold the errors calculated by the function (Lines 408-409).
534    for i in range(0, len(rawData)):
535        AppliedError.append(np.zeros((iterations, rawData[i].shape[0]),dtype=np.float32))
Each entry in the AppliedError list is an array with shape (iterations \(n\), time steps \(t\)) and presumably refers to a different location in a measurement type. For example, the first entry in the list might be a (time steps \(t\), 1) shaped array of data collected at the first input key measurement point. Since each list entry might have a different number of time steps, the arrays are stored in a list rather than being concatenated. The list and constituent arrays are preinitalized.
The main calculation loop occurs as follows:
568    for i in range(0, len(rawData)):
569
570
571      if iterations > batchSize:
572        outerloop = int(np.floor(iterations/batchSize))
573        remruns = iterations%batchSize
574
575
576        for j in range(0,outerloop):
577          startIdx = j*batchSize
578          endIdx = startIdx+batchSize
579
580          (randRSD, sysRSD) = calcBatchError(calibrationPeriod, ErrorMatrix, batchSize, times, i, rawData[i].shape[0])
581          AppliedError[i][startIdx:endIdx,] = rawData[i][:,0].reshape((1,-1)) * (1+sysRSD+randRSD).reshape((batchSize,-1))
582
583          if GUIObject is not None:
584            totalloops = (outerloop+1)*len(rawData)
585            GUIObject.progress.emit(loopcounter / totalloops*100)
586            loopcounter+=1
587          
588          if doTQDM and not dopar:
589            update(1)
590        
591        # if there are any batches left over, do them now
592        if remruns > 0:
593          (randRSD, sysRSD) = calcBatchError(calibrationPeriod, ErrorMatrix, remruns, times, i, rawData[i].shape[0])
594          AppliedError[i][endIdx:,] = rawData[i][:,0].reshape((1,-1)) * (1+sysRSD+randRSD).reshape((remruns,-1))
595
596          if GUIObject is not None:
597              totalloops = (outerloop+1)*len(rawData)
598              GUIObject.progress.emit(loopcounter / totalloops*100)
599              loopcounter+=1 
600          
601          if doTQDM and not dopar:
602            update(1)
603      
604      else:
605          (randRSD, sysRSD) = calcBatchError(calibrationPeriod, ErrorMatrix, iterations, times, i, rawData[i].shape[0])
606          AppliedError[i] = rawData[i][:,0].reshape((1,-1)) * (1+sysRSD+randRSD).reshape((iterations,-1))
607
608          if GUIObject is not None:
609            GUIObject.progress.emit(i/len(rawData)*100)
610          
611          if doTQDM and not dopar:
612            update(1)
613
SimErrors has a parameter batchSize that controls the number of iterations that are calculated at once. The most efficient implementation would calculate all iterations at once using a single matrix calculation. However, this could consume more memory than available in some scenarios, so the batchSize parameter is provided. The code tries to batch iterations into batchSize chunks. If iterations is not equally divisible by batchSize, then an extra remRuns sized calculation is performed after the all iterations/batchSize chunks are calculated.
MAPIT specifically uses vectorized representations to more efficiently calculated the simulated error model. The sections below map the model components to the relevant code expressions.
Model Component  | 
Code Expression  | 
|---|---|
\(\boldsymbol{r}_{n=1:\textrm{batch},p} \)  | 
  | 
\(\boldsymbol{s}_{n=1:\textrm{batch},p} \)  | 
  | 
\(\boldsymbol{x}_{p}\)  | 
  | 
\(\widetilde{\boldsymbol{x}}_{n=1:\textrm{batch},p}\)  | 
  | 
Important
Note that sysRSD has a shape of 1 for dim 1 (i.e., sysRSD.shape[1] = 1) as a single random variate from the underlying distribution is applied to all time steps for location i. Numpy broadcasting ensures that the shape of \(\boldsymbol{S_{n,p}}\) is the same for all \(t\). This will be configurable in a future update.
Additional code present in this section is used to support GUI changes, such as updating of a progress bar update, and is not important to the core multiplicative error model calculation.
Further reading#
- 
Discussion about the development of the multiplicative error model and how UQ models like GUM relate
 International Target Values for Measurement Uncertainties in Safeguarding Nuclear Materials
Reference values for \(\delta_r\) and \(\delta_s\) for different measurement systems and materials
Near Real Time Accountancy (IAEA STR-403)
Finalized but not yet released by IAEA at time of writing
Handbook of Nuclear Engineering: Proliferation Resistance and Safeguards
Specifically the section on “Statistics for Accountancy”