SITMUF#
Historical context#
The statistical properties of the MUF sequence has been studied extensively, and as early as the 1980s, it was noted that there was correlation between successive material balance periods. Pike and Woods were the first to develop a concept called ITMUF (Independent Transformed MUF) and later SITMUF (Standardized Independent MUF). The SITMUF sequence is a transformed material balance sequence wherein the mean is approximately zero and the standard deviation is approximately one. There are two key advantages to performing statistical testing on such an independent sequence:
Alarm thresholds for SITMUF depend only on the sequence length, not the form or properties of the MUF covariance
This alleviates the need to determine alarm thresholds by strictly using simulation
In a near-real time accountancy context, the variance of SITMUF decreases as the approximate covariance of the MUF sequence approaches the true value
Consequently, the detection probability of SITMUF increases over time, often resulting in a higher detection probability than the MUF sequence alone
The transformation from MUF to SITMUF was quite difficult until Picard showed that the SITMUF transform can be easily expressed using the Cholesky decomposition. A series of papers in the late 1980s onward showed that applying Page’s trend test to SITMUF performs well for a wide range of potential material loss scenarios when the pattern is not known. Page’s trend test on SITMUF has been frequently used as an effective test in nuclear material accountancy.
Theory#
Recall that the muf sequence is defined as follows:
With
It’s generally assumed that since the error models are normally distributed, individual muf values (i.e., mbp\(_i\)) and the muf sequence (i.e., muf) will also be normally distributed. Consequently, the muf sequence can be thought of as a multivariate normal distribution such that:
The covariance matrix contains the covariance between different material balances in the sequence. For example, consider the entry \(\sigma_{2n}^2\) of the covariance matrix below. This term is the variance between material balance \(n\) and \(2\).
Note
The covariance matrix \(\boldsymbol{\Sigma}\) is symmetric (i.e., \(\sigma_{1,2} = \sigma_{2,1}\)).
The SITMUF statistic is muf with a mean of zero and standard deviation of one. This can be initially considered through the subtraction of the sequence mean and division by the sequence standard deviation. The independent material balance sequence can be expressed by estimating the sequence mean through the conditional expectation given all previously observed muf values:
Then note that SITMUF is ITMUF divided by the standard deviation:
The using the expression for conditional expectation of the multivariate normal distribution with the covariance expression (1) from the expression then becomes:
Picard showed a convenient way to calculating the above expression is through the use of the Cholesky decomposition. Since Picard fully derives the relationship between the expression above and the Cholesky decomposition, we refrain from showing that work here. The final expression for SITMUF becomes
Where \( \boldsymbol{\text{C}_i}\) is the lower triangular Cholesky factor of the covariance matrix.
Calculation of the covariance matrix#
The covariance matrix itself is often calculated using relative standard deviations, similar to the calculation for \(\sigma\)muf. In fact, the diagonal terms (i.e., \(\Sigma_{1,1}, \Sigma_{2,2}, ... \)) are the variance of the material balance (the covariance of material balance \(i\) with itself is the variance). Recall the expression that was derived from \(\sigma\)muf.
Note
The covariance term for the variance will be included in the covariance matrix calculation.
The off-diagonal is calculated in a similar manner, but has more terms. The off-diagonal is the covariance between material balance \(i\) and \(j\).
Following the same rules used to derive \(\sigma\)muf leads to the expression for the covariance off diagonal:
Where
Note
The goal of the SITMUF transform is to result in an standardized independent
Discussion#
The Cholesky-based approach above has the property that the variance of SITMUF decreases with time as the estimated covariance matrix approaches the true value. This is the implementation used in MAPIT, however, one could also do a yearly SITMUF transform wherein the transform was applied only after the covariance matrix was well approximated from a year’s worth of material balances.
Implementation#
Important
Note that this function is not intended to be used standalone through direct calls, rather, it is designed to be called through the MBArea
class of StatsProcessor
.
The SITMUF implementation in MAPIT is particularly computationally intensive as the “NRTA” type calculation is used. In a simulation space, we can calculate the entire covariance matrix at once with all entries from all balance periods. However, a “NRTA” styled calculation performs the SITMUF transform with a reduced covariance matrix that grows as new observations are added. This results in a decrease in variance of SITMUF over time.
The covariance matrix is a \(NxN\) matrix at the N-th material balance period. Since MAPIT adopts the “NRTA” style calculation, all \(NxN\) entries must be updated at each balance period which simulates the arrival of new information. The MAPIT SITMUF calculation is not well vectorized as the \(NxN\) must be resized and calculated at each balance. The calculation starts by looping over balance periods and each entry in the covariance matrix at balance \(P\):
691 np.min(
692 (time1,time2,time3)))
The variables for the different times have a different meaning than in the expressions that were defined above. This is for legacy purposes and to improve alignment with the papers. The following table describes the mapping between the derived expressions and associated code:
Model Component |
Code Expression |
---|---|
Balance \(i\) |
|
Balance \(j\) |
|
For simplicity, the diagonal and off-diagonal terms are broken into multiple components.
Diagonal terms#
Term 1
728
729 if doTQDM and ispar == False:
730 pbar = tqdm(desc="SITMUF", total=int(totalloops), leave=True, bar_format = "{desc:10}: {percentage:06.2f}% |{bar}| [Elapsed: {elapsed} || Remaining: {remaining}]", ncols=None)
731 #pbar = tqdm(desc="", total=int(totalloops), leave=True, bar_format = "{desc:10}: {n_fmt}/{total_fmt} ({percentage:03.2f}%) |{bar}| [Elapsed: {elapsed} || Remaining: {remaining}]")
Term 2
748 if doTQDM:
749 if ispar == False:
750 pbar.update(1)
751
752 #it's easier to implement SITMUF using the actual indicies
753 #from the statistical papers rather than the python loop variables
754 #so this translates python loop variables to indicides and times
Term 3
742 if doTQDM:
743 if ispar == False:
744 pbar.update(1)
745
746 #it's easier to implement SITMUF using the actual indicies
747 #from the statistical papers rather than the python loop variables
748 #so this translates python loop variables to indicides and times
749 #more closely aligned to the papers
750 I = j + 1
751 IPrevious = j - 1
752 IPrime = P
753 IPrimePrevious = P-1
754
755 I_time = float(I*MBP)
756 IPrevious_time = float(IPrevious*MBP)
757 IPrime_time = float(IPrime*MBP)
758 IPrimePrevious_time = float(IPrimePrevious*MBP)
759
Term 4
742 if doTQDM:
743 if ispar == False:
744 pbar.update(1)
745
746 #it's easier to implement SITMUF using the actual indicies
747 #from the statistical papers rather than the python loop variables
748 #so this translates python loop variables to indicides and times
749 #more closely aligned to the papers
750 I = j + 1
751 IPrevious = j - 1
752 IPrime = P
753 IPrimePrevious = P-1
754
755 I_time = float(I*MBP)
756 IPrevious_time = float(IPrevious*MBP)
757 IPrime_time = float(IPrime*MBP)
758 IPrimePrevious_time = float(IPrimePrevious*MBP)
759
Term 5
739 if decompStatus == 1:
740
741 for P in range(1,int(MBPs)):
742 for j in range(0,P):
743
744 if GUIObject is not None:
Off-diagonal terms#
Term 1
808
809 if j != 0:
810 for k in range(len(inventoryAppliedError)):
811 locMatrixRow = k + len(inputAppliedError)
812 startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) -IPrevious_time).argmin()
813 endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()
814
815 term4 += inventoryAppliedError[k][:,startIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
816 term5 += inventoryAppliedError[k][:,startIdx] * inventoryAppliedError[k][:,endIdx] * ErrorMatrix[locMatrixRow, 1]**2
817
818 covmatrix[:,j,j] = term1 + term2 + term3 + term4 - 2 * term5
Term 2
808
809 if j != 0:
810 for k in range(len(inventoryAppliedError)):
811 locMatrixRow = k + len(inputAppliedError)
812 startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) -IPrevious_time).argmin()
813 endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()
814
815 term4 += inventoryAppliedError[k][:,startIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
816 term5 += inventoryAppliedError[k][:,startIdx] * inventoryAppliedError[k][:,endIdx] * ErrorMatrix[locMatrixRow, 1]**2
817
818 covmatrix[:,j,j] = term1 + term2 + term3 + term4 - 2 * term5
819
820 #------------ Off-diagonal terms ------------#
821 else:
822 term1 = np.zeros((iterations,))
823 term2 = np.zeros((iterations,))
824 term3 = np.zeros((iterations,))
825 term4 = np.zeros((iterations,))
826 term5 = np.zeros((iterations,))
827
828 term3a = np.zeros((iterations,))
Term 3
808
809 if j != 0:
810 for k in range(len(inventoryAppliedError)):
811 locMatrixRow = k + len(inputAppliedError)
812 startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) -IPrevious_time).argmin()
813 endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()
814
815 term4 += inventoryAppliedError[k][:,startIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
816 term5 += inventoryAppliedError[k][:,startIdx] * inventoryAppliedError[k][:,endIdx] * ErrorMatrix[locMatrixRow, 1]**2
817
818 covmatrix[:,j,j] = term1 + term2 + term3 + term4 - 2 * term5
819
820 #------------ Off-diagonal terms ------------#
821 else:
822 term1 = np.zeros((iterations,))
823 term2 = np.zeros((iterations,))
824 term3 = np.zeros((iterations,))
825 term4 = np.zeros((iterations,))
826 term5 = np.zeros((iterations,))
827
828 term3a = np.zeros((iterations,))
829 term3b = np.zeros((iterations,))
830 term3c = np.zeros((iterations,))
831
832 term4a = np.zeros((iterations,))
833 term4b = np.zeros((iterations,))
834 term4c = np.zeros((iterations,))
835
836 term5a = np.zeros((iterations,))
837 term5b = np.zeros((iterations,))
838 term5c = np.zeros((iterations,))
Term 4
787 for k in range(len(outputAppliedError)):
788
789 logicalInterval = np.logical_and(processedOutputTimes[k] >= IPrevious_time,processedOutputTimes[k] <= I_time).reshape((-1,))
790 locMatrixRow = k + len(inputAppliedError) + len(inventoryAppliedError)
791
792 if outputTypes[k] == 'continuous':
793 term2 += AuxFunctions.trapSum(logicalInterval,processedOutputTimes[k],outputAppliedError[k])**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
794 elif outputTypes[k] == 'discrete':
Term 5
797 raise Exception("outputTypes[j] is not 'continuous' or 'discrete'")
798
799 #------------ Inventory terms ------------#
800 for k in range(len(inventoryAppliedError)):
801 locMatrixRow = k+len(inputAppliedError)
802
803 startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - IPrevious_time).argmin()
804 endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()
805