Covariance Matrix#

Historical context#

Generally, the covariance matrix is a matrix that describes the relationship between the elements of a multivariate normal distribution. The covariance matrix is a square matrix of size [n x n] where n is the number of elements in the multivariate normal distribution. The covariance matrix is a symmetric matrix and is defined as follows:

\begin{array}{r} Σ = [\begin{array}{c} σ_{11} & σ_{12} & σ_{13} & \dots & σ_{1 n} \\ σ_{21} & σ_{22} & σ_{23} & \dots & σ_{2 n} \\ σ_{31} & σ_{32} & σ_{33} & \dots & σ_{3 n} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ σ_{n 1} & σ_{n 2} & σ_{n 3} & \dots & σ_{n n} \end{array}] \end{array}

The covariance matrix for the material balance sequence is of particular importance for several statistical tests, including SITMUF and GEMUF. It’s impossible to know the exact matrix in practice, but it can be estimated by knowing expected sensor performance.

Recall that the muf sequence is defined as follows:

muf = {{muf}_{0}, {muf}_{1}, . . . {muf}_{n}}

With

{muf}_{i} = \sum_{l \in l_{0}} \int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} I_{t, l} - \sum_{l \in l_{1}} \int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} O_{t, l} - \sum_{l \in l_{2}} (C_{i, l} - C_{i - 1, l})

It’s generally assumed that since the error models are normally distributed, individual muf values (i.e., mbp $_{i}$ ) and the muf sequence (i.e., muf) will also be normally distributed. Consequently, the muf sequence can be thought of as a multivariate normal distribution such that:

muf \sim N (μ, Σ)

The covariance matrix contains the covariance between different material balances in the sequence. For example, consider the entry $σ_{2 n}^{2}$ of the covariance matrix below. This term is the variance between material balance $n$ and $2$ .

(1)#

\begin{array}{r} \begin{aligned} Σ & = [\begin{array}{c} σ_{11}^{2} & σ_{12}^{2} & \dots & σ_{1 n}^{2} \\ σ_{21}^{2} & σ_{22}^{2} & \dots & σ_{2 n}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ σ_{n 1}^{2} & σ_{n 2}^{2} & \dots & σ_{n n}^{2} \end{array}] = [\begin{array}{c} Σ_{i - 1} & σ_{i - 1} \\ σ_{i - 1}^{T} & σ_{i, i} \end{array}] \end{aligned} \end{array}

The covariance matrix itself is often calculated using relative standard deviations, similar to the calculation for $σ$ muf. In fact, the diagonal terms (i.e., $Σ_{1, 1}, Σ_{2, 2}, . . .$ ) are the variance of the material balance (the covariance of material balance $i$ with itself is the variance). There’s two key expressions; one for the covariance diagonals (i.e., $σ_{x, x}$ ) and one for the covariance off diagonals (i.e., $σ_{x, x^{'}}$ ). These are calculated by using various variance and covariance rules and propagating the terms.

Recall the expression that was derived from $σ$ muf; which is used for the diagonals for the covariance matrix:

\begin{aligned} σ_{i, i}^{2} & \approx \sum_{l \in l_{0}} ({(\int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} I_{l, t})}^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) + \sum_{l \in l_{2}} ((C_{i, l})^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) \\ + \sum_{l \in l_{2}} ((C_{i - 1, l})^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) \\ + \sum_{l \in l_{1}} ({(\int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} O_{l, t})}^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) \end{aligned}

The off-diagonal is calculated in a similar manner, but has more terms. The off-diagonal is the covariance between material balance $i$ and $j$ .

\begin{aligned} σ_{i, j}^{2} \approx cov ( & \sum_{l \in l_{0}} \int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} I_{t, l} - \sum_{l \in l_{1}} \int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} O_{t, l} - \sum_{l \in l_{2}} (C_{i, l} - C_{i - 1, l}), \\ \sum_{l \in l_{0}} \int_{t = {MBP}_{j - 1}}^{{MBP}_{j}} I_{t, l} - \sum_{l \in l_{1}} \int_{t = {MBP}_{j - 1}}^{{MBP}_{j}} O_{t, l} - \sum_{l \in l_{2}} (C_{j, l} - C_{j - 1, l})) \end{aligned}

Following the same rules used to derive $σ$ muf leads to the expression for the covariance off diagonal:

\begin{aligned} σ_{i, j}^{2} \approx & \sum_{l \in l_{2}} ((C_{i, l} C_{j, l} + C_{i - 1, l} C_{j - 1, l}) * (δ_{S, l})^{2}) \\ - & \sum_{l \in l_{2}} ((C_{i, l} C_{j - 1, l}) * ((δ_{S, l})^{2} + P (j - 1 == i) * (δ_{R, l})^{2})) \\ - & \sum_{l \in l_{2}} ((C_{i - 1, l} C_{j, l}) * ((δ_{S, l})^{2} + P (i - 1 == j) * (δ_{R, l})^{2})) \\ + & \sum_{l \in l_{0}} ((\int_{t = {MBP}_{i - 1}}^{{MBP}_{j}} I_{t, l}) (\int_{t = {MBP}_{j - 1}}^{{MBP}_{j}} I_{t, l}) (δ_{S, l})^{2}) \\ + & \sum_{l \in l_{1}} ((\int_{t = {MBP}_{i - 1}}^{{MBP}_{j}} O_{t, l}) (\int_{t = {MBP}_{j - 1}}^{{MBP}_{j}} O_{t, l}) (δ_{S, l})^{2}) \end{aligned}

Where

\begin{array}{r} [P] \equiv {\begin{cases} 0 & P is false \\ 1 & P is true \end{cases} \end{array}

Implementation#

The covariance matrix is a $N x N$ matrix at the N-th material balance period. Since MAPIT adopts the “NRTA” style calculation, all $N x N$ entries must be updated at each balance period which simulates the arrival of new information. The MAPIT SITMUF calculation is not well vectorized as the $N x N$ must be resized and calculated at each balance. The calculation starts by looping over balance periods and each entry in the covariance matrix at balance $P$ :

AuxFunctions.py

    for currentMB in range(1, int(totalMBPs)):
        for j in range(0, currentMB):

The variables for the different times have a different meaning than in the expressions that were defined above. This is for legacy purposes and to improve alignment with the papers. The following table describes the mapping between the derived expressions and associated code:

Model Component	Code Expression
Balance $i$	`I`
Balance $j$	`IPrime`

For simplicity, the diagonal and off-diagonal terms are broken into multiple components.

Diagonal terms#

\begin{aligned} σ_{i, i}^{2} & \approx \sum_{l \in l_{0}} ({(\int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} I_{l, t})}^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) \\ + \sum_{l \in l_{1}} ({(\int_{t = {MBP}_{i - 1}}^{{MBP}_{i}} O_{l, t})}^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) \\ + \sum_{l \in l_{2}} ((C_{i, l})^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) \\ + \sum_{l \in l_{2}} ((C_{i - 1, l})^{2} * ((δ_{R, l})^{2} + (δ_{S, l})^{2})) - \sum_{l \in l_{2}} (2 C_{i - 1, l} C_{i, l} (δ_{S, l})^{2}) \end{aligned}

Term 1
AuxFunctions.py

                for k in range(len(inputAppliedError)):

                    logicalInterval = np.logical_and(processedInputTimes[k] >= IPrevious_time,processedInputTimes[k] <= I_time).reshape((-1,))  #select the indices for the relevant time

                    if inputTypes[k] == 'continuous':
                        term1 += trapSum(logicalInterval,processedInputTimes[k],inputAppliedError[k]) **2 * (ErrorMatrix[k, 0]**2 + ErrorMatrix[k, 1]**2)
                    elif inputTypes[k] == 'discrete':
                        term1 += (inputAppliedError[k][:, logicalInterval].sum(axis=1)**2 * (ErrorMatrix[k, 0]**2 + ErrorMatrix[k, 1]**2 ))
                    else:
                        raise Exception("inputTypes[j] is not 'continuous' or 'discrete'")                      

Term 2
AuxFunctions.py

                for k in range(len(outputAppliedError)):

                    logicalInterval = np.logical_and(processedOutputTimes[k] >= IPrevious_time,processedOutputTimes[k] <= I_time).reshape((-1,))
                    locMatrixRow = k + len(inputAppliedError) + len(inventoryAppliedError)                      

                    if outputTypes[k] == 'continuous':
                        term2 += trapSum(logicalInterval,processedOutputTimes[k],outputAppliedError[k])**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
                    elif outputTypes[k] == 'discrete':
                        term2 += (outputAppliedError[k][:, logicalInterval].sum(axis=1)**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2 ))
                    else:
                        raise Exception("outputTypes[j] is not 'continuous' or 'discrete'")

Term 3
AuxFunctions.py

                for k in range(len(inventoryAppliedError)):
                    locMatrixRow = k+len(inputAppliedError)

                    startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - IPrevious_time).argmin()
                    endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()

                    term3 += inventoryAppliedError[k][:,endIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)


                if j != 0:
                    for k in range(len(inventoryAppliedError)):
                        locMatrixRow = k + len(inputAppliedError)
                        startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) -IPrevious_time).argmin()
                        endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()

                        term4 += inventoryAppliedError[k][:,startIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
                        term5 += inventoryAppliedError[k][:,startIdx] * inventoryAppliedError[k][:,endIdx] * ErrorMatrix[locMatrixRow, 1]**2

                covmatrix[:,j,j] = term1 + term2 + term3 + term4 - 2 * term5

Term 4
AuxFunctions.py

                for k in range(len(inventoryAppliedError)):
                    locMatrixRow = k+len(inputAppliedError)

                    startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - IPrevious_time).argmin()
                    endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()

                    term3 += inventoryAppliedError[k][:,endIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)


                if j != 0:
                    for k in range(len(inventoryAppliedError)):
                        locMatrixRow = k + len(inputAppliedError)
                        startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) -IPrevious_time).argmin()
                        endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()

                        term4 += inventoryAppliedError[k][:,startIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
                        term5 += inventoryAppliedError[k][:,startIdx] * inventoryAppliedError[k][:,endIdx] * ErrorMatrix[locMatrixRow, 1]**2

                covmatrix[:,j,j] = term1 + term2 + term3 + term4 - 2 * term5

Term 5
AuxFunctions.py

The factor 2 is included later when the terms are added and assigned to the covariance matrix.#

                for k in range(len(inventoryAppliedError)):
                    locMatrixRow = k+len(inputAppliedError)

                    startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - IPrevious_time).argmin()
                    endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()

                    term3 += inventoryAppliedError[k][:,endIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)


                if j != 0:
                    for k in range(len(inventoryAppliedError)):
                        locMatrixRow = k + len(inputAppliedError)
                        startIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) -IPrevious_time).argmin()
                        endIdx = np.abs(processedInventoryTimes[k].reshape((-1,)) - I_time).argmin()

                        term4 += inventoryAppliedError[k][:,startIdx]**2 * (ErrorMatrix[locMatrixRow, 0]**2 + ErrorMatrix[locMatrixRow, 1]**2)
                        term5 += inventoryAppliedError[k][:,startIdx] * inventoryAppliedError[k][:,endIdx] * ErrorMatrix[locMatrixRow, 1]**2

                covmatrix[:,j,j] = term1 + term2 + term3 + term4 - 2 * term5

Off-diagonal terms#

\begin{aligned} σ_{i, j}^{2} \approx & \sum_{l \in l_{0}} ((\int_{t = {MBP}_{i - 1}}^{{MBP}_{j}} I_{t, l}) (\int_{t = {MBP}_{j - 1}}^{{MBP}_{j}} I_{t, l}) (δ_{S, l})^{2}) \\ + & \sum_{l \in l_{1}} ((\int_{t = {MBP}_{i - 1}}^{{MBP}_{j}} O_{t, l}) (\int_{t = {MBP}_{j - 1}}^{{MBP}_{j}} O_{t, l}) (δ_{S, l})^{2}) \\ + & \sum_{l \in l_{2}} ((C_{i, l} C_{j, l} + C_{i - 1, l} C_{j - 1, l}) * (δ_{S, l})^{2}) \\ - & \sum_{l \in l_{2}} ((C_{i, l} C_{j - 1, l}) * ((δ_{S, l})^{2} + P (j - 1 == i) * (δ_{R, l})^{2})) \\ - & \sum_{l \in l_{2}} ((C_{i - 1, l} C_{j, l}) * ((δ_{S, l})^{2} + P (i - 1 == j) * (δ_{R, l})^{2})) \end{aligned}