Preprocessing

Preprocessing#


Text


MAPIT.core.Preprocessing.SimErrors(rawData, ErrorMatrix, iterations, GUIObject=None, doTQDM=True, batchSize=10, dopar=False, bar=None, times=None, calibrationPeriod=None)#

Function to add simulated measurement error. Supports variable sample rates. Assumes the traditional multiplicative measurement error model:

\(M_{i,j} = T(1+R_{i,j}+S_j)\)

Random errors: \(R_{i,j} \sim \mathcal{N}(0,{\delta_R}_j^2)\)

Systematic errors: \(S_{j} \sim \mathcal{N}(0,{\delta_S}_j^2)\)

where \(i\) is the measurement time and \(j\) is the location

Example:

import numpy as np

rawData = [np.random.rand(10, 1), np.random.rand(10, 1)]

# [location1 (random, systematic), loction2 (random, systematic)]
ErrorMatrix = np.array([[0.1, 0.2], [0.3, 0.4]])
iterations = 100

result = SimErrors(rawData, ErrorMatrix, iterations)

print(result[0].shape)
>>> (100, 10)
Parameters:
  • rawData (list of ndarray) – Raw data to apply errors to, list of 2D ndarrays. Each entry in the list should correspond to a different location and the shape of ndarray in the list should be [MxN] where M is the sample dimension (number of samples) and N is the elemental dimension, if applicable. If only considering one element, each ndarray in the rawData list should be [Mx1].

  • ErrorMatrix (ndarray) – 2D ndarray of shape [Mx2] describing the relative standard deviation to apply to rawData. M sample dimension in each input array and should be identical to M described in rawData. The second dimension (e.g., 2) refers to the random and systematic error respectively such that ErrorMatrix[0,0] refers to the random relative standard deviation of the first location and ErrorMatrix[0,1] refers to the systematic relative standard deviation.

  • iterations (int) – Number of iterations to calculate

  • GUIObject (obj, default=None) – GUI object for internal MAPIT use

  • doTQDM (bool, default=True) – Controls the use of TQDM progress bar for command line or notebook operation.

  • batchSize (int, default=10) – Batch size for parallel processing.

  • dopar (bool, default=False) – Controls the use of parallel processing.

  • times (list of ndarray, default=None) – List of ndarrays of shape [Mx1] describing the time of each sample in the rawData. Required if calibrationPeriod is provided.

  • calibrationPeriod (list of float, default=None) – List of floats of length M describing the calibration period for each location in rawData. Required if times is provided.

Returns:

List of arrays identical in shape to rawData. A list is returned so that each location can have a different sample rate.

Return type:

list

MAPIT.core.Preprocessing.calcBatchError(calibrationPeriod, ErrorMatrix, batchSize, times, loc, dim0shape)#

Calculate batch error for a given location.

Parameters:
  • calibrationPeriod (numpy array or None) – Calibration period for each location.

  • ErrorMatrix (numpy array) – Matrix containing error values (RSD) for each location.

  • batchSize (int) – Size of the batch.

  • times (numpy array) – Array of time values for each location.

  • loc (int) – Index of the current location.

  • dim0shape (int) – Shape of the first dimension of the raw data array.

Returns:

Random error array. sysRSD (numpy array): Systematic error array.

Return type:

randRSD (numpy array)