pecos.metrics module

The metrics module contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis

qci(mask, tfilter=None)[source]

Compute the quality control index (QCI) for each column, defined as:

QCI=tTXdt|T|

where T is the set of timestamps in the analysis. Xdt is a data point for column d time t` that passed all quality control test. |T| is the number of data points in the analysis.

Parameters:
  • mask (pandas DataFrame) – Test results mask, returned from pm.mask

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns:

Quality control index

Return type:

pandas Series

rmse(data1, data2, tfilter=None)[source]

Compute the root mean squared error (RMSE) for each column, defined as:

RMSE=(data1data2)2n

where data1 is a time series, data2 is a time series, and n is a number of data points.

Parameters:
  • data1 (pandas DataFrame) – Data

  • data2 (pandas DataFrame) – Data. Note, the column names in data1 must equal the column names in data2

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns:

Root mean squared error

Return type:

pandas Series

time_integral(data, tfilter=None)[source]

Compute the time integral (F) for each column, defined as:

F=fdt

where f is a column of data dt is the time step between observations. The integral is computed using the trapezoidal rule from numpy.trapezoid. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.

Parameters:
  • data (pandas DataFrame) – Data

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns:

Integral

Return type:

pandas Series

time_derivative(data, tfilter=None)[source]

Compute the derivative (f’) of each column, defined as:

f=dfdt

where f is a column of data dt is the time step between observations. The derivative is computed using central differences from numpy.gradient. Results are given in [original data units]/seconds.

Parameters:
  • data (pandas DataFrame) – Data

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns:

Derivative of the data

Return type:

pandas DataFrame

probability_of_detection(observed, actual, tfilter=None)[source]

Compute probability of detection (PD) for each column, defined as:

PD=TPTP+FN

where TP is number of true positives and FN is the number of false negatives.

Parameters:
  • observed (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask

  • actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns:

Probability of detection

Return type:

pandas Series

false_alarm_rate(observed, actual, tfilter=None)[source]

Compute false alarm rate (FAR) for each column, defined as:

FAR=TNTN+FP

where TN is number of true negatives and FP is the number of false positives.

Parameters:
  • estimated (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask

  • actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual.

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns:

False alarm rate

Return type:

pandas Series