pecos.metrics module¶

The metrics module contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis

qci(mask, tfilter=None)[source]¶

Compute the quality control index (QCI) for each column, defined as:

$Q C I = \frac{\sum_{t \in T} X_{d t}}{| T |}$

where $T$ is the set of timestamps in the analysis. $X_{d t}$ is a data point for column $d$ time t` that passed all quality control test. $| T |$ is the number of data points in the analysis.

Parameters:

mask (pandas DataFrame) – Test results mask, returned from pm.mask
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns:

Quality control index

Return type:

pandas Series

rmse(data1, data2, tfilter=None)[source]¶

Compute the root mean squared error (RMSE) for each column, defined as:

$R M S E = \sqrt{\frac{\sum (d a t a_{1} - d a t a_{2})^{2}}{n}}$

where $d a t a_{1}$ is a time series, $d a t a_{2}$ is a time series, and $n$ is a number of data points.

Parameters:

data1 (pandas DataFrame) – Data
data2 (pandas DataFrame) – Data. Note, the column names in data1 must equal the column names in data2
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns:

Root mean squared error

Return type:

pandas Series

time_integral(data, tfilter=None)[source]¶

Compute the time integral (F) for each column, defined as:

$F = \int f d t$

where $f$ is a column of data $d t$ is the time step between observations. The integral is computed using the trapezoidal rule from numpy.trapezoid. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.

Parameters:

data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns:

Integral

Return type:

pandas Series

time_derivative(data, tfilter=None)[source]¶

Compute the derivative (f’) of each column, defined as:

$f^{'} = \frac{d f}{d t}$

where $f$ is a column of data $d t$ is the time step between observations. The derivative is computed using central differences from numpy.gradient. Results are given in [original data units]/seconds.

Parameters:

data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns:

Derivative of the data

Return type:

pandas DataFrame

probability_of_detection(observed, actual, tfilter=None)[source]¶

Compute probability of detection (PD) for each column, defined as:

$P D = \frac{T P}{T P + F N}$

where $T P$ is number of true positives and $F N$ is the number of false negatives.

Parameters:

observed (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual
tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns:

Probability of detection

Return type:

pandas Series

false_alarm_rate(observed, actual, tfilter=None)[source]¶

Compute false alarm rate (FAR) for each column, defined as:

$F A R = \frac{T N}{T N + F P}$

where $T N$ is number of true negatives and $F P$ is the number of false positives.

Parameters:

estimated (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual.
tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns:

False alarm rate

Return type:

pandas Series