pecos.metrics module¶
The metrics module contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis
- qci(mask, tfilter=None)[source]¶
Compute the quality control index (QCI) for each column, defined as:
where
is the set of timestamps in the analysis. is a data point for column time t` that passed all quality control test. is the number of data points in the analysis.- Parameters:
mask (pandas DataFrame) – Test results mask, returned from pm.mask
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns:
Quality control index
- Return type:
pandas Series
- rmse(data1, data2, tfilter=None)[source]¶
Compute the root mean squared error (RMSE) for each column, defined as:
where
is a time series, is a time series, and is a number of data points.- Parameters:
data1 (pandas DataFrame) – Data
data2 (pandas DataFrame) – Data. Note, the column names in data1 must equal the column names in data2
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns:
Root mean squared error
- Return type:
pandas Series
- time_integral(data, tfilter=None)[source]¶
Compute the time integral (F) for each column, defined as:
where
is a column of data is the time step between observations. The integral is computed using the trapezoidal rule from numpy.trapezoid. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.- Parameters:
data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns:
Integral
- Return type:
pandas Series
- time_derivative(data, tfilter=None)[source]¶
Compute the derivative (f’) of each column, defined as:
where
is a column of data is the time step between observations. The derivative is computed using central differences from numpy.gradient. Results are given in [original data units]/seconds.- Parameters:
data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns:
Derivative of the data
- Return type:
pandas DataFrame
- probability_of_detection(observed, actual, tfilter=None)[source]¶
Compute probability of detection (PD) for each column, defined as:
where
is number of true positives and is the number of false negatives.- Parameters:
observed (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns:
Probability of detection
- Return type:
pandas Series
- false_alarm_rate(observed, actual, tfilter=None)[source]¶
Compute false alarm rate (FAR) for each column, defined as:
where
is number of true negatives and is the number of false positives.- Parameters:
estimated (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual.
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns:
False alarm rate
- Return type:
pandas Series