pecos.metrics module¶
The metrics module contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis
- pecos.metrics.qci(mask, tfilter=None)[source]¶
Compute the quality control index (QCI) for each column, defined as:
\(QCI=\dfrac{\sum_{t\in T}X_{dt}}{|T|}\)
where \(T\) is the set of timestamps in the analysis. \(X_{dt}\) is a data point for column \(d\) time t` that passed all quality control test. \(|T|\) is the number of data points in the analysis.
- Parameters:
mask (pandas DataFrame) – Test results mask, returned from pm.mask
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns:
pandas Series – Quality control index
- pecos.metrics.rmse(data1, data2, tfilter=None)[source]¶
Compute the root mean squared error (RMSE) for each column, defined as:
\(RMSE=\sqrt{\dfrac{\sum{(data_1-data_2)^2}}{n}}\)
where \(data_1\) is a time series, \(data_2\) is a time series, and \(n\) is a number of data points.
- Parameters:
data1 (pandas DataFrame) – Data
data2 (pandas DataFrame) – Data. Note, the column names in data1 must equal the column names in data2
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns:
pandas Series – Root mean squared error
- pecos.metrics.time_integral(data, tfilter=None)[source]¶
Compute the time integral (F) for each column, defined as:
\(F=\int{fdt}\)
where \(f\) is a column of data \(dt\) is the time step between observations. The integral is computed using the trapezoidal rule from numpy.trapz. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.
- Parameters:
data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns:
pandas Series – Integral
- pecos.metrics.time_derivative(data, tfilter=None)[source]¶
Compute the derivative (f’) of each column, defined as:
\(f'=\dfrac{df}{dt}\)
where \(f\) is a column of data \(dt\) is the time step between observations. The derivative is computed using central differences from numpy.gradient. Results are given in [original data units]/seconds.
- Parameters:
data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns:
pandas DataFrame – Derivative of the data
- pecos.metrics.probability_of_detection(observed, actual, tfilter=None)[source]¶
Compute probability of detection (PD) for each column, defined as:
\(PD=\dfrac{TP}{TP+FN}\)
where \(TP\) is number of true positives and \(FN\) is the number of false negatives.
- Parameters:
observed (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns:
pandas Series – Probability of detection
- pecos.metrics.false_alarm_rate(observed, actual, tfilter=None)[source]¶
Compute false alarm rate (FAR) for each column, defined as:
\(FAR=\dfrac{TN}{TN+FP}\)
where \(TN\) is number of true negatives and \(FP\) is the number of false positives.
- Parameters:
estimated (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual.
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns:
pandas Series – False alarm rate