pecos.monitoring module¶
The monitoring module contains the PerformanceMonitoring class used to run quality control tests and store results. The module also contains individual functions that can be used to run quality control tests.
- class pecos.monitoring.PerformanceMonitoring[source]¶
Bases:
object
PerformanceMonitoring class
- property data¶
Data used in quality control analysis, added to the PerformanceMonitoring object using
add_dataframe
.
- property mask¶
Boolean mask indicating if data that failed a quality control test. True = data point pass all tests, False = data point did not pass at least one test.
- property cleaned_data¶
Cleaned data set, data that failed a quality control test are replaced by NaN.
- add_dataframe(data)[source]¶
Add data to the PerformanceMonitoring object
- Parameters:
data (pandas DataFrame) – Data to add to the PerformanceMonitoring object, indexed by datetime
- add_translation_dictionary(trans)[source]¶
Add translation dictionary to the PerformanceMonitoring object
- Parameters:
trans (dictionary) – Translation dictionary
- add_time_filter(time_filter)[source]¶
Add a time filter to the PerformanceMonitoring object
- Parameters:
time_filter (pandas DataFrame with a single column or pandas Series) – Time filter containing boolean values for each time index True = keep time index in the quality control results. False = remove time index from the quality control results.
- check_timestamp(frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]¶
Check time series for missing, non-monotonic and duplicate timestamps
- Parameters:
frequency (int or float) – Expected time series frequency, in seconds
expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used
expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.
- check_range(bound, key=None, min_failures=1)[source]¶
Check for data that is outside expected range
- Parameters:
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_increment(bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]¶
Check data increments using the difference between values
- Parameters:
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
increment (int, optional) – Time step shift used to compute difference, default = 1
absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_delta(bound, window, key=None, direction=None, min_failures=1)[source]¶
Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window
- Parameters:
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float) – Size of the rolling window (in seconds) used to compute delta
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
direction (str, optional) –
Options = ‘positive’, ‘negative’, or None
If direction is positive, then only identify positive deltas (the min occurs before the max)
If direction is negative, then only identify negative deltas (the max occurs before the min)
If direction is None, then identify both positive and negative deltas
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_outlier(bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]¶
Check for outliers using normalized data within a rolling window
The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.
- Parameters:
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True
streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_missing(key=None, min_failures=1)[source]¶
Check for missing data
- Parameters:
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_corrupt(corrupt_values, key=None, min_failures=1)[source]¶
Check for corrupt data
- Parameters:
corrupt_values (list of int or floats) – List of corrupt data values
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_custom_static(quality_control_func, key=None, min_failures=1, error_message=None)[source]¶
Use custom functions that operate on the entire dataset at once to perform quality control analysis
- Parameters:
quality_control_func (function) – Function that operates on self.df and returns a mask and metadata
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- check_custom_streaming(quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]¶
Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.
- Parameters:
quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.
window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- pecos.monitoring.check_timestamp(data, frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]¶
Check time series for missing, non-monotonic and duplicate timestamps
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
frequency (int or float) – Expected time series frequency, in seconds
expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used
expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.
- Returns:
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_range(data, bound, key=None, min_failures=1)[source]¶
Check for data that is outside expected range
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns:
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_increment(data, bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]¶
Check data increments using the difference between values
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
increment (int, optional) – Time step shift used to compute difference, default = 1
absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns:
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_delta(data, bound, window, key=None, direction=None, min_failures=1)[source]¶
Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float) – Size of the rolling window (in seconds) used to compute delta
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
direction (str, optional) –
Options = ‘positive’, ‘negative’, or None
If direction is positive, then only identify positive deltas (the min occurs before the max)
If direction is negative, then only identify negative deltas (the max occurs before the min)
If direction is None, then identify both positive and negative deltas
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns:
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_outlier(data, bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]¶
Check for outliers using normalized data within a rolling window
The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True
streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns:
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_missing(data, key=None, min_failures=1)[source]¶
Check for missing data
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns:
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_corrupt(data, corrupt_values, key=None, min_failures=1)[source]¶
Check for corrupt data
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
corrupt_values (list of int or floats) – List of corrupt data values
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns:
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_custom_static(data, quality_control_func, key=None, min_failures=1, error_message=None)[source]¶
Use custom functions that operate on the entire dataset at once to perform quality control analysis
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
quality_control_func (function) – Function that operates on self.df and returns a mask and metadata
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- Returns:
dictionary – Results include cleaned data, mask, test results summary, and metadata
- pecos.monitoring.check_custom_streaming(data, quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]¶
Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.
- Parameters:
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.
window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- Returns:
dictionary – Results include cleaned data, mask, test results summary, and metadata