In the case of the customer dataset, missing values appear where there is nothing to measure yet. In the case of sensor data, missing values are due to a malfunctioning of the measuring machine and therefore real numerical values are just not recorded. In both cases, it is our knowledge of the process that suggests to us the right way to proceed in imputing missing values. The right way to go here is to impute the missing values with a fixed value of zero. In this case, using the mean value of the available numbers to impute the missing values would make up customers and revenues where neither customers nor revenues are present. The customer dataset has missing values for those areas where the business has not started or has not picked up and no customers and no business have been recorded yet. In a classic reporting exercise on customer data, the number of customers and the total revenue for each geographical area of the business needs to be aggregated and visualized, for example via bar charts. Here imputing the missing values with the mean of the available values is the right way to go.Ĭase Study 2: Imputation for aggregated customer data This would likely lead to a wrong estimate of the alarm threshold and to some expensive downtime. zero, this will affect the calculation of the mean and variance used for the threshold definition. If the missing values are imputed with a fixed value, e.g. In a classic threshold-based solution for anomaly detection, a threshold, calculated from the mean and variance of the original data, is applied to the sensor data to generate an alarm. Case Study 2: a report of customer aggregated dataĬase Study 1: Imputation for threshold-based anomaly detection.Case Study 1: threshold-based anomaly detection on sensor data.Let’s see the effects on two different case studies:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |