Professional Documents
Culture Documents
Some assumptions
date is the date of recording of the data
Some observations
Data is unique by device ID+date (1 exception)
Once a device fails, it doesn’t come back into operation (5 exceptions)
The devices have a daily entry (in 99.82% of the cases)
Close to 1200 devices, 9.1% of the devices failed in operation
More observations
Using 5 of these attributes ( ID 2,3,4,7 and 8), we can segment 83% of the
data which has the value of all of these attributes = 0.
This segment has event rate of 2 basis points whereas the remainder data has
event rate of 40 basis points
The attribute values (Except for attribute 1) are almost monotonically
increasing ( all other attributes have sparse cases of a decrease in the value for a device over time)
Feature Creation
Created features which look at the difference between the current attribute
and the attribute on the previous day (absolute and %diff)
Created features which look at the difference between the current attribute
and the attribute when device came into operation
Created feature which tells the number of days a device has been live
Modeling Methods and Results
Class Balancing techniques can be tried. The simplest one - Random
oversampling has been used
Logistic Regression modeling technique was used for classification
AUROC = 0.820
AUPR = 0.082