Professional Documents
Culture Documents
Taller I Estudio Casos en El An Lisis De.... PDF
Taller I Estudio Casos en El An Lisis De.... PDF
Introduction
1
01-07-19
Case Studies
2
01-07-19
Status Diagnosis
Normal Normal
Alert Wear of components
Abnormal Silica contamination
Water contamination
Lubricant contamination
Silica contamination and wear of components
3
01-07-19
Thermography
4
01-07-19
5
01-07-19
Problem Statement
11
12
6
01-07-19
13
14
7
01-07-19
15
8
01-07-19
17
• For almost every type of temporal data, this is done with the
discrete Fourier transform:
Q
?(S
𝐹(𝜔) = . 𝑥? 𝑒 I Q T?
?0&
• In Python, this is easily done with the fft function from the
scipy.fftpack library.
18
9
01-07-19
QZ[\ I&
1
𝐵? = . 𝑎𝑏𝑠(𝐹Y )
𝑁?X& − 𝑁?
Y0QZ
19
20
10
01-07-19
21
11
01-07-19
23
24
12
01-07-19
Kernel-PCA
• For non-linear data, a better alternative to uncover a lower
dimensionality transformation is to use a non-linear
function (called ‘kernel’) to transform the data into a higher
dimensionality space and then use PCA.
• Possible kernels are ['linear', 'rbf', 'poly', 'sigmoid', 'cosine']
25
Cluster 1
13
01-07-19
27
14
01-07-19
29
K-Means
• It divides a set of 𝑛 datapoints onto 𝑘 groups by minimizing
the sum of square differences between each group and the
correspondent centroid
• Both centroids and the group of each point are iterated to
find this minimum
15
01-07-19
31
DBScan
• It is an algorithm similar to the Means Shifting, but with
some improvements:
1. It allows to identify outliers as noise
2. It can identify groups of arbitrary form in the N-dimensional
space
32
16
01-07-19
Gaussian Mixture
• One of the disadvantages of the K-means algorithm is that it
can only do circular clusters
• With Gaussian Mixture, we have two parameters per
dimension (mean and variance) to allow more flexibility
• For example, in 2-D, we can form elliptic-shape clusters
33
34
17
01-07-19
35
36
18
01-07-19
Robust Covariance
38
19
01-07-19
Isolation Forest
39
Isolation Forest
40
20
01-07-19
Isolation Forest
41
42
21
01-07-19
• It tries to find the smaller sphere that can contain all the
datapoints
(
1
(
min 𝑟 + . 𝜉?
F,` 𝑛𝑣
?0&
( c ξ2
sa: 𝑥? − 𝑐 ≤ 𝑟( + 𝜉? , for 𝑖 = 1,2, … , 𝑛
r
43
Φ(𝑥)
44
22
01-07-19
45
46
23
01-07-19
• Accuracy: percentage of
elements correctly classified:
pqXpQ
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = pqXpQXrqXrQ
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
47
pq
𝑅𝑒𝑐𝑎𝑙𝑙 =
pqXrQ
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 = 2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
48
24
01-07-19
49
25
01-07-19
51
26
01-07-19
53
K-Neighbors
3 Red Triangle
5 Blue Square
54
27
01-07-19
Decision Trees
• Using an attribute selection metric (Gini or Entropy), an
attribute is chosen and converted into a decision node
inside the tree. The data is divided into two sub-groups
according to that attribute
• This process is repeated until there are no more attributes
or every datapoint is already in a class
55
Random Forests
• Random Forest algorithm is an ensemble of different
decision trees, where each one of them is trained in a sub-
group of the original training dataset
• For a new point to be classified, it is fed to every decision
tree and the corresponding heath state is defined as the
most repeated prediction class of the forest
56
28
01-07-19
57
• One v/s One: For each pair of classes, a classifier is built. This
z zI&
means building classifiers. The predicted class is the most
(
frequent one given as a result between the classifiers
58
29
01-07-19
59
Prediction
Train model
Data
Validation set
Final Evaluation
Final Model
Testing set
60
30
01-07-19
Cross-Validation
• If available data is not significant, dividing it into three
subsets could produce a too small training set
• Cross-validation is useful in these cases:
• In this strategy, the data is separated into a training and testing
dataset
• Then, the training dataset is divided into k subsets and the
following training/validation scheme is performed:
• A subset is chosen. The model is trained with the 𝑘 − 1 remaining
subsets
• The chosen subset is used to test the model. Metrics are generated
and stored
• This procedure is repeated for each subset
• Finally, the resulting metrics for each subset are averaged to
produce a validation metric
61
62
31
01-07-19
Selection of Hyper-Parameters
• Hyper-parameters: they are parameters in the model that
are not learned through data
Algorithm Hyper-Parameters
K-nearest neighbors algorithm Number of neighbors
(k-NN)
Decision Tree Selection criterion: Gini or
Entropy
Random Forest Number of trees, selection
criterion
Support Vector Classifier Kernel, parameter of
penalization C, kernel
parameters
63
Selection of Hyper-Parameters
Two strategies for selecting hyperparameters:
64
32
01-07-19
Thermography: Fault
Identification in Pumps and
Electrical Cabinets
Juan Tapia F.
Enrique López Droguett
Thermography
• For the inspection of mechanical equipment (pumps and electrical
cabinet) infrared cameras are used to determine their operating
condition and prevent failures
• The analyst must verify that the elements (e.g., oil, grease varnish of
motor coils) of an equipment do not exceed the maximum and
minimum design temperatures of each component
66
33
01-07-19
Thermography
• Pump and Electrical Cabinets
67
Thermography
• Outputs of the thermal camera are:
• 01 Fluke file format called IS2. This file has metadata as follows:
• One original image (R,G,B) 3 channel
• One thermal image (R,G,B) 3 channel
• One data matrix with the temperature of each pixels
68
34
01-07-19
Thermography
69
Thermography
• Electromagnetic Spectrum
70
35
01-07-19
Thermography
Data Temperature
71
36
01-07-19
73
Thermography
• Image processing
• One image is a 2D Signal
• Each image has 3 channels (Red, Green, Blue)
• Each channel is a matrix data
74
37
01-07-19
Image Processing
75
K-Means Clustering
76
38
01-07-19
Given a MxN size image, we thus have MxN pixels, each consisting of
three components: Red, Green, and Blue, respectively
77
Exercise – Part 1
Using K-mean Clustering identify the colors in the images according to the
visualization
To do list
• Install Opencv library
• Implement K-means
• Looking for the best number of cluster to represent the temperatures
in the images.
78
39
01-07-19
Dominant
K-Means N- Cluster Histogram
Colors
79
80
40
01-07-19
81
Exercise – Part 2
Using the temperature matrix of each image, identify the areas with higher
temperature to classify the images in three conditions:
• Low temperature
• In operation
• High temperature
To Do list
• Install Opencv library
82
41
01-07-19
Estimate
Position
83
84
42
01-07-19
85
The k-means algorithm assigns each pixel in our image to the closest
cluster. We grab the number of clusters on Line 8 and then create a
histogram of the number of pixels assigned to each cluster on Line 9.
Line 12 to 16, counting the number of pixels that belong to each cluster.
86
43
01-07-19
87
88
44
01-07-19
89
90
45
01-07-19
Thresholding
if minVal < 10:
result=('Low Temperature')
print("Low Temperature")
elif int(minVal) > 11 and int(maxVal) <= 200:
result=('Normal Operation')
print("Normal Operation")
elif maxVal > 201:
result=('High Temperature')
print("High Temperature")
91
92
46
01-07-19
93
47
01-07-19
95
System Diagram
96
48
01-07-19
System Equipment
The following equipment are considered:
97
98
49
01-07-19
Process Sensors
Process Sensors
50
01-07-19
Dataset Characteristics
101
Problem 1
102
51
01-07-19
Problem 2
103
52