Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Keywords: This study presents a new intelligent diagnosis system for classification of different machine conditions
Thermal image using data obtained from infrared thermography. In the first stage of this proposed system, two-dimen-
Feature selection sional discrete wavelet transform is used to decompose the thermal image. However, the data attained
Fault diagnostics from this stage are ordinarily high dimensionality which leads to the reduction of performance. To
Intelligent diagnosis system
surmount this problem, feature selection tool based on Mahalanobis distance and relief algorithm is
employed in the second stage to select the salient features which can characterize the machine condi-
tions for enhancing the classification accuracy. The data received from the second stage are subsequently
utilized to intelligent diagnosis system in which support vector machines and linear discriminant
analysis methods are used as classifiers. The results of the proposed system are able to assist in diagnos-
ing of different machine conditions.
Ó 2011 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.08.004
A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2083
reduction, which is an essential data preprocessing technique for (SVM), multi-agent fusion system, expert system, artificial neural
classification tasks, traditionally fall into two categories: feature networks, and fuzzy logic (Grossmann & Morlet, 1984; Niu, Han,
extraction and feature selection. Yang, & Tan, 2007; Patel, Khokhar, & Jamieson, 1996). Similarly, a
In machine fault diagnosis, there have been numerous diagnosis system based on AI techniques in association with poten-
pproaches for dimensionality reduction such as independent com- tial signal processing and feature selection technique is investi-
ponent analysis (ICA), principal component analysis (PCA) (Widodo gated in this study. The AI diagnosis models used here include
& Yang, 2007), genetic algorithms (Siedlecki & Sklansky, 1989) and linear discriminant analysis and SVM for classifying the different
relief algorithm (RA) (Kira & Rendell, 1994). classes of machine conditions such as normal, misalignment, mass
In the case of using image processing techniques, our previous unbalance, and bearing fault from thermal images. The results of
work proposed a feature extraction technique based on histogram the proposed system could able to assist greatly in diagnosing
for 2D thermal image to diagnose the faults of machine (Younus & the different machine conditions.
Yang, 2010). In this study, the Mahalanobis distance (MD) and RA
based feature selection is investigated with the aim of improving
the classification performance. Subsequent to dimensionality
2. Architecture of the proposed system
reduction procedure, selecting the models for classification or diag-
nosis task is carried out for the next stage. These models have a
The proposed intelligent diagnosis system (IDS) consists of
wide range of approaches which are varied from model-based to
consequent procedures: image decomposition, feature calculation
pattern recognition-based. Amongst these approaches, artificial
which is used to represent obtained data as features, feature
intelligence (AI) techniques based machine fault diagnosis system
selection, and classification algorithms as shown in Fig. 1. Ther-
has become popular in which numerous methods have been
mal images captured the machine conditions which are normal
completely employed, for instance support vector machines
condition, misalignment, mass unbalance, and bearing fault are
utilized as the input for this system. Initially, the measured data
are driven into 2D-DWT to calculate the wavelet coefficients.
Then, these huge data are passed through the feature calculation
IR Data module where feature sets are obtained by using different statis-
tical feature algorithms. For example, standard deviation, mean
absolute deviation, kurtosis, skewness, and others are adapted
Discrete Wavelet Transform in this study. However, after this procedure, the received data
are normally high dimension and have a large amount of redun-
Level 1 Level 2 … Level n dant features. If these data are directly inputted into the classifi-
ers, the performance will be significantly decreased. Therefore,
feature selection algorithm should be employed in order to
Feature Extraction
choose the appropriate features which can characterize the ma-
chine conditions from the whole feature sets and transform the
Feature Feature Feature
… exiting feature into lower dimensional space. In IDS, feature
Selection Selection Selection
selection module is built up by two steps where only a selected
number of feature sets are being used to obtain accurate results
Combined Feature Selection by Relief Algorithm regarding the machine conditions. The first step is applied to find
the levels that contain significant features. Subsequently, the ob-
Classifier 1 Classifier 2 … Classifier n tained features from the first step are combined with a data sheet
from where the features sets are found by applying RA in the sec-
Decision-Making of Machine Conditions ond step. Finally, the selected features from different level of
coefficients are combined and then inputted to the classifiers.
Fig. 1. Architecture of IDS. Two different classifiers are embedded in IDS to evaluate the sys-
tem performance.
Table 1
Specification of thermal camera and fault simulator.
Table 2
Detailed descriptions of image data.
Machine condition No. of data file Dimension of image data Selected dimension Total data file Using features Total features dimension (in each level)
Normal 30
Misalignment 30 320 240 158 25 120 6 120 6
Bearing fault 30
Mass unbalance 30
according to experiment condition. In this study, all of conditions region is to get the reduced dimension for further processing in sig-
i.e. normal, mass unbalance, misalignment, and bearing fault were nal processing technique.
used same setting parameters to accomplish the experiment.
As mentioned above, the aim of this study is to analyze the 4. Signal processing and feature extraction
different types of machine conditions. In the normal condition of
machine, the speed of the motor was increased gradually up to 4.1. Discrete wavelet transform (DWT)
900 rpm. This speed was held for five minutes to which the
machine reaches its stable condition, and then the process of data The 2D-DWT is a very recent mathematical tool in the field of
acquisition began. The other the experiments (misalignment, mass 2D signal processing. The important function of the wavelet is
unbalance, and bearing fault) were likewise carried out. obtained by high pass filter (HPF) and the cascade of sub-sampled
Figs. 3 and 4 show one of the original thermal images of ma- low pass filter (LPF). The LPF gives the smoothing effect known as
chine condition and gray level value at each pixel this image, approximation coefficients, while the HPF gives the details coeffi-
respectively. Fig. 5 presents the temperature value at each pixel cients. The process of decomposing the sequence into two sub-se-
of thermal image in the matrix form. The detailed descriptions of quences with half resolution can be iterated on either lower band
image data in four machine condition experiments and the reduc- or higher band. To achieve a better resolution at lower frequencies,
tion of data size from 320 240 to 158 25 array are shown in the scheme is commonly iterated on the lower band. The output
Table 2. The reason to crop the original image into the interest from the lower band of the kth stage wkh ðnÞ is the input for stage
Columns
Low Pass
Rows 1 2 sAj+1
Filter Approximation
Low Pass
2 1 Columns
Filter
High Pass
1 2 sDj+1 (h)
Original Filter
Horizontal
Signal Columns
sAj Low Pass
Rows 1 2 sDj+1 (v) Detail
Filter
High Pass Vertical Coefficients
2 1
Filter Columns
High Pass
1 2 sDj+1 (d)
Filter
Diagonal
1000
k + 1of the wavelet decomposition. In general, the kth stages of
wavelet decomposition result in a (k + 1)-band wavelet decompo- a1 500 Normal
sition of the original s(n). This can be represented as follows:
0
X
þ1 580 600 620 640 660 680 700
whkþ1 ðnÞ ¼ hðiÞwkh ð2n iÞ ð1Þ 1000
i¼1
a1 500 Mis alignment
X
þ1
0
wgkþ1 ðnÞ ¼ gðiÞwkh ð2n iÞ ð2Þ 580 600 620 640 660 680 700
i¼1 1000
In order to apply wavelet decompositions for image, 2D exten- a1 500 Mas unbalanc e
sions of wavelets are required. This can be achieved by the use of
0
non-separable or separable wavelets which is considered in this 580 600 620 640 660 680 700
1000
study. A separable filter implies that it can be performed in one
dimension (rows), followed by filtering in another dimension (col- a1 500 B earing Fault
umns). A 2D wavelet transform can be computed with a separable
0
extension of the one-dimensional (1D) decomposition algorithm 580 600 620 640 660 680 700
(Mallat, 1989) as shown in Fig. 6. First, we convolve the rows of Approximation Coefficient at Level-1
s(n, m) with a 1D filter, retain every other column, convolve the (a) Level 1
columns of the resulting signals with another 1D filter, and retain
every other row. Further stages of the 2D wavelet decomposition 400
can be computed by recursively applying the procedure to the a2 200 Normal
LPF LL band (see Fig. 6) of the previous stage. In general, the kth
stages of wavelet decomposition result in a (3k + 1)-band wavelet 0
1100 1150 1200 1250 1300 1350 1400 1450 1500
decomposition of the original image s(m, n) (Thulasiraman, Kho- 400
khar, Heber, & Gao, 2004). The decomposition algorithm starts
a2 200 Mis alignment
with signal s which is n by m dimensions, next calculates the coor-
dinates of approximation (A1), horizontal detail (HD), vertical 0
1100 1150 1200 1250 1300 1350 1400 1450 1500
detail (VD1) and diagonal detail (DD1) then those of A2, HD2,
400
VD2, and DD1 and so on. However, 1D signal being decomposed
into two components are A1 and detail coefficients (D1). a2 200 Mas unbalanc e
In this paper, all levels of decomposition and all coefficients
0
have been considered analysis to find the significant result of the 1100 1150 1200 1250 1300 1350 1400 1450 1500
400
machine conditions. Obtaining good features data from the large
amount data are a great challenge for the classification of different a2 200 B earing Fault
class data. Finding out the proper level of coefficients from the
0
decomposition data is an objective as a part successful implemen- 1100 1150 1200 1250 1300 1350 1400 1450 1500
tation of IDS. Approximation Coefficient at Level-2
(b) Level 2
4.2. Feature extraction
100
5. Feature selection 0
2200 2300 2400 2500 2600 2700 2800
100
The data dimensionality and the quality of features indicate the
number of steps required in feature selection. In this work, MD is a3 50 Mas unbalanc e
used to find the proper data source for the first step and RA has 0
proposed obtaining appropriate feature sets among the large 2200 2300 2400 2500 2600 2700 2800
200
dimension of feature sets for the second steps.
100
5.1. Mahalanobis distance (MD) a3 B earing Fault
0
2200 2300 2400 2500 2600 2700 2800
The distance between two N-dimensional points is scaled by the Approximation Coefficient at Level-3
statistical variation in each component of the point. It is a useful (c) Level 3
way of determining similarity of an unknown sample set to a
Fig. 7. Histogram presentations of different conditions in different levels.
known one. As an example, if x and y are two points from the same
distribution which has covariance matrix C, then the MD is given
by d; 5.2. Relief algorithm (RA)
d ¼ ððx yÞC1 ðx yÞÞ ð3Þ The key idea of the original RA is to estimate the quality of
features that have weights greater than the thresholds. These
A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2087
thresholds are determined by using the difference of a feature va- where nuk is a normalization unit to normalize the values of D into
lue between a given instance and the two nearest instance based the interval [0, 1].Eq. (6) is only considered for this work due to the
on near-hit and near-miss approaches using Euclidian distance fact that all features are numerical and be required to normalize.RA
(Kira & Rendell, 1994; Park & Kwon, 2007). picks a sample composed of m triplets of an instance X, its near-hit
Let consider the feature set denoted by F. instance and near-miss instance. RA uses the p-dimensional Euclid
distance for selecting near-hit and near-miss. An instance is called a
Fff1 ; f2 ; . . . ; fp g ð4Þ
near-hit of X if it belongs to the close neighborhood of X and also to
An instance X is denoted by a p-dimensional vector. the same category as X. When an instance is defined as near-miss if
it belongs to the properly close neighborhood of X but not to the
X ¼ ½x1 ; x2 ; ::::; xp ð5Þ
same category as X. RA calls a routine to update the feature weight
where xj is the values of feature fj of X. vector W for every sample triplet and determines the average
Given training data S, sample size m, and a threshold of feature weight vector relevance (of all the features to the target
relevancy s, relief detects those features which are statistically concept). Finally, RA selects those features whose average weights
relevant to the target concept s encodes a relevance threshold (relevance level) are above the given threshold s.
(0 6 s 6 1). Consider that the scale of every feature is either nom- RA is valid only when
inal or numerical. Differences of feature values between two
instances X and Y are defined by the following function D. The relevance level is large for relevant features and small for
When xk and yk are nominal, irrelevant features, and
s can be chosen to retain relevant features and discard irrele-
0 < if xk and yk are the same >
Dðxk ; yk Þ ¼ ð6Þ vant features.
1 < if xk and yk are the different > The input and output of RA are as follows:
When xk and yk are numerical, Input: a vector space for training instances with the value of
attributes and class values
Dðxk ; yk Þ ¼ ðxk yk Þ=nuk ð7Þ
Level-1 Level-1
5000 3.3
Misalignment Misalignment
4800 Mass-unbalance 3.2 Mass-unbalance
Bearing fault Bearing fault
4600 Normal Normal
3.1
4400
3
4200
Entropy
Skew
2.9
4000
2.8
3800
2.7
3600
3400 2.6
3200 2.5
9 10 11 12 13 14 15 16 9 10 11 12 13 14 15 16
STD STD
Level-1 Level-1
17 11
Misalignment
16 Mass-unbalance Misalignment
Bearing fault 10 Mass-unbalance
15 Normal Bearing fault
Normal
14 9
13
Kurtosis
MAD
8
12
11 7
10
6
9
8 5
9 10 11 12 13 14 15 16 9 10 11 12 13 14 15 16
STD STD
(a) Features at level of 1
Fig. 8. Presentation of features.
2088 A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091
Level-2 Level-2
40 3000
Misalignment Misalignment
Mass-unbalance Mass-unbalance
35 Bearing fault Bearing fault
Normal 2500
Normal
30
2000
Skew
MAD
25
1500
20
1000
15
10 500
15 20 25 30 35 40 45 50 55 60 15 20 25 30 35 40 45 50 55 60
STD STD
Level-2 Level-2
5.2 30
Misalignment Misalignment
5 Mass-unbalance 28 Mass-unbalance
Bearing fault Bearing fault
4.8 Normal 26
Normal
4.6 24
4.4 22
Kurtosis
Entropy
4.2 20
4 18
3.8 16
3.6 14
3.4 12
3.2 10
15 20 25 30 35 40 45 50 55 60 15 20 25 30 35 40 45 50 55 60
STD STD
Output: a vector space for training instances with the weight W 6.2. Linear discriminant analysis (LDA)
of each attributes.
LDA (Duda, Hart, & Stork, 2001) is a popular method for reduc-
ing the features dimension and it also performs with classification
6. Classification
problem as classifier. It projects features from parametric space to
feature space through a linear transformation matrix. This classi-
The estimated feature sets were applied to the two types of
fier can be efficiently computed in the linear case even with large
classifiers which are SVM and LDA to evaluate the performance
data sets.
of IDS.
SVMs (Cristianini & Taylor, 2000; Vapnik & Chapelle, 1999) are a 7.1. Wavelet decomposition procedure
set of related supervised learning method used for classification
and regression based on statistical learning theory. The fundamen- In the decomposition of thermal image data from different con-
tal concept of employing SVMs into the classification problem can dition machines the bi-orthogonal (bior-3.5) wavelets of degree 3.5
be implemented by mapping the training data into a feature space and the decomposition level of 3 are applied. The reason for choos-
and the aid of kernel function. In practical application comparing ing the decomposition level of 3 is the dimension of thermal image
with other classifiers, this technique can lead to good recognition data and no data for decomposition after the selected level. By
rate with a few training samples. Kernel function is an important performing decomposition, four kinds of wavelet coefficients can
parameter for SVM classifier which contains linear, polynomial, be obtained from each class of machine conditions data. Among
Gaussian, radial basis function (RBF), and sigmoid functions. these coefficients (A, HD, DD and VD), approximation coefficients
A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2089
Level-3 Level-3
1150 5.3
Misalignment Misalignment
1100 Mass-unbalance Mass-unbalance
5.2
Bearing fault Bearing fault
1050
Normal Normal
5.1
1000
5
950
Entropy
Skew
900 4.9
850
4.8
800
4.7
750
4.6
700
650 4.5
35 40 45 50 55 60 65 35 40 45 50 55 60 65
STD STD
Level-3 Level-3
28 45
Misalignment Misalignment
26 Mass-unbalance Mass-unbalance
Bearing fault 40 Bearing fault
24
Normal Normal
22
35
20
Kurtosis
MAD
18 30
16
25
14
12
20
10
8 15
35 40 45 50 55 60 65 35 40 45 50 55 60 65
STD STD
Table 3
Average Mahalanobis distance from different classes.
M: Mean, SK: Skewness, EN: Entropy, KU: Kurtosis, MA: Mean absolute deviation.
Table 4
Features discarded, MD 6 50.
N: irrelevant feature (IF) = 0, Y: relevant feature (RF) = 1, AF: accepted feature. Six features which are standard deviation, mean, entropy, skew-
NAF: Not accepted feature. ness, kurtosis, mean absolute deviation are extracted from the 2D
thermal image data for four different machine conditions. In each
condition, there are thirty samples. These features depict in Fig. 8
Table 6 where wavelet decomposition coefficient of all machine’s condi-
Classifier parameters.
tions from level 1 to 3 are applied. All features are plotted versus
Classifier Range of parameter Preference parameter standard deviation to find significant distinguish features of the
SVM Kernel function: linear, RBF, Kernel function: RBF, machine condition. First, consider the coefficient of level 1 in
polynomial polynomial Fig. 8(a). In this situation, the machine’s conditions are clearly
C = 1, 10, 100, 1000 C = 10 separated in skewness versus standard deviation. However, the
c = 23, 22, 21, 20, 21 c = 22
remaining features are either scattered or overlapped each other
Method: one-against-one, Method: one-against-all
one-against-all against standard deviation. In this study, all features at level of 2
LDA NA NA are apparently well separated that indicates all machine conditions
can be known easily by using coefficients of level 2 as shown in
Fig. 8(b). Entropy and mean absolute deviation versus standard
deviation cannot provide any decision about the machine condi-
Table 7
Error estimation with no feature selection. tions at the coefficients of level 3 because they coincide each other
(Fig. 8(c)).
SVM LDA
Testing No. of Estimation Testing Estimation
error SVs time error time (s)
7.3. Feature selection
operation was justified in as to clamp each feature weight into the receive the machine’s features which are standard deviation, mean,
interval between 1 and 1. entropy, skewness, kurtosis, and mean absolute deviation. The
feature selection based on Mahalanobis distance and relief
7.4. Classification error algorithm attains the significant features to enhance the perfor-
mance of the classifiers which are support vector machine and lin-
The relevant parameter setup for the classifier is shown in Table ear discriminant analysis in the next procedure. The classification
6 where the optimal parameters of classifier are shown. With opti- results indicate that this system could be employed to assist in
mum parameters of classifier, the RBF kernel is used as the basis monitoring machine condition and diagnosing machine faults.
function of SVM which consists of two parameters C and c. As
optimal value of these arguments, C and c parameters are defined
References
with values 10 and 22, respectively. LDA projects features from
parametric space into feature space through a linear transform Brigham, E. O. (1988). The Fast Fourier Transform and its Applications. Englewood
matrix. In general, there are no specific parameters in LDA that Cliffs. Prentice-Hall International, Inc.
need to be optimized. Within-class and between-class scatter are Cristianini, N., & Taylor, J. S. (2000). An Introduction to Support Vector Machines and
Other Kernel-Based Learning Methods. Cambridge University Press: Cambridge.
used to formulate criteria for class separation. Within-class scatter Duda, R. O., Hart, P. E., & Stork, D. G. (2001). In Pattern Classification. New York: John
is the expected covariance of each of the classes. Wiley Sons.
The classification error using IDS in which feature selection Gonzalez, R. C., Woods, R. E. (1993). Digital image processing (3rd Ed.), MA,
Addision-Wesley.
process is not used are presented in Table 7. Obviously, the use Grossmann, A., & Morlet, J. (1984). Decomposition of Hardy functions into square
of all features in each level of decomposition yields same classifica- integrable wavelets of constant shape. SIAM Journal of Mathematical Analysis, 15,
tion error for both SVM and LDA. Features of levels 1, 2 and 3 have 723–736.
Kira, K., Rendell, L. (1994). A practical approach to feature selection. In Proceeding of
found same testing error and number of support vectors (SVs) as
9th international workshop on machine learning (pp. 171–182).
well as estimation time in SVM classifier. Better result is found Lee, S. K., & White, P. R. (1997). Higher-order time-frequency analysis and its
with LDA rather than SVM. application to fault detection in rotating machinery. Mechanical Systems and
Table 8 is the results of IDS in which a MD based features selec- Signal Processing, 11(4), 637–650.
Lei, Y., He, Z., & Zi, Y. (2009). Application of an intelligent classification method to
tion module is included. It indicates that only two sets of features mechanical fault diagnosis. Expert Systems with Applications, 36, 9941–9948.
in level 1 after applying the MD feature selection process with a Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet
threshold value. There are 4 and 3 sets of features obtained from representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 11(7),
674–693.
levels 2 and 3, respectively. Applying LDA and SVM classifiers for Niu, G., Han, T., Yang, B. S., & Tan, A. C. C. (2007). Multi-agent decision fusion for
this situation, the lowest classification error is obtained from motor fault diagnosis. Mechanical Systems and Signal Processing, 21(3),
feature of levels 2 and 3 by using LDA whilst the best classification 1285–1299.
Park, H., Kwon, H. C. (2007). Extended relief algorithm in instance-based Feature
result is found by using SVM at level 2. Evidently, the classification filtering. In 6th International Conference on Advanced Language Processing and
performance of individual classifier of IDS including feature selec- Web Information technology (pp. 123–128).
tion process has been improved. Patel, J. N., Khokhar, A. A., & Jamieson, L. H. (1996). Scalability of 2-D wavelet
transform algorithms: analytical and experimental results on coarse-grained
In case of both MD and RA are applied to the 18-dimensional parallel computers. In Proceedings of the 1996 VLSI signal processing workshop
feature sets, all feature sets of level of 1 are discarded in the first (pp. 376–385).
step of using MD based feature selection. In the second step, the Saxena, A., & Saad, A. (2007). Evolving an artificial neural network classifier for
condition monitoring of rotating mechanical systems. Applied Soft Computing,
resulting 7-dimentional feature set is found through RA while
7(1), 441–454.
the relevance threshold is s is 0.008. Hence, the classification test Siedlecki, W., & Sklansky, J. (1989). A note on genetic algorithm for large scale
error reduces in comparison with one feature selection process feature selection. Pattern Recognition Letter, 10, 246–259.
applied. If the threshold s is 0.012, 3-dimentional feature set is Thulasiraman, P., Khokhar, A. A., Heber, G., & Gao, G. R. (2004). A fine-grain load-
adaptive algorithm of the 2D discrete wavelet transform for multithreaded
inputted to individual classifier and the best classification accuracy architectures. Journal of Parallel and Distributed Computing, 64, 68–78.
is obtained by LDA. The result is presented in Table 9. Evidently, Toutountzakis, T., Tan, C. K., & Mba, D. (2005). Application of acoustic emission to
the more feature section process is used, the more classification seeded gear fault detection. NDT & E International, 38(1), 27–36.
Vapnik, V., Chapelle, O. (1999). Bounds on error expectation for SVM, advances in large
accuracy is attained. margin classifiers. MIT Press.
Widodo, A., & Yang, B. S. (2007). Application of nonlinear feature extraction and
support vector machines for fault diagnosis of induction motors. Expert System
8. Conclusions
with Applications, 33, 241–250.
Yang, B. S., Lim, D. S., & Tan, A. C. C. (2005). VIBEX: an expert system for vibration
In this study, a new intelligent diagnosis system is proposed to fault diagnosis of rotating machinery using decision tree and decision table.
Expert Systems with Application., 28(4), 735–742.
classify the four machine conditions which are normal, misalign-
Younus, A.M., & Yang, B. S. (2010) Wavelet coefficient of thermal image analysis for
ment, mass unbalance and bearing fault by using infrared thermal machine fault diagnosis. In Proceeding of IEEE prognostics & system health
image. This system consists of three subsequent procedure: two management conference, mu3033.
dimensional discrete wavelet transform (2D-DWT), feature calcu- Younus, A. M., Widodo, A., & Yang, B. S. (2010). Evaluation of thermography image
data for machine fault diagnosis. Nondestructive Testing and Evaluation, 25(3),
lation, feature selection, and classification. 2D-DWT is firstly 231–247.
implemented to determine the wavelet coefficients. Consequently,
these coefficients are utilized for feature calculation procedure to