You are on page 1of 10

Expert Systems with Applications 39 (2012) 2082–2091

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Intelligent fault diagnosis of rotating machinery using infrared thermal image


Ali MD. Younus, Bo-Suk Yang ⇑
Department of Mechanical & Automotive Engineering, Pukyong National University, San 100, Yongdang-dong, Nam-gu, Busan 608-739, South Korea

a r t i c l e i n f o a b s t r a c t

Keywords: This study presents a new intelligent diagnosis system for classification of different machine conditions
Thermal image using data obtained from infrared thermography. In the first stage of this proposed system, two-dimen-
Feature selection sional discrete wavelet transform is used to decompose the thermal image. However, the data attained
Fault diagnostics from this stage are ordinarily high dimensionality which leads to the reduction of performance. To
Intelligent diagnosis system
surmount this problem, feature selection tool based on Mahalanobis distance and relief algorithm is
employed in the second stage to select the salient features which can characterize the machine condi-
tions for enhancing the classification accuracy. The data received from the second stage are subsequently
utilized to intelligent diagnosis system in which support vector machines and linear discriminant
analysis methods are used as classifiers. The results of the proposed system are able to assist in diagnos-
ing of different machine conditions.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction environment, medicine, architecture, engineering where the


temperature represents a key parameter. The measurement princi-
Rotating machinery covers a broad range of mechanical equip- ple is based on the fact that any physical object radiates energy at
ment and plays a significant role in industrial applications. Due to infrared wavelengths i.e. within the infrared range of the electro-
the necessity of increasing production, rotating machinery is magnetic spectrum. Thermal camera can measure and visualize
essentially required to run continuously without interruption and emitted infrared radiation evoked. Therefore, the surface tempera-
to extend the machine life as well (Saxena & Saad, 2007) due to ture distribution is recorded in the form of thermogram. Basing on
the fact that its failures has become very costly and time consum- this characteristic, thermal image is currently applied to machine
ing. Hence, a great emphasis on machine fault diagnosis is neces- condition monitoring and diagnosis field.
sary to increase the availability and to avoid personal casualties In order to monitor the conditions as well as diagnose the ma-
as well as economic losses (Lei, He, & Zi, 2009). However, with chine faults, signal processing techniques are firstly required to
rapid development of science and technology, rotating machinery process the data acquired from machine. The early one of these
in modern industry is growing larger, more precise, and more techniques which frequently uses for processing one-dimensional
automatic. Its potential faults become more difficult to detect. signals is fast Fourier transformation (FFT). FFT is also employed
Therefore, the need to increase fault diagnosis capability against as an image processing tool to handle of two-dimensional (2D)
possible failures has considerably attracted numerous researchers signals i.e. X–ray, synthetic aperture radar (SAR), magnetic
in recent years. Different approaches for condition monitoring resonance image (MRI), and RGB image, etc (Brigham, 1988; Gonz-
and diagnosis of rotating machinery have been profitably proposed alez & Woods, 1993). Currently, wavelet transformation has been
in literature such as acoustic emission (Toutountzakis, Tan, & Mba, received much consideration for image processing field. An
2005), vibration analysis (Yang, Lim, & Tan, 2005), frequency outstanding of this technique is 2D discrete wavelet transform
analysis (Lee & White, 1997) and other methods which are suitable (2D-DWT) commonly used as a decomposition algorithm. How-
only for a particular machine. ever, the data obtained from the decomposition process are rarely
Infrared thermography is a non-contact and non-intrusive usable due to the huge dimensionality which causes not only
temperature measuring technique with an advantage of no alter- difficulties of data storage but also data processing for the next
ation in the surface temperature and capable of displaying real procedure. Representing data as features is the effective solution
time temperature distribution. This has been exploited in many for this problem. Representing data as feature or dimensionality
industrial and/or research fields, amongst other: meteorology, reduction is a process of extracting the useful information to
remove artifacts and reduce the dimensionality. However it must
⇑ Corresponding author. Tel.: +82 51 629 6152; fax: +82 51 629 6150. preserve as much as possible the characteristic features which
E-mail address: bsyang@pknu.ac.kr (B.-S. Yang). indicate the conditions and faults of machine. Dimensionality

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.08.004
A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2083

reduction, which is an essential data preprocessing technique for (SVM), multi-agent fusion system, expert system, artificial neural
classification tasks, traditionally fall into two categories: feature networks, and fuzzy logic (Grossmann & Morlet, 1984; Niu, Han,
extraction and feature selection. Yang, & Tan, 2007; Patel, Khokhar, & Jamieson, 1996). Similarly, a
In machine fault diagnosis, there have been numerous diagnosis system based on AI techniques in association with poten-
pproaches for dimensionality reduction such as independent com- tial signal processing and feature selection technique is investi-
ponent analysis (ICA), principal component analysis (PCA) (Widodo gated in this study. The AI diagnosis models used here include
& Yang, 2007), genetic algorithms (Siedlecki & Sklansky, 1989) and linear discriminant analysis and SVM for classifying the different
relief algorithm (RA) (Kira & Rendell, 1994). classes of machine conditions such as normal, misalignment, mass
In the case of using image processing techniques, our previous unbalance, and bearing fault from thermal images. The results of
work proposed a feature extraction technique based on histogram the proposed system could able to assist greatly in diagnosing
for 2D thermal image to diagnose the faults of machine (Younus & the different machine conditions.
Yang, 2010). In this study, the Mahalanobis distance (MD) and RA
based feature selection is investigated with the aim of improving
the classification performance. Subsequent to dimensionality
2. Architecture of the proposed system
reduction procedure, selecting the models for classification or diag-
nosis task is carried out for the next stage. These models have a
The proposed intelligent diagnosis system (IDS) consists of
wide range of approaches which are varied from model-based to
consequent procedures: image decomposition, feature calculation
pattern recognition-based. Amongst these approaches, artificial
which is used to represent obtained data as features, feature
intelligence (AI) techniques based machine fault diagnosis system
selection, and classification algorithms as shown in Fig. 1. Ther-
has become popular in which numerous methods have been
mal images captured the machine conditions which are normal
completely employed, for instance support vector machines
condition, misalignment, mass unbalance, and bearing fault are
utilized as the input for this system. Initially, the measured data
are driven into 2D-DWT to calculate the wavelet coefficients.
Then, these huge data are passed through the feature calculation
IR Data module where feature sets are obtained by using different statis-
tical feature algorithms. For example, standard deviation, mean
absolute deviation, kurtosis, skewness, and others are adapted
Discrete Wavelet Transform in this study. However, after this procedure, the received data
are normally high dimension and have a large amount of redun-
Level 1 Level 2 … Level n dant features. If these data are directly inputted into the classifi-
ers, the performance will be significantly decreased. Therefore,
feature selection algorithm should be employed in order to
Feature Extraction
choose the appropriate features which can characterize the ma-
chine conditions from the whole feature sets and transform the
Feature Feature Feature
… exiting feature into lower dimensional space. In IDS, feature
Selection Selection Selection
selection module is built up by two steps where only a selected
number of feature sets are being used to obtain accurate results
Combined Feature Selection by Relief Algorithm regarding the machine conditions. The first step is applied to find
the levels that contain significant features. Subsequently, the ob-
Classifier 1 Classifier 2 … Classifier n tained features from the first step are combined with a data sheet
from where the features sets are found by applying RA in the sec-
Decision-Making of Machine Conditions ond step. Finally, the selected features from different level of
coefficients are combined and then inputted to the classifiers.
Fig. 1. Architecture of IDS. Two different classifiers are embedded in IDS to evaluate the sys-
tem performance.

Fig. 2. Experimental setup.


2084 A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091

Table 1
Specification of thermal camera and fault simulator.

Thermal camera  Solid state, uncooled micro bolometer detector,


(FLIR-A 40 series) 7.5–13 lm
 40 °C to +70 °C storage temperature range
 Solid object materials and surface treatments
exhibit emissivity ranging from approximately
0.1–0.95
 For short distance, humidity is default value of
50%
 0.08 °C at 30 °C thermal sensitivity
 IP 40 (determined by connector type)
Fault simulator  Shaft diameter: 30 mm
 Bearing: two ball bearings
 Bearing housings: two bearing housings, alumi-
num horizontally split bracket for simple and
easy changes, tapped to accept transducer
mount
 Bearing housing base: completely movable using
jack bolts for easy misalignment in all three
planes Fig. 3. Original thermal image.
 Rotors: Two rotors, 6’’ diameter with two rows of
tapped holes at every 20° (with lip for introduc-
ing unbalance force)
many available thread holes to add extra mass were attached on
the shaft. The variable speed DC motor (0.5 HP) with speed up to
3450 rpm was used to drive the motor. Table 1 shows the main
3. Experimental setup and image acquisition
specifications of thermal camera and fault simulator. This camera
used in the experiments is a long-wave infrared camera from FLIR
To validate the proposed system, experiment was carried out
with a thermal sensitivity of 0.08 °C at 30 °C.
using fault simulator which consists of driving motor, shaft, disks,
The thermal camera is the major key device and some its
PC for saving data, and thermal camera as shown in Fig. 2. The
parameters requires to be set due to their importance for data
short shaft of 30 mm diameter supported by two ball bearings at
acquisition, especially for thermal image data. However, those
the ends was attached to the shaft of the motor through a flexible
parameters are automatically put functioning of thermal camera
coupling. This minimizes the effects of misalignment and transmis-
because all machine’s material are considered as similar. The most
sion of vibration from motor. Using the coupling, the misalignment
important parameter is emissivity and the other parameters are
condition on the fault simulator can be adjusted. In order to create
relative humidity, scale temperature, focal length of camera, and
the unbalance condition on the fault simulator, the disks with
distance as indicated in Table 1. All of these parameters are chosen

Fig. 4. Gray level value at each pixel in thermal image.


A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2085

Fig. 5. Temperature at each pixel of thermal (Kelvin scale).

Table 2
Detailed descriptions of image data.

Machine condition No. of data file Dimension of image data Selected dimension Total data file Using features Total features dimension (in each level)
Normal 30
Misalignment 30 320  240 158  25 120 6 120  6
Bearing fault 30
Mass unbalance 30

according to experiment condition. In this study, all of conditions region is to get the reduced dimension for further processing in sig-
i.e. normal, mass unbalance, misalignment, and bearing fault were nal processing technique.
used same setting parameters to accomplish the experiment.
As mentioned above, the aim of this study is to analyze the 4. Signal processing and feature extraction
different types of machine conditions. In the normal condition of
machine, the speed of the motor was increased gradually up to 4.1. Discrete wavelet transform (DWT)
900 rpm. This speed was held for five minutes to which the
machine reaches its stable condition, and then the process of data The 2D-DWT is a very recent mathematical tool in the field of
acquisition began. The other the experiments (misalignment, mass 2D signal processing. The important function of the wavelet is
unbalance, and bearing fault) were likewise carried out. obtained by high pass filter (HPF) and the cascade of sub-sampled
Figs. 3 and 4 show one of the original thermal images of ma- low pass filter (LPF). The LPF gives the smoothing effect known as
chine condition and gray level value at each pixel this image, approximation coefficients, while the HPF gives the details coeffi-
respectively. Fig. 5 presents the temperature value at each pixel cients. The process of decomposing the sequence into two sub-se-
of thermal image in the matrix form. The detailed descriptions of quences with half resolution can be iterated on either lower band
image data in four machine condition experiments and the reduc- or higher band. To achieve a better resolution at lower frequencies,
tion of data size from 320  240 to 158  25 array are shown in the scheme is commonly iterated on the lower band. The output
Table 2. The reason to crop the original image into the interest from the lower band of the kth stage wkh ðnÞ is the input for stage

Columns
Low Pass
Rows 1 2 sAj+1
Filter Approximation
Low Pass
2 1 Columns
Filter
High Pass
1 2 sDj+1 (h)
Original Filter
Horizontal
Signal Columns
sAj Low Pass
Rows 1 2 sDj+1 (v) Detail
Filter
High Pass Vertical Coefficients
2 1
Filter Columns
High Pass
1 2 sDj+1 (d)
Filter
Diagonal

2 1 Down sample columns: Keep the 1 2


Down sample columns: Keep the
even index column even index rows

Fig. 6. Decomposition algorithm of DWT.


2086 A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091

1000
k + 1of the wavelet decomposition. In general, the kth stages of
wavelet decomposition result in a (k + 1)-band wavelet decompo- a1 500 Normal
sition of the original s(n). This can be represented as follows:
0
X
þ1 580 600 620 640 660 680 700
whkþ1 ðnÞ ¼ hðiÞwkh ð2n  iÞ ð1Þ 1000
i¼1
a1 500 Mis alignment

X
þ1
0
wgkþ1 ðnÞ ¼ gðiÞwkh ð2n  iÞ ð2Þ 580 600 620 640 660 680 700
i¼1 1000

In order to apply wavelet decompositions for image, 2D exten- a1 500 Mas unbalanc e
sions of wavelets are required. This can be achieved by the use of
0
non-separable or separable wavelets which is considered in this 580 600 620 640 660 680 700
1000
study. A separable filter implies that it can be performed in one
dimension (rows), followed by filtering in another dimension (col- a1 500 B earing Fault
umns). A 2D wavelet transform can be computed with a separable
0
extension of the one-dimensional (1D) decomposition algorithm 580 600 620 640 660 680 700
(Mallat, 1989) as shown in Fig. 6. First, we convolve the rows of Approximation Coefficient at Level-1
s(n, m) with a 1D filter, retain every other column, convolve the (a) Level 1
columns of the resulting signals with another 1D filter, and retain
every other row. Further stages of the 2D wavelet decomposition 400
can be computed by recursively applying the procedure to the a2 200 Normal
LPF LL band (see Fig. 6) of the previous stage. In general, the kth
stages of wavelet decomposition result in a (3k + 1)-band wavelet 0
1100 1150 1200 1250 1300 1350 1400 1450 1500
decomposition of the original image s(m, n) (Thulasiraman, Kho- 400
khar, Heber, & Gao, 2004). The decomposition algorithm starts
a2 200 Mis alignment
with signal s which is n by m dimensions, next calculates the coor-
dinates of approximation (A1), horizontal detail (HD), vertical 0
1100 1150 1200 1250 1300 1350 1400 1450 1500
detail (VD1) and diagonal detail (DD1) then those of A2, HD2,
400
VD2, and DD1 and so on. However, 1D signal being decomposed
into two components are A1 and detail coefficients (D1). a2 200 Mas unbalanc e
In this paper, all levels of decomposition and all coefficients
0
have been considered analysis to find the significant result of the 1100 1150 1200 1250 1300 1350 1400 1450 1500
400
machine conditions. Obtaining good features data from the large
amount data are a great challenge for the classification of different a2 200 B earing Fault
class data. Finding out the proper level of coefficients from the
0
decomposition data is an objective as a part successful implemen- 1100 1150 1200 1250 1300 1350 1400 1450 1500
tation of IDS. Approximation Coefficient at Level-2
(b) Level 2
4.2. Feature extraction
100

As mentioned in the previous section, feature extraction is con- a3 50 Normal


sidered where the input data are too large for processing, and then
the input data will be transformed into a reduced representation 0
2200 2300 2400 2500 2600 2700 2800
set of features. The features from 2D coefficient of DWT are calcu- 100
lated according to Younus, Widodo, and Yang (2010).
a3 50 Mis alignment

5. Feature selection 0
2200 2300 2400 2500 2600 2700 2800
100
The data dimensionality and the quality of features indicate the
number of steps required in feature selection. In this work, MD is a3 50 Mas unbalanc e

used to find the proper data source for the first step and RA has 0
proposed obtaining appropriate feature sets among the large 2200 2300 2400 2500 2600 2700 2800
200
dimension of feature sets for the second steps.
100
5.1. Mahalanobis distance (MD) a3 B earing Fault
0
2200 2300 2400 2500 2600 2700 2800
The distance between two N-dimensional points is scaled by the Approximation Coefficient at Level-3
statistical variation in each component of the point. It is a useful (c) Level 3
way of determining similarity of an unknown sample set to a
Fig. 7. Histogram presentations of different conditions in different levels.
known one. As an example, if x and y are two points from the same
distribution which has covariance matrix C, then the MD is given
by d; 5.2. Relief algorithm (RA)

d ¼ ððx  yÞC1 ðx  yÞÞ ð3Þ The key idea of the original RA is to estimate the quality of
features that have weights greater than the thresholds. These
A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2087

thresholds are determined by using the difference of a feature va- where nuk is a normalization unit to normalize the values of D into
lue between a given instance and the two nearest instance based the interval [0, 1].Eq. (6) is only considered for this work due to the
on near-hit and near-miss approaches using Euclidian distance fact that all features are numerical and be required to normalize.RA
(Kira & Rendell, 1994; Park & Kwon, 2007). picks a sample composed of m triplets of an instance X, its near-hit
Let consider the feature set denoted by F. instance and near-miss instance. RA uses the p-dimensional Euclid
distance for selecting near-hit and near-miss. An instance is called a
Fff1 ; f2 ; . . . ; fp g ð4Þ
near-hit of X if it belongs to the close neighborhood of X and also to
An instance X is denoted by a p-dimensional vector. the same category as X. When an instance is defined as near-miss if
it belongs to the properly close neighborhood of X but not to the
X ¼ ½x1 ; x2 ; ::::; xp  ð5Þ
same category as X. RA calls a routine to update the feature weight
where xj is the values of feature fj of X. vector W for every sample triplet and determines the average
Given training data S, sample size m, and a threshold of feature weight vector relevance (of all the features to the target
relevancy s, relief detects those features which are statistically concept). Finally, RA selects those features whose average weights
relevant to the target concept s encodes a relevance threshold (relevance level) are above the given threshold s.
(0 6 s 6 1). Consider that the scale of every feature is either nom- RA is valid only when
inal or numerical. Differences of feature values between two
instances X and Y are defined by the following function D.  The relevance level is large for relevant features and small for
When xk and yk are nominal, irrelevant features, and
  s can be chosen to retain relevant features and discard irrele-
0 < if xk and yk are the same >
Dðxk ; yk Þ ¼ ð6Þ vant features.
1 < if xk and yk are the different >  The input and output of RA are as follows:
When xk and yk are numerical,  Input: a vector space for training instances with the value of
attributes and class values
Dðxk ; yk Þ ¼ ðxk  yk Þ=nuk ð7Þ

Level-1 Level-1
5000 3.3
Misalignment Misalignment
4800 Mass-unbalance 3.2 Mass-unbalance
Bearing fault Bearing fault
4600 Normal Normal
3.1

4400
3
4200
Entropy
Skew

2.9
4000
2.8
3800

2.7
3600

3400 2.6

3200 2.5
9 10 11 12 13 14 15 16 9 10 11 12 13 14 15 16
STD STD

Level-1 Level-1
17 11
Misalignment
16 Mass-unbalance Misalignment
Bearing fault 10 Mass-unbalance
15 Normal Bearing fault
Normal
14 9

13
Kurtosis

MAD

8
12

11 7

10
6
9

8 5
9 10 11 12 13 14 15 16 9 10 11 12 13 14 15 16
STD STD
(a) Features at level of 1
Fig. 8. Presentation of features.
2088 A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091

Level-2 Level-2
40 3000
Misalignment Misalignment
Mass-unbalance Mass-unbalance
35 Bearing fault Bearing fault
Normal 2500
Normal

30
2000

Skew
MAD

25

1500
20

1000
15

10 500
15 20 25 30 35 40 45 50 55 60 15 20 25 30 35 40 45 50 55 60
STD STD

Level-2 Level-2
5.2 30
Misalignment Misalignment
5 Mass-unbalance 28 Mass-unbalance
Bearing fault Bearing fault
4.8 Normal 26
Normal

4.6 24

4.4 22
Kurtosis
Entropy

4.2 20

4 18

3.8 16

3.6 14

3.4 12

3.2 10
15 20 25 30 35 40 45 50 55 60 15 20 25 30 35 40 45 50 55 60
STD STD

(b) Features at level of 2


Fig. 8 (continued)

 Output: a vector space for training instances with the weight W 6.2. Linear discriminant analysis (LDA)
of each attributes.
LDA (Duda, Hart, & Stork, 2001) is a popular method for reduc-
ing the features dimension and it also performs with classification
6. Classification
problem as classifier. It projects features from parametric space to
feature space through a linear transformation matrix. This classi-
The estimated feature sets were applied to the two types of
fier can be efficiently computed in the linear case even with large
classifiers which are SVM and LDA to evaluate the performance
data sets.
of IDS.

6.1. Support vector machines (SVMs) 7. Results and discussions

SVMs (Cristianini & Taylor, 2000; Vapnik & Chapelle, 1999) are a 7.1. Wavelet decomposition procedure
set of related supervised learning method used for classification
and regression based on statistical learning theory. The fundamen- In the decomposition of thermal image data from different con-
tal concept of employing SVMs into the classification problem can dition machines the bi-orthogonal (bior-3.5) wavelets of degree 3.5
be implemented by mapping the training data into a feature space and the decomposition level of 3 are applied. The reason for choos-
and the aid of kernel function. In practical application comparing ing the decomposition level of 3 is the dimension of thermal image
with other classifiers, this technique can lead to good recognition data and no data for decomposition after the selected level. By
rate with a few training samples. Kernel function is an important performing decomposition, four kinds of wavelet coefficients can
parameter for SVM classifier which contains linear, polynomial, be obtained from each class of machine conditions data. Among
Gaussian, radial basis function (RBF), and sigmoid functions. these coefficients (A, HD, DD and VD), approximation coefficients
A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2089

Level-3 Level-3
1150 5.3
Misalignment Misalignment
1100 Mass-unbalance Mass-unbalance
5.2
Bearing fault Bearing fault
1050
Normal Normal
5.1
1000

5
950

Entropy
Skew

900 4.9

850
4.8

800
4.7
750
4.6
700

650 4.5
35 40 45 50 55 60 65 35 40 45 50 55 60 65
STD STD

Level-3 Level-3
28 45
Misalignment Misalignment
26 Mass-unbalance Mass-unbalance
Bearing fault 40 Bearing fault
24
Normal Normal

22
35
20
Kurtosis

MAD

18 30

16
25
14

12
20
10

8 15
35 40 45 50 55 60 65 35 40 45 50 55 60 65
STD STD

(c) Features at level of 3


Fig. 8 (continued)

Table 3
Average Mahalanobis distance from different classes.

Normal vs misalignment Normal vs mass unbalance Normal vs bearing fault


M SK EN KU MA M SK EN KU MA M SK EN KU MA
L1 163 94 39 175 2417 34 139 3.4 26 15 731 191 758 3195 9053
L2 1550 129 53 238 2414 29 69 53 10 81 133 918 73 3466 3228
L3 1602 485 103 41 5661 23 118 4.6 169 185 41 7749 165 339 1329

M: Mean, SK: Skewness, EN: Entropy, KU: Kurtosis, MA: Mean absolute deviation.

Table 4
Features discarded, MD 6 50.

Normal vs misalignment Normal vs mass unbalance Normal vs bearing fault


M SK EN KU MA M SK EN KU MA M SK EN KU MA
L1 163 94 IF 175 2417 IF 139 IF IF IF 731 191 758 3195 9053
L2 1550 129 53 238 2414 IF 69 53 IF 81 133 918 73 3466 3228
L3 1602 485 103 IF 5661 IF 118 IF 169 185 IF 7749 165 339 1329
2090 A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091

Table 5 The histogram of approximation coefficients shows the separates


Feature selection strategy. machine conditions by either ranges of histograms or amplitude
Sample No. f1 f2 f3 f4 f5 f6 f7 of coefficients. Obviously, normal, misalignment and bearing fault
1 N N Y N N Y N conditions can be categorized by ranges. The peak values of these
2 Y N Y Y Y N Y machine conditions lay in different ranges which is the distin-
3 Y N Y Y N Y Y guishing feature of machine resonances. In case of bearing fault,
4 N Y Y Y N Y N the coefficients show maximum amplitude than other machine
5 Y Y N Y N Y Y
6 Y Y N Y Y Y Y
conditions because of the variation in image contrast of hot spot
7 Y N Y N N Y N means of region of interest. Fig. 7 shows the coefficients of only
8 Y Y Y Y Y Y Y one sample out of over thirty samples. Each machine condition is
9 Y Y Y Y N Y N performed and the obtained results are similar.
10 Y Y Y N Y Y Y
% of IF 0.2 0.4 0.2 0.3 0.6 0.1 0.4
% of RF 0.8 0.6 0.8 0.7 0.4 0.9 0.6 7.2. Feature extraction
Accepted feature;RF > 0.70 AF NAF AF NAF NAF AF NAF

N: irrelevant feature (IF) = 0, Y: relevant feature (RF) = 1, AF: accepted feature. Six features which are standard deviation, mean, entropy, skew-
NAF: Not accepted feature. ness, kurtosis, mean absolute deviation are extracted from the 2D
thermal image data for four different machine conditions. In each
condition, there are thirty samples. These features depict in Fig. 8
Table 6 where wavelet decomposition coefficient of all machine’s condi-
Classifier parameters.
tions from level 1 to 3 are applied. All features are plotted versus
Classifier Range of parameter Preference parameter standard deviation to find significant distinguish features of the
SVM  Kernel function: linear, RBF,  Kernel function: RBF, machine condition. First, consider the coefficient of level 1 in
polynomial polynomial Fig. 8(a). In this situation, the machine’s conditions are clearly
 C = 1, 10, 100, 1000  C = 10 separated in skewness versus standard deviation. However, the
 c = 23, 22, 21, 20, 21  c = 22
remaining features are either scattered or overlapped each other
 Method: one-against-one,  Method: one-against-all
one-against-all against standard deviation. In this study, all features at level of 2
LDA NA NA are apparently well separated that indicates all machine conditions
can be known easily by using coefficients of level 2 as shown in
Fig. 8(b). Entropy and mean absolute deviation versus standard
deviation cannot provide any decision about the machine condi-
Table 7
Error estimation with no feature selection. tions at the coefficients of level 3 because they coincide each other
(Fig. 8(c)).
SVM LDA
Testing No. of Estimation Testing Estimation
error SVs time error time (s)
7.3. Feature selection

Level 1 0.75 45 0.047 0.50 0.011


7.3.1. Mahalanobis distance
Level 2 0.75 45 0.047 0.50 0.012
Level 3 0.75 45 0.047 0.50 0.012 The distances among the features are measured using MD. Here,
MD between the two classes is estimated where normal condition
and standard deviation are taken into consideration as reference.
All distances of different conditions are calculated against normal
Table 8 condition. After finding MD, the relevance level has to select for
Error estimation with one feature selection process.
whether the features satisfy the level. The average distances are
Data Selected SVM LDA shown in Table 3 where the relevance limit is applied. Table 4
feature sets shows the irrelevant features that have been discarded according
Testing No. of Estimation Testing Estimation
error SVs time (s) error time(s) to relevance level.
Level 1 2 0.26 15 0.017 0.23 0.011
Level 2 4 0.22 15 0.034 0.15 0.013
7.3.2. Relief algorithm
Level 3 3 0.23 45 0.021 0.15 0.011
In this feature selection process, four classes of data which are
decomposed using DWT that generates three different levels from
each class are used. The quality of features weight Wi is estimated
Table 9 from the data sets by RA. The major task of RA is to iteratively esti-
Error estimation with two feature selection processes. mate the feature weights according to their ability to discriminate
between neighboring patterns. A pattern x is randomly selected
Data Total SVM LDA
selected and then two nearest neighbors of x are found, one from the same
Testing No. Estimation Testing Estimation
feature sets class (near-hit) and the other ones from the different class (near-
error of time (s) error time (s)
(from levels
SVs miss). By calculating the qualitative features, the threshold s is
2 & 3) applied to extract desirable data sets. Then, the irrelevant feature
s = 0.008 7 0.20 45 0.050 0.12 0.11 (IF) and relevant feature (RF) are calculated in order to select
s = 0.012 3 0.17 15 0.021 0.09 0.11 feature sets which will be retained for next classification. Herein,
taking decision regarding feature sets, the RF should be above
70% of total features in a feature set and IF have to be below
which passed through the LPF is considered for feature extraction 30%. For example, the feature set f1, f2. . .f7 is shown in Table 5. Each
because the low frequency signals contain most important part feature set has ten feature values. For all of the data sets, except for
of original signal. However, other wavelet coefficients except a simple scaling of each feature value to be between 0 and 1 as re-
approximation may be useful to in machine condition diagnosis. quired in relief, no other preprocessing is performed. The scaling
A.MD. Younus, B.-S. Yang / Expert Systems with Applications 39 (2012) 2082–2091 2091

operation was justified in as to clamp each feature weight into the receive the machine’s features which are standard deviation, mean,
interval between 1 and 1. entropy, skewness, kurtosis, and mean absolute deviation. The
feature selection based on Mahalanobis distance and relief
7.4. Classification error algorithm attains the significant features to enhance the perfor-
mance of the classifiers which are support vector machine and lin-
The relevant parameter setup for the classifier is shown in Table ear discriminant analysis in the next procedure. The classification
6 where the optimal parameters of classifier are shown. With opti- results indicate that this system could be employed to assist in
mum parameters of classifier, the RBF kernel is used as the basis monitoring machine condition and diagnosing machine faults.
function of SVM which consists of two parameters C and c. As
optimal value of these arguments, C and c parameters are defined
References
with values 10 and 22, respectively. LDA projects features from
parametric space into feature space through a linear transform Brigham, E. O. (1988). The Fast Fourier Transform and its Applications. Englewood
matrix. In general, there are no specific parameters in LDA that Cliffs. Prentice-Hall International, Inc.
need to be optimized. Within-class and between-class scatter are Cristianini, N., & Taylor, J. S. (2000). An Introduction to Support Vector Machines and
Other Kernel-Based Learning Methods. Cambridge University Press: Cambridge.
used to formulate criteria for class separation. Within-class scatter Duda, R. O., Hart, P. E., & Stork, D. G. (2001). In Pattern Classification. New York: John
is the expected covariance of each of the classes. Wiley Sons.
The classification error using IDS in which feature selection Gonzalez, R. C., Woods, R. E. (1993). Digital image processing (3rd Ed.), MA,
Addision-Wesley.
process is not used are presented in Table 7. Obviously, the use Grossmann, A., & Morlet, J. (1984). Decomposition of Hardy functions into square
of all features in each level of decomposition yields same classifica- integrable wavelets of constant shape. SIAM Journal of Mathematical Analysis, 15,
tion error for both SVM and LDA. Features of levels 1, 2 and 3 have 723–736.
Kira, K., Rendell, L. (1994). A practical approach to feature selection. In Proceeding of
found same testing error and number of support vectors (SVs) as
9th international workshop on machine learning (pp. 171–182).
well as estimation time in SVM classifier. Better result is found Lee, S. K., & White, P. R. (1997). Higher-order time-frequency analysis and its
with LDA rather than SVM. application to fault detection in rotating machinery. Mechanical Systems and
Table 8 is the results of IDS in which a MD based features selec- Signal Processing, 11(4), 637–650.
Lei, Y., He, Z., & Zi, Y. (2009). Application of an intelligent classification method to
tion module is included. It indicates that only two sets of features mechanical fault diagnosis. Expert Systems with Applications, 36, 9941–9948.
in level 1 after applying the MD feature selection process with a Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet
threshold value. There are 4 and 3 sets of features obtained from representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 11(7),
674–693.
levels 2 and 3, respectively. Applying LDA and SVM classifiers for Niu, G., Han, T., Yang, B. S., & Tan, A. C. C. (2007). Multi-agent decision fusion for
this situation, the lowest classification error is obtained from motor fault diagnosis. Mechanical Systems and Signal Processing, 21(3),
feature of levels 2 and 3 by using LDA whilst the best classification 1285–1299.
Park, H., Kwon, H. C. (2007). Extended relief algorithm in instance-based Feature
result is found by using SVM at level 2. Evidently, the classification filtering. In 6th International Conference on Advanced Language Processing and
performance of individual classifier of IDS including feature selec- Web Information technology (pp. 123–128).
tion process has been improved. Patel, J. N., Khokhar, A. A., & Jamieson, L. H. (1996). Scalability of 2-D wavelet
transform algorithms: analytical and experimental results on coarse-grained
In case of both MD and RA are applied to the 18-dimensional parallel computers. In Proceedings of the 1996 VLSI signal processing workshop
feature sets, all feature sets of level of 1 are discarded in the first (pp. 376–385).
step of using MD based feature selection. In the second step, the Saxena, A., & Saad, A. (2007). Evolving an artificial neural network classifier for
condition monitoring of rotating mechanical systems. Applied Soft Computing,
resulting 7-dimentional feature set is found through RA while
7(1), 441–454.
the relevance threshold is s is 0.008. Hence, the classification test Siedlecki, W., & Sklansky, J. (1989). A note on genetic algorithm for large scale
error reduces in comparison with one feature selection process feature selection. Pattern Recognition Letter, 10, 246–259.
applied. If the threshold s is 0.012, 3-dimentional feature set is Thulasiraman, P., Khokhar, A. A., Heber, G., & Gao, G. R. (2004). A fine-grain load-
adaptive algorithm of the 2D discrete wavelet transform for multithreaded
inputted to individual classifier and the best classification accuracy architectures. Journal of Parallel and Distributed Computing, 64, 68–78.
is obtained by LDA. The result is presented in Table 9. Evidently, Toutountzakis, T., Tan, C. K., & Mba, D. (2005). Application of acoustic emission to
the more feature section process is used, the more classification seeded gear fault detection. NDT & E International, 38(1), 27–36.
Vapnik, V., Chapelle, O. (1999). Bounds on error expectation for SVM, advances in large
accuracy is attained. margin classifiers. MIT Press.
Widodo, A., & Yang, B. S. (2007). Application of nonlinear feature extraction and
support vector machines for fault diagnosis of induction motors. Expert System
8. Conclusions
with Applications, 33, 241–250.
Yang, B. S., Lim, D. S., & Tan, A. C. C. (2005). VIBEX: an expert system for vibration
In this study, a new intelligent diagnosis system is proposed to fault diagnosis of rotating machinery using decision tree and decision table.
Expert Systems with Application., 28(4), 735–742.
classify the four machine conditions which are normal, misalign-
Younus, A.M., & Yang, B. S. (2010) Wavelet coefficient of thermal image analysis for
ment, mass unbalance and bearing fault by using infrared thermal machine fault diagnosis. In Proceeding of IEEE prognostics & system health
image. This system consists of three subsequent procedure: two management conference, mu3033.
dimensional discrete wavelet transform (2D-DWT), feature calcu- Younus, A. M., Widodo, A., & Yang, B. S. (2010). Evaluation of thermography image
data for machine fault diagnosis. Nondestructive Testing and Evaluation, 25(3),
lation, feature selection, and classification. 2D-DWT is firstly 231–247.
implemented to determine the wavelet coefficients. Consequently,
these coefficients are utilized for feature calculation procedure to

You might also like