You are on page 1of 14

Expert Systems with Applications

Expert Systems with Applications 32 (2007) 299312

Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors
Achmad Widodo, Bo-Suk Yang *, Tian Han
School of Mechanical Engineering, Pukyong National University, San 100, Yongdang-dong, Nam-gu, Busan 608-739, South Korea

Abstract This paper studies the application of independent component analysis (ICA) and support vector machines (SVMs) to detect and diagnose of induction motor faults. The ICA is used for feature extraction and data reduction from original features. The principal components analysis is also applied in feature extraction process for comparison with ICA does. In this paper, the training of the SVMs is carried out using the sequential minimal optimization algorithm and the strategy of multi-class SVMs-based classication is applied to perform the faults identication. Also, the performance of classication process due to the choice of kernel function is presented to show the excellent of characteristic of kernel function. Various scenarios are examined using data sets of vibration and stator current signals from experiments, and the results are compared to get the best performance of classication process. 2005 Elsevier Ltd. All rights reserved.
Keywords: Fault diagnosis; Independent component analysis; Principal component analysis; Support vector machines; Feature extraction; Induction motor; Vibration signal; Current signal

1. Introduction Induction motors play an important role as prime movers in manufacturing and process industry and transportation due to their reliability and simplicity in construction. Although induction motors are reliable, the possible of unexpected faults is unavoidable (Yang, Jeong, Oh, & Tan, 2004). The issue of robustness and reliability is very important to guarantee the good operational condition. Therefore, condition monitoring of induction motors has received considerable attention in recent years. Early fault diagnosis and condition monitoring can reduce the consequential damage, breakdown maintenance and reduce the spare parts of inventories. Moreover it can increase the prolong machine life, performance, and availability of machine. Many researchers have proposed the techniques and system for doing the diagnosis process. Various techniques

Corresponding author. Tel.: +82 51 620 1604; fax: +82 51 620 1405. E-mail address: (B.-S. Yang).

have done by using motor current signature analysis (Thomson & Fenger, 2001), electromagnetic torque measurement (Thollon, Jammal, & Grellet, 1993), acoustic analysis (Lee, Nelson, Scarton, Teng, & Azizi-Ghannad, 1994), and partial discharge (Stone, Sedding, & Costello, 1996). However, the most popular in techniques is using vibration analysis and stator current analysis because of their easy measurability, high accuracy and reliability. The application of intelligent system for condition monitoring and fault diagnosis is widely used in many areas. A support vector machines (SVMs), as well as neural networks, have been extensively employed to solve classication problems. SVMs were successfully reported in classifying human cytochrome P450 enzyme (Kriegl, Arnhold, Beck, & Fox, 2005), nancial analysis (Min & Lee, 2005), chemical process (Guo, Xie, Wang, & Zhang, 2003), biomedical (Chan & Lee, 2002), image processing and face recognition (Antonini, Popovici, & Thiran, 2005) and so on. In machine condition monitoring and fault diagnosis, some researchers have used SVMs as a tool for classication of faults. For example, ball bearing faults (Jack & Nandi, 2002), gear faults (Samanta, 2004),

0957-4174/$ - see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2005.11.031


A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312

condition classication of small reciprocating compressor (Yang, Hwang, Kim, & Tan, 2005), cavitation detection of buttery valve (Yang, Hwang, Ko, & Lee, 2005) and motor induction (Yang, Han, & Hwang, 2005). For doing good classication process in SVMs, the preparation of data inputs for classier needs special treatment to guarantee the good performance in classier. Many methods have been developed to create the best preparation for data inputs. Recently, the use of feature extraction and feature selection for data preparation before inputting into classier has received considerable attention (Cao, Chua, Chong, Lee, & Gu, 2003). One of the reasons is after getting large data from experiment and many features; of course it cannot be directly inputted into classier because it will decrease the performance of classier. Therefore, we need feature extraction and feature selection to avoid the redundancy. For example, in doing experimental of induction motors, we picked up data from threephase stator current signal and three-direction vibration signal as original inputs of classier. Furthermore, several of feature parameters are then calculated from time and frequency domain. However, too many features can cause of curse of dimensionality phenomenon since irrelevant and redundant features degrade the performance of classier. Problem with high-dimensional data, known as the curse of dimensionality in pattern recognition imply that the number of training samples must grow exponentially with the number of features in order to learn an accurate model. Therefore, reduction the number of features by extracting or selecting only the relevant and useful ones is desirable. There are two ways to reduce the dimensionality: feature extraction and feature selection. Feature extraction means transforming the existing features into a lower dimensional space, and feature selection means selecting a subset of the existing features without any transformation (Han, Son, & Yang, 2005). Most of feature extraction techniques have based on linear technique such as principal component analysis (PCA), linear discriminant analysis (LDA), and independent component analysis (ICA). After doing feature extraction sometimes there are still high noise, irrelevant or redundant information in these extracted features. The feature selection approach can solve the problem of irrelevant information in feature space. The benets of feature selection include a reduction in the amount of data needed to achieve learning, improving classication accuracy, more compact and easily understanding knowledge-base and reducing execution time (Kumar, Jayaraman, & Kulkarni, 2005). These are some feature selection methods such as conditional entropy (Lehrman, Rechester, & White, 1997), genetic algorithm (GA) (Jack & Nandi, 2002), distance evaluation technique (Yang, Han, & An, 2004; Yang et al., 2005). The examples of using PCA and ICA are presented as follows. PCA with statistical process control has employed to enhance the discrimination features from the undamaged and damaged structures (Sohn, Czarnecki, & Farrar,

2000) implemented visualization and dimension reduction for damage detection (Worden & Manson, 1999). Beside of this, a number of applications of ICA have been reported in image processing (Antonini et al., 2005), biomedical signal processing (Vigario, 1997), nancial (Back & Weigend, 1998) and medical area (Biswall & Ulmer, 1999). The use of ICA in machine condition monitoring and faults detection application have reported in the eld of structural damage detection (Zang, Friswell, & Imregun, 2004) and submersible pump (Ypma & Pajunen, 1999). However, there are still relatively few real engineering applications of ICA in machine condition monitoring and fault diagnosis. In this paper the integration of ICA and SVMs are proposed to perform the condition monitoring and fault diagnosis of induction motor. Then, ICA is selected for doing feature extraction because of its reliability to extract the relevant and useful features. Due to the simple and reliability of the distance evaluation technique, this paper adopts it as feature selection method. SVMs based on multi-class classication is needed because in real-world problem, e.g., induction motor there are many fault classes, i.e., bearing fault, bowed rotor, mechanical unbalance, misalignment, broken rotor bar, stator faults, etc. 2. Independent component analysis (ICA) ICA is a technique that transform multivariate random signal into a signal having components that are mutually independent in complete statistical sense. Recently this technique has been demonstrated to be able to extract independent components from the mixed signals. Here independence means the information carried by one component cannot inferred from the others. Statistically this means that joint probability of independent quantities is obtained as the product of the probability of each of them. A generic ICA model can be written as x As; 1 where A is an unknown full-rank matrix, called the mixing matrix, and s is the independent component (IC) data matrix, and x is the measured variable data matrix. The basic problem of ICA is to estimate the independent component matrix s or to estimate the mixing matrix A from the measured data matrix x without any knowledge of s or A. The ICA algorithm normally nds the independent components of a data set by minimizing or maximizing some measure of independence. Cardoso (1998) gave a review of the solution to the ICA problem using various information theoretic criteria, such as mutual information, negentropy, and maximum entropy, as well as maximum likelihood approach. The xed-point algorithm used due to its suitability for handling raw time domain data and good convergence properties. This algorithm will now be described briey. The rst step is to pre-whiten the measured data vector x ~ whose by a linear transformation, to produce a vector x

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312


elements are mutually uncorrelated and all have unit variance. Singular value decomposition (SVD) of the covariance matrix C = E[xxT] yields C WRWT ; 2

J y H y gauss H y ;

where R diagr1 ; r2 ; . . . ; rn is a diagonal matrix of singular values and W is the associated singular vector matrix. ~ can be expressed as Then, the vector x ~ R1=2 WT x QAs Bs; x 3

where ygauss is a Gaussian random variable with the same variance as y. Negentropy is non-negative and measures the departure of y from Gaussianity. However, estimating negentropy using Eq. (9) would require an estimate of the probability density function. To estimate negentropy eciently, simpler approximations of negentropy suggested as follows: J y  EfGy g Eftg ;


where B is an orthogonal matrix as veried by the following relation: ~x ~T BEs sT BT BBT I: Ex 4

An advantage of using an SVD-based technique is the possibility of noise reduction by discarding singular values smaller than a given threshold. We have therefore reduced the problem of nding an arbitrary full-rank matrix A to the simpler problem of nding an orthogonal matrix B since B has fewer parameters to estimate as a result of the orthogonality constraint. The second step is to employ the xed-point algorithm. Dene a separating matrix W that transform the measured data vector x to a vector y, such that all elements yi are both mutually correlated and have unit variance. The xed-point algorithm then determines W by maximizing the absolute value of kurtosis of y. The vector y has the properties required for the independent components, thus ~ s y Wx. From Eq. (3), we can estimate s as follows: ~ ~ BT Qx. s BT x 6 5

where y is assumed to be of zero mean and unit variance, v is a Gaussian variable of zero mean and unit variance, and G is any non-quadratic function. By choosing G wisely, one obtains good approximations of negentropy. A number of functions for G are G1 t 1 log cosha1 t; a1 G2 t expa2 t2 =2;

11 12 13

G3 t t ;

where 1 6 a1 6 2 and a2  1. Among these three functions, G1 is a good general-purpose contrast function and was therefore selected for use in the present study. Based on approximate form for the negentropy, Hyva rinen (1998, 1999), introduced a very simple and highly ecient xed-point algorithm for ICA (available in The FastICA MATLAB Package), calculated over sphered ~. This algorithm calculates one column zero-mean vector x of the matrix B and allows the identication of one independent component; the corresponding independent component can then be found using Eq. (6). The algorithm is repeated to calculate each independent component. 3. Principal component analysis (PCA) PCA is a statistical technique that linearly transforms an original set of variables into a substantially smaller set of uncorrelated variables that represents most of the information in the original set of variables (Jollie, 1986). It can be viewed as a classical method of multivariate statistical analysis for achieving a dimensionality reduction. Because of the fact that a small set of uncorrelated variables is much easier to understand and use in further analysis than a larger set of correlated variables, this data compression technique has been widely applied to virtually every substantive area including cluster analysis, visualization of high-dimensionality data, regression, data compression and pattern recognition. Given Pl a set of centered input vectors xt t 1; . . . ; l and t1 xt 0, each of which is of m dimension xt = (xt(1), xt(2), . . . , xt(m))T usually m < l, PCA linearly transforms each vector xt into a new one st by st U T x t ; 14 where U is the m m orthogonal matrix whose ith column, ui is the eigenvector of the sample covariance matrix

From Eqs. (5) and (6) the relation of W and B can be expressed as W BT Q. 7

To calculate B, each column vector bi is initialized and then updated so that ith independent component T ~ may have great non-Gaussianity. Hyva s i bi x rinen and Oja (2000) showed that non-Gaussian represents independence using the central limit theorem. There are two common measures of non-Gaussianity: kurtosis and negentropy. Kurtosis is sensitive to outliers. On the other hand, negentropy is based on the information theoretic quantity of (dierential) entropy. Entropy is a measure of the average uncertainty in a random variable and the dierential entropy H of random variable y with density f(y) is dened as Z H y f y log f y dy . 8 A Gaussian variable has maximum entropy among all random variables with equal variance (Hyva rinen & Oja, 2000). In order to obtain a measure of non-Gaussianity that is zero for a Gaussian variable, the negentropy J is dened as follows:

l 1X xt xT t : l t 1

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312

M X j 1


f x wT x b

wj xj b 0;


In other words, PCA rstly solves the eigenvalue problem ki ui C ui ; i 1; . . . ;m; 16 where ki is one of the eigenvalues of C, ui is the corresponding eigenvector. Based on the estimated ui, the components of st are then calculated as the orthogonal transformations of xt st i uT i xt ; i 1; . . . ; m. 17

The new components are called principal components. By using only the rst several eigenvectors sorted in descending order of the eigenvalues, the number of principal components in st can be reduced. So PCA has the dimensional reduction characteristic. The principal components of PCA have the following properties: st(i) are uncorrelated, has sequentially maximum variances and the mean squared approximation error in the representation of the original inputs by the rst several principal components is minimal. 4. Support vector machines (SVMs) SVMs are a relatively new computational learning method based on the statistical learning theory presented by Vapnik (1999). In SVMs, original input space mapped into a high-dimensional dot product space called a feature space, and in the feature space the optimal hyperplane is determined to maximize the generalization ability of the classier. The maximal hyperplane is found by exploiting the optimization theory, and respecting insights provided by the statistical learning theory. SVMs have the potential to handle very large feature spaces, because the training of SVMs is carried out so that the dimension of classied vectors does not have as distinct an inuence on the performance of SVM as it has on the performance of conventional classier. That is why it is noticed to be especially ecient in large classication problem. This will also benet in faults classication, because the number of features to be the basis of fault diagnosis may not have to be limited. Also, SVMs-based classiers are claimed to have good generalization properties compared to conventional classiers, because in training SVMs classier the so-called structural misclassication risk is to be minimized, whereas traditional classiers are usually trained so that the empirical risk is minimized. The performance of SVMs in various classication task is reviewed, e.g., in Cristianini and Shawe-Taylor (2000). Given data input xi (i = 1, 2, . . . , M), M is the number of samples. The samples are assumed have two classes namely positive class and negative class. Each of classes associate with labels be yi = 1 for positive class and yi = 1 for negative class, respectively. In the case of linearly data, it is possible to determine the hyperplane f(x) = 0 that separates the given data

where w is M-dimensional vector and b is a scalar. The vector w and scalar b are used to dene the position of separating hyperplane. The decision function is made using sign f(x) to create separating hyperplane that classify input data in either positive class and negative class. A distinctly separating hyperplane should be satisfy the constraints  if y i 1; f xi 1; 19 f xi 1; if y i 1 or it can be presented in complete equation y i f xi y i wT xi b P 1 for i 1; 2; . . . ; M . 20

The separating hyperplane that creates the maximum distance between the plane and the nearest data, i.e., the maximum margin, is called the optimal separating hyperplane. An example of the optimal hyperplane of two data sets is presented in Fig. 1. In Fig. 1, a series data points for two dierent classes of data are shown, black squares for negative class and white circles for positive class. The SVMs try to place a linear boundary between the two dierent classes, and orientate it in such way that the margin represented by the dotted line is maximized. Furthermore, SVMs attempts to orientate the boundary to ensure that the distance between the boundary and the nearest data point in each class is maximal. Then, the boundary is placed in the middle of this margin between two points. The nearest data points that used to dene the margin are called support vectors, represented by the grey circles and squares. When the support vectors have been selected the rest of the feature set is not required, as the support vectors can contain all the

Fig. 1. Classication of two classes using SVM.

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312


information based need to dene the classier. From the geometry the geometrical margin is found to be kwk2. Taking into account the noise with slack variables ni and the error penalty C, the optimal hyperplane separating the data can be obtained as a solution to the following optimization problem: minimize subject to
M X 1 2 kwk C ni 2 i 1  y i wT xi b P 1 ni ;

Table 1 Formulation of kernel functions Kernel Linear Polynomial Gaussian RBF K(x, xj) xT xj (cxT xj + r)d, c > 0 exp(kx xjk2/2c2)

21 i 1; . . . ; M ; f x sign 22

M X i;j1

! ai y i U xi Uxj b .


ni P 0 ;

i 1; . . . ; M ;

where ni is measuring the distance between the margin and the examples xi that lying on the wrong side of the margin. The calculation can be simplied by converting the problem with KuhnTucker condition into the equivalent Lagrangian dual problem, which will be
M M X X 1 2 minimize Lw; b; a kwk ai y i w xi b ai . 2 i1 i1

23 The task is minimizing Eq. (23) with respect to w and b, while requiring the derivatives of L to a to vanish. At optimal point, we have the following saddle point equations: oL 0; ow
M X i1

Working in the high-dimensional feature space enables the expression of complex functions, but it also generate the problem. Computational problem occur due to the large vectors and the overtting also exists due to the high-dimensionality. The latter problem can be solved by using the kernel function. Kernel is a function that returns a dot product of the feature space mappings of the original data points, stated as K(xi, xj) = (UT(xi) Uj(xj)). When applying a kernel function, the learning in the feature space does not require explicit evaluation of U and the decision function will be ! M X f x sign ai y i K xi ; xj b . 30

oL 0; ob
M X i1


which replace into form w ai y i xi ; ai y i 0. 25

From Eq. (25), we nd that w is contained in the subspace spanned by the xi. Using substitution Eq. (25) into Eq. (23), we get the dual quadratic optimization problem maximize subject to La
M X i1


M 1X ai aj y i y j xi xj 2 i;j0


Any function that satises Mercers theorem (Vapnik, 1999) can be used as a kernel function to compute a dot product in feature space. There are dierent kernel functions used in SVMs, such as linear, polynomial and Gaussian RBF. The selection of the appropriate kernel function is very important, since the kernel denes the feature space in which the training set examples will be classied. The denition of legitimate kernel function is given by Mercers theorem. The function must be continuous and positive denite. In this work, linear, polynomial and Gaussian RBF functions were evaluated and formulated in Table 1. 4.1. Multi-class classication

ai P 0; i 1; . . . ; M . M X ai y i 0.


Thus, by solving the dual optimization problem, one obtains the coecients ai which is required to express the w to solve Eq. (21). This leads to non-linear decision function. ! M X f x sign ai y i xi xj b . 28

SVMs can also be used in non-linear classication tasks with application of kernel functions. The data to be classied is mapped onto a high-dimensional feature space, where the linear classication is possible. Using the nonlinear vector function U(x) = (/1(x), . . . ,/l(x)) to map the n-dimensional input vector x onto l-dimensional feature space, the linear decision function in dual form is given by

The above discussion deals with binary classication where the class labels can take only two values: 1 and 1. In the real-world problem, however, we nd more than two classes for examples: in fault diagnosis of rotating machineries there are several fault classes such as mechanical unbalance, misalignment and bearing faults. The earliest used implementation for SVM multi-class classication is one-against-all methods. It constructs k SVM models where k is the number of classes. The ith SVM is trained with all of examples in the ith class with positive labels, and all the other examples with negative labels. Thus given l training data (x1, y1), . . . , (xl, yl), where xi 2 Rn, i = 1, . . . , l and yi 2 {1, . . . , k} is the class of xi, the ith SVM solve the following problem: minimize
l X 1 i 2 T kw k C nij wi 2 i1



A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312

subject to wi /xj bi P 1 nij ; wi /xj bi 6 1 nij ; niJ P 0; j 1 ; . . . ; l;

if y i; if y 6 i;

32 33 34

where the training data xi is mapped to a higher dimensional space by function / and C is the penalty parameter. Minimizing Eq. (31) means we would like to maximize 2/kwik, the margin between two groups of data. When data P l is not separable, there is a penalty term C i1 ni;i which can reduce the number of training errors. Another major method is called one-against-one method. This method constructs k(k 1)/2 classiers where each one is trained on data from two classes. For training data from the ith and the jth classes, we solve the following binary classication problem. X ij 1 ij 2 T kw k C minimize nt wij 35 2 t subject to wij /xt bij P 1 nij t ; w /xt b 6 1 nij t P 0; j 1 ; . . . ; l.
ij T ij T

In order to solve the two Lagrange multipliers a1, a2, SMO rst computes the constraints on these multipliers and then solves for the constrained minimum. For convenience, all quantities that refer to the rst multiplier will have a subscript 1, while all quantities that refer to the second multiplier will have a subscript 2. The new values of these multipliers must lie on a line in (a1, a2) space, and in the box dened by 0 6 a1, a2 6 C.
old a1 y 1 a2 y 2 aold 1 y 1 a2 y 2 constant.


Without loss of generality, the algorithm rst computes the second Lagrange multipliers anew and successively uses 2 it to obtain anew . The box constraint 0 6 a1, a2 6 C, 1 together with the linear equality constraint Raiyi = 0, provides a more restrictive constraint on the feasible values for anew 2 . The boundary of feasible region for a2 can be applied as follows: If y 1 6 y 2 ; If y 1 y 2 ;
old L max0; aold 2 a1 ; old H minC ; C aold 2 a1 ;

if y t i; if y t j;

36 37 38

nij t ;

40 41

max0; aold 1

There are dierent methods for doing the future testing after all k(k 1)/2 classiers are constructed. After some tests, the decision is made using the following strategy: if sign ((wij)T/(x) + bij) says x is in the ith class, then the vote for the ith class is added by one. Otherwise, the jth is increased by one. Then x is predicted in the class using the largest vote. The voting approach described above is also called as Max Win strategy. 4.2. Sequential minimal optimization (SMO) Vapnik (1982) describes a method which used the projected conjugate gradient algorithm to solve the SVM-QP problem, which has been known as chunking. The chunking algorithm uses the fact that the value of the quadratic form is the same if you remove the rows and columns of the matrix that corresponds to zero Lagrange multipliers. Therefore, chunking seriously reduces the size of the matrix from the number of training examples squared to approximately the number of non-zero Lagrange multipliers squared. However, chunking still cannot handle large-scale training problems, since even this reduced matrix cannot t into memory. Osuna, Freund, and Girosi (1997) proved a theorem which suggests a whole new set of QP algorithms for SVMs. The theorem proves that the large QP problem can be broken down into a series of smaller QP sub-problems. Sequential minimal optimization (SMO) proposed by Platt (1999) is a simple algorithm that can be used to solve the SVM-QP problem without any additional matrix storage and without using the numerical QP optimization steps. This method decomposes the overall QP problem into QP sub-problems using the Osunas theorem to ensure convergence. In this paper the SMO is used as a solver and detail descriptions can be found in Platt (1999).

H minC ; C

aold 2 C ; old a1 aold 2 .

The second derivative of the objective function along the diagonal line can be expressed as g K x1 ; x1 K x2 ; x2 2K x1 ; x2 . 42 Under normal circumstances, the objective function will be positive denite, there will be a minimum along the direction of the linear equality constraint, and g will be greater than zero. In this case, SMO computes the minimum along the direction of the constraint: anew aold 2 2
old y 2 Eold 1 E2 ; g


where Ei is the prediction error on the ith training example. As a next step, the constrained minimum is found by clipping the unconstrained minimum to the ends of the line segment: 8 P H; if anew > 2 <H new;clipped new if L < anew < H; a2 44 a2 2 > : L if anew 6 L . 2 Now, let s = y1y2. The value of anew is computed from the 1 new anew : 2
old new anew aold 1 1 s a2 a2 .


Solving Eq. (26) for the Lagrange multipliers does not determine the threshold b of the SVM, so b must be computed separately. The following threshold b1, b2 are valid when the new a1, a2 are not at the each bounds, because it forces the output of the SVM to be y1, y2 when the input is x1, x2, respectively,

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312


b1 E1 y 1 anew aold 1 1 K x1 ; x1
;clipped old y 2 anew aold 2 2 K x1 ; x2 b ;

5.1. Experiment and data acquisition 46 The experiment is conducted using test rig that consists of motor, pulley, belt, shaft, and fan with changeable blade angle that represents the load, as shown in Fig. 3. Six induction motors of 0.5 kW, 60 Hz, 4-pole were used to create the data. One of the motors is normal condition (healthy), which is considered as a benchmark for comparing with faulty condition. The condition of faulty motors is described in Fig. 4 and Table 2. Three AC current probes and three accelerometers were used to measure the stator current of three-phase power supply and vibration signals of horizontal, vertical and axial directions for evaluating the fault diagnosis system. The maximum frequency of the used signals was 5 kHz and the number of sampled data was 16,384. 5.2. Feature calculation The total 78 features (13 parameters, six signals) are calculated from 10 feature parameters of time domain. These parameters are mean, rms, shape factor, skewness, kurtosis, crest factor, entropy error, entropy estimation, histogram lower and upper. And three parameters from frequency domain (rms frequency, frequency center and root variance frequency) using vibration acceleration signal

b2 E 2

y 1 anew 1

aold 1 K x1 ; x2 47

;clipped old y 2 anew aold 2 2 K x2 ; x2 b .

When both b1 and b2 are valid, they are equal. When both new Lagrange multipliers are at bound and if L is not equal to H, then the interval between b1 and b2 are all thresholds that are consistent with the Karush KuhnTucker conditions which are necessary and sucient conditions for an optimal point of a positive denite QP problem. In this case, SMO chooses the threshold to be halfway between b1 and b2 (Platt, 1999). 5. Proposed system for faults diagnosis In our work, vibration and current signature for detection and diagnose of faults in induction motor may be consider as a kind of pattern recognition paradigm. It consists of data acquisition, signal processing, feature extraction and selection-including feature reduction- and faults diagnosis. A novel faults diagnosis method for induction motor is proposed in Fig. 2, which is based on ICA, the distance evaluation technique and SVMs multi-class classication. The procedure of the proposed system can be summarized as follows: Step 1: the data acquisition is carried out and then followed by features calculation using statistical features parameter from time domain and frequency domain. Step 2: we extract the features using ICA algorithm to reduce the dimensionality. This step is performed to remove the irrelevant features which are redundant and even degrade the performance of the classier. Step 3: the feature selection is performed using the distance of evaluation technique. This method is chosen due to the simplicity and its reliability. Step 4: the classication process for diagnosing of faults is carried out using SVMs based on multi-class classication.

Fig. 3. Test rig and experiment.

Fig. 2. The proposed faults diagnosis system.

Fig. 4. Faults on the induction motors.

306 Table 2 The description of faulty motors Fault condition Broken rotor bar Bowed rotor Faulty bearing Rotor unbalance Eccentricity Phase unbalance

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312

Fault description No. of broken bar: 12 ea Maximum bowed shaft deection: 0.075 mm A spalling on outer raceway Unbalance mass (8.4 g) on the rotor Parallel and angular misalignments Add resistance on one phase

Others Total number of 34 bars Air-gap: 0.25 mm #6203 Adjusting the bearing pedestal 8.4%

Table 3 Feature parameters Signals Position Feature parameters Time domain Vibration Vertical Horizontal Axial Phase A Phase B Phase C Mean RMS Shape factor Skewness Kurtosis Crest factor Entropy error Entropy estimation Histogram lower Histogram upper Frequency domain Root mean square frequency Frequency center Root variance frequency

Angular misalignment Bowed rotor Broken rotor bar Bearing fault Rotor unbalance Normal condition Parallel alignment Phase alignment


at the three directions and three-phase current signals. The total of feature parameters can be shown in Table 3. 5.3. Feature extraction Basically feature extraction is mapping process of data from higher dimension into low dimension space. This step is intended to avoid the curse of dimensionality phenomenon. ICA and PCA were used to reduce the feature dimensionality that contains 95% variation of eigenvalues. In this work, feature extraction produced 24 independents components (ICs) and principal component (PCs) based on the eigenvalues. Also, from feature extraction using ICA and PCA, we can understand that there is a change from data features becomes components which are independent and uncorrelated, respectively. The rst three independent and principal components are plotted in Figs. 5 and 6. It can be observed that the clusters for eight conditions are well separated. Nevertheless, the performance of ICA is better than PCA does in clustering of each condition. It can be seen that feature extraction using ICA can separate well almost all of conditions without overlapping except normal and phase unbalance, while PCA produced overlapping in phase unbalance, rotor unbalance and rotor broken bar, also angular misalignment and parallel alignment. 5.4. Feature selection To select the optimal feature ICs and PCs that can well represent the condition of induction motors, a feature selection method based on the distance evaluation technique is presented (Yang et al., 2004; Yang et al., 2005).

Fig. 5. Feature extraction using ICA.

Angular misalignment Bowed rotor Broken rotor bar Bearing fault Rotor unbalance Normal condition Parallel alignment Phase unbalance

Fig. 6. Feature extraction using PCA.

Let that joint feature set of C condition-patterns a1, a2, . . . , ac are fqi;k ; i 1; . . . ; C ; k 1; . . . ; N i g;
( i , k)


where q is the kth feature of ai, and Ni is the number of feature in ai. The average distance of all features in ai. can be determined as follows:

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312

Ni Ni 1 1 X 1 X jqi;j qi;k j. 2 N i j1 N i 1 k1




The average distance of Di, i = 1, 2, . . . , C is Da

C 1 X Di . C i 1


Introducing Eq. (49) into Eq. (50) yields Da

Ni C 1 X 1 X jqi;k qi j; 51 C 1 N i 1 k 1 PN i i;k 1 is the mean of all features in ai. where qi N k 1 q i The average distance of C dierent condition-patterns a1, a2, . . . , ac is

C 1 X jqi qj; 52 C i 1 P C 1 P N i i ; k 1 . where q C i1 N i k 1 q When the average distance Da inside certain conditionpattern is smaller and the average distance Db between different condition-patterns is bigger, the average represents the optimal features well. The evaluation criteria for optimal features is dened as Da dA . 53 Db


Fig. 8. Distance evaluation criteria of PCs.

Table 4 Selected ICs and PCs after feature selection Independent components (ICs) 5, 10, 13, 14, 15, 18, 19 Principal components (PCs) 1, 2, 3, 4, 6, 13, 16

So, according to the bigger distance evaluation criteria of dA, the optimal features can be selected from original feature sets. The results of feature selection using distance evaluation technique can be seen in Figs. 7 and 8. From this gures we can see that there are 24 ICs and PCs are resulted from feature extraction process. Usually 512 parameters are sucient to perform the calculation and provide sucient accuracy (Yang, Lim, & An, 2000). Applying the distance evaluation technique remains 7 ICs and PCs which have largest distance evaluation criteria. The best ICs and PCs from feature selection are presented in Table 4.

5.5. Training and classication In this study, the RBF kernel and polynomial are used as the basic kernel function of SVMs. There are two parameters associated with these kernels: C and c. In addition, polynomial kernel also has parameter d related to degree of polynomial. The upper bound C for penalty term and kernel parameter c play a crucial role in performance of SVMs. Therefore, improper selection of parameters C, c, and d can cause overtting or undertting problem. Nevertheless, there is simple guideline to choose the proper kernel parameters using cross-validation that suggested by Hsu, Chang, and Lin (2003). The goal of this guideline is to identify optimal choice of C and c so that the classier can accurately classify the data input. In m-fold cross-validation, we rst divide the training set into subsets of equal size. Sequentially on subset is tested using the classier trained on the remaining (m 1) subsets. Thus, each instance of the whole of training set is predicted once so the cross-validation accuracy is the percentage of data that are correctly classied. The cross-validation procedure can prevent the overtting problem. In this paper, we use 10-fold cross-validation to search the proper kernel parameter d, C, and c. Basically, all the pairs of (C, c) for RBF kernel and (d, C, c) for polynomial kernel are tried and the one with the best cross-validation accuracy is selected. In this work, we performed the 10-fold cross-validation to choose the proper parameters of C = {20, 21, . . . , 27} and c = {23, 22, . . . , 23}.

Fig. 7. Distance evaluation criteria of ICs.


A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312

The SVMs-based multi-class classication is applied to perform the classication process using one-against-one and one-against-all methods. The tutorial of these methods has clearly explained in Hsu and Lin (2002). The scenarios of training and classication process as follows: rst, SVMs-based multi-class classication is trained on data input from original features without feature extraction and feature classication. Second, we change the data input for SVMs training using data input after feature extraction by PCA and ICA. Furthermore, the variation of kernel function is performed to show the excellent of characteristic of kernel function and its performance in faults classication. In this work, we employed polynomial and Gaussian RBF kernel functions. Third, we retry the all of training and classication process by introducing kernel parameter selection. Finally, the results of the training and faults classication are compared to show the best results of the system. 6. Results and discussion The result of this study can be shown in Tables 46. In these tables, we listed the kernel function, strategy of multiclass classication, classication rate for training and test-

ing, number of support vector and training time. The classication rate (%) is determined by using ratio of correct classication and on the whole of training or testing, respectively. 6.1. Eect of feature extraction and selection In Table 5, classication process is performed on the original feature set without feature extraction and selection. The classication rates of this process among 75.0% until 97.5%. The bad performance of this classication is due to the existence of irrelevant and useless features. Many irrelevant features make burden and tend to decrease the performance of classier. Then, as shown in Tables 6 and 7, the classication rate with PCA and ICA feature extraction ranged from 97.5% to 100%. It is better than the previous classication without feature extraction and selection. By using ICA and PCA feature extraction, the useful feature is extracted from original feature sets. Furthermore, the number of support vectors (SVs) decreased due to feature extraction. In this case, classication process using ICA feature extraction needs fewer numbers of SVs than PCA feature extraction and original feature. This phenomenon can be explained that ICA

Table 5 Fault classication using original feature and SVM due to kernel and multi-class classication Kernel Multi-class strategy Classication rate (%) Training Polynomial (d = 1) Polynomial (d = 2) Polynomial (d = 3) Polynomial (d = 4) Gaussian RBF (c = 2.19) One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all 89.2 77.5 91.7 81.7 93.3 80.8 94.2 80.0 92.5 77.0 Testing 90.0 75.0 90.0 80.0 97.5 85.0 97.5 98.5 90.0 72.5 93 103 94 95 93 94 94 94 99 110 0.48 0.86 0.52 0.56 0.56 1.00 0.48 0.98 0.32 0.47 Number of SVs Training time (s)

Table 6 Fault classication using PCA and SVM due to kernel and multi-class classication Kernel Multi-class strategy Classication rate (%) Training Polynomial (d = 1) Polynomial (d = 2) Polynomial (d = 3) Polynomial (d = 4) Gaussian RBF (c = 2.19) One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all 100 99.17 100 100 100 100 100 100 100 100 Testing 100 97.5 100 100 100 97.5 100 97.5 100 100 79 68 77 84 72 93 73 96 84 80 0.52 2.39 0.55 2.17 0.48 1.69 0.53 2.37 0.41 0.90 Number of SVs Training time (s)

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312 Table 7 Fault classication using ICA and SVM due to kernel and multi-class classication Kernel Multi-class strategy Classication rate (%) Training Polynomial (d = 1) Polynomial (d = 2) Polynomial (d = 3) Polynomial (d = 4) Gaussian RBF (c = 2.19) One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all One vs. one One vs. all 100 99.17 100 100 100 100 100 100 100 100 Testing 100 100 100 97.5 100 97.5 100 97.5 100 100 48 45 49 45 45 47 46 52 56 50 0.31 5.17 0.34 0.92 0.32 1.26 0.34 1.15 0.23 0.26 Number of SVs Training time (s)


nds the components not merely uncorrelated but independent. Independent components are more useful for classication rather than uncorrelated components. The reason is the negentropy in ICA could take into account the higher order information of the original inputs better than PCA using sample covariance matrix. Moreover, from feature selection part, we can observe the eect of feature selection from the distance evaluation criteria of ICs and PCs. Fig. 7 show us that the variance of distance among the ICs is relatively high; it represents of useful ICs features. It means that the bigger variance of distance evaluation criteria have signicant importance in classication process. From Fig. 8 we can see that the variance of distance between PCs is relatively low except rst PCs. However, the others PCs have low variance in distance, respectively. So, it indicates the performance of PCs is lower than ICs in classication process. 6.2. Eect of kernel function From this study, the eect of selection of kernel function is also introduced. The performance of SVM depends on a great extent on the choice of kernel function to transform a data from input space to a higher dimensional feature space. The choice of kernel function is data dependent and there are no denite rules governing its choice that might yield a satisfactory performance. Tables 57 present results of SVM with the kernel function dened in Table 1. In these tables, d is the degree of polynomial and c is width of RBF kernel parameter. The parameter C does not emerge in this table because it only needed in calculation process as penalty term. At the rst classication, we do not optimize the kernel parameters. First, the polynomial kernel function was used and then the second we used Gaussian RBF kernel. RBF kernel is very popular and claimed as the best kernel in classication process. In this kernel, there are two parameters which determine the performance in training and testing, C and c. Therefore, the selection of proper kernel parameters C and c is very important to achieve the good performance. In this paper, we performed training and test-

ing process using without or with kernel parameter selection. The eect of kernel parameters selection will be explained in the next discussion. According to eect of kernel selection, the performance in classication training and testing tends to be increased using polynomial and RBF kernel, respectively. This phenomenon can be seen in Tables 57. The kernel parameters which used in polynomial kernel are d = 1, C = 10 and c = 1. Whereas for RBF kernel we used C = 10 and c = 2.19. In Table 5, the performance of RBF kernel using one-against-all strategy is lower than the others. This condition is caused by using improper RBF kernel parameters C and c. Also, in Table 5, we used the original features without feature extraction and selection. That is why the performance of RBF kernel in Table 5 is lowest. 6.3. Eect of kernel parameters selection There are three parameters associated with polynomial kernel (d, C, c) and two parameters for the RBF kernel (C, c). It is not known beforehand which values of d, C and c are the best for one problem; consequently, some kind of model selection or parameter search approach must be employed. This study conducts a 10-fold cross-validation to nd the best values of d, C and c. Pairs of d, C and c are tried and the one with lowest cross-validation error is picked. For RBF kernel we searched the range of parameters C = {20, 21, . . . , 27} and c = {23, 22, . . . , 23}, so there are 56 pairs of (C, c) which must be evaluated. In the case of polynomial kernel we evaluated pairs of (d, C, c) from the range d = {1, 2, 3, 4}, C = {20, 21, . . . , 27} and c = {23, 22, . . . , 23}. The polynomial kernel seems to have more hyper-parameters than RBF kernel. The complete results of kernel parameter selection are summarized in Table 8. After the optimal pairs were found, the whole training data was training again to generate the nal classier. This study performs the training process using polynomial and RBF kernel to all of data: original features, PCA feature extraction and ICA feature extraction. The performance of polynomial and RBF kernel after kernel parameter selection is presented in Tables 911.

310 Table 8 Selected kernel parameter Data Polynomial kernel (d, C, c) One vs. one Original feature PCA feature extraction ICA feature extraction (3, 2 , 2 ) (1, 22, 20) (1, 25, 20)
7 0

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312

RBF kernel (C, c) One vs. one (2 , 2 ) (20, 21) (21, 20)
7 3

One vs. all (4, 2 , 2 ) (1, 20, 20) (1, 21, 20)
7 0

One vs. all (26, 23) (20, 22) (21, 20)

As shown in Tables 911, the performance of classication process is increased due to the kernel parameter selection. It can be compared with Tables 57 in the case of without kernel parameter selection. In Table 9, the classication rates of training for polynomial kernel is lower than RBF kernel both one-against-one and one-against-all strategies even though the degree of polynomial are 3 and 4, respectively. This condition is supposed due to the bad quality of data input without feature extraction so that the curse of dimensionality phenomenon decreases the performance of classier. However, in Tables 10 and 11, the classication rate reaches 100% using polynomial kernel due to good quality of data input after feature extraction process. In Table 10, the classication rates of each kernel function are high; even almost of classication rates achieve

100%. Generally, the strategy of one-against-one is better than one-against-all as listed in the table. As shown in Table 10, the feature extraction using PCA is useful to increase the performance of classication rather than without feature extraction in Table 9, because of PCA search the uncorrelated components from the input data space and treat it so that more useful in classication. Moreover, using kernel parameter selection will increase the performance better. The proper pairs of (d, C, c) in polynomial kernel are (1, 22, 20) and (1, 20, 20) for oneagainst-one and one-against-all, respectively. Although the degree of polynomial equal to 1, however, the performance is high (100%). In RBF kernel, the proper kernel parameter of pairs (C, c) are (20, 21) and (20, 22) for one-against-one and one-against-all, respectively. The classication rates also high (100% and 99.97%). It becomes evidence that kernel parameter selection is very important to get good performance. Furthermore, the use of proper kernel parameter will overcome the problems of undertting and overtting so the best classication process is yielded. Finally, the faults classication using ICA feature extraction is presented in Table 11. This table presents the best performance in faults classication rather than previous methods. From this table we can see that performance of all kernel function are 100% in fault

Table 9 Fault classication using original feature and selected kernel parameter Kernel Multi-class approach One vs. one (3, 27, 20) One vs. all (4, 27, 20) One vs. one (27, 23) One vs. all (26, 23) Classication rate (%) Training Polynomial (d, C, c) Gaussian RBF (C, c) 99.98 98.30 100 100 Testing 100 100 100 100 47 60 41 43 0.031 0.125 0.032 0.078 Number of SVs Training time (s)

Table 10 Fault classication using PCA feature extraction and selected kernel parameter Kernel Multi-class approach One vs. one (1, 22, 20) One vs. all (1, 20, 20) One vs. one (20, 21) One vs. all (20, 22) Classication rate (%) Training Polynomial (d, C, c) Gaussian RBF (C, c) 100 100 100 100 Testing 100 100 99.97 100 47 91 71 80 0.031 0.063 0.016 0.063 Number of SVs Training time (s)

Table 11 Fault classication using ICA feature extraction and selected kernel parameter Kernel Multi-class approach One vs. one (1, 25, 20) One vs. all (1, 21, 20) One vs. one (21, 20) One vs. all (21, 20) Classication rate (%) Training Polynomial (d, C, c) Gaussian RBF (C, c) 100 100 100 100 Testing 100 100 100 100 42 79 64 79 0.031 0.062 0.015 0.063 Number of SVs Training time (s)

A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312


classication. It shows us that the feature extraction using ICA is the best method among them, because of ICA seeks not merely uncorrelated components but independents. It is more useful for classication process. In addition, the application of kernel parameter selection using cross-validation makes the performance of classication is excellent. The results of kernel parameter selection for polynomial kernel (pairs of (d, C, c)) are (1, 25, 20) and (1, 21, 20) for one-against-one and one-against-all, respectively. It uses one degree of polynomial kernel and produces the best performance. Then, in the case of RBF kernel, the kernel parameter selection pairs of (C, c) yields (21, 20) both for one-against-one and one-against-all, respectively. The classication rate is more excellent, both 100% rather than PCA feature extraction and original feature. 7. Conclusion In this paper, we applied the combination of ICA and SVMs for intelligent fault diagnosis of induction motor. ICA and PCA were successfully applied for feature extraction process; however, the clustering feature using ICA is better than PCA does. The feature extraction is one important step in fault classication process because it can remove the redundancy and avoid the curse of dimensionality phenomenon. After feature extraction, we performed feature selection process to remove irrelevant and useless feature. The distance evaluation technique was employed due to its simple and reliability. SVMs-based multi-class classication is applied to do faults classication process. The results show that SVMs achieved high performance in classication using multiclass strategy: one-against-one and one-against-all. Particularly in this study, we utilized a cross-validation technique using 10-fold in order to choose the optimal values of kernel parameters C and c, that are very important in SVMs model selection. Selecting the proper parameters values through cross-validation, we could build a classication model with high performance and accuracy. To show the importance of feature extraction and kernel parameters selection, we trained the SVMs onto the data input without and with feature extraction, and then followed by kernel parameters selection. The results show that using ICA feature extraction and combining kernel parameters selection gave the best faults classication. According to this result, the combination of ICA and SVMs can serve as a promising alternative for intelligent faults diagnosis in the future. Acknowledgements This work was supported by the Center of Advanced Environmentally Friendly Energy System, Pukyong National University, South Korea (Project number: R122005-001-01006-0). Also, this work was partially supported by the Brain Korea 21 Project in 2005.

Antonini, G., Popovici, V., & Thiran, J. P. (2005). Independent component analysis and support vector machine for face feature extraction. Signal Processing Institute, Swiss Federal Institute of Technology Lausanne, Available from Back, A. D., & Weigend, A. S. (1998). A rst application of independent components analysis to extracting structure from stock returns. International Journal of Neural System, 8(4), 473484. Biswall, B. B., & Ulmer, J. L. (1999). Blind source separation of multiple signal sources of MRI data sets using independent components analysis. Journal of Computer Assisted Tomography, 23(2), 265271. Cao, L. J., Chua, K. S., Chong, W. K., Lee, H. P., & Gu, Q. M. (2003). A comparison of PCA, KPCA and ICA for dimensional reduction in support vector machine. Neurocomputing, 55, 321336. Cardoso, J. F. (1998). Blind signal separation: statistical principles. Proceeding of the IEEE, 86(10), 20092020. Chan, K., & Lee, T. W. (2002). Comparison of machine learning and traditional classiers in glaucoma diagnosis. IEEE Transactions on Biomedical Engineering, 49(9), 963974. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge: Cambridge University Press. Guo, M., Xie, L., Wang, S. Q., & Zhang, J. M. (2003). Research on an integrated ICA-SVM based framework for fault diagnosis. IEEE Journal, 27102715. Han, T., Son, J. D., & Yang, B. S. (2005). Fault diagnosis system of induction motors using feature extraction, feature selection and classication algorithm. Proceedings of VS Tech2005. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classication. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University. Available from guide.pdf. Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transaction on Neural Network, 13(2), 415425. Hyva rinen, A. (1998). New approximations of dierential entropy for independent component analysis and projection pursuit. Advances in Neural Information Processing System, 10, 273279. Hyva rinen, A. (1999). Fast and robust xed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10, 626634. Hyva rinen, A., & Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(45), 411430. Jack, L. B., & Nandi, A. K. (2002). Fault detection using support vector machines and articial neural network, augmented by genetic algorithms. Mechanical System and Signal Processing, 16, 373390. Jollie, I. T. (1986). Principal component analysis. New York: Springer. Kriegl, J. M., Arnhold, T., Beck, B., & Fox, T. (2005). A support vector machine approach to classify human cytochrome P450 3A4 inhibitors. Journal of Computer-Aided Molecular Design, 19, 189201. Kumar, R., Jayaraman, V. K., & Kulkarni, B. D. (2005). An SVM classier incorporating simultaneous noise reduction and feature selection: illustrative case examples. Pattern Recognition, 38, 4149. Lee, Y. S., Nelson, J. K., Scarton, H. A., Teng, D., & Azizi-Ghannad, S. (1994). An acoustic diagnostic technique for use with electric machine insulation. IEEE Transactions on Dielectrics and Electrical Insulation, 9, 11861193. Lehrman, M., Rechester, A. B., & White, R. B. (1997). Symbolic analysis of chaotic signals and turbulent uctuation. Physical Review Letters, 78(1), 645657. Min, J. H., & Lee, Y. C. (2005). Bankruptcy using support vector machine with optimal choice of kernel function parameters. Experts System with Application, 28, 603614. Osuna, E., Freund, R. R., & Girosi, F. F. (1997). Improved training algorithm for support vector machines. Proceeding of IEEE Neural Networks for Signal Processing, 276285.


A. Widodo et al. / Expert Systems with Applications 32 (2007) 299312 Worden, K., & Manson, G. (1999). Visualization and dimension reduction of high-dimensional data for damage detection. IMAC, 17, 15761585. Yang, B. S., Han, T., & An, J. L. (2004). ART-KOHONEN neural network for faults diagnosis of rotating machinery. Mechanical System and Signal Processing, 18, 645657. Yang, B. S., Han, T., & Hwang, W. W. (2005). Fault diagnosis of rotating machinery based on multi-class support vector machines. Journal of Mechanical Science and Technology, 19, 845858. Yang, B. S., Hwang, W. W., Kim, D. J., & Tan, A. C. (2005). Condition classication of small reciprocating compressor for refrigerators using articial neural networks and support vector machines. Mechanical System and Signal Processing, 19, 371390. Yang, B. S., Hwang, W. W., Ko, M. H., & Lee, S. J. (2005). Cavitation detection of buttery valve using support vector machines. Journal of Sound Vibration, 287(12), 2543. Yang, B. S., Jeong, S. K., Oh, Y. M., & Tan, A. C. C. (2004). Case-based reasoning system with Petri nets for induction motor fault diagnosis. Expert Systems with Applications, 27, 301311. Yang, B. S., Lim, D. S., & An, J. L. (2000). Vibration diagnostic system of rotating machinery using articial neural network and wavelet transform. Proceeding of 13th International Congress on COMADEM, Houston, USA, pp. 1220. Ypma, A., & Pajunen, A. P. (1999). Rotating machine vibration analysis with second order independent components analysis. Proceeding of the Workshop on ICA and Signal Separation, pp. 3742. Zang, C., Friswell, M. I., & Imregun, M. (2004). Structural damage detection using independent components analysis. Structural Health Monitoring, 3(1), 6983.

Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf et al. (Eds.), Advances in kernel methods-support vector learning (pp. 185208). Cambridge: MIT Press. Samanta, B. (2004). Gear fault detection using articial neural networks and support vector machines with genetic algorithms. Mechanical Systems and Signal Processing, 18(3), 625644. Sohn, H., Czarnecki, J. A., & Farrar, C. R. (2000). Structural health monitoring using statistical process control. Journal of Structural Engineering, 126(1), 13561363. Stone, G. C., Sedding, H. G., & Costello, M. J. (1996). Application of partial discharge testing to motor and generator stator winding maintenance. IEEE Transactions on Industry Application, 32, 459464. The FastICA MATLAB Package. Available from http://www.cis.hut./ projects/ica/fastica. Thollon, F., Jammal, A., & Grellet, G. (1993). Asynchronous motor cage fault detection through electromagnetic torque measurement. European Transactions on Electrical Power, 3, 375378. Thomson, W. T., & Fenger, M. (2001). Current signature analysis to detect induction motor faults. IEEE Transactions on Industry Applications, 7, 2634. Vapnik, V. N. (1982). Estimation dependences based on empirical data. Berlin: Springer-Verlag. Vapnik, V. N. (1999). The nature of statistical learning theory. New York: Springer. Vigario, R. (1997). Extraction of ocular artifacts from EEG using independent components analysis. Electroencephalography and Clinical Neurophysiology, 103(3), 395404.