You are on page 1of 9

Neurocomputing 128 (2014) 249 257

Contents lists available at ScienceDirect

journal homepage:

Real-time fault diagnosis for gas turbine generator systems using extreme learning machine
Pak Kin Wong a,n, Zhixin Yang a, Chi Man Vong b, Jianhua Zhong a
a b

Department of Electromechanical Engineering, University of Macau, Macau, China Department of Computer and Information Science, University of Macau, Macau, China

art ic l e i nf o
Article history: Received 10 September 2012 Received in revised form 19 March 2013 Accepted 25 March 2013 Available online 5 November 2013 Keywords: Real-time fault diagnosis Gas turbine generator system Extreme learning machine Wavelet packet transform Time-domain statistical features Kernel principal component analysis

a b s t r a c t
Real-time fault diagnostic system is very important to maintain the operation of the gas turbine generator system (GTGS) in power plants, where any abnormal situation will interrupt the electricity supply. The GTGS is complicated and has many types of component faults. To prevent from interruption of electricity supply, a reliable and quick response framework for real-time fault diagnosis of the GTGS is necessary. As the architecture and the learning algorithm of extreme learning machine (ELM) are simple and effective respectively, ELM can identify faults quickly and precisely as compared with traditional identication techniques such as support vector machines (SVM). This paper therefore proposes a new application of ELM for building a real-time fault diagnostic system in which data pre-processing techniques are integrated. In terms of data pre-processing, wavelet packet transform and time-domain statistical features are proposed for extraction of vibration signal features. Kernel principal component analysis is then applied to further reduce the redundant features in order to shorten the fault identication time and improve accuracy. To evaluate the system performance, a comparison between ELM and the prevailing SVM on the fault detection was conducted. Experimental results show that the proposed diagnostic framework can detect component faults much faster than SVM, while ELM is competitive with SVM in accuracy. This paper is also the rst in the literature that explores the superiority of the fault identication time of ELM. & 2013 Elsevier B.V. All rights reserved.

1. Introduction The gas turbine generator system (GTGS) is commonly used in many power plants. The main components of a GTGS are: power turbine, gearbox, ywheel and asynchronous generator. In the rst phase of the GTGS, the power turbine is driven by the exhaust gas, the output of the turbine then drives a gearbox that connects to the ywheel which keeps constant moment of inertia to protect the generator from sudden stop. Finally, the rotating ywheel drives the asynchronous generator to generate the electric power. The system is designed to run 24 h per day. Any abnormal situation of the GTGS will interrupt the electricity supply to cause enormous economic loss. The traditional manual inspection on the GTGS is difcult to accomplish the fault-monitoring task because the GTGS is complicated and also cannot be arbitrarily stopped. In order to ensure that the operation of the power plant can run smoothly, development of a suitable real-time fault monitoring system for the GTGS is necessary.

Corresponding author. Tel.: 853 83974956; fax: 853 28838314. E-mail address: (P.K. Wong).

In recent years, researches on the development of intelligent real-time monitoring systems for the GTGS and rotating machinery have become very active. A real-time fault diagnostic system for steam turbine generator based on hierarchical articial neural network was found in [1]. An online condition monitoring and diagnostic system for feed rolls was developed by [2]. This system measures the bearing vibration signals and judges the feed roll condition automatically according to the diagnostic rules stored in a computer. In the literature [35], expert systems and knowledge base systems for fault diagnosis of turbine generators were developed. In a nutshell, modern methods for fault diagnosis of GTGSs and rotating machinery usually rely on the procedures of (1) processing of the vibration signal and (2) fault identication/ classication [19]. In terms of processing of the vibration signal, the signal contains high-dimensional data and is enclosed by a lot of irrelevant and redundant information, which cannot be easily fed into the realtime fault diagnostic system. The high-dimensional data will degenerate the accuracy and fault identication time of the diagnostic system too. Therefore, extracting the useful information from the vibration signal is desirable. Currently, there are several common feature extraction techniques, such as wavelet packet transform (WPT) [7,8,10], independent component analysis [11] and time-domain

0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.


P.K. Wong et al. / Neurocomputing 128 (2014) 249 257

Nomenclature threshold of the ith hidden node bi C regularization parameter of SVM DKPCA-TRAIN feature set selected by KPCA DProc-TRAIN representative features of training data set after ofine data pre-processing DTRAIN training data set DWT-TRAIN feature set extracted by WPT and TDSF d hyperparameter of polynomial kernel for KPCA d hyperparameter of polynomial kernel for SVM Ei prediction error for the ith test case ELM extreme learning machine GTGS gas turbine generator system h number of hidden nodes of ELM H hidden layer output matrix H MoorePenrose pseudo inverse of matrix H j number of classes for classication KPCA kernel principal component analysis K(.) kernel function of KPCA L level of WPT decomposition LELM level of WPT decomposition for ELM classier LSVM level of WPT decomposition for SVM classier PELM hyperparameter of KPCA for ELM classier PSVM hyperparameter of KPCA for SVM classier q number of test cases R radius of RBF kernel for KPCA

R SVM TDSF tn tn,j wi WPT xi xKPCA xmax xmin xn xnew xProc xWT y zr(xWT)

r,I i r

radius of RBF kernel for SVM support vector machine time-domain statistical features nth output pattern jth binary output in the nth output pattern weight vector connecting the ith hidden node and input nodes wavelet packet transform ith raw signal data feature set selected by KPCA for unseen real-time signal upper limit of feature lower limit of feature nth set of features in DProc-TRAIN unseen real-time signal representative features of unseen real-time signal after data pre-processing feature set extracted by WPT and TDSF for unseen real-time signal normalized feature transformed variables zr for vector xWT Ith element in eigenvector corresponding to the rth largest eigenvalue weight vector connecting the ith hidden node and output nodes rth eigenvalue of KPCA

statistical features (TDSF) [12,13]. The above literature reveals that WPT and TDSF are commonly used for rotating machinery so that both techniques are considered in this project. After performing feature extraction, there may be still some irrelevant and redundant information in the extracted features. In order to resolve this problem, a feature selection method should be employed to wipe off irrelevant and redundant information such that the amount of raw data can be reduced, resulting in improvement on diagnostic accuracy. The available feature selection approaches include compensation distance evaluation technique (CDET) [13], kernel principal component analysis (KPCA) [14] and the genetic algorithm (GA) based methods [15,16]. Although CDET and GA-based methods provide a good solution, the optimal threshold in CDET is difcult to set and the result of GA is unrepeatable. In other words, when a GA is run for two times, two different results will be obtained. In this way, KPCA is considered in this study. Regarding identication/classication techniques, traditional neural networks, such as multi-layer perception (MLP), were commonly used for fault diagnosis of the steam turbine and rotating machinery [10,17,20]. However, MLP has many drawbacks, such as local minima, time-consuming for determination of optimal network structure, and risk of over-tting. To date, a number of researchers have already applied support vector machines (SVM) to diagnose rotating machine faults and other engineering diagnosis problems [8,9,11,1821], and have shown that SVM is superior to traditional ANN [20,2224]. The major advantages of SVM are global optimum and higher generalization capability [20,24]. Recently, an emerging machine learning method called extreme learning machine (ELM) has been developed [25,26], which has a simple structure and efcient learning algorithm. The learning speed of ELM is extremely fast while it has higher generalization than the gradient-descent based learning (e.g. the back-propagation method) [32]. In addition, the issues of local minima, improper learning rate and over-tting suffered in traditional ANN are overcome [27]. A few studies have already

demonstrated the use of ELM for many engineering applications. The approach based on wavelet transform and ELM for fault identication in a series compensated transmission line was presented in [28]. Reference [29] presented a multi-stage ELM for fault diagnosis on hydraulic tube testers. Nevertheless, the application of ELM to real-time fault diagnosis of rotating machinery is still very rare. However, there are three major challenges in the development of real-time fault diagnostic systems for the GTGS. The rst one is that there is a huge number of data collected from the real-time monitoring system, which are multivariate and nonlinear. It is believed that the existing data pre-processing techniques can help a lot. The second challenge is that there are many classes of faults in the GTGS. The nal one is the demand of quick fault identication time. It is well-known that only a few seconds of power interruption may cause many kinds of losses, such as losses of money and computer data, for a modern city. Therefore, if any fault exits, the diagnostic system should be able to detect the fault immediately and then send an alarm signal to inform the control center to start a spare generator in order to avoid any power interruption. Currently, SVM is the most popular classier for diagnostic systems. However, recent studies showed that ELM tends to have better scalability and achieve much better generalization performance at much faster learning speed than traditional SVM [25,26]. Moreover, traditional SVM usually requests at least two hyperparameters to be specied by users, single parameter setting makes ELM be used easily and efciently. Besides, ELM is a multi-input and multi-output structure, whereas SVM is a multiinput and single-output structure. So ELM is easier to be implemented for multi-class problem. With the aforesaid advantages of ELM, this paper proposes a new application of ELM for building a real-time fault diagnostic system for the GTGS. This paper is organized as follows: Section 2 presents the proposed diagnostic framework and the techniques involved in the framework. Experimental setup and sample data acquisition

P.K. Wong et al. / Neurocomputing 128 (2014) 249 257


with a simulated GTGS are discussed in Section 3. Section 4 discusses the experimental results of ELM and its comparison with SVM. Finally, a conclusion is given in Section 5.

2. Proposed fault diagnostic system 2.1. General workow The owchart of the proposed real-time fault diagnostic system for the GTGS is shown in Fig. 1. The proposed real-time fault diagnostic system consists of three components, namely, (1) diagnostic model; (2) real-time signal acquisition system; and (3) fault identication system. In the construction of the fault model, the training data set, DTRAIN, is rstly obtained from experiments. After ofine data preprocessing, the representative features of the training data set, DProcTRAIN, are employed to construct a diagnostic model based on ELM or SVM. The real-time data acquisition system uses accelerometers to record the time-dependent vibration data of the GTGS. The unseen real-time signal, xnew, for diagnosis is instantaneously processed by the data pre-processing approaches, i.e., xnew is transformed into xProc, which is to be identied by the real-time diagnostic model. The diagnostic decision is made based on the output of the diagnostic model and a simple label mapping mechanism. The proposed data pre-processing approaches are utilized both in ofine model training and real-time diagnosis. Feature extraction and feature selection methods are employed to acquire the most important information of the vibration signal. WPT and TDSF are proposed for feature extraction, in which the training feature set, DWT-TRAIN, and the unseen real-time signal, xWT, are generated. Then, KPCA is applied to further reduce the dimensions of DWT-TRAIN and xWT. The results are saved as the DKPCA-TRAIN and xKPCA. In order to avoid domination of large feature values, every feature in DKPCA-TRAIN and xKPCA is normalized within [0,1]. The processed dataset and unseen real-time signal are named as DProc-TRAIN and xProc respectively. 2.2. Wavelet packet transform In the last two decades, Wavelet Transform (WT) has been widely applied in random signal processing. The transform of a signal is just

another form of representing the signal. It does not change the information content presented in the signal. In WT, multi-resolution technique is used and different frequencies are analyzed with different resolutions in order to provide a time-frequency representation of the signal. Discrete wavelet transform (DWT) derives from the WT family, so does the Wavelet Packet Transform (WPT). WPT is a generalization of wavelet decomposition that offers a richer signal analysis. In the decomposition of a signal by DWT, only the lower frequency band is decomposed, giving a right recursive binary tree structure whose right lobe represents the lower frequency band, and its left lobe is the higher frequency band. In the corresponding WPT decomposition, the lower as well as the higher frequency bands are decomposed giving a balanced binary tree structure. Therefore, the WPT has the same frequency bandwidths in each resolution while DWT does not have this property. Therefore, the WPT is suitable for processing of non-stationary signals, like the vibration signal, because the same frequency bandwidths can provide good resolution regardless of high and low frequencies. 2.3. Kernel principal component analysis Principal component analysis (PCA) is a popular statistical method for principal component feature extraction. PCA always performs well in dimensionality reduction when the input variables are linearly correlated. However, for nonlinear cases, PCA cannot give good performance. Hence PCA is extended to nonlinear version under SVM formulation and is called Kernel PCA (KPCA), which has been used to solve many application problems [30,31]. KPCA involves solving in the following set of equations:

I;J ;

where I;J K xWT I ; xWT J for I, J 1, , Q, Q is the number of training data for KPCA, xWTI ; xWTJ A Rs and s is the dimension of the training data. The vector 1 ; ; Q T is the eigenvector of I,J and A R is the corresponding eigenvalue. The transformed variables (score variables) zr for vector x become zr xWT r;I K xWTI ; xWT ;
I1 Q

where r,I is the Ith element in the eigenvector corresponding to the rth largest eigenvalue, r 1 to p, and p is the largest number




Fig. 1. Proposed real-time fault diagnostic system for GTGS.


P.K. Wong et al. / Neurocomputing 128 (2014) 249 257

such that eigenvalue p of the eigenvector p is nonzero. Therefore, based on the p pairs of (r, r), the input vector xWT A Rs can be transformed to a nonlinearly uncorrelated variable z [z1,,zp] where p r s. One more point to note is that the eigenvectors r should satisfy the normalization condition of unit length:

where H is called the hidden layer output matrix denoted by 2 3 2 3 hx1 g w1 U x1 b1 g wh U x1 bh 6 7 6 7 H4 54 7 5 hxG g w1 U xG b1 g wh U xG bh and is given by 2 1;1 1;m 3 6 7 4 5

r T r r 1;

r 1; 2; :::; p;

where 1 Z 2 Z Z p 4 0. To produce a further reduced feature vector, a post-pruning procedure can be done. That is, after acquiring the p pairs of (r, r), all r are normalized to r which 0 satisfy the constraint p r 1 r 1. Based on these normalized eigenvalues r, the smallest r are deleted until lr 1 r0 r 0:95. With the index l o p, the eigenvectors r (r 1 to l) are selected to produce a reduced feature vector which retains 95% of the information content in the transformed features. Usually a 5% of information loss is a rule of thumb for dimensionality reduction. The above concept is depicted in Fig. 2. 2.4. Extreme learning machine for classication Extreme learning machine (ELM) is a recently available learning method for single-hidden-layer feedforward neural networks (SLFNs) [32]. The basic idea of ELM is that, the model has only one hidden layer, and the parameters of this hidden layer, including the input weights and biases of the hidden nodes, need not to be tuned. All hidden node parameters are assigned randomly, which are independent upon the target function and the training data [26]. Afterwards, the output weights (linking the hidden layer to the output layer) are determined analytically using a Moore Penrose generalized inverse [27]. Different from traditional learning algorithms for ANN, ELM tends to provide good generalization at extremely fast learning speed because of its simple and efcient learning algorithm. The approach for multiclass applications is to let ELM have multi-output nodes instead of a single-output node. For j-class classication, ELM classier has j output nodes. The formulation of ELM is summarized as follows. Consider a pre-processed training data set D (xn, tn), n 1, G and tn [tn,1,tn,jtn,m]T, 1 r j r m and tn,j A {0,1}. The classication of ELM with h hidden nodes and sigmoid activation function can be mathematically modeled as f j xProc i g wi U xProc bi
i1 h

h;1 h;m

The ith column of H is the ith hidden node output with respect to inputs x1, x2,, xG. The nth row of H is the hidden layer feature mapping with respect to the nth input xn. The output weight vector can be calculated by

where H and the T 2 t 1;1 6 T 4 t h;1

9 is the MoorePenrose pseudo inverse of matrix H [33], is the output matrix is given below 3 t 1;m 7 10 5 t h;m

Based on this learning algorithm, the training time can be extremely fast because only three calculation steps are required: (1) randomly assign the input weight wi and bias bi; (2) calculate the hidden layer output matrix H and; (3) calculate the output weight . The nal classication result for a multi-class problem can be expressed as LabelxProc arg max f j xProc j A f1; :::; mg 11

3. Case study and experimental setup To obtain representative sample data for model construction and verify the effectiveness of the proposed real-time fault diagnostic framework, experiments were carried out. The details of the experiments are discussed in the following subsections, followed by the corresponding results and comparisons. All the proposed methods mentioned were implemented by using Matlab R2008a and executed on a PC with a Core 2 Duo E6750 @ 2.13 GHz with 4 GB RAM onboard. 3.1. Test rig and sample data acquisition

1 g wi U xProc bi 1 exp wi U xProc bi

where xProc is an unseen case features after data pre-processing. wi is weight vector connecting the ith hidden node and the input nodes, i is the weight vector connecting the ith hidden node and the output nodes, and bi is the threshold of the ith hidden node. If an ELM with h hidden nodes can approximate G training data with zero error, it implies that there exist i, wi and bi such that f j xn i g wi U xn bi H :
i1 h

The experiments were performed on a test rig as shown in Fig. 3, which simulates the GTGS of the Macau power plant. There are many components in the GTGS. As it is not realistic to implement the diagnostic system to monitor all the components in the GTGS in one study, this research selected the fault detection of a gearbox as a case study. The test rig can simulate many

Locations of tri-axial accelerometers

Prime mover
Fig. 2. Selecting index l for dimensionality reduction using KPCA.




Fig. 3. Fault simulator for gas turbine generator system.

P.K. Wong et al. / Neurocomputing 128 (2014) 249 257


common faults in the gearbox of the GTGS, such as unbalance, misalignment, and gear crack and so on. A total of 9 cases, including 8 common faults and one normal case in the gearbox of the gas turbine generator system, were simulated in the test rig in order to generate sample training and test data. Table 1 shows the detail descriptions of the nine cases in which the misalignment of the gearbox was simulated by adjusting one height of the gearbox with shims, and the mechanical unbalance was simulated by adding one eccentric mass on the output shaft. The sample vibration data were acquired by two tri-axial accelerometers located on the outer case of the gearbox as shown in Fig. 3. The accelerometers are used to record the gearbox vibration signals along the horizontal and vertical directions respectively. In the axial direction, the vibration signal is ignored since the test rig uses spur gears in which the vibration along the axial direction is not obvious. The real-time data acquisition system was congured to automatically record the unseen vibration signals for 24 h per day. To construct and test the diagnostic models, each simulated fault was repeated 33 times for sampling, and each time 2 s of vibration data was recorded with a sampling rate of 4096 Hz. The sampling rate was set to a frequency higher than the gear meshing frequency, which can ensure no missing signal. In other words, every training sample for each case has (2 accelerometers 2 measurement directions 2 s 4096) 32768 data points, and the total number of samples for determination of hyperparameters, ofine training and testing of the fault diagnostic system is 297 samples (i.e. 9 cases 33 samples). In this data set, the total number of samples for determination of hyperparameters is 180, while the numbers for ofine model training and testing are 81 and 36 samples respectively. All the numbers of samples can be equally divided by the nine cases for fair training and evaluation.

In this case study, for a vibration signal of 8192 sample points, there are 8192 wavelet coefcients and 24 16 sub-bands after L 4 levels of WPT decomposition. However, using the above statistics, there are only 24 4 64 features which greatly reduce the input complexity (from 8192 inputs to 64 inputs) for next stage. For a good presentation, the term of WPT hereafter refers to the process of wavelet packet decomposition together with the above calculating statistics. After decomposition by WPT, time-domain statistical method is usually further employed to extract time-domain features of the raw signals which provide the physical characteristics of time series data. For instance, references [8,12] applied time-domain statistical features for fault detection on gear trains and low speed bearings, such as mean, standard deviation, skewness, crest factor and kurtosis, respectively. In this study, 10 statistical time-domain features are employed to analyze the vibration signal. Table 2 presents the statistical time-domain features. After data pre-processing by WPT and TDSF, the number of extracted features is shown in Table 3.

3.3. Dimension reduction by KPCA Although the useful features can be extracted by WPT and TDSF, the dimension of these extracted features is still high (about 168 to 552). Such high dimension can degrade the diagnostic performance. To tackle this issue, KPCA is applied to obtain a small set of principal components of the extracted features. With the eigenvalues obtained from KPCA, the unimportant transformed features could be deleted. Therefore, only a limited number of the principal components are necessary and 95% of the information in the features can be retained.

3.2. Feature extraction by WPT and TDSF Feature extraction is the determination of a feature vector from a signal with minimal loss of important information. A feature vector could usually be a reduced-dimensional representation of that signal so as to reduce the modeling complexity and computational cost. Through WPT, a set of 2L subbands of a signal can be obtained, L is the level of WPT decomposition. With reference to the literature [20,34], statistical characteristics can be used for representing signal sub-bands effectively so that the dimensionality of a feature vector extracted from vibration signals can be reduced. For each signal subband, the statistical characteristics include the followings: (1) (2) (3) (4) Maximum of the wavelet coefcients. Minimum of the wavelet coefcients. Mean of the wavelet coefcients. Standard deviation of the wavelet coefcients.
Table 2 Denition of common statistical features in time-domain. Time-domain feature 1. Mean 2. Standard deviation 3. Root mean square 4. Peak 5. Skewness 6. Kurtosis 7. Crest factor 8. Clearance factor 9. Shape factor 10. Impulse factor Equation
1 N i 1 xi xm N q N x xm 2 i 1 i xstd N1 q 1 N xrms N i 1 x 2 i

xpk max jxi j xske xkur

x N x xm 3 i 1 i N 1x3 std N x xm 4 i 1 i N 1x4 std

pk CF xrms


1 N N i 1

jxi j

rms SF 1 x N N i 1 jxi j x IF 1 N pk jx j N i 1 i

Note: xi represents a signal series for i 1, 2, , N, where N is the number of data points of a raw signal. Table 1 Description of common defects of gearbox. Label C1 C2 C3 C4 C5 C6 C7 C8 C9 Case Normal Structure failure Type of fault Normal Unbalance Looseness Misalignment Chipped tooth Gear tooth broken Gear crack Gear tooth broken combined with chipped tooth Gear crack combined with chipped tooth Table 3 Features extracted under different decomposition levels of WPT. WPT Features obtained TDSF by WPT (Measurement directions locations of accelerometers) 22 22 22 Total features

Gear failure

Simultaneous faults

3 level 4 level 5 level

23 subbands 4 24 subbands 4 25 subbands 4

10 10 10

168 296 552


P.K. Wong et al. / Neurocomputing 128 (2014) 249 257

3.4. Normalization To ensure all the features having even contribution, all reduced features should go through normaliziation. The interval of normalization is within [0,1]. The extracted feature is normalized by the following formula: y xKPCA xmin ; xmax xmin 12

class, its corresponding error Ei is set to 1, otherwise Ei is set to 0. Finally, the sum of errors Ei is divided by the total number of test cases q. Its complement gives the accuracy function as follows " # 1 q Accuracy 1 E 100% 13 qi1 i

where xKPCA is an output feature after going through KPCA and y is the result of normalization. After normalization, a processed training set DProc-TRAIN is obtained. The classication techniques of ELM and SVM can then be employed to construct the fault classier based on DProc-TRAIN. 3.5. Setup for evaluation After model training, system evaluation can be carried out. A set of real-time test data was collected via the data acquisition system. When the real-time vibration data were recorded for testing, they were also required going through the three steps of feature extraction, dimension reduction, and normalization. To fairly assess the system, the test data set includes equal numbers of the nine classes of cases. To evaluate the prediction performance of the classiers, classication accuracy is used. The evaluation of classication accuracy is simple because it just compares the calculated class of an unseen input vector and its given target class. Given a test set of q cases, every test case ti for i 1 to q, is passed to the ELM and SVM classiers. If the calculated class of ti is not equal to its given target

4. Experimental results and discussion Since there are many combinations of mother wavelets and kernels for KPCA & SVM and ELM, a set of experiments was carried out in the simulated gas turbine generator system to determine the best combination of the system parameters. In the phase of WPT, mother wavelet and the level of decomposition L are selected according to trial-and-error method. In the family of mother wavelets, the Daubechies wavelet (Db) is the most popular one and hence employed for experiments. In this case study, three Daubechies wavelet (Db3, Db4, and Db5) were tried and hence the range of L was set from 3 to 5. Moreover, three different kernel functions for KPCA and SVM, namely, linear, radial basis function (RBF) and polynomial, were tested. Different kernel functions have various hyperparamters for adjustment. However, it is very timeconsuming to try different values of hyperparameters. To simplify the experiments, we tried the hyperparameter R of RBF based on 2v where v 3 to 3, and the hyperparameter d of polynomial kernel was taken from 2 to 5. The procedure for kernel parameter selection is depicted in Fig. 4. The rst module is used for determining the parameters for WPT and KPCA. As shown in the part Conguration of Data

Fig. 4. Construction of SVM/ELM fault diagnostic model.

P.K. Wong et al. / Neurocomputing 128 (2014) 249 257


Pre-processing, an independent dataset DTRAIN1 of 90 sample signals was employed to determine the WPT decomposition level L, and polynomial degree d or RBF radius R of KPCA. Then, 10-fold cross validation with the 90 sample signals was used to train and evaluate a standard diagnostic model for every combination of trial parameters. Since SVM and ELM were compared in this paper, both methods were adopted to construct a standard diagnostic model respectively. Under different combinations of L and d or R, all training samples were pre-processed and the corresponding SVM or ELM diagnostic models were constructed. The best combinations of L and d or R or null producing the highest average accuracies of the SVM and ELM diagnostic models were returned, respectively, as (LSVM, PSVM), and (LELM, PELM). Both LSVM and LELM represent optimal L for SVM and ELM models respectively, whereas both PSVM and PELM represent optimal KPCA kernel parameter d or R or null for SVM and ELM models respectively. For the standard model of SVM, the kernel was polynomial and all parameters were set to be 1.0. For the one of ELM, sigmoid function was used and the number of hidden nodes was set to be 10 for a trial. Note that the standard SVM and ELM models are designed for evaluation only; they are not used in the nal fault identication model. In the KPCA processing, polynomial kernel with d 4 and RBF kernel with R 2 were nally selected according to experimental results not listed here. These two best sets of parameters for WPT and KPCA were then passed to the next module Conguration of Fault Identication Model where the hyperparameters of the SVM and ELM classiers were determined. In this module, another independent dataset DTRAIN2 of 90 samples was used. Again, 10-fold cross validation was applied to determine the best combinations of the hyperparameters. Under (LSVM, PSVM), DTRAIN2 was pre-processed so that SVM diagnostic models with different combinations of the hyperparameters (C, d or R or null) were constructed. The best combination (C, d or R or null) along with (LSVM, PSVM) was returned as SVM. Similarly, ELM {LELM, PELM, h} was returned. In order to ensure the best performance of SVM classier, the regularization parameter C of SVM was taken from the range of 10u where u 0 to 3, and three different kernel functions, linear, RBF and polynomial, for SVM classication were tried. According to the experimental results not listed here, SVM employed polynomial kernel with C 10 & d 4, and SVM employed RBF kernel with C 10 & R 4 show the best accuracies respectively. With SVM and ELM, the SVM and ELM diagnostic models were nally constructed with the last independent training set of 81 samples. These two diagnostic models were then veried based on the last 36 test samples. In this case study, the test samples are completely independent from training data sets, so the models

do not overt to the test data set. The results of the tests in Module (1) are presented in Table 4, Figs. 5 and 6. Table 4 indicates that KPCA with linear kernel combined with SVM or ELM shows the highest average diagnostic accuracies. Under the linear kernel, the mother wavelet Db4/L4 combined with SVM and ELM reveal the highest diagnostic accuracies respectively, which are highlighted in red in Table 4. Although the vibration signal is nonlinear, the generalization of KPCA with nonlinear kernel SVM with polynomial kernel (i.e. nonlinear kernel) in current application may be degenerated because the decision surface constructed under this combination (nonlinear kernel nonlinear kernel) may be over-complex according to the well-known principle of Occam's razor. Referring to the average accuracies in Table 4, this statement is veried for this application. Hence, it explains why KPCA with linear kernel has higher generalization than other nonlinear kernels, when combined with nonlinear classiers such as SVM and ELM. In a nutshell, Db4 with level 4 combined with the linear kernel of KPCA is the best combination for data pre-processing in this application.
Accuracy of ELM 100% 80% 60% 40% 20% 0%
Linear kernel Polynomial kernel RBF kernel

Db3/L3 Db3/L4 Db3/L5 Db4/L3 Db4/L4 Db4/L5 Db5/L3 Db5/L4 Db5/L5 Mother wavelets Db3 to Db5 with L from 3 to 5

Fig. 5. Accuracies of ELM under the three kernels of KPCA.

Accuracy of SVM

100% 80% 60% 40% 20% 0%

Linear kernel

Polynomial kernel

RBF kernel

Db3/L3 Db3/L4 Db3/L5 Db4/L3 Db4/L4 Db4/L5 Db5/L3 Db5/L4 Db5/L5 Mother wavelets Db3 to Db5 with L from 3 to 5

Fig. 6. Accuracies of SVM under the three kernels of KPCA.

Table 5 Accuracies of ELM and SVM classiers under the linear kernel of KPCA and DB4/L4. Accuracy of SVM with linear kernel and C 10 (%) 93.33 7 2.23 Accuracy of Accuracy of SVM with Accuracy of SVM with RBF kernel and ELM with polynomial kernel h 16 (%) and C 10 & d 4 (%) C 10 & R 4 (%) 97.77 7 1.12 96.66 7 1.12 98.88 7 2.22

Table 4 Accuracies of ELM and SVM classiers under various kernels of KPCA and mother wavelets. Mother wavelet Level L KPCA with linear kernel SVM, accuracy (%) Db3 3 4 5 3 4 5 3 4 5 94.44 7 1.12 95.55 7 1.22 95.55 7 2.22 94.44 7 0.04 96.66 7 1.12 95.5 7 2.22 83.33 7 2.22 95.55 7 1.12 96.66 7 0.04 94.19 7 1.26 ELM, accuracy (%) 95.55 7 2.22 96.66 7 2.23 94.44 7 3.33 96.66 7 1.12 97.77 7 1.12 94.44 7 3.33 86.66 7 3.33 96.66 7 2.23 96.66 7 1.12 95.06 7 2.23 KPCA with polynomial kernel and d 4 SVM, accuracy (%) 66.77 7 3.33 65.44 7 4.44 86.77 7 3.33 68.88 7 4.45 82.22 7 1.12 73.44 7 2.22 66.66 7 4.45 63.33 7 3.33 82.22 7 1.12 72.86 7 3.09 ELM, accuracy (%) 70 7 4.44 74.44 7 5.56 86.66 7 4.44 70 7 6.66 84.44 7 2.22 82.22 7 4.44 72.22 7 5.55 76.66 7 4.45 82.22 7 2.22 77.65 7 4.44 KPCA with RBF kernel and R 2 SVM, accuracy (%) 94.44 7 2.22 95.55 7 1.12 90 7 2.23 94.44 7 1.12 95.55 7 1.12 88.88 7 2.22 91.11 7 2.22 94.44 7 1.12 88.88 7 3.33 92.59 7 1.86 ELM, accuracy (%) 92.22 7 3.33 93.33 7 2.22 86.66 7 4.44 93.33 7 2.22 93.33 7 2.22 88.88 7 3.33 89.89 7 3.33 91.11 7 2.22 86.66 7 4.44 90.60 7 3.08





P.K. Wong et al. / Neurocomputing 128 (2014) 249 257

Table 6 Performance of ELM with h 16 and SVM with polynomial kernel. Mother wavelet Level SVM, accuracy (%) SVM identication time for each test case (ms) ELM, accuracy (%) ELM identication time for each test case (ms) No. of features reduced Time saved by ELM over SVM

KPCA with linear kernel Db4 4 96.55 7 0.0

24 7 0.0

98.22 7 1.05

2.7 7 0.4



Remark: The kernels of SVM and ELM are polynomial function with C 10 & d 4 and sigmoid function respectively.

By referring to the Db4/L4 mother wavelet and linear kernel KPCA, the optimal SVM and ELM congurations were then determined. Table 5 shows the fault detection accuracies for the tests in Module (2) in which the optimal SVM parameters with linear kernel, polynomial kernel and RBF kernel were determined to be (C 10), (C 10 and d 4) and (C 10 and R 4) respectively after going through many trials. To determine the optimal number of hidden nodes for ELM, a reasonable trial strategy was employed. According to the theory of ELM, the number of hidden neurons can be bounded to the number of training samples. In order to get the best accuracy of ELM via cross-validation, the number of hidden nodes was tried from 1 to the number of training samples (i.e., 81 in current study). According to the experimental result not listed here, the best number of hidden nodes is 16 in this application, which results in the best average fault detection accuracy. Table 5 also presents that the polynomial kernel combined with SVM is the best in accuracy among the SVM classiers, but the accuracy of ELM is still slightly higher than all SVM classiers. With the optimal congurations obtained from the above steps, the nal SVM and ELM models were constructed (i.e. Module (3)), and a further evaluation test was then conducted. Table 6 gives the experimental results of this evaluation test. In addition to fault detection accuracy, the fault identication time is also presented in Table 6 because the fault identication time is very important factor for real-time diagnosis problems. The fault identication time is counted from the instant of receiving a signal from the sensors, to the time of reporting the classication result. In other words, the senor delay time is not taken into account. Interestingly, the overall accuracies of ELM and SVM in Table 6 are slightly lower than those in Table 5, it may be caused by increasing the number of testing samples from 9 (ten-fold cross validation) to 36 (test data set) in this evaluation test. Table 6 reveals 27 principal components are obtained by Db4, level 4, and linear kernel of KPCA. The 27 principal components retain 95% of the fault information in the extracted features. Therefore, the number of reduced features is equal to 27. In other words, a raw signal of 32768 data points was transformed to a feature vector of 27 elements as the input of the classiers. Table 6 also shows that the fault detection accuracy of ELM (98.22%) is slightly higher than that of SVM (96.55%) by 1.73%, while the fault identication time of ELM and SVM only take 2.7 ms and 24 ms respectively. In actual GTGS application, the real-time fault diagnostic system is required to analyze signals for 24 h per day. In terms of fault identication time, ELM is better than SVM by 88.75%. Although the absolute diagnostic time difference between SVM and ELM is not very signicant in this case study, the time difference will be very signicant in real situation because a practical real-time GTGS diagnostic system will analyze more sensor signals than the four sensor signals used in this case study. Experimental results show that the real-time fault diagnosis framework based on ELM is superior to that of SVM. More specically, ELM outperforms SVM in speed while, on average, ELM is competitive with SVM in accuracy. Two main reasons may be suitable to explain why ELM is better than SVM. First of all, according to the ELM theories, all the training data are linearly separable by a hyperplane passing

through the origin of the ELM feature space, resulting in no bias in the optimization constraint of ELM and better generalization performance than SVM [25]. ELM has milder optimization constraints than SVM, and thus, compared to ELM, SVM tends to obtain suboptimal model parameters [25,26]. Due to this fact, ELM is slightly better than SVM in terms of fault detection accuracy. Secondly, from the view point of classication model size, the model size of ELM is determined by the number of hidden nodes h which is independent of the number of training samples. However for SVM, the model size is determined by the number of support vectors NSV which more or less depends on the number of training data set. Under the best setting of this case study, h and NSV are 16 and 78 respectively. Obviously, the classication model size of ELM is much smaller than that of SVM. The smaller the classication model size, the shorter the execution time of fault detection is. Therefore, ELM can have more efcient fault detection time for real-time diagnosis of the GTGS than SVM.

5. Conclusions This paper proposes a new application of ELM to the real-time fault diagnostic system for rotating machinery, and the system has been successfully developed to monitor the component conditions of the GTGS. In the proposed system, the data pre-processing approaches employ wavelet packet transform and time-domain statistical features for feature extraction. Kernel principal component analysis is then applied to further reduce the feature dimension. As compared with the commonly-adopted classier, SVM, ELM can search optimal solution from the cube of the feature space without any other constraints. Therefore, ELM can produce slightly higher diagnostic accuracy than SVM. Besides, ELM can generate a smaller classication model and takes less execution time than SVM, and this superior feature of ELM has not been addressed in the open literature yet. In short, ELM is superior to SVM in terms of fault identication time while ELM is competitive with SVM in fault detection accuracy. In the case study, the combination of Db4 with the 4th level mother wavelet, linear kernel of KPCA and 16 hidden nodes for ELM can detect various component faults in the gearbox of the GTGS up to 98.22% accuracy in 2.7 ms. This result is very promising. As the proposed ELM fault diagnostic framework is generic, it could be applied to the other applications of condition monitoring in which the fault identication time is critical.

Acknowledgements The authors would like to thank the funding support by the University of Macau, Grant numbers: MYRG153(Y1-L2)-FST11-YZX, MYRG075(Y2-L2)-FST12-VCM and MYRG081(Y2-L2)-FST12-WPK. References
[1] C.F. Yan, H. Zhang, A novel real-time fault diagnostic system for steam turbine generator set by using strata hierarchical articial neural network, Energy Power Eng. 1 (2009) 0716. [2] J.J. Jeng, C.Y. Wei, An online condition monitoring and diagnosis system for feed rolls in the plate mill, J. Manuf. Sci. Eng. 124 (2002) 5257.

P.K. Wong et al. / Neurocomputing 128 (2014) 249 257 [3] H.R. DePold, F.D. Gass, The application of expert systems and neural networks to gas turbine prognostics and diagnostics, J. Eng. Gas Turbines Power 121 (1999) 607612. [4] X. Wang, S.Z. Yang, A parallel distributed knowledge base system for turbine generator fault diagnosis, Artif. Intell. Eng. 10 (1996) 335341. [5] S.D.J. McArthur, J. McDonald, S.J. Shaw, A semiautomatic approach to deriving turbine generator diagnostic knowledge, IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 37 (2007) 979992. [6] R.E. Abdel-Aal, M. Raashid, Using abductive machine learning for online vibration monitoring of turbo molecular pumps, Shock Vib. 6 (1999) 253265. [7] J.D. Wu, J.J. Chan, Faulted gear identication of a rotating machinery based on wavelet transform and articial neural network, Expert Syst. Appl. 36 (2009) 88628875. [8] Q. Hu, Z.G. He, Fault diagnosis of rotating machinery based on improved wavelet package transform and SVMs ensemble, Mech. Syst. Signal Process. 21 (2007) 688705. [9] B.S. Yang, T. Han, Fault diagnosis of rotating machinery based on multi-class support vector machines, J. Mech. Sci. Technol. 19 (2004) 846859. [10] J. Raee, F. Arvani, Intelligent condition monitoring of a gearbox using articial neural network, Mech. Syst. Signal Process. 21 (2007) 1741754. [11] A. Widodo, T. Han, Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors, Expert Syst. Appl. 32 (2007) 299312. [12] A. Widodo, B.S. Yang, Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors, Mech. Syst. Signal Process. 33 (2007) 241250. [13] Y.G. Lei, Z.G. He, Y.Y. Zi, X.F. Chen, New clustering algorithm-based fault diagnosis using compensation distance evaluation technique, Mech. Syst. Signal Process. 22 (2008) 419435. [14] Q.B. He, F. Kong, Subspace-based gearbox condition monitoring by kernel principal component analysis, Mech. Syst. Signal Process. 21 (2007) 17551772. [15] C.L. Huang, C.J. Wang, A GA-based feature selection and parameters optimization for support vector machines, Expert Syst. Appl. 31 (2006) 231240. [16] B. Samanta, Articial neural networks and genetic algorithms for gear fault detection, Mech. Syst. Signal Process. 18 (2004) 12731282. [17] C.C. Ma, Y.Y. Wang, Fault dianosis of power electronic system based on fault gradation and neural network group, Neurocomputing 72 (1315) (2007) 29092912. [18] C.M. Vong, P.K. Wong, W.F. Ip, Case-based expert system using wavelet packet transform and kernel-based feature manipulation for engine ignition system diagnosis, Eng. Appl. Artif. Intell. 24 (7) (2011) 12811294. [19] J. Cheng, D. Yu, J. Tang, Y. Yang, Application of SVM and SVD technique based on EMD to the fault diagnosis of the rotating machinery, Shock Vib. 16 (2009) 8998. [20] C.M. Vong, P.K. Wong, Engine ignition signal diagnosis with wavelet packet transform and multi-class least squares support vector machines, Expert Syst. Appl. 21 (2011) 25602574. [21] A. Widodo, B.S. Yang, Support vector machine in machine condition monitoring and fault diagnosis, Mech. Syst. Signal Process. 33 (2007) 241250. [22] B. Samanta, Gear fault detection using articial neural networks and support vector machines with genetic algorithms, Mech. Syst. Signal Process. 18 (2004) 625644. [23] P.K. Kankar, C. Satish, Fault diagnosis of ball bearings using machine learning methods, Expert Syst. Appl. 38 (2010) 18761886. [24] S. Abbasion, A. Rafsanjani, Rolling element bearings multi-fault classication based on the wavelet de-noising and support vector machine, Mech. Syst. Signal Process. 21 (2007) 29332945. [25] G.B. Huang, X.J. Ding, H.M. Zhou, Optimization method based extreme learning machine for classication, Neurocomputing 74 (2010) 155163. [26] G.B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classication, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42 (2) (2012) 513529. [27] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, in: Proceedings of IEEE International Conference on Neural Networks, vol. 2, pp. 985990, 2004. [28] V. Malathi, N.S. Marimuthu, S. Baskar, K. Ramar, Application of extreme learning machine for series compensated transmission line protection, Eng. Appl. Artif. Intell. 24 (2011) 880887. [29] X.F. Hu, Z. Zhao, S. Wang, F.L. Wang, D.K. He, S.K. Wu, Multi-stage extreme learning machine for fault diagnosis on hydraulic tube tester, Neural Comput. Appl. 17 (2008) 339403. [30] C. Alzate, J. Suykens, Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010) 335347.


[31] P.K. Wong, C.M. Vong, L.M. Tam, K. Li, Data preprocessing and modelling of electronically-controlled automotive engine power performance using kernel principal components analysis and least squares support vector machines, Int. J. Vehicle Syst. Modelling Test. 3 (2008) 312330. [32] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (2006) 489501. [33] R. Penrose, A generalized inverse for matrices, Proc. Cambridge Philos. Soc. 51 (1955) 406413. [34] E. beyli, Statistics over features of ECG signals, Expert Syst. Appl. 36 (2009) 87588767.

Pak-Kin Wong received the Ph.D. degree in Mechanical Engineering from The Hong Kong Polytechnic University, Hong Kong, China, in 1997. He is currently a Professor in the Department of Electromechanical Engineering and Associate Dean (Academic Affairs), Faculty of Science and Technology, University of Macau. His research interests include automotive engineering, uid transmission and control, engineering applications of articial intelligence, and mechanical vibration. He has published over 135 scientic papers in refereed journals, book chapters, and conference proceedings.

Zhixin Yang obtained his B.Eng. in Mechanical Engineering from the Huazhong University of Science and Technology, China in 1992, and Ph.D. in Industrial Engineering and Engineering Management from the Hong Kong University of Science and Technology in 2000. He is currently an Assistant Professor in the Department of Electromechanical Engineering, Faculty of Science and Technology, University of Macau. His research areas include design reuse, engineering applications of articial intelligence, and manufacturing exection system.

Chi-Man Vong received the M.S. and Ph.D. degrees in Software Engineering from the University of Macau in 2000 and 2005, respectively. He is currently an Associate Professor with the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau. His research interests include machine learning methods and intelligent systems.

Jianhua Zhong received the M.S. degree in Electromechanical Engineering from the University of Macau, Macao, China, in 2011. He is currently pursuing a Ph.D. degree in Electromechanical Engineering at the University of Macau. His research interests include condition monitoring and fault diagnosis using machine learning methods.