You are on page 1of 9

Expert Systems with Applications 36 (2009) 10854–10862

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

A case study on classification of features by fast single-shot multiclass PSVM


using Morlet wavelet for fault diagnosis of spur bevel gear box
N. Saravanan *, K.I. Ramachandran
Department of Mechanical Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu 641105, India

a r t i c l e i n f o a b s t r a c t

Keywords: This paper deals with the application of fast single-shot multiclass proximal support vector machine for
Single-shot multiclass proximal support fault diagnosis of a gear box consisting of twenty four classes. The condition of an inaccessible gear in an
vector machine operating machine can be monitored using the vibration signal of the machine measured at some conve-
Morlet wavelet nient location and further processed to unravel the significance of these signals. The statistical feature
Statistical features
vectors from Morlet wavelet coefficients are classified using J48 algorithm and the predominant features
Fault detection
Bevel gear box
were fed as input for training and testing multiclass proximal support vector machine. The efficiency and
time consumption in classifying the twenty four classes all-at-once is reported.
Ó 2009 Published by Elsevier Ltd.

1. Introduction Support vector machine is a training algorithm for learning classi-


fication and regression rules from data. It is emerging as one of the
Fault diagnosis is an important process in preventive mainte- hottest and fruitful learning methodology in artificial intelligence.
nance of gear box which avoids serious damage if defects occur Fundamentally SVMs are binary classification algorithm with
to one of the gears during operation condition. Early detection of strong theoretical foundations in statistical learning theory. Their
the defects, therefore, is crucial to prevent the system from mal- ease of use, theoretical appeal, and remarkable performance had
function that could cause damage or entire system halt. Diagnosing made them the system of choice for many learning problems. It
a gear system by examining vibration signals is the most com- has been applied very successfully in areas like optical character
monly used method for detecting gear failures. In the recent past recognition, text classification, phoneme classification for speech
reports of fault diagnosis of critical components using machine understanding and synthesis, medical data analysis etc where once
learning languages like SVM, PSVM are reported (Sugumaran, neural network, fuzzy logic and other statistical methodology ruled
Muralidharan, & Ramachandran, 2007). The conventional methods the roost.
for processing measured data contain the frequency domain tech- For the past several years, several attempts (Crammer & Singer,
nique, time domain technique, and time-frequency domain tech- 2001; Weston & Watkins, 1998) have been made to extend the bin-
nique. These methods have been widely employed to detect gear ary classification SVM into multiclass SVM and it was generally
failures. The use of vibration analysis for gear fault diagnosis and considered as a more complex problem than a binary class prob-
monitoring has been widely investigated and its application in lem. Recently, (Szedmak & Shawe-Taylor, 2005) based on the ideas
industry is well established (Cameron & Stuckey, 1994; Gadd & from Micchelli and Pontil (2004, 2005), showed an implementation
Mitchell, 1984; Leblanc, Dube, & Devereux, 1990). This is particu- with computational complexity independent of number of classes.
larly reflected in the aviation industry where the helicopter engine, In this paper, we show that computational complexity can be fur-
drive trains and rotor systems are fitted with vibration sensors for ther reduced with out sacrificing accuracy, if we formulate the
component health monitoring. problem as a non standard SVM as (Fung & Mangasarian, 2001)
According to Vapnik’s formulation (Vapnick, 1999), in one- did for binary classification problem. This approach requires only
against-all support vector machines, an n class problem is con- a linear algebra solver which is available freely everywhere. Fur-
verted into n two-class problems and for the ith two-class prob- ther, we show that iterative single data algorithm [ISDA] proposed
lems, class i is separated from the remaining classes. But by by Kecman, Huang, and VogtKecman (2005, chap. 12) for two-class
these formulation unclassifiable regions exists, if we use the dis- classification can be easily extended to multiclass problem. This
crete decision function. We can resolve unclassifiable regions by approach reduces computational complexity further, requiring
determining the decision functions all-at-once (Vapnick, 1999). only very simple iterative procedure involving matrix addition
and multiplication. Convergence of the algorithm is also guaran-
* Corresponding author. Tel.: +91 4222656422; fax: +91 4222656274. teed (Kecman et al., 2005). We also did a study of the performance
E-mail address: n_saravanan@ettimadai.amrita.edu (N. Saravanan). of the multiclass proximal support vector machine with respect to

0957-4174/$ - see front matter Ó 2009 Published by Elsevier Ltd.


doi:10.1016/j.eswa.2009.01.053
N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862 10855

time consumption and efficiency in classifying the various classes


all-at-once. Wavelet transform (WT) has attracted many research-
ers’ attention recently.
Wang and Mcfadden (1983) utilized the wavelet transform to
represent all possible types of transients in vibration signals gener-
ated by faults in a gearbox. Petrille, Paya, Esat, and Badi (1995) pro-
posed the neural network to diagnose a simple gear system after
the data have been pre-processed by the wavelet transform. Bou-
lahbal and Ismail (1997) used the wavelet transform to analyze
the vibration signal from the gear system with pitting on the gear.
The raw vibration signal in any mode from a single point on a ma-
chine is not a good indicator of the health or condition of a ma-
chine. Vibration is a vectorial parameter with three dimensions
and requires to be measured at several carefully selected points.
This work deals with feature extraction and classifications of
vibration data of a bevel gear box system by Morlet wavelet, mul-
ticlass proximal support vector machine. The vibration signal from
a piezoelectric transducer is captured for the following conditions:
good bevel gear, bevel gear with tooth breakage (GTB), bevel gear
with crack at root of the tooth (GTC), and bevel gear with face wear
of the teeth (TFW) for various loading and lubrication conditions. Fig. 1. Flow chart for bevel gear box condition diagnosis.
Table 1 gives the all twenty four classes used for classifying using
multiclass proximal support vector machine.
Wavelet transform is a time-frequency signal analysis meth- 2. Experimental studies
od, which is widely used and well established. It has the local
characteristic of time domain as well as frequency domain. In The fault simulator with sensor is shown in Fig. 2 and the pinion
the processing of non-stationary signals, it presents better per- and gear shown in Fig. 3. A variable speed DC motor (0.5 hp) with
formance than the traditional Fourier analysis. Hence, wavelet speed up to 3000 rpm is the basic drive. A short shaft of 30 mm
transform has got potential application in gear box fault diagno- diameter is attached to the shaft of the motor through a flexible
sis in which features are extracted from the wavelet transform coupling; this is to minimize effects of misalignment and transmis-
coefficients of the vibration signals. Continuous wavelet trans- sion of vibration from motor.
form (CWT) could put the fine partition ability of wavelet trans- The shaft is supported at its ends through two needle bearings.
form to good use, and is quite suitable for the gear box fault From this shaft the drive is transmitted to the bevel gear box by
diagnosis. In this work, the coefficients of Morlet wavelet were means of a belt drive. The gear box is of dimension
used for feature extraction. A group of statistical features like 150 mm  170 mm  120 mm and the full lubrication level is
kurtosis, standard deviation, maximum value, etc., form a set 110 mm and half lubrication level is 60 mm. SAE 40 oil was used
of features, which are widely used in fault diagnostics, are ex- as a lubricant. An electromagnetic spring loaded disc brake was
tracted from the wavelet coefficients of the time domain signals. used to load the gear wheel. A torque level of 8 N-m was applied
Selection of good features is an important phase in pattern rec- at the full load condition. The various defects are created in the
ognition and requires detailed domain knowledge. The decision pinion wheels and the mating gear wheel is not disturbed. With
tree using J48 algorithm was used for identifying the best fea- the sensor mounted on top of the gear box vibrations signals are
tures from a given set of samples. The selected features were obtained for various conditions. The selected area is made flat
fed as input to SVM for classification. and smooth to ensure effective coupling. A piezoelectric acceler-
ometer (Dytran model) is mounted on the flat surface using direct
1.1. Different phases of present work adhesive mounting technique. The accelerometer is connected to
the signal-conditioning unit (DACTRAN FFT analyzer), where the
The signals obtained are processed further for machine condi-
tion diagnosis as explained in the flow chart Fig. 1.

Table 1
Various classes used for classification.

1 Good-dry-no load 13 GTC-dry-no load


2 Good-dry-full load 14 GTC-dry-full load
3 Good-half lub-no load 15 GTC-half lub-no load
4 Good-half lub-full load 16 GTC-half lub-full load
5 Good-full lub-no load 17 GTC-full lub-no load
6 Good-full lub-full load 18 GTC-full lub-full load
7 GTB-dry-no load 19 TFW-dry-no load
8 GTB-dry-full load 20 TFW-dry-full load
9 GTB-half lub-no load 21 TFW-half lub-no load
10 GTB-half lub-full load 22 TFW-half lub-full load
11 GTB-full lub-no load 23 TFW-full lub-no load
12 GTB-full lub-full load 24 TFW-full lub-full load
Fig. 2. Fault simulator setup.
10856 N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862

Table 3
Gear wheel and Pinion details.

Parameters Gear wheel Pinion wheel


No. of teeth 35 25
Module 2.5 2.5
Normal pressure angle 200 200
Shaft angle 900 900
Top clearance 0.5 mm 0.5 mm
Addendum 2.5 mm 2.5 mm
Whole depth 5.5 mm 5.5 mm
Chordal tooth thickness 3.930.150 mm 3.920.110 mm
Chordal tooth height 2.53 mm 2.55 mm
Material EN8 EN8

Fig. 3. Inner view of the bevel gear box.

signal goes through the charge amplifier and an analogue-to-digi-


tal converter (ADC). The vibration signal in digital form is fed to the
computer through a USB port. The software RT pro-series that
accompanies the signal-conditioning unit is used for recording
the signals directly in the computer’s secondary memory. The sig-
nal is then read from the memory and replayed and processed to
extract different features.
Fig 4a. View of good pinion wheel.
2.1. Experimental procedure

In the present study, four pinion wheels whose details are as


mentioned in Table 2 were used. One was a new wheel and was as-
sumed to be free from defects. In the other three pinion wheels, de-
fects were created using EDM in order to keep the size of the defect
under control. The details of the various defects are depicted in Ta-
ble 3 and its views are shown in Fig. 4.
The size of the defects is a little bigger than one can encounter
in the practical situation; however, it is in-line with work reported
in literature (Gadd & Mitchell, 1984). The vibration signal from the
piezoelectric pickup mounted on the test bearing was taken, after
allowing initial running of the bearing for sometime.
The sampling frequency was 12,000 Hz and sample length was
8192 for all speeds and all conditions. The sample length was cho- Fig. 4b. View of pinion wheel with face wear (GFW).

sen arbitrarily; however, the following points were considered.


form (CWT) is used for obtaining the wavelet coefficients of the
Statistical measures are more meaningful, when the number of
signals. The statistical parameters of the wavelet coefficients are
samples is more. On the other hand, as the number of samples in-
extracted, which constitute the feature vectors.
creases the computation time increases. To strike a balance, sample
The term wavelet means a small wave. It is the representation
length of around 10,000 was chosen. In some feature extraction
of a signal in terms of finite length or fast decaying waveform
techniques, which will be used with the same data, the number
known as mother wavelet. This waveform is scaled and translated
of samples is to be 2n. The nearest 2n to 10,000 is 8192 and hence,
to match the input signal.
it was taken as sample length. Many trials were taken at the set
The continuous wavelet transform (Mallat, 1998) is defined as:
speed and vibration signal was stored in the data. The raw vibra- R þ1
tion signals acquired for various experimental conditions form W s ð sÞ ¼
1
f ðtÞWs;j ðtÞdt
 
the gear box using FFT are shown in Fig. 5. where Ws;j ðtÞ ¼ p1ffiffiffi W ts s
jsj

3. Wavelet-based feature extraction is a window function called the mother wavelet, s is a scale and s is
a translation.
After acquiring the vibration signals in the time domain, it is The term translation is related to the location of the window, as
processed to obtain feature vectors. The continuous wavelet trans- the window is shifted through the signal. This corresponds to the
time information in the transform domain. But instead of a fre-
Table 2
quency parameter, we have a scale. Scaling, as a mathematical
Details of faults under investigation.
operation, either dilates or compresses a signal. Smaller scales cor-
Gears Fault description Dimension (mm) respond to dilated (or stretched out) signals and large scales corre-
G1 Good – spond to compressed signals.
G2 Gear tooth breakage (GTB) 8 The wavelet series is simply a sampled version of the CWT, and
G3 Gear with crack at root (GTC) 0.8  0.5  20
the information it provides is highly redundant as far as the recon-
G4 Gear with face wear 0.5
struction of the signal is concerned. This redundancy, on the other
N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862 10857

hand, requires a significant amount of computation time and


resources.
The multilevel 1D wavelet decomposition function, available in
Matlab is chosen with the Morlet wavelets specified. It returns the
wavelet coefficients of signal X at scale N (Soman & Ramachandran,
2005). Fig. 6 shows Morlet wavelet.
Sixty four scales are initially chosen to extract the Morlet wave-
let coefficients of the signal data. The efficiency of sixty four scales
of Morlet wavelets were obtained using WEKA data mining soft-
ware and the coefficients of highest scale are considered for classi-
fication. Since the eighth scale gave maximum efficiency of 96.5%,
the statistical features corresponding to it were given as input for
J48 algorithm to determine the predominant features to be given
Fig 4c. View of pinion wheel with tooth breakage (GTB).

Good-Dry-Unload Good-Dry-FullLoad
0.2 0.2

Amplitude
Amplitude

0 0

-0.2 -0.2
-0.4 -0.4
0 2000 4000 6000 8000 0 2000 4000 6000 8000
Sample No. Sample No.
Good-HalfLub-Unload Good-HalfLub-FullLoad
0.2 Amplitude 0.2
Amplitude

0 0
-0.2
-0.2
-0.4
0 2000 4000 6000 8000 0 2000 4000 6000 8000
Sample No. Sample No.
Good-FullLub-Unload Good-Full-FullLoad
0.2 0.2
Amplitude
Amplitude

0 0

-0.2 -0.2

0 2000 4000 6000 8000 0 2000 4000 6000 8000


Sample No. Sample No.

Fig. 5a. Vibration signal for good pinion wheel under different lubrication and loading conditions.

GTB-Dry-Unload GTB-Dry-FullLoad
0.2 0.2
Amplitude
Amplitude

0 0
-0.2 -0.2
-0.4 -0.4
0 2000 4000 6000 8000 0 2000 4000 6000 8000
Sample No. Sample No.
GTB-HalfLub-Unload GTB-HalfLub-FullLoad
0.2 0.2
Amplitude
Amplitude

0 0
-0.2 -0.2
-0.4 -0.4
0 2000 4000 6000 8000 0 2000 4000 6000 8000
Sample No. Sample No.
GTB-FullLub-Unload GTB-FullLub-FullLoad
0.2 0.2
Amplitude
Amplitude

0 0

-0.2 -0.2
-0.4 -0.4
0 2000 4000 6000 8000 0 2000 4000 6000 8000
Sample No. Sample No.

Fig. 5b. Vibration signal for pinion wheel with teeth breakage under different lubrication and loading conditions.
10858 N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862

GTC-Dry-Unload GTC-Dry-fullLoad
0.1 0.1

Amplitude

Amplitude
0 0

-0.1 -0.1

0 2000 4000 6000 8000 0 2000 4000 6000 8000


Sample No. Sample No.
GTC-HalfLub-Unload GTC-HalfLub-FullLoad
0.1 0.1
Amplitude

Amplitude
0 0

-0.1 -0.1

0 2000 4000 6000 8000 0 2000 4000 6000 8000


Sample No. Sample No.
GTC-FullLub-Unload GTC-FullLub-FullLoad
0.1 0.1
Amplitude

Amplitude
0 0

-0.1 -0.1

0 2000 4000 6000 8000 0 2000 4000 6000 8000


Sample No. Sample No.

Fig. 5c. Vibration signal for pinion wheel with crack at root under different lubrication and loading conditions.

TFW-Dry-Unload TFW-Dry-FullLoad
0.2 0.2
Amplitude

Amplitude

0 0

-0.2 -0.2

0 2000 4000 6000 8000 0 2000 4000 6000 8000


Sample Sample
TFW-HalfLub-Unload TFW-HalfLub-FullLoad
0.2 0.2
Amplitude

Amplitude

0 0
-0.2 -0.2

0 2000 4000 6000 8000 0 2000 4000 6000 8000


Sample Sample
TFW-FullLub-Unload TFW-FullLub-FullLoad
0.2 0.2
Amplitude

Amplitude

0 0

-0.2 -0.2

0 2000 4000 6000 8000 0 2000 4000 6000 8000


Sample Sample

Fig. 5d. Vibration signals for pinion wheel with teeth face wear under different lubrication and loading conditions.

as an input for training and classification using SVM. Fig. 7 gives and Soares (2002). A decision tree is a tree-based knowledge rep-
the efficiencies of all scales. resentation methodology used to represent classification rules.
J48 algorithm (A WEKA implementation of c4.5 Algorithm) is a
4. Using J48 algorithm in the present work widely used one to construct decision trees as explained by
Sugumaran et al. (2007).
A standard tree induced with c5.0 (or possibly ID3 or c4.5) con- The decision tree algorithm has been applied to the problem
sists of a number of branches, one root, a number of nodes and a under discussion. Input to the algorithm is set of statistical features
number of leaves. One branch is a chain of nodes from root to a of the eighth scale Morlet coefficients of the vibration signatures of
leaf; and each node involves one attribute. The occurrence of an all the twenty four classes. It is clear that the top node is the best
attribute in a tree provides the information about the importance node for classification. The other features in the nodes of decision
of the associated attribute as explained by Peng, Flach, Brazdil, tree appear in descending order of importance. It is to be stressed
N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862 10859

Fig. 8. Good-dry-no load Vs GTB, GTC, TFW-dry-no load.


Fig. 6. Morlet wavelet (Mallat, 1998).

W is a matrix representing a linear operator mapping from the fea-


here that only features that contribute to the classification appear ture space into the label space, Hy  h; i; k  k, denotes the inner
in the decision tree and others do not. Features, which have less product and the norm defined in the corresponding Hilbert space,
discriminating capability, can be consciously discarded by deciding tr(W) denotes the trace of the matrix W, dim (H) gives the dimen-
on the threshold. This concept is made use for selecting good fea- sion of the space H. e is a vector of ones. All the vectors and Matri-
tures. The algorithm identifies the good features for the purpose of ces are in bold.
classification from the given training data set, and thus reduces the
domain knowledge required to select good features for pattern 5.2. Multiclass formulation of the SVM with vector output
classification problem. Fig. 8 shows the sample decision tree ob-
tained for good-dry-no load Vs GTB, GTC, and TFW-dry-no load. In this we section we will discuss the current formulation as gi-
Based on decision trees obtained for all twenty four classes, it ven in Szedmak and Shawe-Taylor (2005) and then we will show
was found that statistical features like standard error, kurtosis, that the formulation is basically a variation of proximal support
sample variance and minimum value play a dominant role in fea- vector machines. To discuss the problem, we will be at first repro-
ture classification using Morlet coefficients. These four predomi- ducing the problem formulation given in Szedmak and Shawe-Tay-
nant features are fed as an input to multiclass proximal support lor (2005). First of all, we will see how the multiclass formulation
vector machine for further classification. These features were given and interpretation differs from classical binary SVMs. Firstly, class
as input for training and testing of classifying features using mul- labels are vectors instead of 1s and 1s in the binary SVM. Thus class
ticlass proximal support vector machine. labels in binary SVM belong to one-dimensional subspace where as
for multiclass SVM class label belongs to multi-dimensional sub-
5. Classification using multiclass proximal support vector space. Secondly, W, which defines the separating hyper plane in
machine Binary SVM, is a vector. In multiclass, W is a matrix. We can imag-
ine the job of W in two-class SVMs is to map the data/feature vec-
5.1. Notation tor into one-dimensional subspace. In multiclass SVM, the natural
extension is then, mapping data/feature space into vector label
The notations that we use are summarized below. We followed space whose defining bases are vectors. In other words multiclass
the notation used in Szedmak and Shawe-Taylor (2005). Note that learning may be viewed as vector labeled learning or vector value-
we assume every Hilbert space mentioned has finite dimension learning.
and is defined above the real numbers. Now we give the formulation given in Szedmak and Shawe-Tay-
lor (2005) and then the modification. Assume we have a sample S
Hn is an arbitrary Hilbert space with dimension n. of pairs fðyi ; xi Þ : yi 2 Hy ; xi 2 Hx ; i ¼ 1; . . . mg independently and
Hx is a Hilbert space comprising the possible input vectors. identically generated by an unknown multivariate distribution P.
H/(x) is a Hilbert space comprising the feature vectors. The support vector machine with vector output is realized on this
Hy is a Hilbert space comprising the label vectors. sample by the following optimization problem:

Efficiency of Morlet Coefficients


97

96
% Efficiency

95

94

93

92

91
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64
Morlet Wavelet Scale

Fig. 7. % Efficiency of Morlet wavelet coefficients.


10860 N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862

min 12 trðW T WÞ þ CeT n Table 4


Multiclass proximal SVM results.
fWjW : H/ðxÞ ! Hy ; W is a linear operatorg;
fbjb 2 Hy g; bias vector Trail No. % Error Time (s) Nu Sigma
Subject to
fnjn 2 Hm g; slack or error vector 1 40.83333 132.2188 0.1 0.5
hyi ; ðW/ðxi Þ þ bÞi P qi  pi ni ; i ¼ 1; . . . m; 2 40.41667 131.1563 0.2 0.5
3 39.66667 132.1094 0.3 0.5
nP0
4 39.33333 131.0781 0.4 0.5
ð1Þ 5 39.33333 131.0156 0.5 0.5
6 39.45833 131.0781 0.6 0.5
where, 0 denote the vectors with components 0. The real values qi 7 39.58333 131.3125 0.7 0.5
and pi denote normalization constraints that can be chosen from 8 39.125 131.0625 0.4 0.6
the set of values f1; kyi k; k/ðxi Þk; kyi kk/ðxi Þkg, depending on the par- 9 38.95833 131.7969 0.4 0.7
10 38.625 132.375 0.4 0.8
ticular task. The bias term b can be put as zero because it has been
11 38.33333 130.9375 0.4 0.9
shown in Kecman et al. (2005) that polynomial and RBF kernel do 12 37.83333 131.3438 0.4 1
not require the bias term. 13 35.91667 132.0313 0.4 2
To understand the geometry of the problem better, first we let 14 34.75 132.1406 0.4 3
and are 1, and then the magnitude of the error measured by the 15 33.79167 132.2188 0.4 4
16 32.66667 132.1875 0.4 5
slack variables will be the same independently of the norm of 17 31.91667 132.1094 0.4 6
the feature vectors. Introducing dual variables fai ji ¼ 1; . . . ; mg to 18 31.66667 132.0469 0.4 7
the margin constraints and based on the Karush–Kuhn–Tucker the- 19 31.54167 132.125 0.4 8
ory we can express the linear operator W by using the tensor prod- 20 31.375 132.9531 0.4 9
21 30.875 132.2656 0.4 10
ucts of the output and the feature vectors, that is
22 30.79167 132.2188 0.4 11
X
m
23 30.625 132.4531 0.4 12
W¼ ai yi /ðxi ÞT : ð2Þ 24 30.20833 132.9375 0.4 13
i¼1 25 29.95833 132.0938 0.4 14
26 29.79167 133.3906 0.4 15
The dual gives
27 29.58333 132.8125 0.4 16
K/ K yij 28 29.375 131.9063 0.4 17
ij
X
m zfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflffl{ zfflfflffl}|fflfflffl{ X
m 29 29.29167 131.5156 0.4 18
min ai aj h/ðxi Þ; /ðxj Þi hyi ; yj i  ai ð3Þ 30 29.33333 131.9531 0.4 19
i;j¼1 i¼1 31 29.20833 132.25 0.4 20
32 29.08333 132.4688 0.4 21
Subject to fai j aj 2 Rg; 33 29.25 133.5938 0.4 22
Xm 34 29.08333 132.0313 0.4 23
ðyi Þi ai ¼ 0; t ¼ 1; . . . ; dimðHy Þ 35 29.04167 132.1406 0.4 24
i¼1 36 29 132.5938 0.4 25
37 28.95833 132.4219 0.4 26
C P ai P 0; i ¼ 1; . . . ; m
38 28.875 132.1406 0.4 27
39 28.91667 132.5625 0.4 28
where we write the output of inner products in the objective as ker-
40 28.83333 132.8125 0.4 29
nel items h/ðxi Þ; /ðxj Þihyi ; yj i ¼ K /ij K yij where K /ij and K yij stand for the 41 28.91667 133.0469 0.4 30
elements of the kernel matrices for the feature vectors and for the 42 29 133.6719 0.4 31
label vectors respectively. Hence, the vector labels are kernelized 43 28.875 134.25 0.4 32
as well. The synthesized kernel is the element wise product of the 44 28.83333 136.5156 0.4 33
45 28.70833 139.0469 0.4 34
input and the output kernels, an operation that preserves positive 46 28.79167 141 0.4 35
semi-definiteness. 47 28.70833 140.5938 0.4 36
The main point to be noted in the above formulation (1) is the 48 28.75 141.1406 0.4 37
constraint equations. 49 28.70833 142.8594 0.4 38
50 28.54167 142.6719 0.4 39
hyi ; ðW/ðxi ÞÞi P 1  ni ; i ¼ 1; . . . m; ni P 0; i ¼ 1; . . . m: 51 28.45833 143.2031 0.4 40
52 28.5 143.625 0.4 41
Here when we project W/ (xi) onto yi, we are restricting the result- 53 28.5 144.9844 0.4 42
ing value to be always less than or equal to one. There seems to be 54 28.5 147.8125 0.4 43
no compelling reason for such a restriction. So if we allow yi,(W/ 55 28.54167 145.375 0.4 44
56 28.375 142.9688 0.4 45
(xi)) to take value around 1 (both sides) the nonnegative restriction
57 28.375 141.2031 0.4 46
on ni goes out. 58 28.375 140.6563 0.4 47
Further the inequality constraint hyi ; ðW/ðxi ÞÞi P 1  ni ; 59 28.29167 141.6406 0.4 48
i ¼ 1; . . . m; becomes equality constraint. hyi ; ðW/ðxi ÞÞi ¼ 60 28.29167 142.4531 0.4 49
1  ni ; i ¼ 1; . . . m. 61 28.25 142.0781 0.4 50
62 28.20833 142.0156 0.4 51
The above change in turn necessitates 1-norm minimization 63 28.25 139.9531 0.4 52
term CeT n in the objective function to take the two-norm form: 64 28.20833 139.0313 0.4 53
1
2
CnT n. 65 28.29167 138.4844 0.4 54
This finally leads to the formulation given below (4), which 66 28.25 139.0469 0.4 55
67 28.125 139.8438 0.4 56
basically can be thought of an extension of the formulation given
68 28.04167 140.1406 0.4 57
by Mallat (1998) for a two-class SVM. 69 27.875 140.4375 0.4 58
70 27.83333 139.9844 0.4 59
min 12 trðW T WÞ þ CeT n 71 27.83333 141.2969 0.4 60
fWjW : H/ðxÞ ! Hy ; W is a linear operatorg; 72 27.79167 143.2969 0.4 61
Subject to 73 27.83333 146.2344 0.4 62
fnjn 2 Hm g; slack or error vector
74 27.75 149.8594 0.4 63
hyi ; ðW/ðxi Þ þ bÞi P qi  pi ni ; i ¼ 1; . . . m: 75 27.66667 154.5156 0.4 64
ð4Þ 76 27.625 158.9688 0.4 65
N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862 10861

Table 4 (continued) a
Ka ¼ e 
Trail No. % Error Time (s) Nu Sigma C
77 27.58333 163.5 0.4 66 where, K ¼ ðK yij  K /ij Þ (where  represents element wise
78 27.58333 159.75 0.4 67 multiplication)
79 27.66667 156.6406 0.4 68  
80 27.66667 153.7813 0.4 69 I
Kþ a¼e
81 27.66667 151.3906 0.4 70 C
82 27.625 150.9375 0.4 71
83 27.58333 151.2656 0.4 72  
I
84 27.625 151.6563 0.4 73 Q a ¼ e; where Q ¼ K þ ð7Þ
85 27.625 152 0.4 74 C
86 27.66667 152.5625 0.4 75
87 27.66667 150.9063 0.4 76
88 27.75 150.1094 0.4 77
a ¼ Q 1 e:
89 27.875 161.9531 0.35 66
This leads to the closed form solution for SVM training. Here as is
90 27.91667 160.375 0.3 66
91 27.79167 160.7656 0.45 66
unrestricted in sign and unbounded. Our argument is that in the
92 27.54167 163.4063 0.5 66 formulation given in Szedmak and Shawe-Taylor (2005), the restric-
93 27.83333 162.6563 0.55 66 tion put on error variable is unwarranted especially when we inter-
94 28.0000 161.0938 0.6 66 pret that W maps data/feature points into label space. The error that
95 27.54167 160.9844 0.5 67
we allow while doing the mapping can have any sign. It is yet to be
96 27.58333 156.6563 0.5 68
97 27.58333 153.125 0.5 69 explored the impact and meaning of restriction on the sign of error
98 27.54167 158.5938 0.5 65 variable.
99 27.58333 154.0938 0.5 64
100 27.66667 149.3594 0.5 63
101 27.75 147.0469 0.5 62
6. Application of MSVM for problem at hand and results
102 27.58333 158.1875 0.5 68
103 27.625 151.0938 0.5 70 The predominant statistical features selected from the eighth
104 27.625 150.7031 0.5 71 scale of Morlet wavelet were given as an input to the multiclass
proximal support vector machine. Totally as mentioned in section
1.0, there are totally 24 classes, all these classes are classified all-
at-once using multiclass support vector machine. The kernel
On taking Lagrangian we obtain: parameters for classification consist of Nu, Sigma, Tolerance and
no. of iterations. Each class consists of 100 data sets and a 10 cross
Xm validation is used for testing. The no. of iterations was fixed to 50
1
L¼ trðW T WÞ þ CeT n  ai ðyTi ; W/ðxi Þ  1 þ ni Þ for the entire classification and tolerance was set to 0.0001. Table 4
2 i¼1 shows results obtained using multiclass proximal support vector
machine for different no. of trails and also time taken for classifica-
@L Xm Xm
tion in each trail. The Fig. 9 shows the% error for different trails and
¼W ai yi /ðxÞT ¼ 0 ) W ¼ ai yi /ðxÞT ð5Þ
@W i¼1 i¼1
Fig. 10 shows the time taken for classification for a particular trail.

@L ai a 7. Discussion
¼ Cni  ai ¼ 0 ) ni ¼ ; n ¼ : ð6Þ
@ni C C
In this paper, we have shown that a multiclass proximal support
P
Substituting W ¼ m T a
i¼1 ai yi /ðxÞ and n ¼ C in the constraint in (4), we
vector machine simplifies the computation and shred some light on
y /
obtain ðK ij  K ij Þa ¼ e  Ca (where  represents element wise the geometry of multiclass formulation. It was clear from the results
multiplication) that using multiclass proximal support vector machine, it was also

%Error Vs No. of Trails


41

39

37
% Error

35

33

31

29

27
0 10 20 30 40 50 60 70 80 90 100
No. of Trails

Fig. 9. % Error for different no. of trails.


10862 N. Saravanan, K.I. Ramachandran / Expert Systems with Applications 36 (2009) 10854–10862

Time Vs No. of Trails


165

160

155
Time (sec)

150

145

140

135

130
0 10 20 30 40 50 60 70 80 90 100
No. of Trails

Fig. 10. Time elapsed for training (sec) for different no. of trails.

depicted that the speed of the classification is high compared to oth- Gadd, P., & Mitchell, P. J. (1984). Condition monitoring of helicopter gearboxes using
automatic vibration analysis techniques. AGARD CP 369, Gears and power
ers and only a few kernel parameters (Nu and Sigma) were needed
transmission system for helicopter turboprops (pp. 29/1–29/10).
for classification. In this work, an error percentage of 27.541 were Kecman, V., Huang, T. -M., & Vogt, M. (2005). Iterative single data algorithm for
obtained for classifying a total of twenty four classes. training kernel machines from huge data sets: Theory and performance. In
Support vector machines: Theory and applications. Studies in fuzziness and soft
computing (Vol. 177). Springer-Verlag.
8. Conclusion Leblanc, J. F. A., Dube, J. R. F., & Devereux, B. (1990). Helicopter gearbox vibration
analysis in the Canadian forces – Applications and lessons. In Proceedings of the
first international conference, gearbox noise and vibration (pp. 173–177).
Fault diagnosis of Gear box is one of the core research areas in the Cambridge, UK: IMechE. C404/023.
field of condition monitoring of rotating machines. A total of twenty Mallat (1998). A wavelet tour of signal processing. Academic Press.
four classes are classified using statistical features of eighth scale Micchelli, C. A., & Pontil, M. (2004). Kernels for multi-task learning. In Proceedings of
the 18th conference on neural information processing systems (NIPS’04).
Morlet wavelet coefficients and multiclass proximal support vector
Micchelli, C. A., & Pontil, M. (2005). On learning vector-valued functions. Neural
machine is used in further classification of features. An error of Computation, 17, 177–204.
27.541% is obtained in classifying a total of twenty four classes Peng, Y. H., Flach, P. A., Brazdil, P., & Soares, C. (2002). Decision tree-based data
characterization for meta-learning, In ECML/PKDD-2002 Workshop IDDM.
(given in Table 3) all-at-once. It is found that the classification using
Helsinki, Finland.
the multiclass proximal support vector machine gives good results Petrille, O., Paya, B., Esat, I. I., & Badi, M. N. M. (1995). In Proceedings of the energy-
for large classes of data in less time. There exists better result than sources technology conference and exhibition: Structural dynamics and vibration
obtained one if it is possible to find the better kernel parameters. PD (Vol. 70, p. 97).
Soman, K. P., & Ramachandran, K. I. (2005). Insight into wavelets from theory to
practice. Prentice-Hall of India Private Limited.
Sugumaran, V., Muralidharan, V., & Ramachandran, K. I. (2007). Feature selection
References using decision tree and classification through proximal support vector machine
for fault diagnostics of roller bearing. Mechanical Systems and Signal Processing,
Boulahbal, D., Golnaraghi, M. F., & Ismail, F. (1997). In Proceedings of the 21, 930–942.
DETC’97, 1997 ASME design engineering technical conference, DETC97/VIB- Szedmak, S., & Shawe-Taylor, J. (2005). Multiclass learning at one-class complexity.
4009. Technical report. ISIS Group, Electronics and Computer Science.
Cameron, B. G., & Stuckey, M. J. (1994). A review of transmission vibration Vapnick, V. N. (1999). In The nature of statistical learning theory. 5, 2nd edn. Springer-
monitoring at Westland Helicopter Ltd. In Proceedings of the 20th European Verlag, pp. 138–146.
rotorcraft forum, Paper 116 (pp. 16/1–116/16). Wang, W. J., & Mcfadden, P. D. (1983). Early detection of gear failure by vibration
Crammer, Koby, & Singer, Yoram (2001). On the algorithmic implementation of analysis II: Interpretation of the time-frequency distribution using image
multiclass kernel-based vector machines. Journal of Machine Learning Research, processing techniques. Mechanical Systems and Signal Processing, 7(3), 205–
2, 265–292. 215.
Fung, G., & Mangasarian, O. L. (2001). In Proximal support vector machine classifiers Weston, J., & Watkins, C. (1998). Multiclass support vector machines. Technical
KDD 2001: Seventh ACM SIGKDD international conference on knowledge discovery report. CSD-TR-98-04. Department of Computer Science, Royal Holloway,
and data mining. San Francisco, August 26–29. University of London, Egham, TW20 0EX, UK.

You might also like