You are on page 1of 14

International Journal of Machine Tools & Manufacture 43 (2003) 707–720

Drilling wear detection and classification using vibration signals

and artificial neural network
Issam Abu-Mahfouz ∗
Penn State Harrisburg, Mechanical Engineering Technology, 777 West Harrisburg Pike, W-255 Olmsted Building, Middletown, PA 17057, USA

Received 15 July 2002; received in revised form 9 December 2002; accepted 15 January 2003


In automated flexible manufacturing systems the detection of tool wear during the cutting process is one of the most important
considerations. This study presents a comparison between several architectures of the multi-layer feed-forward neural network with
a back propagation training algorithm for tool condition monitoring (TCM) of twist drill wear. The algorithm utilizes vibration
signature analysis as the main and only source of information from the machining process. The objective of the proposed study is
to produce a TCM system that will lead to a more efficient and economical drilling tool usage. Five different drill wear conditions
were artificially introduced to the neural network for prediction and classification. The experimental procedure for acquiring vibration
data and extracting features in both the time and frequency domains to train and test the neural network models is detailed. It was
found that the frequency domain features, such as the averaged harmonic wavelet coefficients and the maximum entropy spectrum
peaks, are more efficient in training the neural network than the time domain statistical moments. The results demonstrate the
effectiveness and robustness of using the vibration signals in a supervised neural network for drill wear detection and classification.
 2003 Elsevier Science Ltd. All rights reserved.

Keywords: Process monitoring; Drilling; Neural network; Perceptron; Pattern recognition; Sensors; Supervised learning; Vibration analysis

1. Introduction cess. To overcome this difficulty indirect methods are

required. Sensors are needed in these systems to identify
The manufacturing community is always striving to unexpected failure. Sensory systems are increasingly
reduce operating costs while trying to improve product playing a pivotal role in the realization of advanced auto-
quality and meeting or exceeding customer satisfaction. mated manufacturing systems. However, it is difficult to
These goals are behind the drive towards automation and decide on the best parameters to measure and on the
the use of high production unmanned equipment. Flex- analysis methods to adopt for the system under investi-
ible manufacturing cells with untended machining oper- gation. The cost of the sensory system is another
ations require the development of reliable methods of important consideration and should be accounted for
on-line monitoring of their metal cutting operations. The when designing an industrial monitoring system. Com-
metal cutting process is highly nonlinear involving such prehensive discussions on the design and implemen-
phenomena as plastic deformation, fracture, impact, con- tation of sensor-based tool wear monitoring systems can
tinuous and intermittent multicontact points, friction, and be found in [1,2].
wear. Direct visual inspection of the cutting edge during The reason for acquiring the drill wear state infor-
machining is not feasible because the workpiece and mation is to enhance the predictive capability to allow
chips obstruct the view. Due to the complexity of the the machine operator to schedule tool change or regrind
cutting process, it is often not possible to obtain a math- just in time to avoid underuse or overuse of tools, avoid
ematical description of the relevant dynamics of the pro- shutdown of machines due to damage, and to minimize
scrap or rework. On the other hand, drill wear affects
the ability of the hole cutting system to satisfy specified

Tel.: +1-717-948-6361; fax: +1-717-948-6502. performance characteristics, such as hole roundness,
E-mail address: (I. Abu-Mahfouz). centering, burr formation at drill exit, and surface finish.

0890-6955/03/$ - see front matter  2003 Elsevier Science Ltd. All rights reserved.
708 I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720

However, a more effective TCM system should not only time domain and the area under the power spectrum
be capable of detecting the existence of drill wear but curve to monitor various types of drill wear. Liu and
should also be able to identify the types of wear and Chen [11] used a backpropagation neural network for
forecast the remaining useful life span of the worn tool. on-line detection of drill wear to decide whether a drill
In most of the research conducted on drill condition was usable or not. The input vector comprised drill size,
monitoring [3–9], progressive flank wear was the domi- feed rate, spindle speed, and eight features representing
nant failure mode and has been extensively investigated. four values of torque and thrust, the average, peak value,
Although, flank wear is a good indication of the drill root mean square (rms), and the area under the time sig-
condition and has been used to indicate the severity of nal. Dimla et al. [12] gave a critical review of neural
drill wear, other types of wear also have an equal or network methods used in the TCM problem in metal cut-
even greater influence on the hole quality and surface ting.
finish. Typically, the failure of a drill occurs by excess- Measuring the torque and thrust of the drilling process
ive wear on the flank, chisel wear, crater wear, outer requires special instrumentation such as a dynamometer
corners wear, and fracture or chipping of the cutting lip that in most cases needs special mounting fixtures. This
or edge. Drill wear has a great influence on the dynamics can influence the dynamic and stiffness characteristics
of the drilling process. In the following a brief literature of the drilling system. Acoustic emission (AE) sensors
review of some of the work done in the field of drill are usually more expensive than most industrial acceler-
wear monitoring is presented. ometers. AE sensors also have greater demands on sam-
Li and Wu [3] introduced a fuzzy C-means clustering pling rate, noise filtering, data storage and retrieval
algorithm for on-line monitoring of four drill wear memory, speed of processing and analysis. These greater
grades, ‘Initial’, Small’, ‘Normal’, and ‘Severe’. The demands are mainly due to the fact that AE sensors are
thrust force and torque were selected as the features rel- used to pick up higher frequency signals resulting from
evant to the four drill wear states. The detection of the material deformation, fracture, and chip breakage.
grade membership of the wear state ‘Severe’ was pro- Accelerometers are simple to operate and are very suit-
posed as a control variable for drill replacement. The able for wear monitoring because they offer the follow-
Flank wear area was also used by Liu and Wu [4] in a ing advantages:
two-category linear classifier to indicate a usable or
worn-out tool. The variations of the vertical acceleration 앫 No effect on stiffness and damping properties of the
and thrust force were chosen as their indices for drill drilling system;
wear. Govekar and Grabec [5] described an adaptive 앫 Can be easily mounted close to the cutting action,
self-organizing neural network of the Kohonen type for independent of tool or workpiece;
classification of the drill flank wear. Barton and Thanga- 앫 When properly shielded, they have good resistance to
raj [6] used a neural network to integrate information coolants, chips, electromagnetic or thermal influ-
extracted from the frequency domain of the machine ences;
spindle vibration signals to predict the quality of the 앫 Accelerometers are easily replaceable and are very
hole. The cutting momentum and the feed force power cost-effective.
spectrum were used as the sensory part of the input vec-
tor, while the descriptive part was encoded from the cor- Artificial neural network algorithms are regarded as
responding drill flank wear. Liu and Anantharaman [7] multivariate nonlinear analytical tools capable of reco-
used back propagation with an adaptive activation-func- gnizing patterns from noisy complex data and estimating
tion slope neural network trained by the drilling thrust their nonlinear relationships. Their major advantages
and torque to predict flank wear. They concluded that a include superior learning, noise suppression, and parallel
9 × 14 × 1 architecture yielded the best results. The data processing capabilities. The objective of this study
thrust and torque signals were also used by Lin and Ting is to demonstrate the performance of an Artificial Neural
[8] for comparing several structures and parameters of Network approach based on vibration signals for the
a cumulative back-propagation algorithm with regression detection, and characterization of twist drill wear and
models for drill flank wear monitoring. Li et al. [9] damage. In this paper the artificial neural network term
presented a hybrid learning method to map the relation- will be referred to simply as ‘ANN’, and the feed-for-
ship between the features of cutting vibration and the ward back propagation term will be referred to as
drill flank wear condition. Their method was based on ‘FFBP’.
a neural network model with fuzzy logic trained by the
r.m.s. of the frequency distribution of the vibration sig-
nals. They showed that the r.m.s. of the frequency bands 2. Experimental set-up
increased as the flank wear increased. El-Wardany et al.
[10] used the vibration signature in drill condition moni- Fig. 1 shows a schematic presentation of the experi-
toring. They presented a study using the kurtosis of the mental set-up. Conventional uncoated high-speed-steel
I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720 709

Fig. 1. Schematic diagram of the experimental set-up.

(HSS) twist drills with diameter 12.7 mm (0.5 inch) were

used in the tests. Fig. 2 presents the nomenclature and
geometry of a conventional twist drill. Five types of drill
wear were artificially induced on the drill point, as
shown in Fig. 3, and these are:

1. Chisel wear: normally occurs due to the very high

shear and compressive stresses in the flow zone of the
tool-workpiece interface acting at high temperatures,
which causes erosion of the chisel edge.
2. Crater wear on the rake face of one cutting edge.

Fig. 3. Five types of artificially induced drill wear. Drawings are not
to scale (all dimensions are in mm).

Crater wear is due to high temperature conditions

along the rake surface.
3. Flank wear on the two flank or clearance faces of
the lips.
4. Fracture or breakage on one lip of the cutting edge.
5. Wear on both outside corners of the drill point. This
is due to high friction (rubbing) and impact forces
between the drill and the machined hole walls.

Fig. 2. Nomenclature and geometry of conventional twist drill. Drilling experiments were performed on a Bridgeport
710 I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720

3-axis CNC vertical milling machine. The work material 3.1. Time-frequency by harmonic wavelets
used in this research was a 1040 carbon steel. The work
piece hardness was measured to be 235 BHN. A simplified form of the harmonic wavelet analysis,
One accelerometer (model PCB IMI-607A61) with a presented in [13], was utilized here as one method for
sensitivity of 10.2mV/(m/s2) and a measurement range feature extraction from the vibration signals of the drill-
of ± 490m/s2, was used to measure the vibration signals. ing process. Time-frequency wavelet analysis is con-
As shown in Fig. 1, the accelerometer was mounted on cerned about correlating an input signal f(t) with a wave-
the clamping fixture of the work piece. The analog sig- let base function f(t). The wavelet coefficient c(t) is
nals from this sensor were fed into a PCP 482-series defined by the following correlation equation
Charge amplifier and then to a National Instruments PCI
data acquisition board, with a sampling rate capability
of 1.2 MHz, for signal digitization and conditioning. A c(t) ⫽ 冕 ⫹⬁

f(t) f∗(t⫺t) dt, (1)
Pentium III (650 MHz and 256 MB RAM) was used for
data collection, analysis and neural network training and where f∗(t) is the complex conjugate of f(t), t is a time
testing. The vibration signals were recorded only during parameter, and t is the center of f along the time axis
steady state cutting, i.e., not including the penetration t. Transforming this Eq. (1) from the time domain to the
and exit stages of the drilling process. Drilling tests were frequency domain, using the Fourier transform becomes
performed at four sets of cutting conditions listed in C(w) ⫽ F(w) f∗(w), (2)
Table 1, which are used for off-line training and testing.
Two other sets of cutting conditions were later used for where the following definitions apply

generalization testing of the ANN. All experiments were ⫹⬁
performed under dry cutting conditions. The cutting 1
C(w) ⫽ c(t)e(⫺iwt)dt,
speed and feed were two input features to the ANN. 2p ⫺⬁

3. Signal analysis and feature extraction

F(w) ⫽
2p 冕⫺⬁
(preprocessing) and

Current economic aspects pertinent to production

automation and untended machining, place great con-
f(w) ⫽ 冕 ⫺⬁
straints on tool condition monitoring techniques and
practices. For best performance of the monitoring sys- The Discrete Fast Fourier Transform (DFFT) algor-
tem, only those features which show a high sensitivity ithm was used in computing the above Fourier trans-
to tool wear and low sensitivity to process parameters forms. Following a similar method as in [13], a simple
should be utilized. In this study, vibration based fault wavelet with a boxcar spectrum was used. This means
symptoms were used for the drill wear detection and that f(w) was zero everywhere except in a finite band
classification. Vibration analysis is widely accepted as a of frequencies where it was assigned a value of unity.
tool to monitor the operating conditions of a machine 256 wavelets, used within a segment of 4096 data points,
as it is nondestructive, reliable and permits continuous equivalent to 2048 Nyquist frequency points, were cal-
monitoring without intervening with the process. This culated as follows:
section discusses the signal analysis and data reduction
techniques used to extract the necessary features and 1. The input f(t) time history of the vibration signal was
time invariant indices used for ANN training and testing. represented in segments of 4096 data points.
The following indices were chosen for the feature vector: 2. The DFFT algorithm was used to give the 2048 Four-
ier Coefficients F(w).
3. The following Fourier Coefficient multiplication was
Table 1 performedCi = Fi fi∗, i = 0⫺2047.
Cutting conditions used for the drill wear experiments 4. The IFFT (inverse FFT) of the generated series was
computed to obtain the cr(t) , r = 0⫺2047.
Speed (rpm) Feed (mm/min)
5. 16 averaged wavelet coefficients were extracted from
Cutting conditions used for 300 400
the result of step 4 by grouping and averaging each
training and testing the ANN 400 300 adjacent 128 Coefficients cr(t) .
600 200
900 150 The 16 averaged harmonic wavelet coefficients (HWCs)
Cutting conditions used 350 400
only for testing the ANN 1000 130
form a feature vector which served as an input pattern
to the neural network. Fig. 4 displays the 16 HWCs as
I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720 711

Fig. 4. FFT, 16 HWC, and Burg Power Spectra for 12.7 mm twist drill cutting at 300 mm/min, and 400 rpm, (a) new drill, (b) chisel wear, (c)
crater wear, (d) flank wear, (e) edge fracture, and (f) outer corner wear (Cutting at 440 mm/min, and 570 rpm).

a histogram under the FFT plot for two of the sample MESA and their corresponding location in the frequency
cutting conditions. domain were presented as another input feature vector
to the ANN. The MESA method computes the power
3.2. Maximum entropy spectral analysis (MESA) spectral density from the Burg algorithm [15],
The Burg algorithm [14] and [15], a parametric spec- Sxx(f) ⫽ (4)

| 冘
M 2
tral estimation method, was used to give an estimate of
the power spectral density (PSD) of the discrete-time for
the vibration signals. The Burg method gives a smoothed
2fs 1 ⫹
PSD presentation, compared, for example to the Welch where PM is the residual power of the Mth order autore-
method, plus it is computationally efficient. In this study, gressive prediction error filter (PEF), with coefficients
the highest eight local maxima of the Burg PSD or Am, while fs is half the sampling frequency. After some
712 I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720

Fig. 4. Continued

numerical experimentation with the vibration signals

obtained for this study, the filter order M in Eq. (4) was m ⫽ x̄ ⫽ xi, (5)
N i⫽1
selected to be 32.
where xi was the instantaneous amplitude of the
3.3. Statistical measures in the time domain vibration signal.
The variance V = s 2 of the data segment was nor-
In order to quantify objectively the vibration signal in malized by N⫺1 (where N = 4096 samples), and it is
the time domain, it was evaluated by means of four stat- the mean square value about the mean:
istical parameters. These parameters where the mean µ,

the variance V, the Kurtosis K, and the skewness S and N

they were calculated as follows: (xi⫺x̄)2

The mean (µ) calculates the vibration signal average V ⫽ s2 ⫽
. (6)
for N = 4096 samples: (N⫺1)
I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720 713

The skewness (S) is a measure of the asymmetry of listed above. In most cases these parameters tend to
the data around the sample mean. The skewness of a reach a constant value for a discrete time history segment
normal distribution (or any perfectly symmetric of at least 6000 points. The four parameters at 8000 data
distribution) is zero. The skewness is the third statistical samples for each time record were considered as input
moment of a distribution and is defined as: features to the ANN. The preprocessing techniques illus-
E(x⫺m)3 trated above were the result of extensive trials using vari-
S ⫽ , (7) ous quantifying descriptions and were found to lead to
the best network performance. One reason for combining
where E(x) is the expected value of x. different signal analysis subroutines was to reduce the
The kurtosis (K) is a measure of how outlier-prone a sensitivity of the TCM system to variables other than
distribution is [16]. The kurtosis of a normal distribution tool wear condition. Another reason was that more than
is 3. Distributions that are more outlier-prone than a nor- one type of wear might develop at different locations of
mal distribution have a kurtosis greater than 3; distri- the drill point at the same time.
butions that are less outlier-prone have a kurtosis less
than 3. The kurtosis of a distribution is the fourth statisti-
cal moment and is defined as: 4. Artificial Neural Network (ANN)

冘 (xi⫺x̄)
N 4
K ⫽ (8) Recent work in the field of artificial neural networks
Ni ⫽ 1 s4
has proven that they can be particularly useful in the
where x̄ is the average vibration amplitude or mean m. modeling of nonlinear mapping and in the recognition
In [10] it was demonstrated that the kurtosis was rather of distinctive features from incomplete or chaotic input
sensitive to the occurrence of spikes or impulses in the data. The power of a neural model depends on how many
time domain of the vibration signal. neurons there are, how they are connected and how each
Fig. 5 shows samples of the four statistical parameters neuron operates. Back propagation is a supervised algor-

Fig. 5. The influence of the number of vibration signal data points used in calculating the four statistical measures for the ANN (a) kurtosis, (b)
skewness, (c) standard deviation, and (c) variance.
714 I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720

ithm consisting of the training and testing phases, [17] feature data x = (x1,...,xN) to neurons in the hidden layer.
and [18]. Training is carried out by presenting to the The middle, or hidden layer contains artificial neural
neural model a sequence of examples of the problem to nodes, as does the output layer on the right. The hidden
be solved. The training set is sometimes called the input layer is used to process and connect the information from
stimuli (or epoch). The number of epochs must be large the input layer to the output layer only in a forward
enough to perform the training process adequately so direction. The hidden layer performs feature extraction
that generalization behavior of the classifier is assured. on the input data. Only one hidden layer was used in
These epochs are in the form of input data, together with the present study. Each neuron in the hidden layer (Fig.
the expected neural model output. In supervised learn- 6 (b)) sums up its input signals after weighting them
ing, the model weights are repeatedly and adaptively with the strengths of the respective connections wnm and
modified to make the model’s output agree with that computes its output ym as a function of the sum;
specified in the training data. The weighted sum is
passed to a transfer function to calculate the output of
冉冘 冊

the node. The output of the transfer function is then fed ym ⫽ h wnmxn , (m ⫽ 1,...,M). (9)
to the input of another node (neuron) in a neural model. n

Another, different, set of input-output facts, called the

test set, which was not used in the training stage is then The outputs of the neuron in the output layer zj are
used to verify the performance of the ANN model. computed similarly,
Fig. 6 (a) shows the architecture of the three-layer
feed-forward back-propagation (FFBP) ANN used in
this research. On the left is the layer of inputs, or branch- zj ⫽ g 冉冘 冊

umjym , (j ⫽ 1,...,J). (10)
ing nodes, which are not artificial neurons. Neurons in
the input layer act as buffers for distributing the input Here, we take h = g=f, where f() can be a simple thres-
hold function or a sigmoidal, hyperbolic tangent or radial
basis function. This ANN used a unipolar sigmoid (S-
shape) non-linear activation function [18]

f(s) ⫽ , (11)
[1 ⫹ exp(⫺a(s⫺b))]

where a is the decay (growth) rate, b is the bias or the

s-axis center of asymmetry. This function acts as a thres-
hold activation to obtain a nonlinear ‘warping’ that
pushes the output values toward the binary decision
values 0 or 1. The exponenial rate a and the bias b were
fixed and not adapted in this algorithm. The synaptic
weights (wnm,umj) were initially selected at random
between ⫺0.5 and 0.5. During the training phase, the
weights were repeatedly adjusted by some method (in
this case the steepest descent method) to force each of
the input exemplar feature vectors x(q), (q = 1,..,Q) to be
mapped into an output vector z(q) closer to its correct
target t(k(q)) than to any other identifier t(p) , p ⫽ k. This
means that the error e(q) = |z(q)⫺t(k(q))| must be very small
for each exemplar q. In this study the number of output
neurodes J was taken to be equal to the number of
classes K, which in turn was equal to the number of drill
states {K = 6: 5 drill wear states + the state of a new
(sharp) drill}. All nodes of the target vector t(k(q)) were
put equal to 0.05 ( ⫽0) except at the output neuron rep-
resenting the correct drill wear type, which took a value
of 0.95 ( ⫽1). Using 0.05 for 0 and 0.95 for 1 cause
less saturation and quicker convergence [18]. Using the
Fig. 6. (a) Standard supervised training feed-forward neural network, steepest descent linearization for updating the weights
and (b) the perceptron. (wnm,umj) on the (r + 1)st iteration, gave
I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720 715

nm ⫽ wnm
⫺h1 冉 ∂E(w(r),u(r))
∂wnm 冊
⫹ g1[wnm
nm ]

冉 冊
⫽ u ⫺h2
mj mj
⫹ g2[umj
mj ],


E(w,u) ⫽ (tj⫺zj)2, (13)


was the partial sum-squared error, which needs to be

minimized for any fixed qth exemplar pair (x(q),t(q)),
(h1,h2) were the learning rates, and (g1,g2) were the
momentum coefficients. The weights were adaptively
changed until the network ‘learned’ all of the input train-
ing patterns. This adaptation carried information about
sensory features and drill wear descriptors, and therefore
described the relations between them. In order to prevent
the saturation of the activation function, the data sets
used for training were normalized to fall in the interval
(0,1) before they are introduced to the ANN input layer.
The normalized input data (xi) then becomes:
i ⫺min(xi )
xold old
i ⫽ . (14)
max(xi )⫺min(xold
i )

The training parameters, namely the learning rate h

and momentum coefficient g, were chosen by experimen- Fig. 7. Artificial Neural network architecture for (a) separate group
tation with different values in the range (0.01,0.5). The training; FFBP-I, FFBP-II, FFBP-III, (b) decoupled (grouped) training
FFBP algorithm is summarized in Appendix B. The FFBP-ALL, and (c) coupled (un-grouped) training FFBP-Full.
learning process was terminated (i.e., ANN had
converged) when, for each qthexemplar vector, the cal-
culated output z(q) was close to the desired target vector 4. FFBP-ALL: Input all of the above parameters keeping
t(q) within an acceptable error threshold value (E ⬍ the three groups (decoupled) between the input and
Ethr = 0.05). If not, the training was terminated after hidden layer (18 hidden neurons).
5000 iterations of weight update. The instantaneous 5. FFBP-FULL: Input all parameters and make full con-
error E for each input feature vector x(q) was calcu- nections (coupling) between the neurons in the input
lated as: and hidden layers, as in Fig. 7 (b), (18 hidden

E(q) ⫽ (q)
j ⫺zj(q))2. (15) The concatenated vector {(16 HWC), (eight MESA
peaks, and their eight frequencies), (four statistical
The weights were further updated through learning as moments), (cutting speed), (cutting feed)}, comprising
more and more exemplar patterns (q = 1,...,Q) were 38 components altogether, was used as an input to the
given to the ANN. FFBP-ALL and FFBP-FULL ANNs. The number of
Five connective structures, shown in Fig. 7, were nodes in the hidden layer should be small enough to
taken as follows: reduce noise and weight drift, but sufficiently large to
increase mapping accuracy. In this study no attempt was
1. FFBP-I: Input only the 16 HWCs feature vector and made to determine the optimum number of neurons in
branch to six hidden neurons. the hidden layer M or the number of hidden layers
2. FFBP-II: Input only eight Burg spectrum peaks and required for optimum performance of the ANN. How-
their frequencies and branch to six hidden neurons. ever, it was proposed to use as few processing elements
3. FFBP-III: Input only four statistical parameters and as possible. This was because the use of more processing
branch to six hidden neurons. elements in the hidden layer not only required a larger
716 I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720

number of training iterations to converge but also added

additional CPU load for the tool failure diagnosis in real-
time analysis. The number of neurons in the hidden layer
for each feature vector was chosen to be equal to six
which was the same number as the classes in the target
vector. No adjoining or pruning of hidden neurons were
exercised during training. Each model used 100 epochs
for training. Each epoch was composed of six feature
vectors (six classes) and the corresponding target vectors
(drill status).

5. Results and discussion

Drill wear introduces changes in the drill point Fig. 9. Correct predictions of wear classes as a function of the num-
geometry that lead to unbalanced cutting forces which ber of feature vectors used for testing and further training.
subsequently cause a wandering motion and impacts
between the drill and the wall of the hole. From Fig. 4, it between the wear classes. As discussed in the previous
is clear that various drill wear patterns result in different section, the FFT coefficients were not used in training
vibration signatures in the FFT, HWC, and the PSD the ANN, instead the information manifested by the
plots. The erratic (aperiodic) nature of the cutting pro- HWC and the Burg PSD were considered. This helps
cess dynamics is evident from the broadband nature of reducing the feature vector dimension making training
the FFT spectra. The frequency bands (0–1000 Hz), more practical and less expensive. Experimental results
(around 2000 Hz), (3500–4500 Hz), and (7000–8500 are presented to verify the feasibility of this intelligent
Hz) show distinctive features that support separation tool failure diagnosis system in drilling operations.

Fig. 8. (a)–(e) Root mean square (RMS) error of the ANN output vs the number of iterations for different ANN architectures. (f) Number of
iterations to reach RMS error ⬍ 0.05.
I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720 717

Fig. 8 shows the normalized root mean square (rms) depends somewhat on the type of wear. In general, the
error (Eq. (15)) of the output of the neural network convergence was faster for the chisel and crater classes
against the number of iterations for different ANN mod- and slower for the edge and corner classes of drill wear.
els. The results were recorded during the early training The FFBP-I, trained on the four statistical parameters,
phase (i.e., 10th epoch) and the error E was recorded was not sensitive in its convergence speed to drill wear
every 100 iteration up to 5000 iterations. The FFBP-III type. These phenomena were observed in most train-
was able to converge much faster than all other models ing epochs.
for all wear classes. Fig. 8 (f) summarized the conver- Table 2 presents percentages of correct predictions
gence speed of all the models by plotting the number of averaged over 100 test samples for each wear class. The
iterations each model took to reach a normalized value test phase helps the ANN model to generalize and
of E (Erms ⬍ 0.05). All models, with exception of FFBP- increases its declaration accuracy. The numbers indicate
FULL, are seen to converge after 4000 iterations. The that the best performance was obtained by FFBP-ALL
FFBP-FULL model took more iterations than the FFBP- followed by FFBP-III and FFBP-FULL and then FFBP-
ALL to converge within the specified error threshold. It II and FFBP-I, in that order. More correct classifications
can also be deduced that the speed of convergence were made for the chisel and crater cases. On the other

Table 2
Performance of FFBP with different network structures during testing phase of the ANN

ANN Structure Speed (rpm) Feed (mm/min) Percentage of Correct Predictions

Chisel Crater Flank Edge Corner

300 400 85 82 76 81 77
400 300 88 81 75 82 78
600 200 87 79 72 85 73
FFBP-I 900 150 83 78 72 77 71

350 400 84 80 78 75 78
1000 130 80 75 71 75 68

300 400 92 88 87 90 88
400 300 94 92 89 90 89
600 200 94 91 89 91 88
FFBP-II 900 150 89 90 82 90 87

350 400 88 89 85 88 82
1000 130 89 88 79 86 84

300 400 92 91 91 93 90
400 300 95 92 90 93 90
600 200 96 92 90 94 91
FFBP-III 900 150 93 93 90 95 90

350 400 88 90 88 92 90
1000 130 92 90 89 93 85

300 400 93 91 90 92 89
400 300 96 94 91 95 91
600 200 94 94 93 95 91
FFBP- 900 150 94 92 92 93 90
350 400 94 91 88 88 86
1000 130 90 90 92 85 85

300 400 91 89 91 88 90
400 300 92 91 91 89 87
600 200 90 90 90 92 88
FFBP- 900 150 95 90 86 92 87
350 400 91 90 89 84 86
1000 130 88 85 85 82 84
718 I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720

hand, for the flank and corner wear classes, the ANN source of the prediction difficulty. In order to further
models sometimes missed or mixed up these two classes. verify the feasibility of diagnosing drill failure using a
It was noticed that the frequency response and the spec- neural network, the five ANN architectures were tested
tral measures of drills with the flank wear and outer cor- with drilling data under conditions different from those
ner wear shared some similarities, which could be the used for training the network {(350 rpm, 400 mm/min)

Fig. 10. Online testing of the trained ANN; (a) corner wear, (b) flank wear, (c) crater wear, (d) chisel wear, and (e) edge breakage.
I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720 719

and (1000 rpm, 130 mm/min)}. The performance of all rate of 100% was obtained for detection of the existence
models was satisfactory, though less accurate than the per- of drill wear, and a rate greater than 80% for success
formance for the four cutting conditions used for training. in drill type classification was realized. For some ANN
Fig. 9 shows other averaged results during the testing configurations the rate of accurate classifications was
phase of the trained ANN. This figure shows the percent- greater than 90%. The decoupled neural networks have
age number of test set presentations where the ANN suc- not been found to possess significant advantages over
cessfully identified the wear type against the number of the fully connected networks. However, decoupling of
test presentations. Test feature vectors were presented input groups of data enhances convergence rate during
to the trained ANN at random. In the case of incorrect training. The ANN performance is found to be sensitive
prediction, the same feature vector was used for further to the type of input data. The HWC and the Burg spec-
training of the ANN. A total of 100 vectors, different trum peaks helped the ANN to learn more about the drill
from those used to construct Table 2, were used in this wear than the four statistical moments, with the wavelet
test-and-train phase. From these results it can be deduced coefficients being superior to the PSD. The ANNs were
that the accuracy of the trained ANN does not increase found to accommodate satisfactorily changes in the cut-
significantly after 60 test-and-train presentations. This ting conditions. The results reveal that once the neural
number could be satisfactory for industrial implemen- network was properly trained, it become a powerful and
tation purposes. The size of the neural network shows reliable tool in solving classification and pattern recog-
no significant effect on the accuracy of tool wear esti- nition problems such as in this drilling process monitor-
mation in this case. Overall, the results displayed in Fig. ing application. The results strongly suggest that
9 agree with the predictions of Table 2. vibration signals have tremendous promise for tool con-
For purposes of illustration, Fig. 10 shows the nor- dition monitoring and manufacturing process diagnos-
malized output z(q) of the ANN after it was presented tics. The models discussed in this work are also appli-
with I = 200 epochs. The uni-polar sigmoid activation cable to the monitoring of other machining processes,
function output was from 0 to 1, which formed a meas- for example; turning, milling, and grinding.
ure of the prediction weight for every wear condition.
As presented in Fig. 10, the position of the output neuron
with the maximum descriptor value determines the drill Appendix A
wear class to which the input signal feature vector was
maximally correlated. Ideally, the types of wear that do The following notation applies to the FFBP ANN:
not exist should show a neural output of zero, but here
their neurons show values greater than zero. Neverthe- ai Decay (growth) rates {i = 1 (hidden layer)),
less, the results were very acceptable for online tool and, i = 2 (output layer)}
monitoring. Thresholds to predict tool failure for hole bi Biases {i = 1 (hidden layer)), and, i = 2
quality and surface finish can be designated at appropri- (output layer)}
ate output values (0.4 in Fig. 10). An alarm or machine
control system can be triggered when one or more of gi Momentum coefficient
the output neurons increase beyond this threshold. Tool-
life prediction models and threshold parameter settings I Number of epochs
can therefore be automated as manufacturing process J Number of output nodes
control features for drill replacement and/or regrind K Number of exemplar output target vectors
6. Conclusion (classes of drill wear)
M Number of nodes in the hidden layer
A multiple layer neural network has been successfully N Number of nodes in the input layer
applied to twist drill wear detection and classification Q Number of exemplar features
using supervised learning with experimentally obtained
vibration data. The performance of both coupled and {x(q)}q=1,Q
decoupled different configurations of the ANN has been obtained from the processed signals picked up
analyzed. The signals, collected from extensive exper- by the sensory system
imentation, were analyzed using the following tech- mmj Synaptic weights on the input lines of the
niques, discrete harmonic wavelet transform, Burg output layer
power spectral density (PSD), and four statistical meas- wnm Synaptic weights on the input lines of hidden
ures of the time domain. The ANN algorithm success- layer, and
fully mapped the vibration signals to the appropriate hi Learning rate
classes of drill wear. During the testing phase, a success
720 I. Abu-Mahfouz / International Journal of Machine Tools & Manufacture 43 (2003) 707–720

Appendix B [4] T.I. Liu, S.M. Wu, On-line detection of drill wear, ASME Journal
of Engineering for Industry 112 (1990) 299–302.
[5] E. Govekar, I. Grabec, Self-organizing neural network application
The FFBP Back propagation Algorithm with a single to drill wear classification, ASME Journal of Engineering for
hidden layer for Unipolar Sigmoid: Industry 116 (1994) 233–238.
[6] R.P. Barton, A.R. Thangaraj. A neural network approach to
1. Input N, M, J, Q, and I. drilled hole quality monitoring based on machine spindle
2. Set parameters, initially, as:b1 = N / 2.0; b2 = M / vibrations. Trans. NAMRI/SME (1990), 232–239.
[7] T.I. Liu, K.S. Anantharaman, Intelligent classification and
2.0a1 = 2.5; a2 = 2.5h1 = 0.4; h2 = 0.25g1 = measurement of drill wear, ASME Journal of Engineering for
0.1; g2 = 0.05. Industry 116 (1994) 392–397.
3. Generate initial weights {w(0) (0)
nm, unm} randomly between [8] S.C. Lin, C.J. Ting, Drill wear monitoring using neural networks,
⫺0.5 and 0.5. International Journal of Machine Tools and Manufacture 36 (4)
4. Adjust all weights via steepest descent method as fol- (1996) 465–475.
[9] X. Li, S. Dong, P.K. Venuvinod, Hybrid learning for tool wear
lows:for r = 1 to I dofor q = 1 to Q doStart Updating monitoring, International Journal of Advanced Manufacturing
all weights:for m = 1 to M dofor j = 1 to J do Technology 16 (2000) 303–307.
[10] T.I. El-Wardany, D. Gao, M.A. Elbestawi, Tool condition moni-
mj ⫽ umj
⫹ h2{(tj(q)⫺zj(q))[zj(q)(1⫺zj(q))]ym(q)} toring in drilling using vibration signature analysis, International
⫹ g2[umj
mj ]; Journal of Machine Tools and Manufacture 36 (6) (1996) 687–
for n = 1 to N do [11] T.I. Liu, W.Y. Chen, Intelligent detection of drill wear, Mechan-
ical Systems and Signal Processing 12 (6) (1998) 863–873.

[12] D.E. Dimla Jr, P.M. Lister, N.J. Leighton, Neural network sol-
nm ⫽ wnm
⫹ h1{ (tj(q)⫺zj(q))[zj(q)(1 utions to the tool condition monitoring problem in metal cut-
j⫽1 ting—a critical review of methods, International Journal of
Machine Tools and Manufacture 37 (9) (1997) 1219–1241.
⫺z )]u }
mj ⫻ [ym(q)(1⫺ym(q))][xn(q)] [13] D.E. Newland, Ridge and phase identification in the frequency
analysis of transient signals by harmonic wavelets, ASME Jour-
⫹ g1[wnm
nm ].
nal of Vibration and Acoustics 121 (1999) 149–155.
[14] S.L. Marple, Digital Spectral Analysis, Prentice Hall, Englewood
Cliffs, NJ, USA, 1987 Chapter 7.
References [15] T.M. Romberg, A.G. Cassar, R.W. Harris, A comparison of tra-
ditional fourier and maximum entropy spectral method for
[1] R. Du, M.A. Elbestawi, S.M. Wu, Automated monitoring of vibration analysis, ASME Journal of Vibration, Acoustics, Stress
manufacturing processes—Part 1 and Part 2, ASME Journal of and Relliability in Design 106 (1984) 36–39.
Engineering for Industry 107 (1995) 121–141. [16] Signal Processing Toolbox, For Use with MATLAB User’s
[2] C.S. Leem, D.A. Dornfeld, Design and implementation of sensor- Guide. The Math Works, Inc., Natick, MA, USA, 1998.
based tool-wear monitoring systems, Mechanical Systems and [17] R.P. Lippmann, An introduction to computing with neural nets,
Signal Processing 10 (4) (1996) 439–458. IEEE ASSP Magazine 4 (2) (1987) 4–22.
[3] P.G. Li, S.M. Wu, Monitoring drilling wear states by a fuzzy [18] C.G. Looney, Pattern Recognition Using Neural Networks,
pattern recognition technique, ASME Journal of Engineering for Theory and Algorithms for Engineers and Scientists, Oxford Uni-
Industry 110 (1994) 297–300. versity Press, New York, 1997.