You are on page 1of 5

Systems Engineering - Theory & Practice

Volume 27, Issue 3, March 2007


Online English edition of the Chinese language journal

Cite this article as: SETP, 2007, 27(3): 98–104

Intelligent Prediction Method for Small-Batch Producing Quality


Based on Fuzzy Least Square SVM
DONG Hua1,2 ,∗ , YANG Shi-yuan1 , WU De-hui3
1. Hefei University of Technology, Hefei 230009, China
2. Henan University, Kaifeng 475001, China
3. Jiujiang University, Jiujiang 332005, China

Abstract: A quality intelligent prediction model for small-batch producing process was proposed in the article, after comparing with
the common used approaches of procedure intelligent prediction and their characteristics. The prediction process and algorithm were
presented too. Fuzzy least square support vector machine (FLS-SVM) is taken as the intelligent kernel for the model. On one hand,
it can solve the small-batch learning better and avoid the disadvantages, such as over-training, weak normalization capability, etc., of
artificial neural networks prediction. On the other hand, it makes samples fuzzy by membership function to choose optimum samples
and make history data ‘nearer is more weight’. After doing lots of prediction experiments and comparing with other common prediction
methods, the method proposed in the article proved to be good normalization capability, more rapidly built, and more easily realized. It
offers feasibility to predict and control small-batch machining process online.
Key Words: small-batch; support vector machine (SVM); fuzzy least square support vector machine (FLS-SVM); quality prediction

1 Introduction gent prediction method. This prediction method has many


merits, such as self-learning, fault-tolerance, nonlinear and
Statistic process control (SPC) is a passive method. parallel distributed processing, noise treatment, inadequate
Productions are tested after the inferior ones had been pro- data treatment, and so on. These make the ANN-based qual-
duced. So SPC can not prevent the reject very efficiently[1] . ity prediction model stand out in traditional mathematical
Along with the low reject rate (even to 6σ) required under method. However, it also has the congenital defects, such
the advanced manufacturing environment, it is of necessity as it falls into local minimum easily, and it is weakly nor-
that quality control based on prediction in process. malized for few samples[5] . These defects make it difficult
To set up quality prediction model is the most impor- to meet the wants of quality prediction for advanced small-
tant thing in producing. The expert system inference learn- batch producing.
ing, fuzzy mathematics, and other techniques are used in Support vector machine (SVM) is based on the struc-
setting up the traditional prediction model[2−3] . However, tural risk minimization (SRM) rule , and overcomes the
the application of these techniques is still based on SPC. On shortcoming that ANN structure relies on the experience of
one hand, quality prediction methods based on statistics need designer. Its topology structure is decided by support vector.
huge samples, so they cannot adapt to the small-batch pro- It solves high dimension, local minimum, and small samples
ducing pattern, but adapted to the large scale or middle scale well and has the advantages of both ANN and traditional
producing pattern. On the other hand, the producing pro- model[6−7] . Particularly, because the normalization of SVM
cess and producing system are complex and dynamic, and is very good in small samples learning process, SVM method
there are many characteristics, such as multiinput, multiout- doesn’t need mass data and precise mathematical model in
put, nonlinear, time delay, and so on. Such complex causal producing. This advantage makes high-precision prediction
relationships in the producing process make serious relativ- models possible under the flexible small-batch producing
ity among the miscellaneous quality data. So it is very diffi- pattern. Least square SVM (LS-SVM) is developed from
cult to describe accurate mathematical model of the dynamic standard SVM. It substitutes inequality constraints of stan-
quality characteristics in the Producing process[4] . Once dard SVM to equality constraints, that is to say, it substitutes
the quality factors (4M1E) of producing process changes, it quadratic programming to solving linear system of equation.
needs to re-establish and re-analyse the model. That is to Thus, it reduces calculation complexity, speed up solving,
say, the flexibility of traditional model is poor. and enhanced interference-free ability[8−9] .
In recent years, with the development of artificial intel- In actual production, quality parameters change con-
ligence technology, especially the artificial neural network stantly with process. The short-term impact, in future, on
(ANN), ANN has been commonly used as a new intelli- producing accuracy mainly comes from not history parame-
Received date: January 4, 2006
∗ Corresponding author: E-mail: donghuafish@hotmail.com
Foundation item: Supported by the NSFC (No.70672096)
Copyright c 2007, Systems Engineering Society of China. Published by Elsevier BV. All rights reserved.
DONG Hua, et al./Systems Engineering – Theory & Practice, 2007, 27(3): 98–104

ters but the latest processing parameters. We call the rule as Eq.(3) is transformed to dual optimization problem:
“nearer data weigh heavier and farther data weigh lighter”.
N
The concept of membership was introduced into the article, 1  ∗  
max J = − (α − αi ) αj∗ − αj ψ (xi , xj ) −
and a fuzzy LS-SVM prediction model based time-domain ∗
a,a 2 i,j=1 i
membership was presented. This model put different mem-
berships according to the impact extent of history data. It N
 N

reduces the impact of early data to the current producing pro- ε (αi∗ + αi ) + yi (αi∗ − αi )
i=1 i=1
cess, and improves the accuracy of real-time prediction. The ⎧
model was researched in the producing process of bearing’s ⎨ 
N
(αi − αi∗ ) = 0
outer ring, and compared prediction accuracy with the tradi- s.t. i=1 , (5)

tional model. The research demonstrates that the model is αi , αi∗ ∈ [0, c]
applicable and reasonable to small-batch producing process.
Standard SVM regression model turns into:
2 Fuzzy least square support vector machine
N

Suppose a data-set {xi , yi }, (i = 1, 2, · · · , N ) is waited
for being regressed, xi ∈ Rn and yi ∈ R are the input and y (x) = (ai − a∗i ) ψ (xi , x) + b,
i=1
output of system, respectively. The n-dimension input sam-
ples are mapped from the original space to high dimension 
space F by nonlinear transformation ϕ (·), in which space the b= (ai − a∗i ) [ψ (xj , xi ) + ψ (xk , xi )], (6)
best linear regression function is constructed. SVS

where, xj , xk are arbitrary support vectors.


f (x) = ω T ϕ (x) + b, (1)
LS-SVM is the extension of the standard SVM. It se-
Standard SVM take the insensitive loss function ε as the lects the two norms of error ξt as the loss function. Thus, the
risk minimization value. So, the optimization target is: optimization become:

N
1  2
N
1 1
min ω T ω + c (ξi + ξi∗ ), min ω T ω + γ ξ ,
2 2 2 i=1 i
i=1

s.t. ⎧ s.t.
⎨ yi − ω T ϕ (xi ) − b ≤ ε + ξi
yi = ω T ϕ (xi ) + b + ξi , i = 1, 2, · · · , N, (7)
ω T (xi ) + b − yi ≤ ε + ξi∗ , (2)

ξi ≥ 0, ξi∗ ≥ 0, i = 1, · · · , N where, positive real number γ is tuning constant. γ can take
where c is called equilibrium factor, usually be valued 1. ξi a compromise between training errors and model complex-
and ξi∗ are the errors of training-set, and they indicate how ity. And it has better ability of generalization. The larger the
much is the distance for samples beyond the fit precision ε. value of γ is, the lesser the error of regression model will be.
Lagrange equation can be set up from Eq.(2): The loss function of LS-SVM changes inequality constraints
to equality constraints, and it’s different from the standard
N SVM. Lagrange function is introduced in as the following:
1 T
l (ω, b, ξ, ξ ∗ ) = ω ω+c (ξi + ξi∗ ) −
2 i=1 L (ω, b, ξ, a) =
N
   1 T N N
αi ε + ξi + yi − ω T ϕ (xi ) − b − ω ω+γ ξi2 − ai [ωϕ (xi ) + b + ξi − yi ] (8)
i=1 2 i=1 i=1
N
  
αi∗ ε + ξi∗ + yi − ω T ϕ (xi ) − b − where, ai , (i = 1, · · · , N ) is the Lagrange multiplier.
i=1 The best a and b can be acquired by KKT:
N
 ⎧
(ηi ξi + ηi∗ ξi∗ ), (3) ⎪ ∂L N

⎪ = 0 → ω = ai ϕ (xi )
i=1 ⎪


⎪ ∂ω

⎪ i=1
where, αi , αi∗ ≥ 0, ηi , ηi∗ ≥ 0, i = 1, 2, · · · , N . ⎪
⎪ N
⎨ ∂L
With the help of first partial derivative, Eq.(3) can be =0→ ai = 0
∂b , (9)
transformed to dual optimization problem. According to xi , ⎪
⎪ i=1

⎪ ∂L
(ai − a∗i ) = 0 is support vector. Variable ω is complexity of ⎪
⎪ = 0 → ai = γξi


the function, and it’s the linear combination of the mapping ⎪ ∂ξ


⎩ ∂L = 0 → ωϕ (x ) + b + ξ − y = 0
function ϕ (·). Therefore, the calculation complexity of sys- i i i
tem identification by SVM depends on not the dimensions of ∂a
space but the number of samples. Then, optimization can be transformed to:
Kernel function is introduced instead of nonlinear map-

ping ϕ (·): 0 ΘT b 0
T = (10)
ψ (xi , xj ) = ϕ (xi ) ϕ (xj ) (4) Θ Ω + γ −1 I a y
DONG Hua, et al./Systems Engineering – Theory & Practice, 2007, 27(3): 98–104

where y = [y1 , · · · , yN ]T ; Θ = [1, · · · , 1]T ; a =


[a1 , · · · , an ]T ; Ω is a square matrix. The elements Ωij =
ψ(xi , xj ) = ϕ(xi )T ϕ(xj ). ψ(·) are symmetric functions
meeting Mercer term.
LS-SVM regression model is:
N

y (x) = ai ψ (xi , x) + b, (11)
i=1

Lin etc. introduced fuzzy membership into SVM to


fuzzy input sample sets and presented the concept of fuzzy
SVM[10] to solve the sensitivity of SVM for isolated spots.
We introduced the idea into LS-SVM. That is to introduce
fuzzy membership µi to each sample of LS-SVM to fuzzy
input sample sets {xi , yi , µi }, (i = 1, 2, · · · , N ), 0 ≤ µi ≤
1. Thus, the optimization objective function of Eq.(7) is
rewritten as:

N
1 1
min ω T ω + γµi ξi2
2 2 i=1
Figure 1. Quality prediction for small-batch process based
s.t. FLS-SVM (a) Modeling method based on FLS-
yi = ω T ϕ (xi ) + b + ξi , i = 1, 2, · · · , N (12) SVM; (b) Predict method based on FLS-SVM
Lagrange function is structured as follows:
control chart to set up training sample set, and keep the win-
L (ω, b, ξ, a) = dow width unchanged.
N N
Data window move with the process. While a new da-
1 T  
ω ω + γµi ξi2 − ai [ωϕ (xi ) + b + ξi − yi ](13) tum comes into the window in one side, another datum goes
2 i=1 i=1
out of the window in another side correspondingly at each
move. Thus, the number of the data (set as n) in the window
According to the optimum conditions, the matrix is is the same. Take the data in the window as a n-dimension
structured as: vector of the input sample.
Suppose the position of the current time window is i,
0 ΘT b 0
= (14) the input vector xi of samples is the data from i to i + n − 1,
Θ Ω + (γµi )−1 I a y
and the output is yi = zi+n . First, move the window to get
the next training sample {xi+1 , yi+1 }, which is put into the
where, y, a, Θ, Ω, I have the same meaning as LS-SVM ex-
LS-SVM learning machine to get the regression parameters
pression.
a and b.
Estimation function of fuzzy LS-SVM can be obtained
Second, put a and b into the LS-SVM prediction ma-
by solving matrix Eq.(14). On comparing Eq.(14) with
chine while extracting history data zN ∼ zN +n−1 from the
Eq.(10), the fuzzy membership µi is the difference. So we
time window and take them as input xN to get prediction re-
call this method as Fuzzy least square support vector ma-
sponse ŷN , which is the prediction value ẑN +n at moment
chine.
N + n. Third, take ẑN +n as the real value at moment N + n
3 Quality prediction model based on FLS-SVM of to get the next prediction value at N + n + 1 moment. The
small-batch producing process process of quality prediction based on LS-SVM is shown as
Figure 1.
3.1 Time-sequence predicting model
The commonly used quality analysis and prediction 3.2 Fuzzy membership of Processes sample sequence
model is usually based on SPC nowadays. Generally, about Generally for sample sequence, the degree of impor-
20 to 50 process data are extracted from the quality control tance and the impact on the future data by history data is
charts to set up training sample set. Then, the sample set increasing from far to near. Different membership to his-
are studied by intelligent model (such as artificial neural net- tory samples according to its difference positions in the time
works) and masters potential rules of the process quality to domain are observed. Specific to say, nearer data samples
predict the quality in the future. are given larger membership, while further data samples
Small-batch process produces little production, usually are given smaller membership. Through this fuzzy process,
within the dozens. It is difficult to provide a number of sam- nearer samples are enhanced and farther samples are weak-
ples. Therefore, the research decreases the number of con- ened.
tinuous data (called as data width) from 20–50 data to 3–8 To determine the fuzzy membership of history samples,
data, which are extracted from the control chart. The specific two schemes were designed in our research: one is index
process is: establish a window of data in the quality distribution in time domain, the other is linear distribution in
DONG Hua, et al./Systems Engineering – Theory & Practice, 2007, 27(3): 98–104

Table 1. Real deviation sequence for bearings’ outer ring size (Unit: mm)
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
∆Ds max s 0.38 0.37 0.38 0.39 0.42 0.41 0.40 0.38 0.36 0.42 0.44 0.40 0.37 0.40
No. 15 16 17 18 19 20 21 22 23 24 25 26 27 28
∆Ds max s 0.42 0.45 0.42 0.41 0.40 0.42 0.41 0.44 0.43 0.43 0.42 0.43 0.44 0.46

time domain. of models, output of training sample sets, and the number
Membership of index distribution in time domain avail- of training samples, respectively. The computer used in ex-
able to the following formula: periment is Pentium M-436M CPU and 128M memory. The
N −i time-consumption and regression precision of each model
µi = α (1 − α) , (15)
are shown in Table 2.
where, α is index coefficient, 0 < α < 1 and i = From Table 2, it is obvious to see that the MSE of
1, 2, · · · , N . LS-SVM model is minimum, and then is that of FLS-SVM
Membership of linear distribution in time domain µi is: model. They are one or two orders lower than that of BP
µi = α + i · (1 − α) /N , (16) neural network models and polynomial regression model.
The time-consumption of polynomial regression model is
The parameter α of membership distribution in time-
minimum, and then that of LS-SVM model and FLS-SVM
domain is very important to solve multivariate linear
model. Time-consumption of BP neural network model is
Eq.(14). It impacts on the accuracy of results. Under normal
the largest.
circumstances, if the time sequence data sample is stable,
linear distribution and less value of α can be selected; if the The predicted deviations of polynomial regression
data sequence fluctuates more seriously, index distribution model, BP neural network models, LS-SVM model, and
and lesser value of α can be selected. The value of α can FLS-SVM model are shown in Figure 2.
also be selected through optimized calculation by computer Regression capacity of the BP neural network models,
to get α, which is the least fitting error. LS-SVM, and FLS-SVM model was tested by input of train-
4 Experiments of process quality prediction ing sample sets. These tests are not predictions but sample
tests actually. That is to say 5th to 25th work pieces are re-
A semi-automatic turning process has twenty-eight gression tests of samples, and 26th to 28th work pieces are
pieces of outer ring of bearing. Obviously, it is a typical prediction values.
small-batch process.
The real deviation sequence for outer ring of bearings Conclusions from Figure 2:
size is shown in Table 1. The tolerance of the outer ring (1) Polynomial model can predict quality deviation
is 90+0.45
+0.30 where, ∆Ds max s is the largest deviation of the caused by system factors but cannot predict those by random
outer ring’s diameter by measurement. In the turning pro- factors in turning process.
cess, there are system factors (such as tool wear, thermal (2) Regression and prediction capability of random er-
deformation, etc.) and random factors (such as system vi- ror based on BP neural networks is better than polynomial
bration, rough material, etc.) impacting the turning quality. model.
Prediction models based on three-order polynomial re-
gression, four-order polynomial regression, BP neural net- (3) LS-SVM model has high regression precision for
work, LS-SVM, and fuzzy LS-SVM were set up, respec- training samples. Output sequence of the LS-SVM model
tively, with the first twenty-five data in Table 1, and then to almost coincides with the real deviation sequence. For the
predict the values of 26th, 27th, 28th work piece. Compare three future data (26th to 28 work pieces), the prediction pre-
the prediction values with the real values. cision is higher than that of BP neural network.
In the experiment, the polynomial models and BP neu- (4) There are more regression errors on the first half
ral network are called polyfit function and artificial neural of sample sequence based on FLS-SVM model but much
network toolbox in Matlab, respectively; the time window higher precision on the second half of the sample sequence.
width of BP neural network model, LS-SVM model, and That proves the rule: “nearer data weigh heavier and farther
FLS-SVM model was valued five; in BP neural network, data weigh lighter”. It’s better to predict the three future data
the learning rate and training steps valued 0.1 and 10,000 by FLS-SVM model.
steps, hidden layer neurons were taken six and ten for ex- In addition, parameters γ and δ were not optimized only
periments, respectively; LS-SVM and FLS-SVM algorithm to test FLS-SVM prediction model. The parameters should
were achieved by matrix, adjusting constant valued 500, high be adjusted by the conditions and experiences in applications
precision radial basis function (RBF) was chosen as Ker- to improve precision.
nel function, RBF parameters valued 0.1; index distribution
membership function in time-domain was selected for fuzzy
samples of FLS-SVM, and the distribution parameters val- Table 2. Time-consumption and regression precision of
ued 0.3.
each model
n
Mean square error M SE = (ŷi − yi )2 n was de- Method 3’Polynomial BP(6,1) BP(10,1) LS-SVM FLS-SVM
i=1 CPU 0.1s 60.3s 64.3s 0.6s 0.6s
fined as the testing indicators, in order to compare the regres-
MSE 3.5e-4 2.7e-4 1.3e-4 1.0e-5 4.3 e-5
sion precision of each model, where ŷi , yi , and n are output
DONG Hua, et al./Systems Engineering – Theory & Practice, 2007, 27(3): 98–104

optimum samples and make history data ‘nearer is more


weight’. In addition, the quadratic programming problem of
standard SVM is transferred into linear equations problem,
which makes modeling faster and provides the possibility of
quality intelligence prediction online.
There are few research works done on quality predic-
tion based on LS-SVM model, especially fuzzy LS-SVM
time sequence model, and there are many problems to solve.
It may be a way that the time sequence of workpiece’s size is
combined with the changes of process parameters to set up
prediction model based on Fuzzy LS-SVM, and it should be
studied further.
References
[1] Shewhart M. Interpreting statistical process control (SPC)
charts using machine learning and expert system techniques.
Proceeding of the IEEE 1992 National Aerospace and Elec-
tronics Conference, 1992.
[2] Zhang L N. Research on machining error and predicting
model. Acta Metrologica Sinica, 1998, 19(3): 183–188.
[3] Li J, Liu H X. Systematic error modeling in part machining
processes based on genetic algorithm. China Machine Engi-
neering, 2003, 14(13): 1130–1132.
[4] Jiang L, Liu J, Pan S X. Support vector machines-based
method for machining error prediction modeling. Modular
Machine Tool & Automatic Manufacturing Technique, 2005,
8: 13–15.
[5] Metaxiotis K, Kagiannas A, Askounis A, et al. Artificial
intelligence in short term electric load predicting: a state-
Figure 2. Prediction result of each model (a) 3-order Polynomial of-the-art survey for the researcher. Energy Conversion and
model; (b) Four-order Polynomial model; (c) BP(6,1) Management, 2003, 44: 1525–1534.
model; (d) BP(10,1) model; (e) LS-SVM model; and (f) [6] Vapnik V N. The nature of statistical learning theory. New
FLS-SVM model York: Spring-Verlag, 1999.
[7] Vapnik V N. An overview of statistical learning theory. IEEE
Transaction Neural Networks, 1999, 10(5): 988–999.
5 Conclusions [8] Suykens J A K, Vandewalle J. Least squares support vector
machine classifiers. Neural Processing Letters, 1999, 9(3):
Fuzzy least square support vector machine is proposed 293–300.
and taken as the intelligent kernel to set up quality prediction [9] Suykens J A K, Vandewalle J. Sparse least squares support
model in the article. On one hand, it can solve the small- vector machine classifiers. European Symposium on Artifi-
batch learning well, so it could be applied to quality predic- cial Neural Networks, Bruges Belgium, 2000: 37–42.
tion of small-batch producing process; On the other hand, [10] Lin C F, Wang S D. Fuzzy support vector machines. IEEE
it makes samples fuzzy by membership function to choose Transactions Neural Networks, 2002, 13(3): 466–471.

You might also like