Professional Documents
Culture Documents
A Neural Network With Minimal Structure For Maglev System Modeling and Control
A Neural Network With Minimal Structure For Maglev System Modeling and Control
function at the output is considered here. Its form is given, where L'(&(k,€))) is the second derivative of L(&(k,8))
for single output, by: with respect to E , I is the identity matrix and p a small
non negative scalar, adjusted during learning.
The criterion most frequently used for parameter
estimation is the classical ordinary least-squares criterion
where x i , h = l,..-,n,,, are the inputs of the network, (L2 norm), with as case cost function a quadratic one:
wfh and b:, i = l,-..,n,, h = 1,a.e ,no, are the weights 1
and biases of the hidden layer, the activation function g is L(E(k,8)) =-c2(k,8) (7)
2
the hyperbolic tangent, w; , i = l,...,n,, and b2 are the whose derivatives are simply:
weights and bias of the output neuron. All the weights and
biases of the network can be grouped in the parameter L'(E(k,8) = E(k,8) , L"(&(k,8) = 1 (8)
vector 8 , and the inputs xt in the regression vector Such a criterion receives the largest contributions from the
points which have the largest errors and the solution can
cp(x) = [xy xt x",]. So for the case k, the output
be dominated by a very small number of points which can
predicted by the network can be written: be gross errors or outliers.
W O ) = NN(cp(x(k)),O).
3. An outlier-robust learning rule
To estimate from data the parameter vector 8 , the
prediction error: The batch learning method presented here is detailed in
[4,171. This Iteratively Reweighted Least Squares (IRLS)
E(k,e) = y(k)-j%k,e) (2) method starts, following Huber, from a distribution of the
noise e contaminated by outliers expressed by a mixture of
with y(k) the desired output, is formed and incorporated in
two probability density functions. The first one
a criterion to be minimized. A general form for this
corresponds to the basic distribution of a measurement
criterion, which leads to a M-estimator, is given, for n
samples, by: noise, for example Gaussian, with variance o;!and the
second one, corresponding to outliers, is arbitrary
symmetric long-tailed, for example also Gaussian, but
(3)
with variance 0; such as 0: << 0;:
where L(.) is a scalar case cost function.
The minimization of the criterion (3) can therefore be
e&) - (l-p)N(O,of)+pN(O,o~)
carried out using the Gauss-Newton algorithm: where p is the probability of occurring a large error. In
practice, neither the probability p nor the two variances
(4)
0: and 0; are known and the preceding model is
In (4), the gradient V'(8) of the criterion (3) with respect replaced by:
to 8 is given by:
E(k) - (1- S(k)) N(O,.?) + 6(k ) N(0, 0:) (9)
where E(k) is the prediction error, given by (2), 6 ( k ) = 0
for l&(k)lSM and 6(k) = 1 for IE(k)l> M , and M is a
where v(k,B) is the gradient of q(k,8) with respect to
bound which can be taken as 30, .
8 and L(&(k,e)),the score or influence function, is the
first derivative of L(e(k,8)) with respect to E . The The unknown variances 0: and 0: are estimated as
second derivative H(8) of the criterion (3) with respect to follows. At each iteration i of the algorithm (4), the
8 , known as the Hessian matrix, is obtained by prediction error sequence E is calculated by (2), and the
variances recursively estimated by:
41
for le(k)l I301 (k - 1) weight. The sensitivity SV(e) of the criterion V(0) is
approximated by a Taylor expansion around 8 to order
o:(k)=o?(k-l)+- (E2 (k) - 0:(k - 1))
k-T(k) ( 1Oa) two:
otherwise
o:(k) =o:(k-l)
and The gradient V'(0) being zero after convergence, the first
term in (15) vanishes, leading to:
for lE(k)l >3c1(k-1)
1
0;(k) = 0;(k - 1) +-(E* (k) - 0 ; (k - 1))
W )
otherwise which involves only the Hessian H. Noting eq a canonical
o;(k)=oi(k-l) vector selecting the qth component of 0
with z(0) = O at each iteration, and z(k+l) = z(k)+l (e: =[0 0 l o . . . O ] ) , the deletion of the weight
1.. e,,
whenever (e(k)l> 30, (k - 1). oi(0) can be chosen equal (i.e. e;(Se+e) = 0 ) must lead to a minimal increase of
to o:(O) and o:(O) equal to the classically calculated the criterion. The following Lagrangian can thus be
variance of the prediction errors at the first iteration. Note written:
that T( n) is the estimate of the number of outliers.
The variance 0:(k) of E( k) is finally given by:
1
k(se)= -se
2
T
H se + A (et
(se+ e))
and minimized, leading to:
02, (k) = (1- 6(k)) o:(n) + 6(k) o;( n) (11)
leading to the weighted robust norm: (18)
42
5. Magnetic levitation example "ML system" is the block which represents the real system
(real time interfaced).
Outlier robust learning and structure determination
presented above are applied to model the inverse behavior
of a magnetic levitation system (MagLev). Precise
description of the MagLev system which allows to balance
a metallic ball without any support, by the use of an
electromagnet, and to steer the object tracking a desired
vertical trajectory, can be found in [ l l , 121. The system
input is the voltage U applied to the coil and the output is
bml+ ML System
ML System
+
upid
P.I.D.
+
43
network. At each parameter deletion, retraining for 50
iterations maximum is performed. According to the W E
(Final Prediction Error), the optimal network architecture
was the one comprising 9 hidden neurons and 38
parameters.
0 IO 20 30 40 50 60 A.
44
corresponding output Vx are plotted. The proposed Workshop on Artificial Intelligence arid Statistics, Fort
strategy has been carried out in real time using Matlab- Lauderdale, FL, January 199 1, pp. 2 18-239,
Simulink and RTW (Real Time Workshop) software [7] S.J. Hanson and D.J. Burr, "Minkowski-r back-
environment, with sampling period of 3ms. propagation: learning in connectionist models with non-
Euclidean error signals", in Neurul Itlformatiori
Processing Systems, D.Z. Anderson (Ed.), American
Institute of Physics, New-York, 1988, pp. 348-357.
[8] B. Hassibi and D.G. Stork, "Second order derivatives
for network pruning: optimal brain surgeon", in Advarice~
in Neural Information Processing Systems, S.H. Hanson,
, w , I J.D. Cowan and C.L. Gilles (Eds.), Morgan Kaufmann.
0 50 100 150 200 250 300 350
San Mateo, CA, Vol. 5, 1993, pp. 164-17 1 .
Figure 7: Reference trajectory and the output of the
MagLev system [9] K. Hunt, D. Sbarbaro, R. Sbikowski, and P.J.
Gawthtop, "Neural networks for control systems - a
6. Conclusion survey", Autornatica, Vol. 28, 1992, pp. 1083- 1 1 12.
[lo] C. Jutten and 0. Fambon, "Pruning methods: A
The use of artificial neural networks for control is
review", Proc. European Symp. on Artificial Neurul
motivated by their universal approximation capabilities.
Networks ESANN'95, Brussels, April 1995, pp. 129- 140.
The feedforward one hidden layer perceptron with linear
activation at the output gives a simple, but sufficient [ l l ] M. Lairi, "Identification et commande neuronales de
general structure for nonlinear modeling. In this paper, systkmes non IinCaires : application h un systkme de
the usefulness of deleting spurious parameters and outlier- sustentation magnttique", T h b e de Doctorat de
robust criterion for pruning are shown. The model size 1'UniversitC Henri PoincarBNancy 1, specialit6
reduction is applied to the identification of the inverse Automatique, 1998.
model of a MagLev system for feedforward control. [12] M. Lairi, G. Bloch, and G. Millerioux, "Real time
Obtaining a model with very few parameters and good feedforward neural control of a MagLev system", Neurul
approximation capabilities allows to envisage adaptive Processing Letters, Kluwer Academic Publishers, 1999.
parameter estimation algorithms, which cannot be submitted.
implemented in real time with standard environment, for [13] K. Liano, "Robust error for supervised neural
systems requiring short sampling period, without minimal network learning with outliers", IEEE Truns. on Neurd
structure neural models. Networks, Vol. 7, no 1, 1996, pp. 246-250.
[ 141 K.S. Narendra, and K. Parthasarathy, "Identification
7. References and control of dynamical systems using neural networks."
[ l ] M. Agarwal, "A systematic classification of neural IEEE Trans. on Neural Networks, 1990, Vol. 1, pp. 4-27.
network based control", IEEE Control Systems Mag., Vol. [ 151 M. Norgaard, "Neural Network Based System
17, 1997, pp. 78-84. Identification Toolbox", Tech. Report 95-E-773, Institute
[2] G. Bloch, P. Thomas, M. Ouladsine and M. Lairi, "On of Automation, Technical. University of Denmark, 1995.
several outlier-robust training rules of neural networks for [16] J. Sjoberg, Q. Zhang, L. Ljung, A. Benveniste, B.
identification of nonlinear systems", 8th Int. Con$ on Delyon, P.Y. Glorennec, H. Hjalmarsson, and A. Juditsky,
Neural Networks and their Applications NEURAP'96, "Nonlinear black-box modeling in system identification: a
Marseille, France, March 1996, pp. 13-19. unified overview", Automatica, Vol. 3 I , n o 12, 1995, pp.
[3] G . Bloch, P. Thomas, and D. Theilliol, 1691-1724.
"Accommodation to outliers in identification of non linear [17] P. Thomas and G. Bloch, "From batch to-recursive
SISO systems with neural networks", Neurocomputing, outlier-robust identification of non linear dynamic systems
Vol. 14, no 1, 1997, pp. 85-99. with neural networks", Proc. IEEE Int. Cor$ on Neiirul
[4] G. Bloch, F. Sirou, V. Eustache and P. Fatrez, "Neural Networks ICNN'96, Vol. 1, Washington, DC, June 1996.
intelligent control of a steel plant", IEEE Trans. on Neural pp. 178-183.
Networks, Vol. 8, no 4, July 1997, pp. 910-918. [18] P. Thomas and G. B l a h , "Robust pruning for
[5] S. Chen, S.A. Billings, and P.M. Grant, "Non-linear multilayer perceptrons", Proc. IMACSLEEE
system identification using neural networks," Int. J. Multiconference on Computational Engineering in
Control, Vol. 51, no 6, 1990, pp. 1191-1214. Systems Applications CESA'98, P. Borne, M. Ksouri and
163 D.S. Chen and R.C. Jain, "A robust back propagation A. El Kame1 (Eds.), Vol. 4, Nabeul-Hammamet, Tunisia.
learning algorithm for function approximation", Third Inr. April 1998, pp. 17-22.
45