Professional Documents
Culture Documents
Abstract—This paper reports an empirical study of the behavior II. THE TEST SYSTEMS
of the test and training errors in different systems. Frequently the test
The data presented here is collected from two different
error of Artificial Neural Networks is presented with a monotonic
decreasing behavior as a function of the iteration number, while the systems.
training error also continuously decreases. The present paper shows A. Cruise Control Distributed System
examples where such behavior does not hold, with data collected
from systems where it is corrupted by either noise or actuation delay. The first one is from a first order system corresponding to a
This shows that selecting the best model is not a simple question and cruise control system as shown in equation 1:
points to automatic procedures for the selection of models as the best
solution to optimize their capacity, either with the Regularization or
1
Early Stopping techniques. H ( s) =
s +1 (1)
Keywords— Early Stopping, Feedforward Neural Networks,
Regularization, Test Error, Training Error, Weight decay. The cruise control system is distributed through three
processing nodes over a fieldbus according to the architecture
I. INTRODUCTION
presented in figure 1.
2000
1500
Heating thermocouple
Element
1000
500
Power Data
PC
Module Logger
0
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55
Fig. 2. Histogram of the number of samples versus the delay Fig. 4. Block diagram of the system.
introduced.
The Data Logger acts as an interface to the PC where the
controller is implemented using MATLAB. Through the Data
B. Reduced Scale Prototype Kiln Logger bi-directional information is passed: control signal in
The second system is a reduced scale prototype kiln affected real-time supplied by the controller and temperature data for
by measurement noise. Additional details about this system the controller. The temperature data is obtained using a
can be found in [6] and [8]. thermocouple.
The power module receives a voltage signal from the
The system is composed of a kiln, electronics for signal controller implemented in the PC, which ranges from 0 to
conditioning, power electronics module, cooling system and a 4.095V and converts this signal in a power signal ranging from
Data Logger from Hewlett Packard HP34970A to interface 0 to 220V.
with a Personal Computer (PC).
Heating
Element
10
between 750ºC and 1000ºC. A picture of the kiln and
9.5
electronics can be seen in figure 7.
Both systems were chosen because the effect of the 9
simulated system. 8
7.5
220V
AC Zero-crossing 8 bit binary D/A 7
5.5
Power Optic 5
Comparator 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Stage Decoupling
-3
x 10
6.2
5.8
5.6
5.4
5.2
4.8
III. TRAINING VERSUS TEST ERROR Fig. 9- The test and training error as a function of the
iteration for the inverse model of the first system with 4
For both systems training was performed for 10000
neurons.
iterations using the Levenberg-Marquardt algorithm [3] [4].
After each training epoch, the test sequence was evaluated to
allow permanent monitoring of the training and test error.
IV. DISCUSSION
Usually a train and a test sequence are used. The first one is
used to update the weights based in the error obtained at the As stated in the introduction, the literature presents
output and the second is used to test if the ANN is learning the frequently both the training and test errors with a monotonic
general behavior of the system to be modeled, instead of behavior. It can be seen from the two examples shown here
learning the training sequence. that such behavior is not always found in real systems.
Some authors consider using a third sequence to ensure that The examples show, in many situations, that both for direct
the ANN is able to generalize the behavior that is desired in and inverse models, while the training error is always
different situations or to compare the quality of different monotonic, the test error finds frequently hills and vales.
networks [5].
-4 -3
x 10 x 10
12 6
11 5.8
10 5.6
9 5.4
8
5.2
7
5
6
4.8
5
4.6
4
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 4.4
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
6.5
9
6
8
5.5
7
5
6
4.5
4
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
4
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
8.5 5 .6
8
5 .4
7.5
5 .2
7
6.5 5
6 4 .8
5.5
4 .6
5
4 .4
4.5
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
4 .2
0 100 0 200 0 3 000 400 0 500 0 6 000 70 00 8 00 0 9 000 10 000
Fig. 12- The test and training error as a function of the
iteration for the direct model of the first system with 8 Fig. 15- The test and training error as a function of the
neurons. iteration for the inverse model of the first system with 10
neurons.
a larger number of parameters it presents curves which are less
Figures 8 to 11 of the first model and all the figures of the common.
direct model of the second system show an initial phase of
learning, followed by a very flat zone where learning is very V. REGULARIZATION AND EARLY STOPPING
slow for the test and training error. The oscillations found in the test error of the systems chosen
In the rest of cases it is possible to find a very different as an example lead to the necessity of choosing carefully the
behavior for the train and test error. While the training error length of the training stage or to apply another solution to
continues to decrease with iterations, the test error shows obtain the best quality for the models.
frequently hills and vales. Two possible solutions that can be used to cope with these
The different tests shown in the figures correspond to the problems are Early Stopping and Regularization techniques.
two systems referred above, modeled with different number of
neurons. While usually a larger set of weights allows obtaining
a lower training error, the existence of more degrees of A. Regularization
freedom enable the test error to be composed of much more For the training algorithms that are based on derivatives the
ups and downs then it is the case with less parameters. first parameters to be updated are the ones with larger
influence in the criteria to be minimized, while in a second
x 10
-4 phase other less important parameters are updated.
3
-4
x 10
2.5 4.5
4
2
3.5
1.5
3
2.5
1
2
0.5
1.5
0 1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0.5
Fig. 16- The test and training error as a function of the
0
iteration for the direct model of the second system with 4 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
neurons.
Fig. 18- The test and training error as a function of the
x 10
-3 iteration for the direct model of the second system with 6
1.8 neurons.
1.6 -3
x 10
1 .8
1.4
1 .6
1.2 1 .4
1 1 .2
0.8 1
0 .8
0.6
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0 .6
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Fig. 17- The test and training error as a function of the
iteration for the inverse model of the second system with 4
Fig. 19- The test and training error as a function of the
neurons.
iteration for the inverse model of the second system with 6
neurons.
For the second system more figures are shown because, with
These last parameters to be updated are the ones responsible x 10
-4
6
for the overtraining problem by learning characteristics of the
training signal and the noise. The overtraining, i.e. excessive
5
training, situation results in a network that over fits the training
sequence but is not capable of the same performance with a
4
test sequence, because the ANN has learned details of the
training sequence instead of the general behavior.
3
One way to avoid this second phase in training is called
regularization and it consists of changing the criteria to be
2
minimized according to:
1
W (θ ) = V (θ ) + δ θ
2
(2) 0
where δ, the weight decay is a small value and is the 0 100 0 200 0 3 000 400 0 500 0 6 000 70 00 8 00 0 9 000 10 000
3
1 .6
2. 5
1 .4
2
1. 5 1 .2
1 1
0. 5
0 .8
0
0 1000 2 0 00 3000 40 0 0 5000 6 00 0 7000 8000 9000 10000
0 .6
1. 6 -4
x 10
2.5
1. 4
1. 2 2
1
1.5
0. 8
1
0. 6
0. 4
0 1000 2 0 00 3000 40 0 0 5000 6 00 0 7000 8000 9000 10000 0.5
neurons.
Fig. 24- The test and training error as a function of the
iteration for the direct model of the second system with 15
neurons.
x 10
-4
early stopping. One example of comparison of both techniques
13
in an automated procedure can be found in [7].
12
-3
x 10
11 1.6
10
1.4
9
1.2
8
7
1
6
0.8
5
0 1000 2 0 00 3000 40 0 0 5000 6 00 0 7000 8000 9000 10000
0.6
Fig. 25- The test and training error as a function of the
iteration for the inverse model of the second system with 15 0.4
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
neurons.