You are on page 1of 21

Chemometrics and

intelligent
laboratory systems
Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

Adaptive resonance theory based artificial neural networks for


treatment of open-category problems in chemical pattern
recognition - application to UV-Vis and IR spectroscopy
Dietrich Wienke *, Gerrit Kateman
Catholic University of Nijmegen, Laboratory of Analytical Chemistry, Toernooiveld 1, 6525 ED Nijmegen, Netherlands

(Received 11 January 1993; accepted 26 September 1993)

Abstract

A supervised classification problem is defined as open-categorical if the training classes are only partly known
beforehand according to their number, shape and size. This is the case, for example, in applications of spectroscopic
pattern recognition for automated materials sorting for recycling. Continuously modified or newly developed
materials require a continuous self-adapting pattern recognition technique (plasticity). A second, but contrary
demand is the robustness of the classifier against outlier pattern vectors (stability). This stability-plasticity dilemma
can partly be solved using Grossberg’s adaptive resonance theory based artificial neural networks (ARTS). The main
difference between ARTS like artificial neural networks to other types of artificial neural networks is that besides
the numerical size of the weights as fitting parameters, additionally the network structure itself (number of units,
dimensions of the weights matrices) is also not fixed forming a final result of the training process. Basic properties of
the ART-l technique for its potential application to chemical pattern recognition are elucidated by a classification of
simulated data, UV-Vis spectra of phenanthroline complexes and infrared reflectance spectra of optical glasses.

1. Introduction another example realize some principles of natu-


ral neural networks such as parallel information
Many chemometrically oriented papers were processing and distributed knowledge storage and
published during the last years dealing with retrieval. After a first cross reference of Jurs et
mathematical techniques of natural computing al. in 1969 [3] in connection with the classification
for applications in analytical chemistry and spec- of mass spectra, in 1975 Stonham et al. 141used
troscopy. The genetic algorithm as one example an ‘adaptive digital learning network’ to classify
for natural computing methods exploits the prin- mass spectra. Influenced by the introduction of
ciple of natural evolution for chemical-analytical the backpropagation learning rule in the begin-
applications [ 1,2]. Artificial neural networks as ning of the eighties analytical chemists and spec-
troscopists started to deal more intensively with
potential applications of neural networks. Thom-
son and Meyer 151and Anker and Jurs 161classi-
* Corresponding author. fied NMR spectra by means of multilayer feed-

Elsevier Science B.V.


SSDI 0169-7439(93)E0063-A
310 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

forward neural networks based on the backpropa- IR spectra by a parallel implementation of the
gation learning rule (MLF). Bos et al. 171 classi- network.
fied with the same network type data from ion- These four types of artificial neural networks
selective electrode arrays. Nakamato et al. [8] and used in analytical chemistry (MLF backpropaga-
Chay et al. [9] analysed data from multi-sensor tion network, BAM, Hopfield’s neural network
arrays for the differentiation of drinks and for and Kohonen self-organizing feature map) work
detection of odorants in general. Bos and Weber with a predefined rigid network design. This
[lo], Glick and Hieftje [ll] and Otto et al. [12] means that the number of layers, the number of
reported applications in different atomic spectro- units and the number of weights are fixed from
scopic methods such as X-ray fluorescence, glow the start of the training process up to the conver-
discharge and inductively coupled plasma (ICP) gence of the network. The numerical size of the
spectroscopy. Schierle et al. [13] used in place of weights is the only flexible parameter which can
a MLF network the bidirectional associative be used for storage and retrieval of information.
memory (BAM) for the qualitative interpretation Or, in other words, the structure itself of the
of ICP spectra. The interpretation of infrared neural network is not a free parameter during the
spectra by MLF neural networks was reported by learning process. A practical way out is the repe-
Robb and Munk [14], Wythoff et al. [15], Smits et tition of the training with the same data but
al. [16], Meyer and Weigelt [17] and Tanabe et al. different network designs or the rejection of cer-
[US].Nonlinear multivariate calibration with MLF tain less important weights as reported by Har-
networks in the field of near-infrared spec- rington [29] and by design of optimal minimal
troscopy was discussed by Long and co-workers neural networks as discussed by Borggaard and
[19,20]. Lang et al. [21] applied MLF networks to Thodberg [30].
analytical chromatography. An application of the However, a rigid network structure can be a
MLF neural networks to the interpretation of deficiency under certain circumstances. A
mass spectra was published by Lohninger [22]. In changed classification environment with a strong
a review article Zupan and Gasteiger [23] pre- deviating classification task, for example, can in-
sented additional chemical applications of neural fluence the learned and stored knowledge in an
networks outside analytical chemistry, for exam- undesirable direction. In such a situation the
ple, in chemical product soft modeling and chem- network provides uninterpretable results. Gross-
ical process control. In an overview article Jans- berg [31-341 called this problem the stability-
son [24] focussed on potential application of a plasticity dilemma.
third type of artificial neural networks, the Hop-
field type neural networks, in analytical chem- 1.2. Stability-plasticity dilemma of artificial neural
istry. A fourth type of artificial neural networks, networks
the unsupervised working Kohonen self-organiz-
ing features map, was used by Gross and Seibert Grossberg [31, p. 851 stated how a system’s
[25] to evaluate satellite images of remote spec- adaptive mechanism can be stable enough to re-
trochemical measurements of environmental data. sist environmental fluctuations, but plastic enough
They mapped the images by Kohonen’s method to change rapidly in response to environmental
and used the detected qualitative clusters to form demands. Further he asked how stability without
representative training classes for backpropaga- rigidity and adaptability without chaos could be
tion neural networks and supervised pattern achieved. Also in ref. [31, p. 1893 the aspects of
recognition with the vector quantizer method. the stability-plasticity dilemma are described,
Arrigo et al. [26] used the method to evaluate how internal representations can maintain them-
nucleic acid sequences. An application to a quan- selves in a stable fashion against the erosive ef-
titative structure-activity relationship (QSAR) fects of behaviorally irrelevant environmental
study has been reported recently by Ross et al. fluctuations yet can nonetheless adapt rapidly in
[27]. Melssen et al. [28] clustered a data base of response to environmental fluctuations that are
D. henke, G. Kateman /Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 311

crucial to survival. Grossberg searched for a solu- (features) in the input vector. The number of
tion that a network knows the difference between units in layer 2 is unlimited, in principle. How-
behaviorally irrelevant and relevant events even ever, in practice the actual number of units in
though its individual cells, or nodes, do not pos- layer 2 is determined by the actually detectable
sess this knowledge. For the practical realization number, k, of clusters in the input data. The
of such a new type of learning it is further impor- input data vector is transferred from the outside
tant how a network is able to transmute this world via the input layer and a transfer function
knowledge into the difference between slow and fl into layer 1. After processing in the network a
fast rates of adaptation, respectively. Based on corresponding output vector in layer 2 is ob-
biological, neurological and mathematical studies tained. This vector will be sent via another trans-
he developed an artificial neural network to over- fer function f2 to an output layer.
come the stability-plasticity dilemma. The new General desired properties for all kinds of
network differs significantly from other known artificial neural networks are robustness against
network types [35]. It is based on adaptive reso- events from outside, stability of recognition re-
nance theory (ART) which will be described in sults in a new classification environment and the
more detail in the theoretical part. The use of self-design of the network layout. Grossberg [371
ART based neural networks has been reported was able to realize simultaneously these three
outside analytical chemistry for classification of properties within his ART based neural network
simulated data by Carpenter and Grossberg [36- by introducing an unique operation called ‘feed-
421, for clustering of large data sets by Burke 1431 back by weights’. The output of layer 2 obtained
and for on-line recognition of hand-written let- from an input vector is sent back via the weight
ters by Gan [44]. Further citations referring to matrix W2 (top-down weights) and transfer func-
applications of modified ART algorithms in im- tion fl to the layer 1. This feedback operation
age processing, speech recognition and sonar sig- occurs before an output vector is sent via the
nal interpretation can be found in Carpenter and output layer to the outside world. This feedback
Grossberg [42]. causes every new input coming from outside to
The present work deals with properties and propagate two times through the neural net: one
behavior of the ART-l algorithm in a computer time through the ‘bottom-up weights’ Wl and
simulation study and in two chemical-analytical one time via feedback through W2. The network
application cases for the classification of UV-Vis tries to adjust the two weight matrices Wl and
spectra of coloured phenanthroline complexes W2 in such a way that the original input vector
and for the classification of Fourier transform and the, by means of Wl and W2, ‘filtered’ input
infrared reflectance spectra of glasses with differ- vector match each other. To reach the feedback
ent compositions. The aim of this first study is to behavior Grossberg linked two neural network
introduce basic ideas of ART for chemists. elements called ‘instar’ and ‘outstar’. The instar
network is a vector detector and operates with a
list of neurons which has an ‘unlimited’ length, in
principle. If the first input vector is trained, a
2. Adaptive resonance theory (ART) first neuron from this list is set equal to one. If a
new training vector is offered which differs signif-
A general design of an artificial neural net- icantly from the first one a new second neuron is
work based on adaptive resonance theory (ART) activated in the list and in this way the dimension
shows at least one input layer, two working layers of the weight matrix is enlarged. The first neuron
1 and 2 and one output layer. The layers 1 and 2 and all other potential neurons are set to zero.
are connected with each other in a forward mode Thus, the instar generates a vector of limited
by a weight matrix Wl and in a feedback mode by length, k, of neurons actually in use where one
a weight matrix W2. The number of units in layer neuron is set active (‘1’) and the others are set
1 corresponds to the number, m, of variables non-active (‘0’). The instar uses for stepwise
312 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

adaptation of the j xi weight matrix Wl the the name ‘adaptive resonance theory based artifi-
learning rule cial neural networks’ (ART).
A test vector xj not belonging to any previous
(1) trained class will not stimulate resonance be-
with 77= learning rate, xi = input vector of di- tween network and test input. On the other hand,
mension 1 x m with j = l... m, a(i) = output such a strong deviating vector will not destroy the
vector of dimension 1 x k with i = 1.. . k, m = previously stored knowledge (network stability).
number of input features, k = number of output The significantly deviating test vector at first
neurons actually in use corresponding to the stimulates the enlargement of the layer 2 by link-
number of classes. ing one additional new output neuron to it and
An outstar artificial neural network does the enlarging in this way the dimension k of the
opposite compared with an instar network. A list weight matrices Wl and W2 by one.
of k neurons is presented to the outstar network
were the ith neuron is firing (‘1’) and the others 2.1. Competitive learning and the novelty detector
are set to non-active (‘0’). Then from the outstar
an m-dimensional output vector is obtained, for The self formation of classes requires a contin-
example, a continuous spectrum corresponding to uous update of the dimension k of the weight
the class i assigned by the firing outstar neuron. matrices Wl and W2. The number k of i = 1.. . k
For updating the i X j weight matrix W2 the output units in layer 2 increases with the number
outstar uses the following learning rule of detected clusters. The decision if an additional
dWZi,j=?) * [t%(i)- ’ (k + 0th output neuron is needed or not is based
w2i,j] * xj
(2) on a decision coming from a so-called novelty
with xj = output vector of dimension 1 x m with detector. It works as follows. Every new input
j= l... m, a(i) = input vector of dimension 1 x k vector xj stimulates via its dot product with Wl
with i= l...k. simultaneous activities of all k output neurons in
By a link of instar and outstar neural networks layer 2. Every excited ith output neuron stimu-
Grossberg created a feedback cycle as a basis for lates via W2 the formation of a vector x:(i) in
the self-stabilizing properties of such network layer 1. Based on vector similarity between the
types. ith xj* and the xr a winning neuron in layer 2 can
If a test vector xj, which does not differ from be selected. A high vector similari~ is mathemat-
the training vectors, is presented to such a com- ically expressed by a small vector angle B
pletely trained neural network it will stimulate a
neuron in the second layer to fire. This firing ej(i)[xj, xi*(i)]

neuron in the outstar stimulates via W2 the for- =arcoss[x~xI*(i)]/[~~~~/~*~(x~(i)~~] (3)


mation of a vector x,? in layer 1 which will be
identical to the original test vector xj. The trained At the end, one remaining ith output neuron
neural network comes into perfect resonance with in layer 2 will dominate the competition within
the test vector presented to the input layer. the feedback cycle. Its corresponding vector xj(i)
If a slightly differing test vector is presented to will be compared with the input vector X~ using a
such a trained network that is not significantly vigilance parameter p with
different from the training vectors then the net-
work at first slightly adapts its actual weights Pj(i) =XjTXj*(i)/llXjll (4)
using Eqs. (1) and (2). The network shows a which looks similar to the vector angle in Eq. (3).
certain plasticity. Secondly, resonance occurs A marginal vigilance parameter prnax determines
again between the slightly deviating test vector how much variance within every class box may be
and the trained and new adapted neural network. tolerated, in principle. If pj(i) is smaller than
That is why Grossberg has introduced for his new P max then xi is ‘in resonance’ with x,?(i). In the
type of neural networks with feedback weights following calculation step the weights in Wl and
D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 313

W2 corresponding to the ith output neuron will and in layer 2 the linear increasing transfer func-
be adapted using Eqs. (1) and (2). This adapta- tion
tion reinforces the activity of the ith neuron in
layer 2 compared with its activity after a first
initialization. In this way its relative contrast to
other active neurons is enhanced. Sirnultane- with 6, bias constant and b, amplitude constant.
ously, the activity of these other neurons is low-
ered in a future competition if a new input pat- 2.4. Other ART algorithms
tern is presented that resonates again with the
ith neuron. The completed process is called com- Carpenter and Grossberg [36,37,39,41,42] de-
petitive feedback learning. veloped a series of different ART algorithms.
If pi(i) exceeds prnaxthe network is not able to The main difference between ART-l 1371 and
classify the input vector in any previously formed ART-2 1361and ART-2a [41] is that both ART-2
class box. The network itself concludes that it has algorithms are able to process not only binary
detected a ‘novelty’. Thus, the network decides to coded but continuous data types. The ART-3
create an additional output neuron k + 1 in layer algorithm [39] uses in place of layer 1 and layer 2
2. It increases the dimension of Wl and W2 by additional hidden layers for the treatment of more
one and adapts the additional weights immedi- complicate cluster structures. FUZZY-ART [42]
ately to the deviating input vector using Eqs. (1) can learn binary as well as continuous coded
and (2). feature vectors. ART-MAP [45] has been devel-
oped recently to project two feature spaces i and
2.2. Gain control j onto each other. In this paper we restrict our-
selves to the study of the properties of the ART-l
If, for example, the outstar neural network in strategy for its potential use in the pattern recog-
layer 2 is stimulated independently, it generates a nition of optical spectra. The study of other ART
vector x,? in layer 1. This is again a new input algorithms will be part of future work.
stimulus for layer 2. This can be an undesired
source for cyclic vibrations between layers 1 and 2
without any reliable input. To overcome such 3. Experimental
potential ‘ghost activities’, a bias is used as gain
control. The gain control only opens the gate for 3.1. Experimental 1. W-l/is spectroscopy of chem-
cyclic information flow between layer 1 and layer ical complexes
2 after a new vector has been presented to the
input layer. The classical phenanthroline method [461 is a
technique for the photometric determination of
Fe in the ppm and in the sub-ppm concentration
2.3. Transfer functions range. Fe forms an intensively yellow coloured
Fe-phenanthroline complex (Fe-Phen) having a
The central role of the type of the transfer spectrum with a maximum absorbance between
functions for the effect of a single neuron was 480 and 550 nm. Wienke [471 has shown recently
considered in numerous papers and books. In this that Co forms with phenanthroline an intensively
paper we focus first on the ART-l artificial neu- orange coloured complex (Co-Phen). Its maxi-
ral networks with binary scaled input data. Such mum UV-Vis absorbance near 220 nm continu-
an ART-l based neural network uses in layer 1 a ously decreases to the noise level near 650 nm in
hard limiter transfer function f1 with contrast to the Fe-Phen spectrum. Examples of
the complete absorbance spectra of Fe-Phen and
(5) Co-Phen between 380 and 670 nm were given in
ref. [47].
314 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

Twelve parallel samples of binary aqueous tween 0.3 and 0.8 in the chosen optical compro-
mixtures containing 30 kg Co and 3 c1.gFe were mise range. After that the 12 pure spectra of
prepared according to ref. [46]. The 12 ab- Co-Phen and of Fe-Phen were ranked within
sorbance spectra were recorded in the optical their group in a random order. Then two corre-
range between 380 and 670 nm and numerically sponding spectra from every set were added at
sampled at 13 equidistant wavenumbers in steps every wavenumber (Table 1, LINC014-24 from
of 500 cm-’ (computer driven spectrophotometer ‘linear combination’).
M42, Carl Zeiss, Germany) (Table 1, MIXl-12). Thus, for the pattern recognition study with
Twelve aqueous solutions containing only 3.0 Fg the ART-l neural network two classes (MIX,
Fe were prepared independently using the same LINGO) individually containing 12 experimental
phenanthroline method [46]. In the same way 12 UV-Vis spectra sampled at 13 wavenumbers were
aqueous solutions containing only 30 Fg Co were available (Table 1).
independently prepared. From the 24 pure solu-
tions absorbance spectra were recorded in the 3.2. Experimental 2. Infrared reflectance spec-
same optical range sampled at the same 13 troscopy of ternary Al,O,-CaO-SO, glasses
wavenumbers. By use of a quartz cuvette with 2.0
cm thickness the spectra of the Co-Phen and In the study, 66 differently composed glasses
Fe-Phen solutions reached absorbance values be- were included. The simultaneously changed con-

Table 1
In the pattern recognition study used 24 experimentally measured UV-Vis absorbance spectra of mixtures of Fe-phenanthroline
and Co-phenanthroline in equally concentrated aqueous solutions containing 30 ug Fe and 3 ug Co. MIXl-12 are spectra of 12
chemically prepared binary mixtures of Fe and Co. Spectra LINC012-24 are pairwise linear combinations of 12 separately
prepared pure solutions of Co-phenanthroline and Fe-phenanthroline (mathematically prepared mixture spectra, see text)
Sample name Wavenumber/cm-’
and category
19000 19500 20000 20500 21000 21500 22000 22500 23000 23500 24000 24500 25000
MIX1 1.070 1.166 1.13 1.13 1.116 1.07 1.021 0.98 0.94 0.898 0.842 0.818 0.790
MIX2 1.075 1.176 1.17 1.145 1.121 1.09 1.028 0.998 0.945 0.901 0.854 0.828 0.802
MIX3 1.054 1.17 1.11 1.115 1.10 1.06 0.99 0.962 0.919 0.878 0.820 0.796 0.764
MIX4 1.075 1.178 1.152 1.13 1.136 1.07 1.001 0.975 0.936 0.886 0.835 0.813 0.770
MIX5 1.075 1.21 1.17 1.142 1.119 1.09 1.008 0.98 0.925 0.881 0.832 0.799 0.772
MIX6 1.075 1.22 1.18 1.15 1.14 1.096 1.021 0.99 0.935 0.887 0.838 0.802 0.784
MIX7 1.04 1.188 1.172 1.16 1.145 1.09 1.021 0.97 0.935 0.890 0.840 0.792 0.778
MIX8 1.055 1.20 1.204 1.172 1.155 1.112 1.055 0.982 0.948 0.905 0.857 0.825 0.784
MIX9 1.06 1.193 1.158 1.152 1.137 1.082 1.024 0.986 0.948 0.888 0.855 0.814 0.788
MIX10 1.065 1.202 1.19 1.156 1.155 1.11 1.04 1.012 0.957 0.908 0.860 0.816 0.794
MIX11 1.04 1.17 1.143 1.128 1.105 1.08 1.Ol 0.982 0.945 0.906 0.865 0.822 0.803
MIX12 1.045 1.185 1.16 1.135 1.11 1.09 1.018 0.988 0.95 0.910 0.867 0.827 0.811

LINC013 1.038 1.181 1.147 1.15 1.132 1.11 1.022 0.968 0.916 0.872 0.806 0.756 0.695
LINC014 1.052 1.188 1.173 1.18 1.166 1.121 1.007 0.992 0.931 0.883 0.812 0.768 0.706
LINCOlS 1.048 1.194 1.174 1.155 1.139 1.086 1.022 0.98 0.933 0.873 0.826 0.744 0.736
LINC016 1.062 1.218 1.192 1.175 1.159 1.105 1.035 0.998 0.942 0.890 0.835 0.755 0.747
LINC017 1.051 1.221 1.194 1.162 1.16 1.107 1.02 0.995 0.938 0.884 0.814 0.768 0.718
LINC018 1.073 1.235 1.214 1.178 1.156 1.113 1.032 1.005 0.961 0.898 0.826 0.778 0.727
LINC019 1.052 1.186 1.154 1.16 1.147 1.09 1.017 0.987 0.936 0.860 0.805 0.759 0.714
LINC020 1.075 1.227 1.186 1.188 1.169 1.122 1.025 0.996 0.994 0.887 0.816 0.765 0.728
LINC021 1.068 1.205 1.157 1.15 1.148 1.105 1.02 0.989 0.931 0.880 0.816 0.771 0.718
LINC022 1.072 1.225 1.188 1.161 1.16 1.093 1.037 0.997 0.938 0.886 0.828 0.773 0.740
LINC023 1.053 1.186 1.177 1.138 1.13 1.092 1.005 0.967 0.938 0.854 0.81 0.781 0.697
LINC024 1.063 1.211 1.193 1.149 1.142 1.12 1.088 0.986 0.938 0.873 0.825 0.764 0.718
D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 315

tents of the three oxides ranged between Al,O, artone1.m routine from the MATLAB Neural
= 0.00-36.67 m%, CaO = 10.00-55.0 m% and Networks Toolbox from Demuth and Beale [52].
SiO, = 26.89-80.00 m%. After careful polishing The program concerns visualization of the raw
of the sample surface every glass was character- experimental spectra, graphical two-dimensional
ized by its Fourier transform infrared reflectance principal components mapping, feature autoscal-
(FT-IRR) spectrum in the range 400-1600 cm-’ ing and feature range scaling, feature binary cod-
with a resolution of 1.9285 cm-’ under a con- ing, unsupervised ART-l learning, supervised
stant angle of excitation of 57”. Every sample ART-l classification and graphical visualization
spectrum was obtained from an accumulation of of the weight matrices.
50 repeated scans over the chosen optical range.
Further details of the used standard procedure of
ET-IRR measurements at glass surfaces can be 4. Results and discussion
found in ref. [48]. The 66 raw spectra were re-
duced to the optical range 400-1200 cm-‘. In 4.1. A computer simulation study
this wavenumber region the most important vi-
brations of the ternary Al,O,-CaO-SiO, glasses Eight simulated training spectra and three
can be observed 149,501. The raw high resolution simulated test spectra (Table 2) were used. A
spectra were equidistantly sampled at only 69 visual inspection of the 11 simulated spectra (Ta-
wavenumbers to reduce the amount of data and ble 2, Fig. 1) shows that the training spectra
because of the low number of bands and their LEARNl-4 (Fig. la) form one group and that
broad distributions over the considered spectral the training spectra LEARNS-8 (Fig. la) form a
range. second group with a certain variation within every
Thus, a matrix of 66 spectral pattern vectors single group. The test spectrum TEST1 (Fig. lb)
described by 69 features formed the input data seems to belong to the group LEARNl-4, the
set for the second experimental ART-l study. test spectrum TEST2 seems to belong to
LEARNS-8 and the test spectrum TEST3 seems
3.3. Computational to form a third cluster well separated from the
other spectra. An artificial neural network was
The computer program was written by the designed with 13 input units corresponding to the
authors in MATLAB [511 with partial use of the m = 13 input variables (wavelength intervals) of

Table 2
Computer simulated absorbance spectra sampled at 13 equidistant wavelengths. Two groups LEARNl-4 and LEARNS-8 of in
total eight training spectra and three test spectra TESTl-3 were generated numerically
Name of spectrum Simulated absorbance at wavelenath/nm
- I

220 240 260 280 300 320 340 360 380 400 420 460 480

LEARN1 0.7 1.0 0.8 0.7 0.6 0.6 0.4 0.4 0.4 0.3 0.1 0.1 0.0
LEARN2 0.9 1.0 0.9 0.8 0.6 0.6 0.6 0.5 0.4 0.2 0.0 0.0 0.0
LEARN3 0.6 0.7 1.0 0.9 0.7 0.7 0.7 0.5 0.4 0.4 0.4 0.2 0.0
LEARN4 0.7 0.9 1.0 0.8 0.6 0.5 0.4 0.4 0.1 0.1 0.0 0.0 0.2

LEARN5 0.2 0.1 0.4 0.6 0.7 0.7 0.6 0.6 0.8 0.9 1.0 0.9 0.9
LEARN6 0.1 0.3 0.4 0.4 0.6 0.5 0.6 0.7 0.7 0.9 1.0 0.8 0.7
LEARN7 0.0 0.1 0.1 0.4 0.6 0.6 0.6 0.7 0.8 0.9 0.9 0.9 1.0
LEARN8 0.1 0.2 0.4 0.5 0.8 0.7 0.6 0.7 0.8 1.0 0.8 0.7 0.6

TEST1 0.7 0.9 0.9 0.8 0.6 0.6 0.5 0.5 0.3 0.2 0.1 0.1 0.0
TEST2 0.1 0.2 0.4 0.4 0.5 0.6 0.6 0.7 0.7 0.9 0.9 0.8 0.8
TEST3 0.1 0.2 0.3 0.5 0.7 0.9 1.0 1.0 0.9 0.7 0.2 0.1 0.1
316 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

the it = 11 pattern vectors (spectra). A pool of 20 visual impression obtained from the spectra plots
output units in layer 2 was chosen. The ab- in Fig. 1. Looking further to the results in Table 2
sorbances of the original spectra from Table 1 it is obvious that a certain increase in the vigi-
were binary coded between 0 and 1. One possibil- lance parameter from 0.2 up to 0.9 is followed by
ity for binary coding of continuous data will be the formation of more subclusters within both
given in detail in the next chapter. groups of spectra. Anyway, after formation of the
Five computer experiments were carried out class boxes the network is able to classify the test
by using these 11 binary coded spectra. In every spectra TEST1 and TEST2 into the correct sub-
single computer experiment the ART-1 neural groups. The third test spectrum TEST3 stimu-
network was at first trained with the eight train- lates in every case the formation of a distinguish-
ing spectra LEARNl-8. Then the trained neural ing new class box.
network was used for a reclassification of the
training spectra LEARNl-8 (recognition step)
4.2. Experimental 1. W-Vii spectra of phenan-
and for classification of the test spectra TESTl-3
throline complexes
(prediction step). The only difference between all
five experiments was an increasing size of the
vigilance parameter p chosen in the steps of 4.2.1. Raw spectra
p = 0.2, 0.7, 0.78, 0.80, 0.90. A score plot of the 24 spectra obtained by
From the first classification results (Table 3, principal component analysis (PCA) of the au-
columns 1 and 2) can be seen that the ART toscaled 24 x 13 absorbance matrix from Table 1
network decides to form two output units corre- shows a complete separation of the 12 spectra of
sponding to two detected clusters in the training the MIX group from the 12 spectra of the LINCO
step for p = 0.2. In the test step for p = 0.2 the group in the direction of the second principal
network classifies the spectra TEST1 and TEST2 component (Fig. 2). Wienke [47] described as
into these two formed and trained class boxes. reason for the clear two-group separation of the
However, the spectrum TEST3 belongs neither to 24 spectra a disturbed additivity in the wavenum-
the class 1 nor to the class 2. The network de- ber range between 23 500 and 25 000 cm-
cides to create an additional third output unit.
Thus, the network has detected two clusters in
the training and in the test spectra and one third
cluster in the test set. This corresponds to the
D. Wienke, G. Ka!eman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 317

4.2.2. Binary coding of the spectra 13 absorbance matrix of the 24 experimental


For binary coding of the 24 spectra their 13 UV-Vis spectra (Table 4). To prevent that any
features were autoscaled with the total mean and spectrum gets only zeros as features, the zeros
the total standard deviation over all 24 spectra and ones for wavenumber 25 000 cm- ’
for every single wavenumber. Additionally the 13
features were range scaled between 0 and + 1.
Because of outliers and deviations from normal
distribution the median as robust mean value [56]
was used as a decision threshold for a binary
coding of the between 0 and 1 scaled ab-
sorbances. This resulted in the binary coded 24 X

ABSORBANCE (simulated)

18
1.2

200 240 280 320 360 400 440 480


WAVELENGTH / nm

ABSORBANCE (simulated)

200 240 280 320 360 400 440 480


WAVELENGTH / nm
Fig. 1. (a) The linear connection of the 13 sampling points (‘wavelengths’) for the simulated training spectra from Table 1
LEARNl-4 (a) and LEARNS-8 (+I shows two groups of spectra with a certain within-class variation. (b) A visual comparison
with Fig. 2a illustrates the graphical class membership of the test spectrum TEST1 ( - ) to group LEARNl-4 and of TEST2 (+ ) to
group LEARN%% TEST3 (0) seems not to belong to any of the hvo groups of the training spectra.
318 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

weak separation. For the other remaining optical complete set of 24 spectra was presented 30 times
regions no univariate separation can be observed in random order to the network. A maximum of
in Table 4. 16 epochs was found enough to reach full training
convergence for the slowest learning rate n = 0.1.
4.2.3. Vigilance parameter p and learning rate q This observed fast training convergence of the
As already shown in the simulation study (Ta- neural networks weight matrices is in accordance
ble 3) an increasing value of pmax provides a finer with the general known fact that ART based
resolution of the data set into more subclusters. neural networks can be trained much faster than,
The learning rate 77 as linear parameter in the for example, backpropagation rule based MLF
learning rules (1) and (2) is expected to have an neural networks. A more impressive example for
influence on the category formation, too. To an- fast convergence of ART neural networks is dis-
swer this question more quantitatively 77has been cussed by Carpenter and Grossberg in ref. [42]
systematically varied for the classification prob- where a speed ratio of 5
lem of the 24 binary coded UV-Vis spectra from
Table 4. To do this the neural network was de-
signed with 13 input units corresponding to the
13 wavenumbers and a large enough chosen pool
of 100 output units. However, this pool presents
only potential output units and the network will
decide by itself whether it needs output units
from this pool or not. For every setting combina-
tion of n and p the network was trained. The

0.3
l 20
l 14
l 17
T 0.2 - * * * 18
.m
c 13 * 19 24 * 16
2 * 21 * 22
Eli 0.1 - * 15
x l23
d

ii O
z 07 06
g -0.1 - OS 08
g 0 10

2 -a2 - o3 04 09

si
g -0.3 - 01 02

0 12
0 11
-0.4 I
0.05 0.1 0.15 0.2 0.25 0.3 0.35
PRINCIPAL COMPONENT 1 (47.2 % variance)

Fig. 2. Principal components score plot of the 24 experimental UV-Vis spectra from Table 3 in the space of the first hvo principal
components. The numbers correspond to the mixture spectra MIXl-12 (0) and to the sample numbers of the mixture spectra
LINC013-24 (* ).
D. Wienke, G. Kateman /Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 319

to ‘neglect’ a large part of its previously learned


number of formed clusters
26 knowledge. For n > 1.25, it tends to forget the
1 20
most recently trained knowledge, too. This large
20- n resulted for the 24 UV-Vis spectra, for exam-
ple, in a generation of continuously new class
16 - boxes. After one epoch for n = 1.25 24 output
neurons were active. After a second epoch 48
10 - output neurons, etc. From the finally selected
classification results (Table 5) it can be seen how
larger clusters tend to split into more smaller
subclusters if n is forced. In Table 5 (column
n = 0.5) the MIX group splits from five clusters
6.1 0.3 0.6 0.7 0.9
(2,4,5,6,8) into seven clusters (2 >5 3677 78710,13) for
learning rate v = 0.7. One can further see that the MIX and
z the LINCO groups do not share any common
Fig. 3. Effect of increasing learning rate n on the number of
cluster. In other words, the formation of new
detected clusters in the binary coded experimental UV-Vis
spectra from Table 4 using all 13 wavenumbers as input subgroups occurs separately for each class. This
features to the ART-l artificial neural network. means that no ‘class box overlap’ between the

Table 4
In the pattern recognition study used binary coded experimental UV-Vis spectra from Table 3 (For details of the coding procedure
see text)
Sample Wavenumber/cm-’
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0
MIX1 1 0 0 0 0 0 1 0 1 1 1 1 0
MIX2 1 0 0 0 0 0 1 1 1 1 1 1 0
MIX3 0 0 0 0 0 0 0 0 0 0 0 1 0
MIX4 1 0 0 0 0 0 0 0 0 0 1 1 0
MIX5 1 1 0 0 0 0 0 0 0 0 1 1 0
MIX6 1 1 1 0 0 1 1 1 0 1 1 1 0
MIX7 0 0 0 1 1 0 1 0 0 1 1 1 0
MIX8 0 1 1 1 1 1 1 0 1 1 1 1 0
MIX9 0 0 0 1 0 0 1 0 1 1 1 1 0
MIX10 1 1 1 1 1 1 1 1 1 1 1 1 0
MIX1 1 0 0 0 0 0 0 0 0 1 1 1 1 0
MIX12 0 0 0 0 0 0 0 1 1 1 1 1 0
LINC013 0 0 0 0 0 1 1 0 0 0 0 0 1
LINC014 0 0 1 1 1 ’ 1 0 1 0 0 0 0 1
LINC015 0 1 1 1 0 0 1 0 0 0 0 0 1
LINC016 1 1 1 1 1 1 1 1 1 1 1 0 1
LINC017 0 1 1 1 1 1 0 1 1 0 0 0 1
LINCOl8 1 1 1 1 1 1 1 1 1 1 0 0 1
LINC019 0 0 0 1 1 0 0 1 0 0 0 0 1
LINC020 1 1 1 1 1 1 1 1 1 1 0 0 1
LINC021 1 1 0 0 1 1 0 1 0 0 0 0 1
LINC022 1 1 1 1 1 1 1 1 1 0 0 0 1
LINC023 0 0 1 0 0 0 0 0 1 0 0 0 1
LINC024 1 1 1 0 1 1 1 0 1 0 0 0 1
320 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

MIX and the LINCO groups exists. This result 4.2.5. Effect of the selection of input features on
agrees with the graphical impression from the the ART-l neural network’s self-organizing behau-
PCA scores plot (Fig. 21. ior
The classical univariate Fisher weights [53] as
4.2.4. Interpretation of the weight matrices given in Fig. 5 as one kind of expression for the
The similarity between all spectra that were separability of the MIX group from the LINCO
classified in the same ith class box with its corre- show the optical range > 23 500 cm-’ to be the
sponding ith column in Wl and with its ith row most differentiating one. This corresponds to the
in W2 becomes obvious by a look into Hinton graphical impression of univariate sorting of ze-
diagrams (Fig. 4). The largest weights announce ros and ones in the binary coded data set (Table
the most frequently appearing joint features for 4) at the position of these wavenumbers. The
one class i. Small weights indicate unimportant optical range from 22000 cm-’ up to 23 500
features for that class. From Fig. 4 it is obvious cm-’ (Fig. 5) shows in contrast to that low Fisher
that the columns 5-24 within Wl and the rows weights as expression for a kind of random order
5-24 within W2 have the same constant size. of the MIXl-12 and LINC013-24 spectra.
That simply means that only the k = 4 output The following results of a wavenumber selec-
neurons are linked to the network. All other tion experiment (Table 6) with the 24 UV-Vis
neurons (k > 4) belong to the pool of potential spectra illustrate how the rejection of subsets of
but actually not required output neurons. When a wavenumbers influences the self-organizing be-
special pattern recognition task has to be solved havior of an ART based neural network. A step-
in the future, for example, that requires k > 4 wise elimination of wavenumbers with a low in-
output neurons then the network will be able to formation content about the separability of the
link additional output units from that pool to its MIX and the LINCO classes from column 1
network structure. (Table 6) to column 5 (Table 6) provides fewer

.7 qBe3mEEiEEiBm~
t-4q S q 2gs q ~BBEBB
m q BmBmEEIEaBBBBB'
a q mmmmmmmmm~ q
q mmmmmmmmmmmm
q mmmmmmmmmmmm
q mmmmmmmmmmmm
q mmmmmmmmmmmm
.- rlq mmmmmmmmmmmm
q mmmmmmmmmmmm
q mmmmmmmmmmmm

L__
m q mmmmmmmmmmmm
u-l q mmmmmmmmmmmm
6 q mmmmmmmmmmmm
q mmm03mmmmmmmm
q mmmmmmmmmmmm
d q mmm~mmmmmmmm
q mmmammmmmmmm
q mmmr3mmmmmmmm
q mmmmmmmmmmmm
q mmmmmmmmmmmm
q mmmmmmmmmmmm
q BmBBBBBBmBBm
q mmmmmmmmmmmm
19000 -22Mx) 2X00
CLASS i @ WAVENUMBER j/cm’
Fig. 4. Graphical presentation of the weight matrices Wl and WZ of a trained ART-l neural network by Hinton diagrams. Every
single ‘cross box’ corresponds to one weight respective to one matrixelement in Wl (a) or WZ (b). The surface of every single cross
box corresponds to a number between 0 (no box) and 1 (large box). The network was trained with the 24 binary coded experimental
UV-Vis spectra from Table 4 using 13 wavenumbers as input features (vigilance parameter p = 0.5, learning rate q = 0.1).
D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 321

Table 5 4.3. Experimental 2. Infrared reflectance spec-


Effect of increasing learning rate 11 on the individual class troscopy of ternary A120,-CaO-SO, glasses
membership of all 24 experimental UV-Vis spectra (Tables 3
and 4) obtained by the ART-l neural network for constant
vigilance parameter p = 0.5 and by using all 13 wavenumbers 4.3.1. Results of a multivariate statistical study
as input features (see also Fig. 4). The numbers represent the A recent study of the relations between the 66
name of a cluster given by the network and in this way the glass spectra performed by Wienke 1471 showed
individual class memberships for a sample in this cluster that the 69-dimensional wavenumber space can
Sample 17 be mapped down to a two- or three-dimensional
0.1 0.3 0.5 0.7 principal component space (Fig. 6a). By means of
MIX1 2 5 5 6 a hierarchical cluster analysis using Ward’s
MIX2 2 5 5 6 method [57-591 two well separated large clusters
MIX3 2 2 2 2 including 39 and 27 glasses were detected in the
MIX4 2 2 2 2
direction of the second principal component (Fig.
MIX5 2 2 2 2
MIX6 3 4 4 5
6b). A correlation analysis of all spectral principal
MIX7 2 7 6 8 components to the gross composition of the
MIX8 3 7 8 10 glasses yielded the result that the second princi-
MIX9 2 5 5 8 pal component is highly correlated (R = 0.944)
MIX10 3 8 8 13
with the Al,O, concentration. Moreover, the cor-
MIX11 2 5 5 7
MIX12 2 5 5 7 responding VARIMAX rotated eigenvector
(‘varivector’ 1531) of principal component 2
LINCOl 1 3 7 4
showed - except in the region between 590-610
LINC02 1 1 1 3
LINC03 1 3 3 1 cm-’ - a spectral band which exactly corre-
LINC04 4 6 10 15 sponds to the maximum position of the main
LINC05 1 1 1 12 band in the reflectance spectrum of pure (Y-
LINC06 4 6 9 14 Al,O,. These results obtained from the classical
LINC07 1 1 1 3
multivariate statistical analysis of the 66 FT-IRR
LINCOS 4 6 9 14
LINC09 1 1 7 9 spectra were in agreement with the hypothesis of
LINCOlO 4 6 9 14 Yamane and Okuyoma [54] about an abrupt in-
LINCOll 1 3 3 1
LINC012 1 3 7 11
Fisher weight
31

2.5
but larger classes. This gives a purer separation 2.27
2.18
into fewer class boxes. A stepwise elimination of 2
wavenumbers with high separation power (from
column 6 to 9, Table 6) provides in contrast to
that an increasing number of subclusters up to a
sharing of common clusters between both groups
of spectra. The pattern MIX2, MIXlO, LINC04,
LINC06 and LINC08 were now wrongly classi-
fied, for example, in one common class box 7
19 19.5 20 20.5 21 21.5 22 22.5 23 23.5 24 24.5 25
(Table 6, column 8). The results show how an
ART-l neural network can fail in its self-organiz-
ing behavior if the set of input variables (here: a wavenumber / lOOO*cm
-1

suitable set of selected wavenumbers) has been


Fig. 5. Univariate Fisher weights (height of the bars) as
chosen without any strategy. Not all of the wrong statistical measure for the separation between the MIX and
or noisy information can be compensated for by the LINCO spectra calculated for all 13 wavenumbers (data
simple adaptation of some corresponding weights. taken from Table 3).
322 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

crease of the coordination of A14+ to A16+ by served as smah simuItaneous changes in the ab-
increase of the relative Al,O, concentration in sorbances over the entire FT-IRR spectrum pro-
the glasses to values higher than lo-15 m%. A viding a clustering of the 66 spectra (Fig. 6a and
higher coordination number of A16+ causes a b).
changed environment for the Al atom. This can
be observed as discontinuities in the X-ray Al K, 4.3.2. Results of an ART-l study
fluorescence shifts f-541.The new molecular bonds The 66 FT-IRR spectra were binary coded as
of the Al species to its neighbors cause tensions. described above for the UV-Vis spectra. Then
This causes finally a change of the glass network they were presented to an ART-l neural network
structure, too. This abrupt change can be ob- in random order having input units corresponding

Table 6
Effect of directed selection of subsets of wavenumbers as input features on the individual class membership of all single 24
experimental W-Vis spectra (Tables 3 and 4) obtained with the ART-l artificial neural network (constant p = 0.25, constant
1) = 0.3)
Name of the Used subset of wavenumbers/cm-’
spectrum 22000 22000 22000 22000 22000
22500 22500 22500 22500 22500 22500
23000 23000 23000 23000 23000 23000 23000
23500 23500 23500 23500 23500 23500 23500 23500
24000 24000 24000 24000 24000 24000
24500 24500 24500 24500 24500 24500 24500
25000 25000 25000 25000 25000 25000 25000
MIX1 2 2 2 2 2 2 2 2 5
MIX2 4 2 2 2 2 4 4 7 9
MIX3 2 2 2 2 2 2 2
MIX4 2 2 2 2 2 2 _ 2
MIX5 2 2 2 2 2 2 _ 2
MIX6 4 2 2 2 2 4 4 6 4
MIX7 2 2 2 2 2 2 2 2 2
MIX8 4 2 2 2 2 4 2 2 5
MIX9 4 2 2 2 2 4 2 2 5
MIX10 4 2 2 2 2 4 4 7 9
MIX11 2 2 2 2 2 2 2 2 5
MIX12 4 2 2 2 2 4 4 5 6

LINCOl 1 1 1 1 1 1 1 1 1
LINC02 1 1 1 1 1 1 3 3 3
LINC03 1 1 1 1 1 1 1 1 1
LINC04 3 4 4 3 3 5 6 7 8
LINCOS 3 3 3 1 1 3 3 4 3
LINC06 3 3 3 3 1 3 6 7 7
LINC07 1 1 1 1 1 1 3 3 3
LINCO8 3 3 3 3 1 3 6 7 7
LINC09 1 1 1 1 1 1 3 3 3
LINCOlO 3 3 3 1 1 3 5 4 7
LINCOll 1 1 1 1 1 1 1
LINCOl2 1 1 1 1 1 1 1 1 1
LINC012 1 1 1 1 1 1 1 1 1
The block on the left-hand side (columns l-5) illustrates the influence of the stepwise rejection of wavenumbers with low
separation power (see Fig. 7). The block on the right-hand side (columns 6-9) shows the effect of the elimination of wavenumbers
with high separation power. (-: corresponding spectrum cannot be classified because the value of all used input features was equal
to zero, see Table 4, text and Eqs. (3) and (41.1
D. Wienke, G. Kateman /Chemometrics and intelligent Laboratory Systems 23 (1994) 309-329 323

to the m = 69 chosen wavenumbers. By using a the weights and a formation of five clusters was
learning rate q = 0.1 and a vigilance parameter obtained (Fig. 64. To be sure about complete
of p = 0.5 after 18 epochs a stable convergence of convergence an upper limit of at least 30 training

@lo3 34

46
-$2 0.2 ;0
‘C
9
3Y l9
22 4 *g
11 38 41
; 0.1 31 ’ 17
P)
77
263 534510 3
272g 1631058
2 15
g O
33 00 Qj

B
42 32~21351p
61
8
a -0.1

44 60 362&J7 4o 24 25
a

2 49 56 35 sfg 2150 237


g -0.2 62
37

-0.3
0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 ;7
PRINCIPAL COMPONENT 1[ 54.14% variance)
a

g3 0

7 0.2 0 0

‘L 8’ o
$
0 o 0
0 0 cl
0.1 0 OO 0
0
0 00 0
08 00
0 oo”
0
ig 0 e 0 0

& 00 ,”
X
8 x OO
U
2 -0.1 X X X
X X

xI% xx

X XX XXX
2 X x

z -0.2 X

I
-0.3
0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17
PRINCIPAL COMPONENT 1(54.14% variance]
Fig. 6. Map of the 66 experimentat FT-IR reflectance spectra onto the space of the first two principal components extracted from
the 69-dimensional space of autoscaled features. (aI Numbers correspond to the sampie numbers; (b) index o and x corresponds to
two large sample groups found by Ward’s hierarchical clustering method; (c) index l-5 corresponds to the class membership as
found by the ART-l artificial neural network.
324 D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

-0.3
0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17
PRINCIPAL COMPONENT 1 ( 54.14 % variance)
Fig. 6. (continued).

Table 7
Most important (in the IR region of 400-1200 cm-’ observable) vibrations of a ternary AlaO,-CaO-SiO, glass system taken from
refs. [49,50]
Element Wavenumber range/cm-’ Observed infrared vibration
Si 400-600 Si-0-Si, bond and rocking motions
Si-O-, nonbridged and at the end position
500 SiO,, 6 motion
790 Si-0-Si bond motion
800-1200 Si-0-Si bond motion
Si-O- bond motion
800,1100 Non-symmetrical, symmetrical bond motion
900-930 Isolated SiO, tetraeders
1050 Si-0-Si

1030 Si-O-
1050 Si-O...H
1090-1120 Si-0-Si
10.52-1111 Si-0-Si
909-970.. 1000 Si-O-

Al 1050 Si-0-Si
800 Characteristic broadening of the non-symmetrical Si-0-Si bond motion
700-740 Formation of an additional band (difficult to identity)
830 Small band if SiO, is present
1000 Intensive band if SiO, is present
620-1250 Characteristic broadening of already existing bands
980 Al-0

Ca 1050 Si-0-Si
800 Coupling of SiO- with non-symmetrical motion (enhanced total intensity)
D. Wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 325

I?
3
I

OR 0.8

._
‘2 Is
; “.6 r 0.6

0.4

0.2

0
1200 1130 1061 991 922 852 763 713 643 514 504 435
e- WA”EN”MBER /cm-’

1.2

0.8 0.8

.-

gi6

cl4 0.4

0.2

O-
1200 ,130 1061 991 922 a52 la3 713 643 574 504 435 1200 1130 1061 991 922 852 783 713 643 574 504 435
a WAVENUMBERlcrd c- WAVENUMBER/cd

0.8

2
; 0.0

0.4

0.2

0
12iN 1130 1061 991 922 852 703 713 643 574 504 435
- WAVENUMBER /cm-’

Fig. 7. Bargraph presentation of the weight vectors Wlj,,, corresponding to the i = 1 . .5 output neurons and j = 1.. .69 input
neurons. The weights were obtained after the training with the 66 experimental fl-IRR spectra and unsupervised formation of five
classes by an ART-l artificial neural network.
326 D. Wienke, G. Kateman /Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

epochs was chosen. The training run was re- dated for its potential use in chemical pattern
peated 10 times and in every case the same 5 recognition.
clusters with the same individual class member- In contrast to all other types of artificial neural
ship of all pattern vectors were obtained. Com- networks an ART neural network has a less pre-
paring Fig. 6c (index 1, 2) with Fig. 6b (index o, x> defined and less rigid structure. However, during
two similar large and located clusters along the the training the network adapts not only its
second principal component axis are observed. weights but also its entire structure to the offered
This corresponds well to the results of the classi- data. The output of the training are classes of
cal multivariate analysis. Only in region between training pattern and the corresponding class
the two clusters the ART-l neural network gen- memberships for every training pattern. This is
erates three subgroups (index 3,4,5) according to similar to a result that would be obtained from
the chosen size of 77 and p. This hints towards any classical unsupervised working pattern recog-
probably ‘mixed’ substructures in the glasses with nition technique [57-591. However, as shown in
an A14” as well as an A16+ coordination. A look the present study, the direct interpretable weight
at the weights provides more quantitative infor- vectors of a trained ART network carry sup-
mation about the main spectral properties of the pressed information about characteristic proper-
five single clusters (Fig. 7a-d). First, it is obvious ties of the classes. Based on these weights an
that the main important weights belonging to ART network can be used like a supervised work-
output neuron 1 (Fig. 7a, and Fig. 6c class index ing classifier, too. During the prediction step the
1) can be related to the optical range between network continuously keeps learning (plasticity
580 and 1050 cm-‘. This corresponds to the and adaptability) but it is able to form on-line
position of the main optical band of a-Al,O, additional class boxes if required keeping saved
(Table 7) and to the results as found in the the previously trained information (stability and
multivariate study. Class 2 (output neuron 2) in robustness). A simulation study and a processing
Fig. 6c corresponding to low bruto contents of of experimental UV-Vis and FT-IR spectra by
AI,O, shows the most important weights for the ART demonstrated that a suitable selection of
complementary optical regions of 435-580 cm-’ variables for the training vectors forces the for-
and 1080-1200 cm-* (Fig. 7b). In these regions mation of less clusters with a purer separation.
the main vibrations of Si within the glass are The same effect has been achieved by a choice of
found (Table 7). It is well known, too, that pure low values for vigilance parameter p and learning
SO, shows its two main bands in these two rate q.
optical regions and a very small band between Compared to other pattern recognition meth-
600 and 800 cm- ‘. A look at the weights of ods the use of ART based neural networks
output neuron 3 (Fig. 7c, class 3) shows four main promises the following advantages: (i) The unsu-
optical regions: 435-570 cm-‘, 600-700 cm-‘, pervised nature of the training behavior saves
800-1080 cm-’ and 1100-1200 cm-‘. This corre- development time to find a suitable structure for
sponds with the four bands as they can be found a desired supervised classifier. (ii) The process of
in the IT-IRR spectrum of pure CaO as recently competitive learning with the step of compari~n
discussed by Wienke 1471.The other two classes 4 between xi and x,? looks like the kth nearest
and 5 (output neurons 4 and 5) show ‘mixed’ neighbor clustering method (KNN) 1551. How-
structures and will not be discussed here. ever, the (n * k)/2 comparisons of the y1 input
patterns with k weight vectors of the classes with
k c n in place of [(n * n) - nl/2 pair-wise com-
parisons of all input patterns provides a signifi-
5. Conclusions cant gain in computational speed for ART com-
pared to KNN. The observed high training speed
Grossberg’s adaptive resonance theory based of less than 20 epochs for the used data sets
artificial neural network (ART-11 has been eluci- increases the gain in time and additionally
D. wienke, G. Kateman / Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 327

promises a potential on-line applicabili~ in run- [3] PC. Jurs, B.R. Kowalski and T.L. Isenhour, Computer-
ized learning machine applied to chemical problems-
ning chemical processes. (iii) One training vector
Molecular formula determination from low resolution
only is enough to initialize the formation of a mass spectrometry, Analytical Chemistry, 4lf1969) 21-26.
class box. (iv) The concept of adaptive resonance [4] T.J. Stonham, I. Aleksander, M. Camp, W.T. Pike and
makes an overtraining theoretically impossible. M.A. Shaw, Classification of mass spectra using adaptive
This is in agreement with the observations in the digital learning networks, Analytical Chemistry, 47 (1975)
1817-1822.
present work, too. (v) Significantly deviating
[5] J.U. Thomson and B. Meyer, Pattern recognition of the
training patterns (outliers) cause the formation of 1H NMR spectra of sugar alditols using a neural net-
additional class boxes. In this way the generation work, Journal of Magnetic Resonance, 84 (1989) 212-217.
of tensions or empty regions in the trained class [6] L.S. Anker and P.C. Jurs, Prediction of Cl3 NMR chemi-
boxes are avoided. cal shifts by artificial neural networks, Analytica! Chem-
&try, 64 (1992) 1157-1164.
On the other hand, the wavelength selection
[7] M. Bos, A. Bos and W.E. v.d.Linden, Processing of
experiment demonstrated some limitations of ion-selective electrode array signals by a neural network,
ART: (i) The cluster structure of the input data Analytical Chimica Acta, 233 (1990) 31-39.
forces the structure of the network. This means [8] T. Nakamoto, K. Fukunishi and T. Moriizumi, Identifica-
that noise can only partly be compensated for by tion capability of odor sensor using quartz resonator
array and neural network pattern recognition, Sensors
a simple adaptation of weights. Very noisy data
and Actuators, Bl (1990) 473-476.
or a lot of outliers cause the formation of numer- [9] SM. Chay, Y. Iwasaki, M. Suzuki, E. Tamiya, I. Karube
ous subclusters. (ii) The development of a con- and H. Maramatsu, Detection of odorants using an array
cept of individual vigilance parameters p for the of piezoelectric crystals and neural network pattern
description of the individual sizes of different recognition, Analytical Chimicu Acta, 249 (1991) 323-329.
class boxes could be useful. At this moment, for [lo] M. Bos and H.T. Weber, Comparison of the training of
neural networks for quantitative X-ray fluorescence spec-
example, ART is modeling curved hyperboxes by trometry by a genetic algorithm and backward error
a series of regular shaped hyperboxes. propagation, Analytical Chimica Acta, 247 (1991) 97-105.
However, this has been a first study about the [II] M. Glick and G.M. Hieftje, Classification of alloys with
principles of ART for chemical pattern recogni- an artificial neural network and multivariate calibration
of glow discharge emission spectra, Applied Spectroscopy,
tion demonstrated by processing UV-Vis and
45 (1991) 1706-1716.
FT-IR spectra. In the upcoming research the [12] M. Otto, T. George, C. Schierle and W. Wegscheider,
ART-2 based artificial neural network will be Fuzzy logic and neural networks - Applications to ana-
studied 1601. lytical chemistry, Pure and Applied Chem~t~, 64 (1992)
497-502.
[13] C. Schierle, M. Otto and W. Wegscheider, A neural
network approach to qualitative analysis in inductively
Acknowledgement
coupled plasma atomic emission spectroscopy (ICP-AES),
Fresenius’ Journal of Analytical Chemistry, 343 (1992)
The authors are grateful to G. Pflug and H. 561-565.
Dunken (University of Jena, Germany) for offer- [14] E.W. Robb and E.M. Munk, A neural network approach
ing the FT-IR reflectance spectra to the authors, to infrared spectrum interpretation, Microchimica Acta
The authors would like to thank both referees Wien), I(19901 131-139.
[15] B.J. Wythoff, S.P. Levine and S.A. Tomellini, Spectra
for their valuable hints and comments.
peak verification and recognition using a multilayered
neural network, Ana~ticai Chern~t~, 62 (1990) 2702-
2709.
References (161 J.R.M. Smits, P. Schoenmakers, A. Stehmann, F. Sys-
termans and G. Kateman, Interpretation of infrared
[l] G. Kateman, Evolutions in Chemometrics, Analyst, 115 spectra with modular neural network systems, Chemo-
(1990) 487-493. metrics and Intelligent Laboratory Systems, 18 (1993) 27-
[2] D. Wienke, C. Lucasius, M. Ehrlich and G. Kateman, 39.
Multicriteria optimization of analytical procedures using [17] M. Meyer and T. Weigelt, Interpretation of infrared
a genetic algorithm, Analytica Chimica Acta, 271 (1993) spectra by artificial neural networks, Analytica Chimica
253-268. Acta, 265 (1992) 183-190.
328 D. Wienke, G. Kateman /Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329

[18] K. Tanabe, T. Tamura and H. Uesaka, Neural network [35] S. Grossberg, Adaptive pattern classification and univer-
system for identification of infrared spectra, Applied sal recoding, I. Parallel development and coding of neu-
Spectroscopy, 46 (1992) 807-810. ral feature detectors, Biological Cybernetics, 23 (1976)
1191 R.J. Long, G.V. Gregoriou and J.P. Gemperline, Spec- 121-134.
troscopic calibration and quantitation using artificial neu- [36] G.A. Carpenter and S. Grossberg, ART2, self-organiza-
ral networks, Analytical Chem~t~, 62 (1990) 1791-1797. tion of stable category codes for analog input patterns,
[20] P.J. Gemperline, J.R. Long and V.G. Gregoriou, Nonlin- Applied Optics, 12 (1987) 4919-4930.
ear multivariate calibration using principal components [37] G.A. Carpenter and S. Grossberg, A massively parallel
regression and artificial neural networks, Analytical architecture for a self-organizing neural pattern recogni-
Chem~t~, 63 (1991) 2313-2323. tion machine, Comp~fer vision, Graphics and Image Pro-
[21] J.R. Lang, H.T. Mayfield, M.V. Henly and P.R. Kro- cessing, 37 (1987) 54-115.
mann, Pattern recognition of jet fuel chromatographic [38] G.A. Carpenter and S. Grossberg, The ART of adaptive
data by artificial neural network with backpropagation of pattern recognition by a self-organizing neural network,
errors, Analyttcal Chemistry, 63 (1991) 1256-1261. IEEE Computer, 21(3) (1988) 77-78.
[22] H. Lohninger, Classification of mass spectral data using [39] G.A. Carpenter and S. Grossberg, ART-3, hierarchical
neural networks, in J. Gmehling (Editor), Software Deuel- search using chemical transmitters in self-organizing pat-
opment in Chemistry, Vol. 5, Springer, Berlin, 1991. tern recognition architectures, Neural Networks, 3(2)
1231J. Zupan and J. Gasteiger, Neural networks: A new (1990) 129-152.
method for solving chemical problems or just a passing (401 S. Grossberg, Attention and recognition learning by
phase?, Analytica, Chimica Acta, 248 (1991) l-30. adaptive resonance, Behavioral and Brain Sciences, 13(2)
[24] P.A. Jansson, Neural networks: An overview, Analytical (1990) 241-241.
Chem~t~, 63 (1991) 357A-362A. 1411G.A. Carpenter, S. Grossberg and D.B. Rosen, ART-2A,
1251 M. Gross and F. Seibert, Neural network for image an adaptive resonance algorithm for rapid category Iearn-
analysis of environmental protection, in R. Denzer (Edi- ing and recognition, Neural Nefworks, 4 (1991) 493-504.
tor), tisualisierung van Umweltdaten, Springer, Berlin, 1421 G.A. Carpenter, S. Grossberg and D.B. Rosen, FUZZY-
1991. ART, fast stable learning and categorization of analog
[26] P. Arrigo, F. Giuliano, F. Scaha, A. Rapallo and G. patterns by an adaptive resonance system, Neural Net-
Damiani, Identification of a new motif on nucleic acid works, 4 (1991) 759-771.
sequence data using Kohonen’s self-organizing map, [43] L.I. Burke, Clustering characterization of adaptive reso-
Computer Applications in the Biosciences, 7 (1991) 353- nance, Neural Networks, 4 (1991) 48.5-491.
357. [44] K.W. Gan and K.T. Lua, Chinese character classification
[27] V.S. Ross, I.F. Croall and H.J.H. Maefic, An application using an adaptive resonance network, Pattern Recogni-
of unsupervised neural network methodology (Kohonen tion, 25 (1992) 877-882.
topology preserving mapping) to QSAR analysis, Quanti- (451 G.A. Carpenter, S. Grossberg and J.H. Reynolds, ART-
tative Structure- Activity Relationships, 10 (1991) 6-1.5. MAP: Supervised real-time learning and classification of
[28] W.J. Melssen, J.R.M. Smits, G.H. Rolf and G. Kateman, non-stationary data by a self-organizing neural network,
Two-dimensional mapping of IR spectra using a parallel Neural Networks, 4 (1991) 565-588.
implemented Kohonen network, Chemometrics and Intel- 1461 H.M. Koster, Die chemische Silikatanalyse - Spek-
ligent Laboratory Systems, 18 (1993) 195-204. tralphotometrische, komplexometrische und ftammenfo-
[29] P. de B. Harrington, Minimal neural networks; Differen- tometrische Analysen-Methoden, Springer, Berlin, 1979.
tiation of classification entropy, Chemometrics and Intelli- [47] D. Wienke, Application and Development of Multivariate
gent Laboratory Systems, 19 (1993) 143-154. Statistical Methods for Trace Analysis; Structure Analysis
1303C. Borggaard and H.H. Thodberg, Optimal minimal neu- and Glass Technology (Ph.D. Thesis), University of Jena,
ral interpretation of spectra, Analytical Chemistry, 64 Jena, 1990.
(1992) 545-551. [48 R. Stephanowitz, Theoretische und Experimentelle Unter-
(311 S. Grossberg, Studies of Mind and Brain, Reidel, Dor- suchungen zur Anwendung der IR Re~~~~spektroskopie
drecht, 1982. zur ~jektiven Charukte~sierung van Giasobe~~chen
[32] S. Grossberg, The Adaptive Brain I - Cognition, Learn- (Ph.D. Thesis), University of Jena, Jena, 1987.
ing, Reinforcement and Rhythm (Advances in Psychology [49] H. Dunken, Physikalische Chemie der Glasoberjlache,
Series, Vol. 421, North Hoiiand, Amsterdam, 1987. Deutscher Verlag fiir Grundstoffindustrie, Leipzig, 1981.
[33] S. Grossberg, The Adaptiue Brain II - Cognition, Leam- [SO] D.M. Sanders, W.B. Person and L.L. Hench, Quantita-
ing, Reinforcement and Rhythm (Advances in Psychology tive analysis of glass structure with the use of infrared
Series, Vol. 431, North Holland, Amsterdam, 1987. reflectance spectroscopy, Applied Spectroscopy, 28 (1974)
[34] S. Grossberg and M. Kuperstein, Neural dynamics of 247-259.
adaptive sensor-motor control, in Neural Ne~orks Re- [51] C. Moier, S. Bangert, S. Kleiman and J. Little, MATLAB
search and Applications, Pergamon Press, New York, User Guide, The MathWorks Inc., Natick, MA, 1990.
1989. [52] 11. Demuth and M. Beale, Neural Network TOOLBOX
D. Wienke, G. Kateman /Chemometrics and Intelligent Laboratory Systems 23 (1994) 309-329 329

For Use with MATLAB, User Guide, The MathWorks Einfiihrung in Methoden und Verfahren der automatischen
Inc., Natick, MA, 1992. IUassifikation, De Gruyter, Berlin, 1977.
1531 M.A. Sharaf, D.L. Illrnan and B.R. Kowalski, Chemomet- [58] H. Splth, Clusteranalyse Algorithmen zur Objektklassi-
rics, Wiley, New York, 1986. fizierung und Datenreduktion, Oldenburg, Miinchen, 1977.
1541 M. Yamane and 0. Okuyoma, Coordination number of [59] M. Goldstein and W.R. Dillon, M&variate Analysis,
aluminium ions in alkali-free aluminosilicate glasses, Methods and Applications, Wiley, New York, 1984.
Journal of Non-Crystalline Solids, 52 (1982) 217-223. [60] D. Wienke, Y. Xie and P.K. Hopke, An adaptive reso-
[Xl K. Varmuza, Pattern Recognition in Chemistry - Lec- nance theory based artificial neural network (ART-2A)
ture Notes in Chemistry, Springer, Berlin, 1978. for rapid classification of airborne particles by their SEM
1561 P.J. Rousseuw and M.A. Leroy, Robust Regression and images, Chemometrics and Intelligent Laboratory Systems,
Outlier Detection, Wiley, New York, 1987. submitted for publication..
1571 D. Steinhausen and K. Langer, Clusteranalyse -

You might also like