Professional Documents
Culture Documents
T. BLAFFERT
Phihps GmbH Forschungslaboratorlum Hamburg, Vogt-KiillnStraBe 30, D-2000
Ham burg 54 (West Germany)
(Received 20th December 1983)
SUMMARY
A new approach to the interpretation of spectra with “fuzzy sets” is described. A com-
puter program CIF (Compound Identification with Fuzzy sets) is applied. This program is
capable of finding components in a mixture by comparing the sample spectrum with
reference spectra in a library. The applications discussed involve the interpretation of
infrared spectra. The problems of spectral library search are discussed, an elementary
introduction to fuzzy set theory is given, and applications to spectral library search are
demonstrated.
spectral feature
~ feature
input + vector -+ classification + decision
extraction
data or set
I I
A
m ‘0
$05.
Fig. 3. Fuzzy set connections. Comparison of measured set of lines (A) and reference set
of lines (B) using conventional intersection. There are no lines in the intersect set. Mem-
bership of the set is indicated by MEMB.
Fig. 4. Fuzzy set connections. Comparison of a fuzzed set of measured lines (A) and a
reference set of lines (B) using the fuzzy intersection. The intersect set (C) reflects all
lines, but some of them have a degree of membership smaller than 1. The peaks in A
represent position distribution, not intensity distribution.
139
(2)
as shown in Fig. 5(C). The power of a fuzzy set A, for a finite population X,
is
NA = c fA(x) (3)
XEX
(see Fig. 5D). The last four definitions are consistent with conventional set
theory if f is limited to the dual set { O;l}.
An important relationship exists between the containment and the inter-
section: if a (reference) set A is contained in a (measured unknown) set B,
then the equation Am = A holds, and the fuzzy set power of the inter-
section set is equal to the power of A. Thus the fuzzy set power of an
intersection is a measure of containment of a (reference) set A within the
(measured) set B. The containment measure can be normalized by CAB =
NA~B/NA.
s=; S’ (4)
J=l
with fsJ(x) = 1 if 3t is the location of the jth peak but fsJ(x) = 0 otherwise.
So far, S is a conventional set (Fig. 3A). Of course, a measured spectrum
is not exact because random errors introduce uncertainty about which
features are really related to the measured values; not only a spectral line
with the exact measured value, but also lines in its neighbourhood may in
fact be the same. One solution is to define a “hard window”; this is an
interval of m points around a measured line. All lines in this interval may
then belong to the spectrum. This can be expressed by S = U:,,f?, with
fSJ(x) = 1 if 3tE[X - m, x + m] and x is the location of the jth peak, but
f,,(x) = 0 otherwise. Figure 6 demonstrates the hard window approach.
Various search programs use this hard window. The disadvantage in this
concept is that a small variation around 3c - m or x + m can make a large
difference to belonging or not belonging to a set. This disadvantage can be
avoided by using a continuous grade of membership between 0 and 1. A
reference line with exactly the same 3t as the measured line has a member-
ship value of 1. The membership value diminishes with increasing distance
from the measured line, and if the distance is very large, the membership
value becomes zero. The shape of the membership function might be Gaus-
sian or Lorentzian, but it is not necessary to choose statistical functions.
The degree of membership expresses at least a subjective estimate of uncer-
tainty, which is established by the experience of the operator. Nevertheless,
141
,‘O- C
$05 _
800 moo 1200 1wo 1600 moo
LINE POSITION
Fig. 6. Fuzzy set connections: comparison of spectra using the hard window approach.
with &J(X) = exp[-(xi- x)~/~u’] if XE [xi - cu&,xJ + a&], xj being the loca-
tion of the jth peak; otherwise fgJ(llc) = 0. This is equivalent to
f&+ max [fgJb)l (6)
J
resulting from contributions of more than one initial spectral line. In con-
trast, the “max” union operator implies a decision on which spectral line
contributes to a membership value.
Another method of determining the similarity between sample and refer-
ence spectra is to calculate the variance
S* = Z: (Xj -X,,)2/(T2 - 1) (8)
Pairs of related lines x,, xJR are needed, which have to be selected by addi-
tional rules. The problem is that the jth reference line does not necessarily
correspond to the jth line in the spectrum because missing lines in the spec-
trum, extra lines from additional components, unresolved doublets, etc.,
distort the order of lines in a list. Sets are compared rather than vectors.
with fRi(x) = 1 if x is the location of the jth line of reference K, but is other-
wise equal to zero. Here, k is a reference number in the reference library, and
nk is the number of lines in reference k. The application of the intersection
operator leads to a result set
given by the fuzzy set power (Eqn. 3). It must be assumed that the support
(spectrum range) consists only of finite elements, but this is always true in
computerized spectral analyses. In the special case of intersection (Eqn. lo),
this power is
Fig. 8. Comparison of reference spectra with a two-component unknown sample: (A) the
measured spectrum; (B) and (C) two reference sets after a fuzzy intersection.
144
L INE POSITION
a
:,
8
P
L
LINE POSITION
Fig. 9. Line comparison with fuzzy sets including line intensities. A fuzzed set (A) inter-
sected by a reference (B) yields set (C).
COMPONENT IDENTIFICATION WITH FUZZY SETS 145
Fig. 10. Comparison of themeasured set(D) with the fuzzed combination (C) of reference
spectra (A, B) by a fuzzy intersection (E).
146
TABLE 1
Score table of the fuzzy match and the component identification [ CIF Version 2.1 (IR)]
for the sample mixture of Fig. 1. All 4 components are correctly identified in the best
combination (a). In the remaining combinations (b, c, d), the hexane is replaced by an-
other similar compound
the number of fitting lines, and the 1% value gives an estimate of the relative
concentrations. The compounds belonging to the four best possible combina-
tions are marked with A, B, C, D; all compounds with an “A" belong to the
best combination, alI compounds with a “B” to the second best, etc. A list
with the combinations printed together is presented in Table 2.
CONCLUSION
TABLE 2
List of best combinations [CIF Version 2:l (IR)] for the sample mixture of Fig. 1; com-
pounds belonging to a combination are printed together
The author thanks W. J. Dallas for helpful discussions and P. Smit for
his advice on crystallographic problems and the preparation of samples for
the program tests. The author also appreciates the work and helpful com-
ments of R. Jenkins and W. N. Schreiner, which provided a basis for the CIF
148
REFERENCES