Professional Documents
Culture Documents
, 2016.
Original Russian Text © S.S. Adjemov, N.V. Klenov, M.V. Tereshonok, D.S. Chirov, 2016, published in Programmirovanie, 2016, Vol. 42, No. 3.
Abstract—In the paper, methods of classification of signal sources in cognitive radio systems that are based
on artificial neural networks are discussed. A novel method for improving noise immunity of RBF networks
is suggested. It is based on introducing an additional self-organizing layer of neurons, which ensures auto-
matic selection of variances of basis functions and a significant reduction of the network dimension. It is
shown that the use of auto-associative networks in the problem of the classification of sources of signals makes
it possible to minimize the feature space without significant deterioration of its separation properties.
DOI: 10.1134/S0361768816030026
A multilayer perceptron (MP) is most frequently where n j is the number of objects of class j in the train-
used in the identification problem. However, the use ing set and the parameter σ specifies the width of the
of this type of neural networks in problems of cognitive basis functions and their effect.
121
122 ADJEMOV et al.
Wj, σj
Output layer
Kohonen Sample
layer layer Summing
layer
Thus, an RBF network does not need learning like (SOM). As shown in [1–3], the SOM ensures efficient
that for networks with back propagation of errors, clusterization of the training set of radio signals
since all parameters of the RBF network (the number together with an estimate of the intraclass variance for
of elements and weights) are determined directly by each informative parameter. The values of the intra-
learning data. The only external parameter is the width class variance obtained by means of the SOM can be
of the basis functions σ. It can be seen from used as estimates of the basis functions width σ j .
formula (1) that the value of σ greatly affects the esti- Another advantage of the SOM is calculation of the
mate of the distribution density over the classes and, center positions for each class.
accordingly, the result of the classification. This is
most critical in the cases of low signal/noise ratios, Like in an RBF network, each neuron of the SOM
which lead to considerable fluctuations of the is an n-dimensional column vector of weight coeffi-
observed parameters of the signals. This gives rise to cients
the necessity in a method for estimating the optimal
width of the basis functions σ. W = [w1, w2, … , w n ]T , (3)
One of the basic problems associated with the where n is determined by the dimension of the original
selection of σ is that different classes of objects have, space. The SOM training is based on competition of
as a rule, different values of intraclass variance. For neurons [3, 4]. As the result of training of a one-layer
example, values of intraclass variance of parameters of SOM, N clusters are formed, each of which contains
signals of different classes of one mobile communica- signals of one class. A cluster may consist of both one
tion system measured on one and the same receiver and several neurons. Each neuron is a set of weight
may differ by more than order of magnitude. One of coefficients W j , which are averaged values of parame-
the ways to take into account the fact that statistical ters of the training set instances grouped around a
characteristics of different classes can considerably given neuron with some σj. The process of a SOM
differ is to specify different values of σ. for each class of training consists in successive correction of vectors
objects. Thus, expression (1) takes the form representing the neurons and proceeds as follows:
nj 1. One vector x from the training set is randomly
−|| x − x i || 2
g i ( x) = ∑ exp
σ 2j
, (2) selected.
i =1 2. A winner–neuron wc , is determined as a vector
that is most similar to the input vector. The similarity
where parameter σ j specifies function width for class j. is meant to be the Euclidean distance between the vec-
The value of σ j for each class should be selected by tors.
way of analysis of the training set. This goal can be 3. The SOM weights are corrected according to the
achieved by means of a self-organizing Kohonen map formula
Results of calculation of probability distribution density 1. To carry out clusterization of the training set by
Distribution Distribution means of a one-layer self-organizing Kohonen map.
Feature
Class label
value
density by density by 2. Using W j and σ j obtained, form an RBF net-
samples classes work.
А 0.25 0.043936934 0.179272 3. To classify new data using the RBF network syn-
А 0.3 0.135335283 thesized in this way.
В 0.769 0.026834954 0.213672 The proposed method makes it possible to consid-
В 0.766 0.029077227
erably improve correctness and noise immunity of sig-
nal classification and reduce size of the RBF network.
В 0.768 0.027565232
To illustrate efficiency of the proposed method, we
В 0.762 0.032315768 consider an example. Suppose that we have two classes
В 0.764 0.03065989 of signals A and B. To classify signals, we will use val-
В 0.76 0.034047455 ues of a feature x. Signals from class B are measured
В 0.761 0.033171971 more accurately than signals from class A. For the
training set, we have measured values of feature x of
signals from classes A and B, which are presented in
the table.
wi (t + 1) = wi (t ) + hci (t ) ⋅ ( x(t ) − wi (t )), (4)
We will train the RBF network on the sample pre-
where t is the epoch number. Function h(t ) is called sented in the table (σ = 0.02 ) and consider network
neuron neighboring function. It can be divided into a operation when classifying a signal with the feature
distance function and training rate function and is value x = 0.5. The third and fourth columns show
defined as follows: results of calculation of probability distribution of each
instance and classes A and B for x = 0.5.
h(t ) = h(|| rc − ri ||, t ) ⋅ α(t ), (5)
From the data presented in the table, it can be seen
where r determines neuron position in the grid, h(d , t ) = that the signal corresponding to the value x = 0.5 will
d2 be classified among the class B, since it has greater
2σ 2 (t )
e is the distance function, δ(t ) is the correction probability density in the given region, although, in
radius, d = || rc − ri || is the distance between the win- terms of distance, the point x = 0.5 is closer to class A
(see Fig. 1). This is due to nonuniformity of the net-
ner–neuron and the ith neuron, α(t ) = A is the work training set, which has much more instances of
t+B class B than those of class A. Moreover, due to higher
rate function, and A and B are rate constants. At the
first stage, quite large values of the rate parameters A, accuracy of measurements, instances of class B are
located near the class centroid more densely than
B and the radius δ(t ), are selected, which allows us to
those of class A.
position the weight vectors in accordance with the dis-
tribution of signals in the sample being analyzed. As a result of clusterization of the training set pre-
Then, fine weight adjustment is carried out. As a sented in the table, three clusters with the centroids
result, we obtain rate parameters that are much 0.25(A), 0.3(A) и 0.764(B) and the corresponding
smaller than the initial ones. variances were obtained by using a SOM. Training of
the RBF network by these values with the classifica-
If the values W j and σ j obtained after the SOM tion of the point x = 0.5 yielded the following values of
training are used in the construction of an RBF net- the probability density at the point (see Fig. 1):
work, then the neuron network obtained will possess 0.179272 for class A and 0.030429 for class B. Thus,
the following advantages compared to the standard both probabilistic and metric estimates yielded identi-
one [3, 4]: cal results.
• lesser dimension;
• higher noise immunity of the classification 3. REDUCTION OF THE FEATURE SPACE
owing to individual width of basis functions for each DIMENSION IN THE SIGNAL SOURCE
neuron. IDENTIFICATION PROBLEMS WITH THE
Structure of a self-organizing neuron network on USE OF ARTIFICIAL NEURAL NETWORKS
radial basis functions is presented in Fig. 1.
An important specific feature of the RBF networks
Thus, the following classification method is sug- is that their size (and computational complexity) lin-
gested for classification–recognition of signals with early depends on the size of the input data (features of
optimal (in terms of noise immunity) selection of radio signal sources being recognized) and the number
width σ j of potential functions and minimization of of samples in the training set. Reduction of the num-
the RBF network size. ber of samples reduces recognition accuracy; there-
fore, the only way to improve the training is to reduce Note that the recovered value fˆ j( x) should be as close
the dimension of the feature space. to the initial one as possible for the selected value m:
Let us turn to the definition and optimization of
the feature space (the so-called feature vocabulary) for || ZU − F || 2 ⎯⎯ → min .
Z ,U
(8)
the identification of signal sources. Such a vocabulary
should include features the a priori information on In what follows, we consider a vector version of the
which required for the description of classes in the lan- algorithm:
guage of these features can be obtained. Note that
some of these features should not be included in the F̂ is a centered column vector F̂ = F − MF of fea-
vocabulary in view of their low informativeness, tures F = ( f1, f 2, … , f k ), which is governed by a linear
whereas other features, as a rule, most informative model F̂ = AZ ;
ones, cannot be defined in a sufficiently fast, cheap, Z is a column vector of uncorrelated principal
and qualitative way on a modern technological level of
development of information receipt and processing components z j , where j = 1, … , k ;
systems. A is a matrix of loads of features f i on components
There are many methods for selecting a feature set r j (i = 1, … , k, j = 1, … , k ) of the form
under constraints on the cost of their implementation
and methods for comparative feature estimate: com- A = (aij ) ∈ » k ×k . (9)
plete search method, method of successive addition
and removal of features (or their combinations); Let us denote the covariance matrix of the column
branch and bound algorithm, genetic methods, ran- vector of features F as Σ = M (FF
ˆ ˆ T ). Since this matrix
dom (with or without adaptation) search, and cluster is symmetric and nonnegative definite, it has k real
analysis (hierarchical or SOM-based) methods. When nonnegative eigenvalues λ, and the matrix of the
using any method from the above list, some original eigenvalues can be written as
features are completely ignored.
⎛λ 0 … 0 ⎞⎟
This problem can be solved by means of methods ⎜ 1
⎜
for synthesizing a feature set at the expense of reduc- ⎜ 0 λ2 … 0 ⎟⎟
tion of the feature space dimension without significant ∧= ⎜ ⎟, λ1 > λ 2 > … > λ k . (10)
⎜ ⎟⎟
loss in the quality of solution of the classification and ⎜
identification problem. In the framework of this ⎜⎜
⎝ 0 0 0 λ k ⎟⎟⎠
approach, a set of m features is constructed with the
use of all n initial features (clearly, m < n). Note that the eigenvectors v j = (ν1 j , ν 2 j , … , ν kj )T , of
In this work, we will use the principle component the matrix Σ, corresponding to the eigenvalues λ j ,
analysis method for searching the minimal number of compose matrix of eigenvectors V = (v1, v 2, … , v k ) .
new features from which the initial features can be
restored with insignificant errors by means of a linear Now, let us turn to searching an optimal transfor-
transformation [5, 6]. mation. Let Zˆ = V T Fˆ , MZˆ = M (V T Fˆ ) = V T MFˆ = O,
We assume that the above-mentioned set of n fea- where O = (0, 0, … , 0)T , Then, it follows that Ẑ is a
tures of a signal x i provides full description of the sig- centered vector with the covariance matrix
nal: x i = { f1( x i ), f 2( x i ), … , f n( x i )}, where i = 1, … , l , and
l is the number of signals. Then, the entire set of signals ˆ ˆ T ) = M (V T FF
M (ZZ ˆ ˆ TV ) = V T M (FF
ˆ ˆ T )V
can be represented as a matrix F of the form ⎛λ 0 … 0 ⎞⎟
⎜ 1
⎜ ⎟
⎧ f1( x1) … f n( x1)⎫ ⎜ 0 λ2 … 0 ⎟ (11)
⎪ ⎪ = V T ΣV = ⎜ ⎟,
Fl ×n = ⎨ … … … ⎬ . (6) ⎜ ⎟⎟
⎜
⎪ f1( x1) … f n( x1)⎪ ⎜⎜ 0 0 … λ ⎟⎟
⎩ ⎭ ⎝ k⎠
∑z ( x)u
(13)
fˆ j( x) = s sj , j = 1, … , n.
s =1 from which it follows that
x1
yi
xd
x1 ~x
1
xd ~
xd
1 k
Fˆ = VZˆ = V Λ 2 Z . (14)
∑λ i
1 σ= i = k ' +1
.
Thus, matrix A can be defined as A = V Λ . 2 k
~
X Y X
J3
Input
J1 J2
IB Vout
Fig. 4. Structure of an associative neural network. The insertion shows an example of energy-efficient implementation of a neuron
based on superconductive Josephson’s (elements J1, J2, J3) technologies.
stored and used for solving the dimension reduction values of the inputs by a given output. Such a training,
problem discussed. thus, tries to improve sensitivity of the only output–
Consider the application of the approach described indicator to multidimensional information as much as
to training m linear neurons: possible and provides us with an example of optimal
information compression. In a similar way, one can
d
easily obtain the Oja rule for a one-layer network
yi = ∑u x , ij j (16) equivalent to a bottlenecked network of hidden linear
j =1 neurons shown in Fig. 3 that was trained to output val-
ues of its inputs and contains
where the amplitudes of the output neurons play role
of the set of independent indicators that give as much ⎛ ⎞
T ⎜
∑y
⎟
information about the multidimensional input of the Δ wij = η yi ( x j − x j ) = η yi ⎜ x − k w kj ⎟ .
T T T ~T T T
(21)
network as possible. To obtain several meaningful fea- ⎜⎜ ⎟⎟
⎝ k ⎠
tures on the output of the network, the original train-
ing rule should be modified by way of including inter- The second part of the considered auto-associative
action between the neurons. Suppose that a neuron network—decoder—relies on only encoded informa-
tries to recover its inputs given an amplitude of its out- tion in the bottleneck of the network, which can be
put. According to the Hebb rule, which is widely used seen from Fig. 4. The quality of data reproduction by
in training, the change in the neuron weights upon given encoded representation is usually evaluated by
supplying an example is proportional to its inputs and means of conditional entropy H ( x| y) , where minimi-
output (in the index and vector form, respectively): zation of uncertainty is equivalent to maximization of
encoding entropy:
Δ wTj = η yT x Tj , Δ wT = η yT x T . (17)
min H ( x| y)
If the training is formulated as an optimization prob- (22)
= min { H ( x| y) − H ( y)} = max H ( y).
lem, one can see that the neuron trained by the Hebb
rule tries to increase the amplitude of its output: Indeed, encoding does not introduce additional
uncertainty, so that the total entropy of the inputs and
Δ w = −η ∂ E , (18) their encoded representation is equal to the entropy of
∂w the inputs themselves, H ( x, y) = H ( x) + H ( y| x) =
H ( x) , and, hence, does not depend on the network
E (w, x a ) = − 1 〈(w ⋅ x) 2 〉 = − 1 y 2, (19) parameters.
2 2
To evaluate promises of the MPC use for optimiza-
where averaging is carried out over the training set tion of the feature space in cognitive radio systems, we
{ x a }. It has been shown in [3] that training by the Hebb considered a sample of 64 signals from 10 model
rule does not result in unlimited increase in the weight sources with the following set of features: preamble
amplitudes upon adding a term of the form duration, preamble type, duration of squelch termina-
tion, type of squelch termination, and signal modula-
Δ wTj = η yT ( x Tj − y T w j ). (20) tion mode. Figure 5 shows results of training by these
features of an auto-associative neural network con-
The training modified in this way, which is called the taining in its layers 5, 10, 3, 10, and 5 neurons with sig-
Oja training rule for one neuron, teaches to recover moid activation function. The network was trained by
ACKNOWLEDGMENTS
the Levenberg–Marquardt method, and the learning
criterion was the mean square error (MSE). As can be This work was supported by the Ministry of Educa-
seen from the plot presented, after 500 cycles of train- tion and Science of the Russian Federation, agree-
ing, the MSE of data on the network output dimin- ment no. 14.604.21.0011 (RFMEFI60414X0011).
ished to 10 −4.
This result allows us to conclude that, in the case REFERENCES
under consideration, it was possible to reduce the 1. Adzhemov, S.S., Vinogradov, A.N., Lebedev, A.N.,
dimension of the feature space from 5 to 3. Separating Tereshonok, M.V., Makarenkov, S.A., and Chirov, D.S.,
properties of the synthesized features have been suc- Metody intellektual’nogo analiza slabostrukturirovan-
cessfully (different classes of signals were not mixed) nykh dannykh i upravleniya kompleksami monitoringa
verified by means of self-organizing Kohonen maps (Methods of Intelligent Analysis of Weakly Structured
Data and Control of Monitoring Complexes), Moscow:
[3, 8]. Insvyaz’izdat, 2009.
2. Tereshonok, M.V., Classification and recognition of
4. CONCLUSIONS radio communication system signals by means of self-
organizing Kohonen maps with various topologies of
The proposed method of classification of sources output layer and training algorithms, Elektrosvyaz,
of radio signals with the use of a self-organizing neural 2008, no. 6, pp. 28–36.
network on radial basis functions improves noise 3. Khaikin, S., Neironnye seti: polnyi kurs (Neural Net-
immunity and correctness of classification. The works: Complete Discussion), Moscow: Vil’yams,
improvement of noise immunity is achieved owing to 2006.
preliminary estimation of basis function width of the 4. Callan, R., The Essence of Neural Networks, Prentice
RBF network and construction of reference feature Hall, 1998.
vectors of sources of various classes of radio signals 5. Kalinina, V.N. and Solov’ev, V.I., Vvedenie v mno-
with the use of a self-organizing Kohonen map. Effi- gomernyi statisticheskii analiz. Uchebnoe posobie (Intro-
ciency of the method is substantiated by model and duction to Multidimensional Statistical Analysis. Text-
real data. book), Moscow: GUU, 2003.
Analysis of possibilities of artificial neural networks 6. Aivazyan, S.A., Bukhshtaber, V.M., Enyukov, I.S., and
showed that auto-associative neural networks, as a Meshalkin, L.D., Prikladnaya statistika: klassifikatsiya i
snizhenie razmernosti (Applied Statistics: Classification
nonlinear implementation of the principal component and Dimension Reduction), Moscow: Finansy i statis-
analysis method, allow the feature space to be mini- tika, 1989.
mized without significant deterioration of its separat-
7. Lebedev, A.N., Tereshonok, M.V., and Chirov, D.S.,
ing properties. Owing to the reduction of the feature Methods for evaluating informativeness of radio signal
space at the stage of preliminary processing of data parameters for classification of radiating objects, in
arriving to cognitive radio systems, it becomes possible Trudy MTUSI (Proceedings of MTUSI), Moscow,
to reduce computational capacities at the stages of 2008, pp. 49–54.
8. Adzhemov, S.S., Vinogradov, A.N., Lebedev, A.N., 11. Gupta, D., Filippov, T.V., Kirichenko, A.F., Kiriche-
Tereshonok, M.V., and Chirov, D.S., Intelligent data nko, D.E., Vernik, I.V., Sahu, A., Sarwana, S., Shev-
analysis, Certificate of official registration of a software chenko, P., Talalaevskii, A., and Mukhanov, O.A.,
program no. 2007612101, 2007. Digital channelizing radio frequency receiver, IEEE
9. Chiarello, F., Carelli, P., Castellano, M.G., and Torri- Trans. Appl. Supercond., 2007, vol. 17, no. 2, pp. 430–
oli, G., Artificial neural network based on SQUIDs: 437.
Demonstration of network training and operation. 12. Sarwana, S., Kirichenko, D.E., Dotsenko, V.V., Kiri-
Supercond, Sci. Technol., 2013, vol. 26. chenko, A.F., Kaplan, S.B., and Gupta, D., Multi-
10. Onomi, T. and Nakajima, K., An improved supercon- band digital-rf receiver, IEEE Trans. Appl. Supercond.,
ducting neural circuit and its application for a neural 2011, vol. 21, no. 3, pp. 677–680.
network solving a combinatorial optimization problem,
J. Phys.: Conf. Series, vol. 507, part 4. Translated by A. Pesterev