You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/344903867

D Convolutional Neural Networks Versus Automatic Classifiers for Known LPI


Radar Signals Under White Gaussian Noise

Article in IEEE Access · October 2020


DOI: 10.1109/ACCESS.2020.3027472

CITATIONS READS

0 11

2 authors, including:

Serkan Kiranyaz
Qatar University
222 PUBLICATIONS 3,910 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Metamaterial Inspired Structures (Metastructures) for Vibration Suppression View project

Structural Health Monitoring, Structural Damage Detection, Localization and Quantification View project

All content following this page was uploaded by Serkan Kiranyaz on 27 October 2020.

The user has requested enhancement of the downloaded file.


Received September 11, 2020, accepted September 24, 2020, date of publication September 28, 2020,
date of current version October 13, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3027472

1D Convolutional Neural Networks Versus


Automatic Classifiers for Known LPI Radar
Signals Under White Gaussian Noise
ALPER YILDIRIM 1 , (Member, IEEE),
AND SERKAN KIRANYAZ 2 , (Senior Member, IEEE)
1 Bilgem İltaren, Tübitak, 06800 Ankara, Turkey
2 Department of Electrical Engineering, Qatar University, Doha 2713, Qatar
Corresponding author: Alper Yildirim (alperyildirim74@yahoo.com)
This work was supported by the Qatar National Research Fund (QNRF) through the ongoing project under Grant NPRP11S-0108-180228.

ABSTRACT In this study we analyze the signal classification performances of various classifiers for
deterministic signals under the additive White Gaussian Noise (WGN) in a wide range of signal to noise
ratio (SNR) levels (−40dB to +20dB). The traditional electronic support measure (ESM) systems require
high SNR for radar signal classification. LPI (low probability of intercept) radar signals that are received
by ESM systems are usually corrupted by noise. So, we demonstrate through extensive simulations that
it is possible to achieve high classification performance at low SNR levels providing that the underlying
radar signals are known in advance. MF bank classifier, 1D Convolutional Neural Networks (CNNs) and the
minimum distance classifier using spectral-domain features (the skewness, the kurtosis, and the energy of
the dominant frequency) have been derived for the radar signal classification and their performances have
been compared with each other and with the optimal classifier.

INDEX TERMS Classification, convolutional Neural Networks, radar signal processing, low probability of
intercept radar, electronic support measures, matched filter, spectral moments, white Gaussian noise.

I. INTRODUCTION ESM system knows what to receive, it may operate at a very


Receiver sensitivities of the electronic support mea- low signal to noise ratio (SNR) levels.
sure (ESM) systems are often insufficient to detect and In [9], the problem of extracting one out of a finite number
classify low power signals. Although the channelized receiver of possible signals of the known form given observations in an
structures ([1]–[5]) increase the signal processing gain, ESM additive noise model was studied. Two approaches come for-
systems require a clean signal, i.e. a signal with high SNR. ward: 1) the signal with the shortest distance to the observed
LPI radar signals that are received by ESM systems are data, 2) the signal having a maximal correlation with the
usually corrupted by noise; therefore, detection, feature observed data. The authors analyzed the detector error prob-
extraction, and classification of such signals are not possible abilities for various signal dimensions, number of signals,
from safe distances [6]. Traditional ESM systems update the noise distributions, and signal lengths. In [10], telecommu-
active threat tables by extracting features, i.e. pulse descriptor nication signals with 9 different (known) modulations are
words (PDWs), from the radar signal and classify the threats classified with 6 different classifiers, where the signals are
comparing with the data available in the emitter mission corrupted by additive White Gaussian Noise (WGN) and the
library ([7] and [8]). If no clear signal is received (i.e. the classifier performances are compared for several SNR levels
SNR required for a fair parameter estimation is more than in between −20 to 20 dB. This study has revealed the fact
18 dB [7]), it is neither possible to extract the PDWs and nor that telecommunication signals corrupted by severe WGN
to classify the threats. The signal types used in radars have a and thus being on the border of not being distinguishable
simpler structure than those used in telecommunications, and can still be classified with reasonable accuracy and therefore,
these signals are generally defined in threat databases. If an the classification errors are especially important to know in
this case.
The associate editor coordinating the review of this manuscript and Accordingly in this study, we aim to evaluate various
approving it for publication was Chengpeng Hao . classification methods that can be used for detection and

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
180534 VOLUME 8, 2020
A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

classification of known radar-signals corrupted by severe earlier approaches to the signal classification problem in
WGN. The simplest solution to detect a known determin- ESM systems.
istic signal with additive WGN is to use the matched filter
(MF) ([9] and [11]). Therefore, our first approach is to use A. OPTIMAL SIGNAL CLASSIFICATION IN WGN
an MF bank structure for signal classification in the ESM Radars of interest to ESM systems use specific signal forms
receiver. MF’s are tolerant to signal time shifts; however, which can be used to identify not only the presence of radar
MF bank requires high computational power, whose com- but also its type. Let us define a multiple hypothesis testing
plexity is proportional by the increasing number of signals as given in (1), where Hi , i = 0, 1, . . . , M − 1, denotes
that are needed to be classified. However, in recent years, the hypothesis that the signal R[n], n = 1, 2, . . . , L, is a
1D CNNs have achieved the state-of-the-art performance noise-corrupted version of the ith deterministic signal si [n],
levels in many applications such as personalized biomedi- selected from a possible set of M deterministic signals:
cal data classification and early diagnosis [12]–[15], struc-
tural health monitoring [16]–[20], anomaly detection and Hi : R[n] = si [n] + W [n] (1)
identification in power electronics and motor-fault detec-
tion [21]–[26]. Recently conventional, 2D, and deep CNNs where W [n] denotes independent, zero-mean, Gaussian ran-
have been applied to radar signal classification [27]–[32]. dom variables with variance σ 2 .
However, they are very complex and data-hungry models, In this case, each signal, si [n], and the corresponding
which require a certain parallelized computing environment hypothesis Hi , would correspond to the presence of a partic-
and also 1D to 2D transformation of the radar signal. More- ular type of radar. Thus, our task is to decide in favor of one
over, when the data is scarce, in order to prevent over-fitting of the hypotheses, given a set of measurements r[n] of R[n].
and poor generalization deep (2D) CNNs usually require As derived in [37] for minimum error probability, the required
certain techniques such as data augmentation and Dropout, test involves a comparison of the quantities in (2).
which not only increases the computational complexity and L
! !
X E i
training, they may not address these issues completely. There- H ∗ = max r [n] si [n] − + σ 2 lnP (Hi ) ,
fore, in this study, we propose to use compact 1D CNNs that i 2
n=1
can directly be applied to the received data signal without i = 0, 1, . . . ,M − 1 (2)
any pre-processing and they allow real-time implementation
even on low-power devices due to their elegant computa- where Ei denotes the energy of the ith signal and H ∗ is the
tional efficiency [33], [34]. Our third approach is based on hypothesis selected. Since it may not be possible to know the
spectral-domain features of the radar signals received by signal probabilities in advance, assume that the hypotheses
the ESM system. Spectral moments (skewness and kurtosis) are equally likely a priori, so that lnP (Hi ) = 0. Under the
based on Discrete Fourier Transform (DFT) spectra have assumption that the signals have equal energies, the above
been used with some success to classify signals in many comparison reduces to deciding in favor of the signal with
applications ([35] and [36]). Overall, this study aims to per- the highest deterministic correlation,
form comparative evaluations of major classifiers for known XL
LPI radar signals in a wide range of SNR levels. More- r[n]si [n]. (3)
n=1
over, this is the first time that the proposed compact and
Note that, in order to use the optimal classifier as in (3), it will
adaptive 1D CNNs are used for LPI Radar Signal classifi-
be necessary to equalize the energies of the signals (si [n]) in
cation and compared against those conventional (automatic)
advance.
classifiers through a detailed set of experiments. Finally,
this is the first study in the literature that evaluates the
B. MF BANK (MFB) CLASSIFIER
classifier performances starting from very low SNR levels,
i.e. −40 dB. A filter whose impulse response h[n] = Cs[n0 − n] is called
The rest of the paper is organized as follows. In Section 2, ‘‘the matched filter’’ to the signal s[n] ([38], [39] and [40]).
the optimal and the MF bank classifier definitions are pre- With h[n] as the impulse response and r[n] as the input,
sented. In Section 3, an overview of the 1-D CNN classifier is the output y[n] will be the convolution sum for a linear time
presented. In Section 4, classification using spectral-domain invariant (LTI) system
features is detailed. In Section 5, the classifier performances X∞
y[n] = r[k]h[n − k]. (4)
are evaluated by an extensive set of simulations. Finally, k=−∞
Section 6 concludes the paper. Assuming coefficient C = 1 and put h[n] = s[n0 − n] in (4).
X∞
II. SIGNAL CLASSIFICATION WITH WGN y[n] = r[k]s[n0 − n + k]. (5)
k=−∞
The optimal classification of deterministic signals with WGN
has been widely studied ([11] and [40]). In literature, thermal For r[n] = 0 except
P for 1 ≤ n ≤ L, the filter output at
noise is usually modeled as WGN. If we assume that LPI n = n0 is y[n] = Lk=1 r[k]s[k], which is equal to the optimal
radar signal is corrupted with additive WGN, we can adapt classifier given in (3).

VOLUME 8, 2020 180535


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

The classifier expressed in (3) is easy to implement; how-


ever, it is not suitable for real-life problems. It may not be pos-
sible to detect the time of arrival of the received signal (r[n])
accurately. In other words, one cannot know the appropriate
time, i.e. n0 , to sample the output signal (y[n]). Since the
radar signals are repetitive one can buffer the shifted versions
of the noise added si ’s, i.e. r[n], in ESM receiver. In this
case, the general form of the matched filter and then the peak
detection method can resolve this issue. As stated earlier,
MF is time-shift tolerant. The general form of MF is given
in (6).
XL
yi [n] = r[k]si [k − n]. (6)
k=1 FIGURE 1. The 1D CNN configuration with 4 CNN and 2 dense (MLP)
layers is used in this study. The kernel size is 21 and the down-sampling
We can first determine the maximum values (peaks) in signals factor ss = 6.
(yi [n]’s) as,
αi = max{ yi [n] : n = 1, .., 2L − 1}, (7) 16 × 233, etc. over each layer. The input segment size is,
and then we can obtain the maximum of αi ’s as expressed 4 × 2048 = 8,192 samples. When 1D convolution is per-
below: formed with a 1D kernel of 1×21, the input-map size will be:
8,192 – 21 +1 = 8,172. After the 6x sub-sampling (pooling),
αmax = max{ αi : i = 0..M − 1}, (8) the output map size will, therefore, be, 8,172 / 6 = 1362 and
where the ith signal, i.e. si , which the αmax is achieved will be so on. All network parameters were initialized randomly with
the output of the classifier. Uniform distribution, i.e., ∼U(-0.1, 0.1). The details of both
1D forward propagation (1D-FP) and Back-Propagation (BP)
III. 1D CONVOLUTIONAL NEURAL NETWORKS are detailed in Appendix A. For further details, the Readers
Similar to their conventional 2D counterparts, in 1D CNNs are encouraged to refer to the comprehensive survey in [33]
the input layer is a passive layer where the raw 1D signal is and [34].
fed to, while the output layer is a dense layer with the number
of neurons equal to the number of classes. The dense layers IV. CLASSIFICATION USING SPECTRAL DOMAIN
are identical to Multi-Layer Perceptrons and thus sometimes FEATURES
called ‘‘MLP layers’’. A 1D-CNN configuration can be set The Discrete Fourier Transform (DFT) is a fundamental
a priori by the following hyper-parameters: transformation, which is used to perform spectral analysis in
many practical applications. The skewness, the kurtosis, and
1) The number of hidden CNN and MLP layers/neurons.
the energy of dominant frequency are some of the features
2) The kernel size in each CNN layer.
that can be extracted from the spectrum of a noisy radar
3) Pooling (down-sampling) factor in each CNN layer.
signal.
4) The pooling and activation functions.
As shown in FIGURE 1, the 1D CNN configuration with A. THE DISCRETE FOURIER TRANSFORM
four CNN and two dense (MLP) layers are used in this
The N-points DFT transforms a signal window of N samples
study where the kernel size is 21 and the down-sampling
{x[n]} := x[0], x[1], . . . , x[N − 1] into the spectral domain
factor is 6 in each CNN layer. The main difference between
of N complex numbers known as DFT coefficients, {X [k]} :=
1D and 2D CNNs, is that the 1D arrays are replaced by
X [0], X [1], . . . , X [N − 1]. DFT approximates the Discrete
the 2D matrices for both kernels and feature maps. This
time Fourier Transform with a finite frequency resolution and
is the major advantage of 1D CNNs which can also result
is expressed as [41],
in a low computational complexity since the only opera-
tion with a significant cost is a sequence of 1D convolu- N −1
X j2π
tions which are simply linear weighted sums of two 1D X [k] = x [n] e− N kn
0 ≤ k ≤ N − 1, (9)
arrays. Such a linear operation during the Forward and n=0
Back-Propagation operations can effectively be executed in where N is the DFT size.
parallel. In either variant, the convolutional (CNN) layers We have used the N-point window of a real radar signal, i.e.
process the raw data and ‘‘learn to extract’’ such features r[n], as the input to DFT and compute the magnitude spec-
that are used in the classification task performed by dense trum. Since the DFT of a real signal is conjugate symmetric
layers. As a result, both feature extraction and classification the kth bin in the DFT of a real-valued sequence has the same
operations are fused into one process that can be optimized to magnitude as the (N-k)th bin. To avoid redundancy, we can use
maximize the classification performance. In the figure, each the first (N /2) DFT magnitudes, |Xk | , k = 0, 1, 2, . . . , N2 −1
CNN layer has 16 neurons, which is shown as 16 × 1362, for feature extraction in the spectral domain.

180536 VOLUME 8, 2020


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

B. FEATURE EXTRACTION where the ith signal, i.e. si , which the i∗ is achieved is the
1) SKEWNESS decision of our classifier.
In Probability Theory and Statistics, skewness (the third
standardized moment) is a measure of the asymmetry of V. SIMULATION RESULTS
the probability distribution of a real-valued random variable The radar signals [44] used in simulations are presented in
about its mean. The skewness of a distribution is defined as, TABLE 1.
"  # In the simulations, 1024 frames from each signal are gener-
X −µ 3 E[(X − µ)3 ] ated each of which is corrupted by additive WGN according
Skew [X ] = E = , (10)
σ (E[(X − µ)2 ])3/2 to 61 SNR levels (-40 dB to 20 dB with a step size of 1 dB).
Therefore, the total of 1024 × 8 signals (r[n]’s) are classi-
where µ is the mean of X, σ is the standard deviation of X,
fied at each SNR level and the corresponding classification
E is the expectation operator.
accuracies (Pc’s) are computed. The results are illustrated in
The skewness, sk, of the signal, |Xk |, can be approximated
FIGURE 2.
as,
For CNN classifier 1024 frames are used for both training
1 PN /2−1
N /2 k=0 (|Xk | − µX )3 and testing. For classification, we have combined 4 frames
sk = q 3 , (11) each of which has 2048 samples into a segment of 8192 sam-
1 PN /2−1
N /2 k=0 (|Xk | − µX ) 2 ples. We used a compact 1D CNN in all experiments with
only four hidden CNN layers and one hidden MLP layers,
where µX is the sample mean of |Xk | . in order to achieve the utmost computational efficiency for
both training and particularly for real-time radar signal clas-
2) KURTOSIS sification. The 1D CNN used in all experiments has only
In Probability Theory and Statistics, kurtosis (the fourth stan- 16 neurons in all (hidden) CNN and MLP layers. The out-
dardized moment) is a measure of the ‘‘tailedness’’ of the put (MLP) layer size is 8 since this is an 8-class problem.
probability distribution of a real-valued random variable. The The kernel size of the CNN is 21 and the sub-sampling
kurtosis of a distribution is defined as, factor is 6. Accordingly, the sub-sampling factor for the last
CNN-layer is adaptively set to 13. We used the Stochastic
"  #
X −µ 4 E[(X − µ)4 ]
Kurt [X ] = E = , (12) Gradient Descent optimization method for Back-Propagation
σ (E[(X − µ)]2 )2
(BP) training with the learning factor, ε = 10−3 . We have
where µ is the mean of X, σ is the standard deviation of X, performed 200 epochs in a BP run and 5 individual BP runs
E is the expectation operator. for each noise level. The best 1D CNN configuration that
The kurtosis, ku, of the signal, |Xk |, can be approximated reached the lowest classification error has been selected for
as, test-set evaluation.
1 PN /2−1
For illustration purposes, TABLE 2 presents some noise
N /2 k=0 (|Xk | − µX )4 corrupted signals and their matched filter outputs for several
ku =  2 . (13)
SNR levels. The second column in TABLE 2 illustrates noise
1 PN /2−1
N /2 k=0 (|X k | − µX )2
added s5 , i.e. r[n], the third and last columns illustrate the
filter outputs (6) where the impulse responses are equal to
3) ENERGY OF THE DOMINANT FREQUENCY s5 [−n] and s6 [−n], respectively. According to the sample MF
The energy of the dominant frequency (fe) in the signal |Xk | outputs given in TABLE 2, at SNR = −30 dB, the MFB
calculated as, will misclassify the signal, since the maximum value of the
fe = (max{ |X k | : k = 0..N /2 − 1})2 . (14) signal |y6 [n] | is higher than the maximum value of the signal
|y5 [n] |. At SNR = −10 dB and 10dB, the maximum value
C. THE MINIMUM DISTANCE CLASSIFIER of the signal |y5 [n] | is higher than the maximum value of
The minimum distance (MD) classifier ([42] and [43]), where the signal |y6 [n] |. Obviously, the other six MFB outputs are
the distance is defined as an index of similarity so that the also computed to accomplish the final classification of the
minimum distance is identical to the maximum similarity, signal.
is used to decide which class the received signal belongs to. The computational complexity of the classifiers is pre-
Each particular feature given in Section 3.2 is extracted sented in TABLE 3, where M = 8 is the number of signals
from the received signal, i.e. r[n], and from each of the to classify and N = 2048 is the frame size.
reference signals, si [n] , as expressed in (1). For each feature, For the computational complexity analysis of 1D CNNs the
the distances between the value obtained for r[n] and the total number of operations at each 1D CNN layer (ignoring
reference values (values obtained for the same feature from the sub-sampling) is first computed and then cumulate them
si [n]’s) are computed. The minimum distance is obtained as to find the overall computational complexity. During each
expressed below: FP, at a CNN layer, l, the number of connections to the
previous layer is, N l−1 N l . the number of connections to
i∗ = arg min{ di = |Fr −F si | : i = 0..M − 1}, (15) the previous layer is, N l−1 N l , there is an individual linear

VOLUME 8, 2020 180537


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

TABLE 1. The radar signals.

180538 VOLUME 8, 2020


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

TABLE 1. (Continued.) The radar signals.

TABLE 2. Some sample MF outputs at different snr levels.

convolution is performed, which is a linear weighted sum. Let the boundary conditions, a linear convolution consists of
2
sl l−1 and wl l−1 and be the vector sizes of the previous layer sl l−1 wl l−1 multiplications and sl l−1 additions from a sin-
output, sl−1 l−1
k , and kernel (weight), wki , respectively. Ignoring gle connection. Ignoring the bias addition, the total number

VOLUME 8, 2020 180539


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

VI. CONCLUSION
In this paper, we evaluated different radar signal classifiers for
ESM systems. The classifier performances were compared
with each other, starting from very low SNR levels, i.e.
−40 dB. The experimental results have shown that when the
signals are assumed to be known, it is possible to achieve a
high classification performance even at very low SNR levels
by the optimal classifier (OPT). Among all the unsupervised
classifiers MFB is robust to signal time shifts. The results
have confirmed that the performance of the MFB is better
than the minimum distance classifier using spectral-domain
features, however, the computational complexity of MFB is
the highest. The performances and the computational com-
plexities of MD-SKW and MD-KUR classifiers are close to
each other. The performance of MD-FRE classifier is better
FIGURE 2. The classification accuracy plots at each SNR level. than MD-SKW and MD-KUR classifiers at SNR levels from
−20 dB to 1 dB, but when SNR >1 dB, its performance
gain diminishes. The MD-FRE classifier requires the least
TABLE 3. The computational complexity.
computational power and thus it can be a convenient choice
for low-power applications at the SNR levels from −20 dB to
1 dB. We can conclude that a proper classifier with the least
computational complexity can be used when the SNR level
is assumed to be known or approximated, i.e., KUR or SKW
can conveniently be used when SNR >0 dB. On the other
hand, under very low SNR levels (SNR < −15 dB), MFB is
the best choice among all.
Finally, in this study, compact 1D CNNs have been pro-
posed the first time for the radar signal classification and the
results have demonstrated that they can achieve a compatible
performance level along with the MFB classifiers with a
slight margin. Their unique capability is to learn directly from
the raw signal that avoids the need of designing handcrafted
pre-processing and feature extraction in advance. Moreover,
they exhibit significant computational and feasibility advan-
tages over their deep (2D) counterparts. Thus, especially
when the data is scarce and the number of radar classes
increases, they can serve as a viable option for this problem
even at very low SNR levels (e.g. −16 dB). In the future,
we shall investigate different CNN architectures and models
of multiplications and additions in the layer l will, therefore, to further improve the performance so that high classification
be: accuracies can be achieved even below −20dB SNR level.
l−1
 2
N (mul)l =N l−1 N l sl wl l−1 , N (add)l = N l−1 N l sl l−1 APPENDIX
(16) The adaptive 1D CNN implementation used in this study is
illustrated in FIGURE 3. In this illustration, the 1D filters
So, during FP the total number of multiplications and have a size of 1 × 3 and the pooling (sub-sampling) factor
additions, T (mul), and T (add), on a L CNN layers will be, is 2 where the kth neuron in the hidden CNN layer, l, first
performs a sequence of convolutions, the sum of which is
L 2
passed through the activation function, f , followed by the
X l−1

TFP (mul) = N l−1 N l sl wl l−1 , TFP (add)
sub-sampling operation. In this implementation, we fused the
l=1
L
two distinct layers (convolution and pooling) into a single
X layer, called ‘‘CNN layer. As a next step, the CNN layers
= N l−1 N l sl l−1 (17)
process the raw 1D data and ‘‘learn to extract’’ such fea-
l=1
tures which are used in the classification task performed by
Obviously, TFP (add) is insignificant compared to the dense layers (or MLP layers). In this way, both feature
TFP (mul). extraction and classification operations are fused into one

180540 VOLUME 8, 2020


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

output layer for the input p; the mean-squared error (MSE),


Ep , can be expressed as follows:
 h i0  XNL 
p 2

Ep = MSE t , y1 , · · · , yNL
p L L
= yLi − ti (20)
i=1
To find the derivative of Ep by each network parameter,
the delta error, 1lk = ∂El should be computed. Specifically,
∂xk
for updating the bias of that neuron and all weights of the
neurons in the preceding layer, one can use the chain rule of
derivatives as,
∂E ∂E
= 1lk yl−1
i and l = 1lk (21)
FIGURE 3. Three consecutive hidden CNN layers of a 1D CNN [22]. ∂wik
l−1 ∂bk

process that can be optimized to maximize the classification So, from the first MLP layer to the last CNN layer, the regular
performance. This is the major advantage of 1D CNNs which (scalar) BP is simply performed as,
can also result in a low computational complexity since the Nl+1
X ∂E ∂x l+1 Nl+1
∂E X
only operation with a significant cost is a sequence of 1D = 1sl
k = i
= 1l+1
i wki .
l
(22)
convolutions which are simply linear weighted sums of two ∂slk i=1
∂xi
l+1 ∂sl
k i=1
1D arrays. Such a linear operation during the Forward and Once the first BP is performed from the next layer, l+1, to the
Back-Propagation operations can effectively be executed in current layer, l, then one can carry on the BP to the input
parallel. delta of the CNN layer l, 1lk . Let zero-order up-sampled map
This is also an adaptive implementation since the CNN be: uslk = up slk , then the delta error can be expressed as
topology will allow the variations in the input layer dimension follows:
in such a way that the sub-sampling factor of the output CNN
∂E ∂yl ∂E ∂uslk 0  l     
layer is tuned adaptively. Forward and Back-Propagation in 1lk = l kl = f xk = up 1sl
k β f 0
xkl
CNN layers will be next. ∂yk ∂xk ∂uslk ∂ylk
(23)
A. FORWARD- AND BACK-PROPAGATION IN CNN-LAYERS
where β = (ss)−1 . Then, the BP of the delta error
In each CNN-layer, 1D forward propagation (1D-FP) is  6
1slk ←−1l+1
i can be expressed as,
expressed as follows:
Nl−1   Nl+1   
X
1slk = conv1Dz 1l+1
i , rev wki
X l
xkl = blk + conv1D wl−1
ik , si
l−1
(18) (24)
i=1 i=1

where xkl is defined as the input, blk is defined as the bias of where rev (.) is used to reverse the array and conv1Dz (., .) is
the k th neuron at layer l, sl−1 is the output of the ith neuron at used to perform full 1D convolution with zero-padding. The
i
l−1
layer l − 1, wik is the kernel from the ith neuron at layer l − weight and bias sensitivities can be expressed as follows:
1 to the k th neuron at layer l. conv1D (., .) is used to perform ∂E   ∂E X
= conv1D s l
k , 1l+1
i and l = 1lk (n). (25)
‘in-valid’ 1D convolution without zero-padding. Therefore, ∂wik
l ∂bk n
the dimension of the input array, xkl , is less than the dimension
When the weight and bias sensitivities are computed, they can
i . The intermediate output, yk , can
of the output arrays, sl−1 l
l
be expressed bypassing the input xk through the activation be used to update biases and weights with the learning factor,
function, f (.), as, ε as,
∂E
ik (t + 1) = wik (t) − ε
wl−1 and blk (t + 1)
  l−1
ylk = f xkl and slk = ylk ↓ ss (19)
∂wl−1
ik
where slk stands for the output of the kth neuron of the layer, ∂E
= bk (t) − ε l .
l
(26)
l, and ‘‘↓ ss’’ represents the down-sampling operation with a ∂bk
scalar factor, ss. The forward and back-propagation in hidden CNN layers are
The back-propagation (BP) algorithm can be summarized illustrated in FIGURE 4. The output gradient of the kth neuron
as follows. Back propagating the error starts from the output at the CNN layer l, 1slk , is formed by back-propagating all
MLP-layer. Assume l = 1 for the input layer and l = L
the delta errors, 1il+1 , at the next layer, l + 1, by using
for the output layer. Let NL be the number of classes in the
Eq. (24), while the forward and back-propagation between
database; then, hfor an input vector p, and its target and output
i0 the last hidden CNN layer and the first hidden MLP layer are
vectors, t and y1 , · · · , yNL , respectively. With that, in the
p L L
summarized in FIGURE 5.

VOLUME 8, 2020 180541


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

REFERENCES
[1] P. E. Pace, Detecting and Classifying Low Probability of Intercept Radar.
Norwood, MA, USA: Artech House, 2009.
[2] R. Ardoino and A. Megna, ‘‘LPI radar detection: SNR performances for
a dual channel cross-correlation based ESM receiver,’’ in Proc. 6th Eur.
Radar Conf., Rome, Italy, Sep./Oct. 2009, pp. 113–116.
[3] O. Ozdil, M. Ispir, I. E. Ortatatli, and A. Yildirim, ‘‘Channelized DRFM
for wideband signals,’’ in Proc. IET Int. Radar Conf., Hangzhou, China,
2015, pp. 14–16.
[4] O. Ozdil, C. Toker, and A. Yildirim, ‘‘Channelized transceiver for real
signals,’’ in Proc. IET Intell. Signal Process. Conf. (ISP), London, U.K.,
2013, pp. 2–3.
[5] C. Yang, Z. Xiong, Y. Guo, and B. Zhang, ‘‘LPI radar signal detection based
on the combination of FFT and segmented autocorrelation plus PAHT,’’
J. Syst. Eng. Electron., vol. 28, no. 5, pp. 890–899, Oct. 2017.
[6] R. G. Wiley, ELINT The Interception and Analysis of Radar Signals.
FIGURE 4. Forward and back-propagation in hidden CNN layers. Norwood, MA, USA: Artech House, 2006.
[7] A. De Martino, Introduction to Modern EW Systems, 2nd ed. Norwood,
MA, USA: Artech House, 2018.
[8] R. A. Poisel, Information Warfare and Electronic Warfare Systems.
Norwood, MA, USA: Artech House, 2013.
[9] O. Hossjer and M. Moncef, ‘‘Robust multiple classification of known
signals in additive noise—An asymptotic weak signal approach,’’ IEEE
Trans. Inf. Theory, vol. 39, no. 2, pp. 594–608, Mar. 1993.
[10] Z. Zhu and A. K. Nandi, Automatic Modulation Classification Prin-
ciples, Algorithms and Applications. Hoboken, NJ, USA: Wiley,
2015.
[11] G. B. Giannakis and M. K. Tsatsanis, ‘‘Signal detection and clas-
sification using matched filtering and higher order statistics,’’ IEEE
Trans. Acoust., Speech, Signal Process., vol. 38, no. 7, pp. 1284–1296,
Jul. 1990.
[12] S. Kiranyaz, T. Ince, R. Hamila, M. Gabbouj, ‘‘Convolutional neural
networks for patient-specific ECG classification,’’ in Proc. Annu. Int. Conf.
IEEE Eng. Med. Biol. Soc. (EMBS), 2015, pp. 2608–2611, doi: 10.1109/
EMBC.2015.7318926.
[13] S. Kiranyaz, T. Ince, M. Gabbouj, ‘‘Real-time patient-specific ECG clas-
FIGURE 5. Forward and back-propagation between the last hidden CNN
layer and the first hidden MLP layer. sification by 1-D convolutional neural networks,’’ IEEE Trans. Biomed.
Eng., vol. 63, no. 3, pp. 664–675, Mar. 2016, doi: 10.1109/TBME.2015.
2468589.
Consequently, the iterative flow of BP for the 1D raw [14] S. Kiranyaz, T. Ince, and M. Gabbouj, ‘‘Personalized monitoring and
advance warning system for cardiac arrhythmias,’’ Sci. Rep., vol. 7,
signals in the training set can be stated as follows: p. 9270, Aug. 2017, doi: 10.1038/s41598-017-09544-z.
[15] S. Kiranyaz, M. Zabihi, A. B. Rad, T. Ince, R. Hamila, and
1) Initialize weights and biases (e.g., randomly, ∼U(-0.1,
M. Gabbouj, ‘‘Real-time phonocardiogram anomaly detection by adap-
0.1)) of the network. tive 1D convolutional neural networks,’’ Neurocomputing, vol. 411,
2) For each BP iteration DO: pp. 291–301, Oct. 2020, doi: 10.1016/j.neucom.2020.05.063.
[16] O. Avci, O. Abdeljaber, S. Kiranyaz, M. Hussein, and D. J. Inman,
a) For each PCG beat in the dataset, DO: ‘‘Wireless and real-time structural damage detection: A novel decentral-
ized method for wireless sensor networks,’’ J. Sound Vibrat., vol. 424,
i. FP: Forward propagate from the input layer to pp. 158–172, Jun. 2018.
the output layer to find outputs of each neuron [17] O. Avci, O. Abdeljaber, S. Kiranyaz, and D. Inman, ‘‘Structural damage
at each layer, sli , ∀i ∈ [1, Nl ] , and ∀l ∈ detection in real time: Implementation of 1D convolutional neural networks
for SHM applications,’’ in Structural Health Monitoring & Damage Detec-
[1, L]. tion, C. Niezrecki, Ed. Cham, Switzerland: Springer, 2017, pp. 49–54,
ii. BP: Compute delta error at the output layer doi: 10.1007/978-3-319-54109-9_6.
and back-propagate it to first hidden layer [18] O. Abdeljaber, O. Avci, S. Kiranyaz, M. Gabbouj, and D. J. Inman, ‘‘Real-
to compute the delta errors, 1lk , ∀k ∈ time vibration-based structural damage detection using one-dimensional
convolutional neural networks,’’ J. Sound Vibrat., vol. 388, pp. 154–170,
[1, Nl ] , and ∀l ∈ [1, L]. Feb. 2017, doi: 10.1016/j.jsv.2016.10.043.
iii. PP: Post-process to compute the weight and [19] O. Avci, O. Abdeljaber, S. Kiranyaz, B. Boashash, H. Sodano, and
bias sensitivities using Eq. (25). D. J. Inman, ‘‘Efficiency validation of one dimensional convolutional
neural networks for structural damage detection using a SHM bench-
iv. Update: Update the weights and biases by the mark data,’’ in Proc. 25th Int. Conf. Sound Vib. (ICSV), Jan. 2018,
(accumulation of) sensitivities scaled with the pp. 4600–4607.
learning factor, ε using Eq. (26). [20] O. Abdeljaber, O. Avci, M. S. Kiranyaz, B. Boashash, H. Sodano, and
D. J. Inman, ‘‘1-D CNNs for structural damage detection: Verification on a
structural health monitoring benchmark data,’’ Neurocomputing, vol. 275,
ACKNOWLEDGMENT pp. 1308–1317, Jan. 2018, doi: 10.1016/j.neucom.2017.09.069.
This work was supported by the Qatar National Research [21] T. Ince, S. Kiranyaz, L. Eren, M. Askar, and M. Gabbouj, ‘‘Real-time
motor fault detection by 1-D convolutional neural networks,’’ IEEE Trans.
Fund (QNRF) through the ongoing project under Grant Ind. Electron., vol. 63, no. 11, pp. 7067–7075, Nov. 2016, doi: 10.
NPRP11S-0108-180228. 1109/TIE.2016.2582729.

180542 VOLUME 8, 2020


A. Yildirim, S. Kiranyaz: 1D CNNs Versus Automatic Classifiers for Known LPI Radar Signals

[22] S. Kiranyaz, A. Gastli, L. Ben-Brahim, N. Al-Emadi, and M. Gabbouj, [39] G. Turin, ‘‘An introduction to matched filters,’’ IEEE Trans. Inf. Theory,
‘‘Real-time fault detection and identification for MMC using 1-D con- vol. IT-6, no. 3, pp. 311–329, Jun. 1960.
volutional neural networks,’’ IEEE Trans. Ind. Electron., vol. 66, no. 11, [40] J. G. Proakis and M. Salehi, Communication Systems Engineering.
pp. 8760–8771, Nov. 2019, doi: 10.1109/TIE.2018.2833045. Upper Saddle River, NJ, USA: Prentice-Hall, 2002.
[23] O. Abdeljaber, S. Sassi, O. Avci, S. Kiranyaz, A. A. Ibrahim, and [41] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing.
M. Gabbouj, ‘‘Fault detection and severity identification of ball bearings by London, U.K.: Pearson Higher Education, 2010.
online condition monitoring,’’ IEEE Trans. Ind. Electron., vol. 66, no. 10, [42] M. L. D. Wong and A. K. Nandi, ‘‘Semi-blind algorithms for automatic
pp. 8136–8147, Oct. 2019. classification of digital modulation schemes,’’ Digit. Signal Process.,
[24] L. Eren, T. Ince, and S. Kiranyaz, ‘‘A generic intelligent bearing fault vol. 18, no. 2, pp. 209–227, Mar. 2008.
diagnosis system using compact adaptive 1D CNN classifier,’’ J. Signal [43] Z. Zhu and A. K. Nandi, ‘‘Blind digital modulation classification using
Process. Syst., vol. 91, no. 2, pp. 179–189, Feb. 2019, doi: 10.1007/s11265- minimum distance centroid estimator and non-parametric likelihood func-
018-1378-3. tion,’’ IEEE Trans. Wireless Commun., vol. 13, no. 8, pp. 4483–4494,
[25] L. Eren, ‘‘Bearing fault detection by one-dimensional convolutional neural Aug. 2014.
networks,’’ Math. Problems Eng., vol. 2017, pp. 1–9, Jul. 2017, doi: 10. [44] N. Levanon and E. Mozeson, Radar Signals. Hoboken, NJ, USA: Wiley,
1155/2017/8617315. 2004.
[26] W. Zhang, C. Li, G. Peng, Y. Chen, and Z. Zhang, ‘‘A deep convolutional
neural network with new training methods for bearing fault diagnosis under
noisy environment and different working load,’’ Mech. Syst. Signal Pro-
cess., vol. 100, pp. 439–453, Feb. 2018, doi: 10.1016/j.ymssp.2017.06.022.
[27] Y. Xiao, W. Liu, and L. Gao, ‘‘Radar signal recognition based on trans- ALPER YILDIRIM (Member, IEEE) received the
fer learning and feature fusion,’’ Mobile Netw. Appl., vol. 25, no. 4, B.Sc. degree in electrical engineering from Bilkent
pp. 1563–1571, Aug. 2020, doi: 10.1007/s11036-019-01360-1. University, Turkey, in 1996, the M.Sc. degree
[28] M. Zhang, M. Diao, and L. Guo, ‘‘Convolutional neural networks for in digital and computer systems from the Tam-
automatic cognitive radio waveform recognition,’’ IEEE Access, vol. 5, pere University of Technology, Finland, in 2001,
pp. 11074–11082, 2017, doi: 10.1109/ACCESS.2017.2716191. and the Ph.D. degree in electronics engineering
[29] M. Kong, J. Zhang, W. Liu, and G. Zhang, ‘‘Radar emitter identification
from Ankara University, in 2007. He received the
based on deep convolutional neural network,’’ in Proc. Int. Conf. Control,
Associate Professor degree from Inter-University
Autom. Inf. Sci. (ICCAIS), Hangzhou, China, Oct. 2018, pp. 309–314,
doi: 10.1109/ICCAIS.2018.8570480. Committee, Turkey, in 2017. He was a Design
[30] A. Selim, F. Paisana, J. A. Arokkiam, Y. Zhang, L. Doyle, and Engineer with Nokia Mobile Phones, Tampere.
L. A. DaSilva, ‘‘Spectrum monitoring for radar bands using deep convo- He is currently a Chief Research Scientist with the Scientific and Techno-
lutional neural networks,’’ in Proc. GLOBECOM-IEEE Global Commun. logical Research Council of Turkey (Tübitak), Ankara. His research interests
Conf., Singapore, Dec. 2017, pp. 1–6. include digital signal processing, electronic warfare, and radar systems and
[31] F. C. Akyon, Y. K. Alp, G. Gok, and O. Arikan, ‘‘Classification of intra- optimization.
pulse modulation of radar signals by feature fusion based convolutional
neural networks,’’ in Proc. 26th Eur. Signal Process. Conf. (EUSIPCO),
Rome, Italy, Sep. 2018, pp. 2290–2294, doi: 10.23919/EUSIPCO.2018.
8553176.
[32] P. Itkin and N. Levanon, ‘‘Ambiguity function based radar waveform SERKAN KIRANYAZ (Senior Member, IEEE) is
classification and unsupervised adaptation using deep CNN models,’’ in
currently a Professor with Qatar University, Doha,
Proc. IEEE Int. Conf. Microw., Antennas, Commun. Electron. Syst. (COM-
Qatar. He published two books, five book chap-
CAS), Tel-Aviv, Israel, Nov. 2019, pp. 1–6, doi: 10.1109/COMCAS44984.
2019.8958242. ters, more than 80 journal articles in high impact
[33] S. Kiranyaz, T. Ince, O. Abdeljaber, O. Avci, and M. Gabbouj, ‘‘1- journals, and 100 papers in international con-
D convolutional neural networks for signal processing applications,’’ in ferences. He made contributions on evolutionary
Proc. ICASSP-IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), optimization, machine learning, bio-signal analy-
Brighton, U.K., May 2019, pp. 8360–8364, doi: 10.1109/ICASSP. sis, computer vision with applications to recogni-
2019.8682194. tion, classification, and signal processing. He has
[34] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, coauthored the papers which have nominated or
‘‘1D convolutional neural networks and applications: A survey,’’ 2019, received the Best Paper Award in ICIP 2013, ICPR 2014, ICIP 2015, and
arXiv:1905.03554. [Online]. Available: http://arxiv.org/abs/1905.03554 IEEE TSP 2018. He had the most-popular articles in the years 2010 and 2016,
[35] B. Pinkowski, ‘‘LPC spectral moments for clustering acoustic transients,’’ and most-cited article, in 2018, in the IEEE TRANSACTIONS ON BIOMEDICAL
IEEE Trans. Speech Audio Process., vol. 1, no. 3, pp. 363–368, Jul. 1993. ENGINEERING. From 2010 to 2015, he authored the fourth most-cited article of
[36] M. Islam, T. Ahmed, S. S. Mostafa, S. U. Yusuf, and M. Ahmad, Neural Networks journal. His theoretical contributions to advance the current
‘‘Human emotion recognition using frequency & statistical measures of state of the art in modeling and representation, targeting high long-term
EEG signal,’’ in Proc. Int. Conf. Inform., Electron. Vis. (ICIEV), Dhaka,
impact, while algorithmic, system level design and implementation issues
Bangladesh, May 2013, pp. 17–18.
target medium and long-term challenges for the next five to ten years. He is
[37] A. V. Oppenheim and G. C. Verghese, ‘‘Class notes for 6.0.11: Introduction
to communication, control and signal processing,’’ in Signals, Systems, in particular aims at investigating scientific questions and inventing cutting
and Inference. Cambridge, MA, USA: MIT Press, 2010. Accessed: edge solutions in personalized biomedicine which is in one of the most
Mar. 28, 2020. [Online]. Available: https://ocw.mit.edu/courses/electrical- dynamic areas where science combines with technology to produce efficient
engineering-and-computer-science/6-011-introduction-to-communica signal and information processing systems. His research team has received
tion-control-and-signal-processing-spring-2010/readings/MIT6_011S10_ the 2nd and 1st places in PhysioNet Grand Challenges 2016 and 2017,
notes.pdf among 48 and 75 international teams, respectively. He received the Research
[38] P. M. Woodward, Probability and Information Theory: With Applications Excellence Award and the Merit Award of Qatar University, in 2019.
to Radar. New York, NY, USA: McGraw-Hill, 1953.

VOLUME 8, 2020 180543

View publication stats

You might also like