Meng 2018

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2018.2868698, IEEE
Transactions on Vehicular Technology
IEEE TRANSACTION ON VEHICULAR TECHNOLOGY 1
Automatic Modulation Classification: A Deep

Learning Enabled Approach
Fan Meng, Peng Chen, Member, IEEE, Lenan Wu, and Xianbin Wang, Fellow, IEEE
Abstract—Automatic modulation classification (AMC), which The traditional feature-based AMC algorithms [3,4] can be
plays critical roles in both civilian and military applications, realized by three steps: data preprocessing, feature extraction
is investigated in this paper through a deep learning approach. and classificatory decision. Since good statistical features can
Conventional AMCs can be categorized into maximum likelihood
(ML)-based (ML-AMC) and feature-based AMC. However, the provide robust performance with low complexity [5], feature
practical deployment of ML-AMCs is difficult due to its high extraction-based AMC methods have been studied in many
computational complexity, and the manually extracted features existing papers [6,7], such as cyclic feature [8], higher order
require expert knowledge. Therefore, an end-to-end convolution moments and cumulants [9]–[11], wavelet transforms [12,13],
neural network (CNN)-based AMC (CNN-AMC) is proposed, et al. In classificatory decision, the conventional classifiers
which automatically extracts features from the long symbol-rate
observation sequence along with the estimated signal-to-noise includes random forest [14], multi-layer perception [15,16]
ratio (SNR). With CNN-AMC, a unit classifier is adopted to and support vector machine (SVM) [17,18]. These AMCs are
accommodate the varying input dimensions. The direct training combinations of different feature extraction algorithms and
of CNN-AMC is challenging with complicated model and complex classifications, and require both expert knowledge and artificial
tasks, so a novel two-step training is proposed, and the transfer design.
learning is also introduced to improve the efficiency of retraining.
Different digital modulation schemes have been considered in The AMC methods based on maximum likelihood (ML)
distinct scenarios, and the simulation results show that the CNN- are also studied in [19], where the ML-AMCs have been
AMC can outperform the feature-based method, and obtain a proved to achieve the optimal performance in the scenarios
closer approximation to the optimal ML-AMC. Besides, CNN- with mathematical channel models. Different from the feature-
AMCs have the certain robustness to estimation error on carrier based AMCs, ML-AMCs calculate likelihood functions of all
phase offset and SNR. With parallel computation, the deep
learning-based approach is about 40 to 1700 times faster than candidate modulations, and choose the modulation scheme
the ML-AMC regarding inference speed. with maximal likelihood value, working as an end-to-end
module. However, the high computational complexity makes it
Index Terms—automatic modulation classification, convolution
neural network, deep learning, two-step training. extremely difficult for the real-time implementation with low
cost [20]. Besides, the exact likelihoods in ML-AMC are hard
to achieve in the complex environments.
I. I NTRODUCTION In feature-based AMCs, the machine learning methods
In achieving the maximum possible transmission rates, simply work as a mapping function between features and
adaptive modulation schemes are used in the civilian wireless multi-hypothesis. To obviate feature engineering, the deep
communication system for full utilization of time-varying learning method [21] is developing rapidly in recent years
channels. Under these circumstances, information concerning both in algorithm design and hardware implementation [22],
a specific modulation scheme used in a communication process and it can learn significantly more complex functions than a
has to be shared by the transmitter with the receiver through a shallow one. The rapid development of deep learning has led
network protocol, at the cost of protocol overhead. Such over- to some successful applications in communications [23]–[25],
head could be eliminated when the receiver has the capability including modulation classification. In [26], the eye diagram of
of modulation scheme recognition. On the other hand, many the raw signal is used as input for Lenet-5-based classifier [27]
military applications also require automatic discovery of the and subtly relates the problem of AMC with the well-studied
modulation schemes used by signals from adversaries. Such area of image recognition. The convolutional network for
applications encompass signal interception and jamming. As a radio modulation recognition is proposed in [28,29], and the
result, automatic modulation classification (AMC) [1,2] is of simulations with a dataset called RML2016.10b show that the
great importance for both civilian and military applications classification accuracy is higher than those with expert feature
in achieving automatic receiver configuration, interference engineering. Then in [30], a LSTM-based AMC is proved
mitigation, and spectrum management. to outperform the CNN model with oversampled received
signals at small or medium scales. The long symbol-rate signal
Fan Meng and Lenan Wu are with the School of Information Science and is studied in [31], and the stacked auto-encoders achieves
Engineering, Southeast University, Nanjing 210096, China (email: mengxiao-
maomao@outlook.com, wuln@seu.edu.cn). excellent results with increased simulation runs.
Peng Chen is with the State Key Laboratory of Millimeter Waves, South- In our paper, we propose a deep neural network (DNN)
east University, Nanjing 210096, China (email: chenpengseu@seu.edu.cn), enabled AMC which can automatically learn to extract features
Peng Chen is the corresponding author of the paper.
Xianbin Wang is with the Department of Electrical and Computer Engi- from long symbol-rate signals at low SNR, and adopt the
neering, Western University, Canada (e-mail: xianbin.wang@uwo.ca). convolution neural network (CNN) to realize this AMC. The
0018-9545 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2018.2868698, IEEE
CNN-AMC can approximate the ML-AMC with minimal 2) θc , the fixed phase offset introduced by the propagation
performance loss but significant speed improvement. The ML- delay as well as the initial carrier phase.
AMC is used as a benchmark for our proposed CNN-AMC, 3) N , the number of symbols in received sequence, i.e., the
and can also generate the training data in different scenarios. observation sequence space.
The contributions of this work are summarized as follows: 4) xk,i , the i-th constellation point under k-th modulation
• Directly using the long symbol-rate signals at low SNR, scheme where the symbols are equally distributed, i ∈
we propose an end-to-end trainable AMC based on {1, · · · , Mk }, k ∈ {1, · · · , C}. The mark Mk denoted
deep CNN. Different from the featured-based AMCs, the the symbol number using this modulation, and C is the
CNN-AMCs automatically learn to acquire features from number of candidate equiprobable modulation schemes,
the raw signals and simplify the AMC design. Besides, respectively. The modulated symbols are usually nor-
CNN-AMC makes an approximation to the ML-AMC, malized with unit power
and the related processing can be accomplished parallelly Mk
to accelerate the inference speed. 1 X
|xk,i |2 = 1. (4)
• We introduce a two-step training method including pre- Mk i=1
training and fine-tuning for the proposed CNN-AMC to
solve the challenging direct training problem. For varying In addition, symbols in a data block are assumed to be
combinations of different modulation schemes and com- independent and identical distributed (I.I.D).
munication channel scenarios, the transfer learning is also 5) ∆f , the residual carrier frequency after carrier removal.
proposed to improve the retraining efficiency further. 6) Ts , symbol interval.
• By dividing long preprocessed observation sequence into
7) g(t), the composite effect of the residual channel h(t)
signal segments with unit size, the unit classifier is and the pulse-shaping function p(t). It is expressed
introduced with the unit input dimension to accommodate by the equation g(t) = h(t) ∗ p(t), where ∗ is the
the varying input dimensions, and parallelly realize the convolution operator. Usually, p(t) is a root raised cosine
block calculation. pulse function with the duration Ts .
8) ε, the normalized epoch for timing offset between the
The remainder of this paper is organized as follows.
transmitter and the receiver, 0 ≤ ε < 1.
Section II outlines the modulation classification problem in
the framework of classical decision theory. In Section III In this paper, the received signal is expressed as the sampled
the structure of a CNN-AMC featured with multiple inputs complex baseband representation, and SNR of Es /N0 is con-
is provided, along with its corresponding two-step training, sidered. The discrete observation sequence sampled at symbol
T
transfer learning and the unit classifier. Then, in Section IV, interval Ts can be written as y = [y0 , · · · , yN −1 ] , where the
the proposed CNN-AMC, the ML-AMC and feature-based superscript T is the transpose operator. Each sampling instant
method are tested versus SNR in distinct environments, and the of the received signal is the output of a pulse-shaping matched
simulation results are analyzed. Conclusions and discussion filter at the receiver, which can be expressed as
are given in Section V. Z (n+1)Ts
1
yn = y(t)p(t − nTs ) dt. (5)
Ts nTs
II. S YSTEM M ODEL
A. Problem Formulation The pulse shape p(t) of the transmitting signal is unknown for
Without loss of generality, the complex envelope of received the receiver in the blind classification case. Similarly, instant
baseband signal can be expressed as yn is usually normalized to unit power
yn
y(t) = x(t; uk ) + v(t), (1) yn = q
1 H (6)
Ny y
where v(t) is the complex additive white Gaussian noise
(AWGN) with zero mean and double-sided power spectral where the superscript H is the conjugate transpose operator.
density being N0 /2. The noiseless signal x(t; uk ) is modeled In our classifiers, the estimation of symbol SNR is required,
via a comprehensive form as which is equivalent to having knowledge of both a and N0 .
N
X −1 The conditional probability density function (PDF) of yn
x(t; uk ) = aej(2π∆f t+θc ) xk,i
n g(t − nTs − εTs ), (2) under hypothesis Hk can be written as
n=0
Mk
|yn − xk,i 2

where uk is defined as a multi-dimensional parameter set, in
X 1 n (uk )|
p(yn |Hk , uk ) = √ exp
which a bunch of unknown signal and channel variables are i=1
Mk πN0 N0
stochastic or deterministic under the k-th modulation scheme, Mk ∗ k,i
|xk,i 2

and it is given as
1 X n | − 2Re{yn xn (uk )}
∝ exp .
n o Mk i=1 N0
uk = a, θc , ∆f, {xk,i }M
i=1 , h(t), ε .
k
(3) (7)
The symbols used in (2) and (3) are listed as follows: The elements in parameter set uk are deterministic or
1) a, the signal amplitude. random variables with known PDFs. Hence, the marginalized
density function of an observation sequence can be obtained C. Signal Segment

by taking the statistical averaging of (7), and it is given as The normalized likelihood vector is defined as ξ N =
N N T
Γ(yn |Hk ) = Euk {Γ(yn |Hk , uk )} , (8) [ξ1 , · · · , ξC ] , in which the element of corresponding mod-
PC
ulation is ξkN = Pcc (y|Hk ), and apparently k=1 ξkN = 1.
where E{·} is the expectation operator. Because the instants in observation sequence are I.I.D, the
The channel environment is assumed to vary slowly, and whole vector can be segmented into S pieces of successive
the parameters in set uk is static over the whole observation sub-sequence with corresponding length being Ns , and N =
PS
period. Besides, the normalized epoch ε = 0 and the noise s=1 s N . Furthermore, combine (9) and (12), the ξkN can be
is white, and thus the elements of y are I.I.D. The function written as
p(yn |Hk , uk ) represents the PDF of single observation at in- QS
N Γ(yNs |Hk )
stant n, and the joint likelihood function of whole observation ξk = PC s=1 QS Ns |H 0 )
sequence can be described as k0 =1 s=1 Γ(y k
QS Ns
N −1 ξk (16)
Y = PC s=1 QS Ns
Γ(y|Hk ) = p(yn |Hk ). (9) k 0 ξ 0
n=0
=1 s=1 k
= γk ξ , · · · , ξ N S ,
N1
Floating point overflow in the numerical calculation can be
introduced by a large N , and therefore the equivalent loga- where ξ Ns denotes the normalized likelihood vector of seg-
rithmic form of (9) is usually adopted, and it is given by ment s. It is meaningful when N is large, and we divide
it into several segments and make block calculation. More
L(y|Hk ) = ln Γ(y|Hk ). (10)
importantly, (16) indicates that the segment is equivalent to
According to the maximum likelihood criterion, given the the original classification task, and the design of the block
observation sequence y, the most possible hypothesis Hk calculation can be flexible and implementable.
which is correspondent to the k-th modulation scheme, is
finally chosen with the maximal likelihood value III. CNN- BASED C LASSIFIER
We propose a novel end-to-end DNN enabled AMC [32]–
k = arg
b max Γ(y|Hk ), (11) [34], which close approximates the ML-AMC and significantly
k∈{1,··· ,C}
improves the inference speed.
ˆ denotes the estimation.
where the top-mark (·)
A. Reasons for Using CNN in AMC
B. Correct Probability The channel environment is assumed to be time invariant
The conditional correct probability Pcc (y|Hk ) is defined as and frequency non-selective, then the elements in observation
sequence y are I.I.D. Some characteristics of y must be
Γ(y|Hk ) noticed:
Pcc (y|Hk ) = PC . (12)
k0 =1 Γ(y|Hk0 ) 1) The statistical characteristics of segment yNs in arbitrary
Therefore, the overall correct probability Pc is represented as time interval are the same, i.e.,
C E{L(yN Ns
i |Hk , uk )} = E{L(yj |Hk , uk )}, 0 ≤ i, j < N −Ns ,
s
1 X
Pc = Pcc (y|Hk ) . (13)
C where yNi
s
= [yi , · · · , yi+Ns −1 ]T
k=1
2) The signal sequence yN can be divided into S segments
The distribution of observation sequence y is fY (y), then of successive sub-sequence, and the Plikelihood
Ns
function
combine (12) and (13), the expectation of Pc is given by can be expressed as L(yN |Hk ) = s=1 L(yNs |Hk ).
3) Arbitrary exchanges elements in y for arbitrary times,
Z
EY {Pc } = Pc fY (y) dy and the likelihood function of obtained y0 is L(y0 |Hk ) =
y
C Z (14) L(y|Hk ).
1 X Γ(y|Hk ) The property 1 indicates that the signal is identically dis-
= PC Γ(y|Hk ) dy.
C y k0 =1 Γ(y|Hk0 ) tributed in the time domain. The impulse responses of signal
k=1
segments at different instants with a convolution kernel are the
The parameter set uk is assumed to be known, but in fact, it
same, so that the features can be extracted or detected at any
is usually modeled or estimated with some deviations. Then
time slots. The property 2 proves that the likelihood function is
the (14) is given as
( ) compositional and can be approximated by a stacked cascade
C Z
1 X Γ(y|Hk , ûk ) representation model. Therefore, a CNN with local receptive
EY {Pc } = EUk ,Ûk ,Y PC , fields is more applicable than a full-connected NN. Further-
C k0 =1 Γ(y|Hk , ûk )
k=1 y
0 0
more, given a large N , the over-fitting problem during training
(15)
becomes more severe for full-connected models. The property
where ûk is the estimation of uk . It is hard to obtain a closed- 3 indicates that the signal is sampled as a time series, but it
form solution of (14) and (15), but a numerical approximation is not time-related, so the recurrent neural network (RNN) is
can be obtained by Monte Carlo method instead. not recommended for this issue.
B. Network Architecture where xj is j-th pre-activation output. Finally, the representa-

The illustrative structure of our designed CNN-AMC is tion of the CNN-AMC can be described as follows
shown in Fig. 1, and the input consists of two parts: the pre- N
ξ̂ = F (y, SNR; θ) , (21)
processed observation sequence y with dimension N = 1000
and the estimated symbol SNR, which is the same as ML- where θ is the network parameter, and F (·) is the overall
AMC. However, y is a raw signal vector which needs to be function of CNN-based classifier. In this deep model, the
convoluted to generate higher-level information, while SNR feature exaction is automatically realized by the CNN part, and
is just a scalar which cannot be convoluted. Different kinds the higher-layers mainly deal with the mapping relationship
of signals cannot be simply concatenated as the input for between the input and output data. Generally, the CNN-AMC
the CNN-AMC. Instead, a mixed NN with multiple inputs works as a powerful nonlinear approximator F (·) to the target
is employed. function (9). The complete training/testing and simulation
To transform complex vector y into real data, the real & source codes can be downloaded from https://github.com/
imaginary parts are separated and then converted into shape mengxiaomao/CNN AMC.
2 × N . For example, for the first convolution layer l = 1,
the input vectors can be denoted by xl−1 = y ∈ RNl−1 ×Vl−1 , C. Two-step Training
where the vector number and dimension are Nl−1 = 2 and
Vl−1 = 1000, respectively. The sizes of output vectors are In a general training procedure, the CNN-AMC is trained
listed on the left of corresponding layer boxes in Fig. 1. For a epoch by epoch, and in an epoch, all the training samples are
one-dimensional convolution layer l with Nl−1 input and Nl used once to train the CNN-AMC. Within an epoch, the whole
output vectors, there are Nl × Nl−1 convolution kernels (also training set is shuffled and split into batches with size being
called as weight matrices) and Nl biases, and the kernel size Nb , and the model is trained batch by batch. In our paper, the
is denoted by kl . Giving the output vectors of previous layer categorical cross-entropy error is adopted as the loss function,
xl−1 and for a training batch the loss function is represented as
i , i ∈ Nl−1 = {1, · · · , Nl−1 }, and the i-th output vector
of current layer is represented as 1 X
Nb
N N

Nl−1
 L(θ) = − ξN N
i log ξ̂ i + (1 − ξ i ) log(1 − ξ̂ i ), (22)
X Nb i=1
xli = f  xl−1
j ∗ kli,j + bli  , (17)
j=1 where ξ N
i denotes the target output. This loss function is
where the rectified linear unit (ReLU) [35] is adopted as a minimized by the stochastic gradient descent (SGD) algorithm,
fast non-linear activation function: f (x) = max(0, x), and ∗ and the model parameter is updated iteratively as
is the convolution operation. There are Nl × Nl−1 × kl + Nl θt+1 = θt − η∇L(θt ), (23)
trainable parameters in convolution layer l. A pooling layer
is also called as sub-sampling layer, and it produces down- where η denotes the learning rate, and operation ∇ is the
sampled versions of input vectors. Formally we have the i-th gradient operator. As a variant of SGD algorithm, the adaptive
output vector as moment estimation (Adam) [37] algorithm is adopted with
xl+1
i = down(xli ), (18) default settings in our paper. The Adam optimizer dynamically
adapts η to accelerate the convergence speed by using adaptive
where down(·) represents a sub-sampling function, and the
estimates of lower-order momentums, and it is proved to be
Average pooling [36] is used to reduce the dimensionality by
efficient especially for deep models. The parameter set θ and
computing the average value of a windowed input vector. After
validation loss at each training epoch are recorded, and the
stacked convolution layers and pooling layers, the output vec-
best CNN-AMC is defined as the one with minimal validation
tors are squeezed into a vector by a flatten layer. Then a dense
loss. Different lengths of received observation sequence under
layer is used to encode this vector into a 256-dimensional
varying SNRs can be generated in the simulation, and the ML-
vector which contains extracted higher-level information. For
AMC can also produce large amounts of output data.
a dense layer l with input vector xl−1 , the output vector can
However, in our experiments, the direct training of the
be described as
CNN-based classifier is challenging, due to the complicated
xl = f (Wl xl−1 + bl ), (19) model and complex target function. With massive amounts of
training data and sufficient iteration times, we find that the
where Wl ∈ RNl ×Nl−1 and bl ∈ RNl are weight matrix and
loss function L(θ) cannot drop, indicating that this model has
bias, respectively. This is a full-connected layer, and the num-
stuck in the high-dimensional parameter optimization from the
ber of trainable parameters is Nl ×Nl−1 +Nl . Then this vector
beginning.
is concatenated with a vectorized representation of scalar
Although the final task is difficult, it can be much easier for
SNR. After another dense layer, then the normalized likelihood
N the model to learn a more straightforward function. There is
vector namely ξ̂ is produced by a output layer with activation a simple but useful trick to train the CNN-based classifier by
N
function being softmax. The output ξ̂ = [ξˆ1N , · · · , ξˆC
N T
] , and using auxiliary data. As shown in Fig. 2, the whole training
ˆN
the element ξi is defined as: process is divided into two steps.
exi 1) Pre-training: This step is designed to help the CNN-
ξˆiN = PC , (20)
j=1 e
xj AMC converge at the beginning of training, so the scale
TABLE I
S IMULATION PARAMETERS OF T WO - STEP T RAINING
k̂
m
A
a
g
x
r
Training Step Step 1 Step 2
Arg max Epoch Number 20 240
SNR Range/dB [0, 10] [−6, 12]
ξN
Ar
ax
m
g
Amount of Training Data 79, 200 228, 060
Softmax Amount of Validation Data 8, 800 25, 340
m
A
Using Output of ML Classifier No Yes
a
g
x
r
Dense 128 ReLU
266 Merge Concat

of the training data is small as shown in Tab. I. Another
output term is added with a gray circle in this step, and
Dense 256 ReLU Dense 10 ReLU its corresponding category is Gaussian white noise with
1984 Flatten
zero means, not other modulation schemes. The amount
Scaling
of auxiliary training samples of noise is the same as any
32×62 AveragePool 2,1 other modulation scheme, and its I/O data are denoted by
SNR
32×125 Conv 32×3,1 ReLU
xaux and yaux . The output vectors are set to be binary. In
the simulation tests, the loss function L(θ) decreases,
24×125 AveragePool 2,1 the variance of gradients gets smaller, and the CNN-
AMC goes to convergence after some training epochs.
24×250 Conv 24×3,1 ReLU
The parameter θ with lowest validation loss is stored for
24×250 AveragePool 2,1 the model initialization in the next training.
2) Fine-tuning: Firstly, the CNN-AMC loads the stored
24×500 Conv 24×3,1 ReLU
parameter θ in step 1. The auxiliary output term is
12×500 AveragePool 2,1 redundant in the final classification, so the top layer
is replaced with the primary output layer and then
12×1000 Conv 12×3,1 ReLU randomly initialized. In this step, only the samples of
modulation schemes are generated, and the ML-AMC
Output: 12×1000 Conv 12×3,1 ReLU
produces the output of training data in the form of
Scaling normalized likelihood vector ξ N , which is different from
the first step. The elaborately designed training data
y 2×1000 helps the CNN-AMC close approximate the ML-AMC,
and simulation result shows that the Pc performance
Fig. 1. Illustration of a CNN-based modulation classifier with both vectors with data generated by ML-AMC has better approxima-
and a scalar as input. The text ‘Conv 12 × 3,1 ReLU’ represents that in this tion than the one with common binary labels. Similarly,
convolution layer, the kernel number, kernel size and stride are 12, 3 and
1, respectively. Similarly, the text ‘AveragePool 2,1’ represents that in this the CNN-AMC with lowest validation loss is stored,
average pooling layer, window size and stride are 2 and 1, respectively. reloaded and tested in the following simulations, and
the detailed settings are listed in Tab. I.
The summarization of the two-step training method is given
in Tab. II. In Fig. 3, the curves of training/validation
Output Layer loss/accuracy in a complete two-step training process are
plotted. Generally, the loss curves go down, and the accuracy
x  y  curves keep rising in both two steps. In step 2, the training loss
Step 1: xI   1  yI   1 
drops sharply at the beginning and then descends steadily. The
 xaux  y aux 
curve of validation loss has the similar trend as the training
loss one. While as the epochs increasing it goes stable with
Noise Term small jitters, and it is hard to be less than 0.47, which means
the training is in the saturation region. The corresponding θ
with the minimal validation loss is stored, and the minimum
y II   y 2 
value in step 2 is labeled with the red dot.
Step 2: xII   x2 
The likelihood function of the auxiliary Gaussian noise
in the complex plane is quasi-convex, which is a relatively
more straightforward function to approximate. From (7) we
know that the conditional PDF of one observation instant
Fig. 2. Illustration of the two-step training of CNN-AMC. In step 1 all the is a weighted sum of Gaussian distribution with different
primary output elements are represented with orange circles and the additional means and same variance. Therefore, the approximation to
noise term is depicted with gray circle. The output element of Gaussian noise
is removed in step 2. The input and output in step 1 are denoted by xI and more complex PDFs of other modulation schemes becomes
yI , and similarly they are marked by xII and yII for step 2. much more comfortable for a CNN-AMC which has learned
the function of Gaussian noise. After learning the approxima-
1 1.4 1 1.2
0.9 1.26 0.9 1.08
0.8 1.12 0.8 0.96
0.7 0.98 0.7 0.84
Cross-entropy Loss
Cross-entropy Loss
0.6 0.84 0.6 0.72
Accuracy
Accuracy
0.5 0.7 0.5 0.6
0.4 0.56 0.4 minimum(transfer) 0.48

minimum(two-step)
0.3 minimum(step 1) minimum(step 2) 0.42 0.3 0.36
0.2 Train accuracy 0.28 0.2 Two-step/ validation accuracy 0.24

Train cross-entropy Loss Two-step/ validation cross-entropy Loss
0.1 Validation accuracy 0.14 0.1 Transfer/ validation accuracy 0.12
Validation cross-entropy Loss Transfer/ validation cross-entropy Loss
0 0
0 20 50 100 150 200 240 260 0 20 50 100 140
Epoch Times Epoch Times
Fig. 3. Training/validation curves of accuracy and loss. The x-axis denotes Fig. 4. Loss and accuracy curves of validation data versus epoch times. The
the training epoch. The validation and training curves are plotted in dark blue one using transfer learning takes less time to reach the saturation region than
and orange lines, and the accuracy and loss curves are depicted with solid the two-step training one, and the pre-training step is reduced.
and dotted lines.
TABLE II or experiences instead of purely blank. Here, we take two cases

S UMMARIZATION OF T WO - STEP T RAINING
as examples:
Step 1: 1) Diverse modulation schemes. In distinct situations, some
1) Generate training data, and samples of the additional Gaussian noise
is included.
modulation modes are added or removed, depending on
2) Initialize the CNN-AMC parameters randomly. the requirement of the classification task. Except for the
3) Train CNN-AMC, store the model parameters with the minimal new training data, the output layer of the CNN-AMC
validation loss.
Step 2:
needs to be changed to the new corresponding category.
1) Produce training data, and the target output is achieved from ML- Therefore, all the CNN-AMC parameters except for the
AMC. ones in the top layer can be loaded as the initialization
2) Replace the top layer of CNN-AMC and randomly initialize it;
Load the other model parameters from storage.
instead of randomly setting.
3) Train CNN-AMC, store the model parameters with the minimal 2) Different environments. During the preprocessing pe-
validation loss. riod, the carrier phase compensation is included in the
coherent case but not incoherent scenario. The CNN-
AMC can be trained with new incoherent training data
tion of a symbol, then the objective function becomes more using the proposed two-step method. Moreover, the
accessible according to (9). In the first step, the problem of CNN-based classifier learned in the coherent case can be
initial convergence is solved. By pre-training, the CNN-AMC utilized as the initial model for training with these data
is further trained to approximate the optimal ML-AMC as and all the model parameters are loaded from the trained
close as possible in the second step. one in coherent case. We give an example using transfer
learning, and its counterpart is the one using two-step
D. Transfer Learning training. As shown in Fig. 4, descent speed of loss curve
is faster than that using two-step training. Besides, the
The fore-mentioned two-step training deals with the prob-
pre-training step can be left out, which further reduce
lem: Given a CNN-AMC with random parameter initialization,
the time cost of the training period.
and how to train it to classify the modulation scheme with
given training and validation data. In fact, one learned CNN- Although transfer learning helps to improve the training effi-
AMC can be applied to one specific task such as the coherent ciency, it is found by two-step learning.
environment with several candidate modulation modes. Once
the scenario or the modulation set is changed, the CNN-AMC
E. Unit Classifier
needs to be retrained with new training data.
In fact, it is not necessary to restart the new training To achieve high classification accuracy with signals at low
process from the very beginning with the aid of transfer SNR, the sequence space N can be up to several or even tens
learning [38]. As the paper states before, the CNN part of of thousands. More training samples are needed to overcome
the CNN-AMC generates higher-level information from the over-fitting as N increases, and thus it makes the training
preprocessed raw observation sequence. The feature extraction much more difficult. The problem is that with limited memory
can be employed universally in varying tasks, and the training space and computation resources, the classification and its
can be much easier with some already learned prior knowledge corresponding training with a large N can be hard or even
impractical. On the other hand, it is also challenging to design Firstly, the conditional correct probabilities Pcc (y|Hk )
a classifier with changeable input space. curves of 7 modulations are illustrated in Fig. 5 (N = 1000).
According to the previous analysis in Subsection II-C, the The ML-AMC are presented with solid lines, and the CNN-
input y of a large N can be divided into segments with unit 1000 ones are plotted with dashed lines. Clearly 2PSK is
length. The CNN-based classifier with the input signal y of the most easily recognized modulation in our set, and its
unit input dimension is defined as a unit classifier. Theoreti- conditional correct probability is 100% even at the region of
cally, the segment has the same classification performance as low SNR. The 4PSK and 8PSK are also found to be more
the complete observation one in coherent environment. More easily identified compared to the other modulation schemes,
importantly, the scales of the unit classifier regarding training, and their Pcc (y|Hk ) is up to 1.00 when SNR = 2 dB and SNR
cache, and calculation are adjustable. Hence, the design of the = 4 dB, respectively. In high SNR field (SNR = 10 dB), all
AMC can be flexible to suit the hardware, and it is irrelevant the modulation schemes can be recognized almost without any
to the sequence space N . The inference can also be made error for ML-AMC.
parallelly for all unit classifiers. In general, there is a little Pcc (y|Hk ) performance loss when
comparing the CNN-1000 to the ML-AMC. At some SNR
IV. S IMULATION R ESULTS regions, for some modulations the Pcc (y|Hk ) are higher than
the ML-AMC when using CNN-1000. For example, in the case
The common sequence space N for AMC varies from where SNR ranging from −6 to −2 dB and the modulation
hundreds to thousands through our investigation, and consider- scheme being 16QAM, the CNN-1000 seems better than ML-
ing the hardware condition, the CNN-AMC with observation AMC. This is because the corresponding conditional probabil-
space N = 500 (named CNN-500) and N = 1000 (named ity is overestimated, at the cost of a lower correct classification
CNN-1000) are tested in this section. Besides, the ML-AMC probability of some other modulation scheme such as 64QAM.
and a feature-based AMC using 9 high-order cumulants [39] The Pc of CNN-1000 can only infinitely approximate the ML-
(named Cu-AMC) are served as the comparisons. The second AMC, and it cannot exceed the optimal one.
order, fourth order and sixth order cumulants are adopted as The curves of the overall correct probabilities for ML-
features, and a two-hidden-layer NN is used as a classifier. AMC, CNN-500, CNN-1000 and Cu-AMC are presented in
Couples of modulation schemes are considered. The overall Fig. 6. There is about 1% to 2% Pc performance degradation
and conditional correct probabilities versus SNR are the two between the CNN-AMC and the ML-AMC when N = 500
main issues needed to be studied. Besides, many other aspects and N = 1000. Moreover, the CNN-500 and CNN-1000
must be considered such as robustness against estimation error, are used as unit classifier proposed in III-E when sequence
inference time cost, and system storage requirement. space N is integer multiples of the unit input length, and
the approximation is almost very near the curves of ML-
A. Coherent AMC when N = 2000 and N = 8000. The simulation
results show that under the precise knowledge of the parameter
The simulations begin with the coherent environments, set, the CNN-based classifiers can closely approximate the
and seven modulation modes namely 2PSK, 4PSK, 8PSK, ML-AMC. On the other hand, its calculating speed is about
16QAM, 16APSK, 32APSK and 64QAM with equal prior 150× to 170× faster than ML enabled approach with given 7
probabilities are considered. The receiver is assumed to have modulation schemes in coherent scenario. Besides, the CNN-
precise knowledge of parameter set uk including the constant based classifiers outperform the Cu-AMC, and the feature-
phase offset θc and SNR, but the modulation scheme needs to based method works worse than CNN-AMCs when Pc > 90%.
be inferred. Apart from the power normalization by (6), the More importantly, the result indicates that the deep learning
observation sequence is compensated with the conjugate phase enabled approach has better performance than the manual
offset in the preprocessing period feature extraction-based method in this case.
y = ye−jθc . (24) 2) Estimation Error: The previous simulations are given
under the assumption of perfect knowledge of two parameters:
Combine (7), (8) and (9), the likelihood function under hy- fixed phase offset and SNR. However, the estimation error
pothesis Hk is explicitly given by is usually inevitable. In this part, simulations are conducted
N −1 Mk with varying degrees of estimation error, and the variable-
|xn | − 2Re{yn∗ xk,i
k,i 2
1 YX n } controlling approach is adopted.
Γ(y|Hk ) ∝ exp .
Mk n=0 i=1 N0 In the first simulation the deviation in carrier phase is inves-
(25) tigated. Because of the priori knowledge that common constel-
Hence, the normalized likelihood vector ξ N can be obtained, lation diagrams are symmetric, the influence of positive and
and tied with observation sequence y, plenty of training and negative deviation on carrier phase is equivalent. We use |∆θc |
validation data can be generated for the CNN-AMC. denote phase error and the |∆θc | ∈ {3◦ , 6◦ , 9◦ , 12◦ , 15◦ }.
1) Correct Probability: The deep learning enabled AMC is The result is shown in Fig. 7. Generally speaking, the Pc
greatly faster than the ML-AMC. While as an approximation to decreases as the |∆θc | rises. When |∆θc | < 6◦ the classifica-
the optimal AMC, it can suffer some performance degradation tion performance degradation is very small. The recognition
in terms of Pcc (y|Hk ) and Pc , which are studied in this has obviously deteriorated when |∆θc | ∈ [6◦ , 9◦ ], and the ML-
simulation. AMC gets even worse when SNR increases at the region of
1 1
0.9 0.9
0.8 0.8
0.7 0.7
2 PSK/ML
2 PSK/CNN c
=3°ML
0.6 0.6
Pcc (y|H k )
4 PSK/ML =3°CNN-1000
c
4 PSK/CNN
Pc
0.5 0.5 c
=6°ML
8 PSK/ML
8 PSK/CNN c
=6°CNN-1000
0.4 16 QAM/ML 0.4
c
=9°ML
16 QAM/CNN
0.3 0.3 c
=9°CNN-1000
16 APSK/ML
16 APSK/CNN c
=12°ML
0.2 32 APSK/ML 0.2 =12°CNN-1000
c
32 APSK/CNN
=15°ML
0.1 64 QAM/ML 0.1 c
64 QAM/CNN c
=15°CNN-1000
0 0
-6 -4 -2 0 2 4 6 8 10 -6 -4 -2 0 2 4 6 8 10
SNR (dB) SNR (dB)
Fig. 5. Pcc (y|Hk ) curves of 7 modulation modes with SNR range being Fig. 7. Pc curves of different estimation error on phase offset |∆θc | in set
[−6, 10] dB. The ML and CNN-based classifiers are depicted with solid and {3◦ , 6◦ , 9◦ , 12◦ , 15◦ }. The ML and the CNN-1000 classifiers are depicted
dotted lines, respectively. with solid and dashed lines, respectively.
1
be seen that the classifiers are sensitive to the SNR error,
0.9 even 0.5 dB estimation bias can bring about the inference
degradation. Besides, an overestimated SNR is more harmful
0.8
N=500 ML than an underestimated SNR for the classifier. We also notice
N=500 CNN-500
0.7 N=500 Cumulant+NN
that curves of CNN-AMC is smoother than the ML-AMC,
N=1000 ML and in many cases, the difference in Pc between the ML-
N=1000 CNN-500
Pc
0.6
N=1000 CNN-1000 AMC and CNN-based classifier can be larger than 10%. At
N=1000 Cumulant+NN the region of SNR = 12±1.5 dB, the Pc of ML-AMC remains
0.5 N=2000 ML
N=2000 CNN-500 1.0, which indicates that the performance saturation can be
0.4
N=2000 CNN-1000 against SNR estimation error to some extent. Meanwhile, the
N=2000 Cumulant+NN
N=8000 ML corresponding CNN-based classifier has the worst performance
N=8000 CNN-500
0.3
N=8000 CNN-1000
in all 5 comparisons. We suppose that this degradation can be
N=8000 Cumulant+NN attributed to inadequate training data. In Tab. I the training
0.2
-6 -4 -2 0 2 4 6 8 10 data is sampled in the range [−6, 12] dB, which means the
SNR (dB) CNN-AMC is not trained including this range.
Fig. 6. Pc curves of different sequence space N versus SNR. The ML-AMC, One feasible approach to make the classifier robust to the
CNN-500 and CNN-1000 are depicted with solid, dotted and dashed lines, SNR error is to train the CNN-AMC with adding noise on
respectively. SNR. This method can make the classifier less sensitive to the
deviation on SNR, but at the expense of overall classification
performance.
[8, 10] dB (|∆θc | = 9◦ ). The Pc remains about 0.86 when SNR
> 6 dB with |∆θc | = 12◦ . The Pc cannot even exceed 0.7
when |∆θc | = 15◦ . As comparison the CNN-based classifier
is about the same as the ML-AMC, and in some cases it can
perform better. In the extreme case with |∆θc | being up to
15◦ , the Pc of CNN-based classifier is higher than ML-AMC, B. Incoherent
which means the NN enabled approach is more robust.
In our second simulation, the influence of SNR estimation The coherent environment is usually not accessible in non-
error is studied. The mistake can lead to an erroneous assump- cooperative case, and the blind phase estimation algorithm is
tion of concentrations of data around the constellation points, required at the receiver. In the previous simulation classifi-
and the centers of these alphabets are also normalized with a cation performance of the CNN-AMC is greatly influenced
coefficient with deviation. when phase estimation error |∆θc | is larger than 6◦ , so the
As shown in Fig. 8, the x-axis is the SNR error ranging final classification accuracy highly depends on the employed
from −5 to 5 dB, and it is defined as ∆SNR = SNR d − SNR, estimation algorithms. In this subsection, the θc is assumed to
and the y-axis is the Pc . Curves of real SNR ∈ {0, 3, 6, 9, 12} be a random variable which is uniformly distributed in range
dB are labeled with different colors and markers. It can [0, 2π). Similarly, the likelihood function can be obtained as
1 1
0.9
0.9
0.8
0.7
0.8
0.6
Pc
Pc
0.5 0.7
SNR=0dB ML ML coherent
SNR=0dB CNN-1000 CNN coherent
0.4
SNR=3dB ML Cumulant+NN coherent
SNR=3dB CNN-1000 0.6
0.3 ML incoherent
SNR=6dB ML CNN incoherent
SNR=6dB CNN-1000 Cumulant+NN incoherent
0.2
SNR=9dB ML 0.5
SNR=9dB CNN-1000
0.1 SNR=12dB ML
SNR=12dB CNN-1000
0 0.4
-5 -4 -3 -2 -1 0 1 2 3 4 5 -6 -4 -2 0 2 4 6
SNR (dB) SNR (dB)
Fig. 8. Pc curves of different SNR in set {0, 3, 6, 9, 12} with estimation Fig. 9. Pc curves of ML-AMC and CNN-500 versus SNR in range [−6, 6]
error ∆SNR range being [−5, 5] dB. The ML-AMC and the CNN-1000 are dB. Classifiers in coherent case are plotted in light blue lines, and depicted
depicted with solid and dashed lines, respectively. with navy blue lines in incoherent scenario.
follows than that of the Cu-AMC in coherent case, but its advantage is
−1 X
Mk
( N

|xk,i 2
enlarged in the incoherent scenario. To achieve the Pc = 90%,
1 n |
Y
Γ(y|Hk ) ∝ Eθc exp − the SNR required for the incoherent classifier is up to 1.6 dB,
Mk n=0 i=1 N0
) which is about 1.0 dB more than the coherent counterpart.
2Re{yn∗ xk,i jθc

n e } Generally, the gap between the performance of the coherent
× exp and incoherent classifiers is gradually narrowed as the SNR
N0
N X
Z 2π Y Mk
" (26) increases. The correct decision probabilities of both classifiers
|xk,i |2

1 are all at the point of 100% when SNR = 4 dB.
= exp −
Mk 0 n=1 i=1 N0 The incoherent CNN-500 is also tested with observation
# space N ∈ {500, 1000, 2000}, and the ML-AMC is treated
2Re{yn∗ xk,i ejθc }

× exp dθc . as the optimal contrast in Fig. 10 with dotted lines. The text
N0 ‘slicing’ indicates that the unit classifier is used to calculate
In fact, the first Bessel function of zero order can be utilized the likelihood function, and the ML-AMC using segment is
to simplify (26), but it is not suggested in numerical calculation depicted with solid lines. Theoretically, the block calculation
due to its complex nature. In replace, the expectation over θc of observation signal is not optimal in the incoherent environ-
can be adequately approximated with the help of composite ment. From the illustration it can be seen there is about 0.2
Simpson’s rule [20] by splitting the integration interval [0, 2π) dB and 0.4 dB Pc loss with the segment number Ns being 2
into subintervals of same lengths. The range can be shorted to and 4. On the other hand, the approximation of unit CNN-500
[0, π), taking the use of constellation points symmetry in the is also very near the ones of ML-AMC using segment with
complex plane. The additional integration over θc makes the different input space. When N ∈ {1000, 2000}, the overall
total amount of calculations is several times of that in coherent correct probabilities of classifiers are up to 100% at the region
environment, and the computation time cost is proportional to of SNR = 2 dB. Although the unit classifier suffers some
the number of subintervals. performance loss, it is feasible as a sub-optimal approach
To reduce the total amount of calculations, only the 2PSK, in incoherent scenario. In addition, new CNN-AMCs with
4PSK, 8PSK and 16QAM are included in the modulation set. specific input space can be trained as an approximation to
Besides, only the CNN-500 is retrained with incoherent data the optimal method.
of observation space N = 500. In the preprocessing period, 2) Conditional Correct Probability: Through initial anal-
the phase compensation step in (24) is not needed. ysis on results of four conditional correct probabilities using
1) Coherent VS Incoherent: Both the overall probabilities ML-AMC and CNN-AMC, we find that the approximations of
of correct classification with ML-AMC, CNN-AMC and Cu- the CNN-based classifiers are quite close to ML-AMC in high
AMC are plotted in Fig. 9. It can be seen that under the SNR region, but they do not match well when SNR < 2dB.
incoherent scenario, the approximation of CNN-AMC is very To further study the conditional correct probability perfor-
close to the optimal one, and it also outperforms the Cu- mance, the confusion matrices of two AMCs are visualized in
AMC. As a comparison, the overall correct probabilities of Fig. 11 at the region of SNR = −6 dB. Each row represents
three classifiers in coherent scenario are illustrated with light the instances in a predicted class of AMC, and each column
blue lines. The Pc performance of CNN-AMC is a little better denotes the instances in a target modulation class. Comparing
Besides, the PDF of amplitude a is assumed to follow the

1
Rayleigh distribution which is given by
2
2a a
0.9 p(a) = 2 exp − 2 , a≥0 (27)
σa σa
0.8 where σa2 = E[a2 ]. Except for the preprocessed observation
sequence y, the knowledge of symbol SNR is still required
ML 500
for the CNN-AMC. The likelihood function is represented as
Pc
0.7 CNN 500

ML 1000 slicing
N Mk
(
ML 1000
|axk,i |2

CNN 1000 slicing
1 YX
0.6
ML 2000 slicing
Γ(y|Hk , a) ∝ Eθc exp −
ML 2000
Mk n=1 i=1 N0
CNN 2000 slicing
)
2Re{yn∗ axk,i ejθc }

0.5
× exp
N0
0.4 Mk
Z πY N X
"
|axk,i |2

-6 -4 -2 0 2 4 6 1
SNR (dB) = exp −
Mk 0 n=1 i=1 N0
#
Fig. 10. Pc curves of ML-AMC and CNN-500 versus SNR with varying 2Re{yn∗ axk,i ejθc }

observation sequence space N . The ML-AMCs are plotted with dotted lines, × exp dθc .
and illustrated with solid lines when signal segment is used. The CNN-AMCs N0
are depicted with dashed lines, and the unit classifiers are employed.
(28)
The single input and multiple output (SIMO) case is con-
2PSK
2PSK
9797 48 170 187 9633 244 232 235

94.9% 93.1%
24.5% 0.1% 0.4% 0.5% 24.1% 0.6% 0.6% 0.6% sidered in the simulation, where there is one transmitter and
one receiver with Na antennas. The channels are assumed to
4PSK
4PSK
175 2324 2108 1929 183 3049 2964 2672

Predict Class
36.3% 34.4%
Predict Class
0.4% 5.8% 5.3% 4.8% 0.5% 7.6% 7.4% 6.7%

be independent to each other. The likelihood function with
multiple observation is given as
8PSK
8PSK
70 3385 3565 3094 29 2208 2258 1890

35.3% 35.4%
0.2% 8.5% 8.9% 7.7% 0.1% 5.5% 5.6% 4.7%
Na
Precision 16QAM
Precision 16QAM
85 4116 4157 4790 155 4499 4546 5203 Y

0.2% 10.3% 10.4% 12.0%
36.4%
0.4% 11.2% 11.4% 13.0%
36.1% Γ {y1 , · · · , yNa }|Hk , {a1 , · · · , aNa } = Γ(yu |Hk , au ),
u=1
98.0% 23.2% 35.7% 47.9% 51.2% 96.3% 30.5% 22.6% 52.0% 50.4% (29)
2PSK 4PSK 8PSK 16QAM Recall 2 PSK 4 PSK 8 PSK 16 QAM Recall which ignores the fact that the transmitted symbol sequence
Target Class Target Class is the same to all antennas. Formally, the process of multiple
ML-AMC CNN-AMC
observation of one source is similar to the signal segment in
(16) for CNN-AMC.
Fig. 11. The visualized confusion matrices of ML-AMC on the left side and The receiver using 1, 2, 4 antennas is investigated in our
CNN-AMC on the right side at the region of SNR = −6 dB. Each target simulation, and thus the overall classification performance is
class is tested with 10000 samples, and the exact number and its proportion
are given in blue and yellow blocks. In the gray column block, the gray row tested versus average symbol SNR, which is denoted by SNR.
block and the white block, the values of recall, precision and accuracy are As shown in Fig. 12, the curve of CNN-500 is illustrated and
given, respectively. the ML-AMC is treated as its counterpart. It can be seen that
in flat Raleigh fading channel, the receiver with 1 antenna
the recall and precision of two AMCs, the main difference reaches the overall correct probability Pc = 90% when SNR
is on 4PSK and 8PSK. Both the 4PSK and 8PSK are phase = 4 dB. In contrast, the SNR required for the receiver with
shift keying, and the distance between adjacent points is 2 and 4 antennas are about −1 dB and −3 dB, respectively.
small in the complex plane. Therefore, these two kinds of When SNR = 2 dB, there is almost no classification error in
modulation schemes are easily mistakenly classified for each the case with 4 antennas. Combining with the performance in
other, especially in low SNR field. The loss function is set Fig. 6 and Fig. 9, the performance of CNN-AMC can generally
to maximize the overall correct probability, and even though achieve a little closer approximation to the optimal one, in
the Pc is close approximated in Fig. 9, the approximation of comparison with Cu-AMC.
conditional correct probabilities cannot be guaranteed in this
case. D. Feature-based methods
Except for the aforementioned NN-based method, some
C. Flat Fading Channel other classifiers are considered in this subsection, including
The non-frequency selective slow-fading channel is studied decision tree classifier (DTC), SVM with polynomial and
in this part. Therefore, the amplitude a and carrier phase θc radial basis function (RBF) kernels, i.e., SVM-POLY and
are assumed be constants over the whole observation period. SVM-RBF, respectively. The dataset (N = 500) in IV-A is
The PDF of carrier phase is the same as the one in IV-B1. served as the baseline test in the coherent environment. As
1 1
0.9
0.9
0.8
0.7
0.8
0.6
ML 1 antenna
Pc
Pc
0.5 0.7
CNN 1 antenna
Cumulant+NN 1 antenna
0.4
ML 2 antennas
CNN 2 antennas 0.6
0.3 ML
Cumulant+NN 2 antennas
ML 4 antennas Cumulant+SVM-RBF
0.2 CNN 4 antennas Cumulant+SVM-POLY
0.5
Cumulant+NN 4 antennas Cumulant+DTC
0.1 Cumulant+NN
Proposed CNN
0 0.4
-18 -14 -10 -6 -2 2 6 10 14 18 -6 -4 -2 0 2 4 6
SNR (dB) SNR (dB)
Fig. 12. Plots of Pc versus SNR in range being [−18, 18] dB. The observation Fig. 14. Baseline test for various AMCs in the incoherent environment (N =
space N = 500 for each antenna, and the curves of 1, 2, 4 antennas are 500). Plots of Pc versus SNR.
illustrated with light blue, navy blue and green lines, respectively.
TABLE III
AVERAGE I NFERENCE T IME C OST OF C LASSIFIERS IN C OHERENT
E NVIRONMENT.
1 N 500 1000 2000 8000
ML-AMC/ms 10.8 21.3 42.6 164.5
0.9 CNN-500/ms 0.0727 0.146 0.288 1.15
CNN-1000/ms - 0.127 0.246 1.02
0.8
0.7 E. Complexity & Implement

In this subsection, we give a discussion on the complexity
Pc
0.6
analysis and implement of ML-AMC and CNN-AMC. The
0.5
ML ML-AMC requires no memory, and the scales of the proposed
Cumulant+SVM-RBF
Cumulant+SVM-POLY
CNN-500 and CNN-1000 are small, and they occupy only 3.4
0.4 Cumulant+DTC MB and 6.3 MB capacity with N = 500 and N = 1000,
Cumulant+NN
Proposed CNN
respectively. The simulation platform is presented as: cpu
0.3 intel i7-6700k and gpu nvidia GTX-960. The ML-AMC is
-6 -4 -2 0 2 4 6 8 10
implemented with Matlab and mainly the cpu is utilized. While
SNR (dB) the CNN-AMC runs on the Python platform with the gpu.
1) Coherent: Tab. III lists the time cost per inference with
Fig. 13. Baseline test for various AMCs in the coherent environment (N = different input space of three classifiers in IV-A. The unit
500). Plots of Pc versus SNR.
time cost is defined as time cost per inference divided by
N , and they are 20.8µs, 0.144µs and 0.127µs for ML-AMC,
CNN-500 and CNN-1000, respectively. The calculating speed
is accelerated by more than two orders of magnitude with
shown in Fig. 13, the overall Pc of CNN-AMC and SVM-
NN-based approach in this specific case. More concretely, the
RBF is very similar, and the NN-based AMC suffers a little
CNN-500 and CNN-1000 are about 150 and 170 times faster
Pc performance loss compared to these two. the DTC classifier
than ML-AMC, respectively.
has the lowest classification correct rate.
2) Incoherent: In incoherent cases, the additional integra-
Then these AMCs are tested with the incoherent baseline tion on carrier phase is required for ML-AMC. Theoretically,
dataset (N = 500) in IV-B. Combining with Fig. 13, the the unit time cost is about Nθc times of that in coherent
correct classification performance of DTC classifier is still the scenario, where Nθc is the number of subintervals. After
worst, and the SVM-based methods are not stable in Fig. 14. simulation tests in IV-B1, the unit time costs listed in Tab. IV
Among the feature-based methods, only the NN-based AMC are 6.01µs and 268µs in coherent and incoherent scenarios,
can achieve the closest Pc performance to the CNN-AMC in respectively. The latter one is about 44.7 times of the coherent
both coherent and incoherent environments, and that is why one, which is approximately equal to the adopted variable
it is served as the Cu-AMC in previous simulations. Besides, value Nθc = 45.
the Pc performance of CNN-AMC is a little better than that On the other hand, the unit time cost of CNN enabled
of NN-based AMC. approach is only related to the architecture of the model.
TABLE IV CNN-AMC is proposed as an approximation to the ML-

AVERAGE T IME C OST OF C LASSIFICATION OF F OUR M ODULATION AMC. The CNN-AMC automatically extracts features from
S CHEMES .
the long symbol-rate sequence, and the SNR is also required as
N 500 1000 2000 input. The corresponding two-step training process and trans-
Coherent ML-AMC/ms 3.08 6.07 11.8 fer learning are introduced to solve the challenging training
CNN-500/ms 0.0775 0.155 0.309
N 500 1000 2000 task and improve retraining efficiency, respectively. Besides,
Incoherent ML-AMC/ms 138 271 528 a unit classifier is also proposed to suit varying observation
CNN-500/ms 0.0775 0.155 0.309 sequence dimensions flexibly. The simulation results show that
the CNN-based classifier can outperform the feature-based
TABLE V methods, and obtain the closest approximation to the optimal
C OMPLEXITY OF ML-AMC AND CNN-AMC U NDER T HREE S CENARIOS . ML-AMC. Besides, CNN-AMCs have the certain robustness
ML-AMC CNN-AMC to estimation error on carrier phase offset and SNR. By taking
kml O(N C advantage of parallel computation, the deep learning-based
P
Coherent k= Mk ) kcnn O(N )
kml O(N Nθc C approach is about 40 to 1700 times faster than the ML-
P
Incoherent k= Mk ) kcnn O(N )
kml O(N Nθc Na C
P
Flat fading channel k= Mk ) kcnn O(N Na ) AMC regarding average classification speed, depending on
the given scenarios. In our future work, we will focus on
the robustness of CNN-based classifiers to varying deviations,
Except for the top layer, the structure of the CNN-500 for and applications in complicated non-cooperative scenarios will
seven and four modulation schemes are the same, so the also be investigated.
average time cost of CNN-500 in Tab. IV is very similar to the
corresponding ones in Tab. III. Besides, the average time cost
of CNN-based approach is irrelevant to the communication VI. ACKNOWLEDGMENTS
environment, and its unit time cost is 0.155µs. Therefore, in This work was supported in part by the National Natural
this simulation case the CNN-500 is about 40 and 1700 times Science Foundation of China (Grant No. 61801112, 61271204,
faster than ML-AMC in coherent and incoherent environment, 61471117 and 61601281), the Natural Science Foundation of
respectively. Jiangsu Province (Grant No.SBK2018042259), and the open
Both the amount of calculations grows linearly as observa- program of the State Key Laboratory of Millimeter Waves in
tion space N increases for ML-AMC and CNN-AMC. The Southeast University (Grant No. Z201804).
computational complexity is mainly determined by network
architecture for our proposed CNN-AMC, and the modulation
schemes and channel environments has negligible influence R EFERENCES
on the total computation. In contrast, the ML-AMC is greatly [1] E. E. Azzouz and A. K. Nandi, “Automatic modulation recognition of
affected by these issues. The computational complexities of communication signals,” IEEE Transactions on Communications, vol.
two AMCs in our three simulated environments are listed in 334, no. 4, pp. 431–436, 1998.
[2] O. A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su, “Survey of auto-
Tab. V. The notations kml and kcnn are the coefficients of matic modulation classification techniques: classical approaches and new
corresponding AMCs. According to Tab. V, intuitively we can trends,” Communications Iet, vol. 1, no. 2, pp. 137–156, 2007.
see that the computational complexity of CNN-AMC is less [3] C. M. Bishop, Pattern Recognition and Machine Learning. Springer-
than ML-AMC. In fact, the coefficient of CNN-AMC kcnn is Verlag New York, Inc., 2006.
[4] M. Bkassiny, Y. Li, and S. K. Jayaweera, “A survey on machine-
much larger than that of ML-AMC, whose total amount of cal- learning techniques in cognitive radios,” IEEE Communications Surveys
culations can be smaller in the coherent scenario. However, the & Tutorials, vol. 15, no. 3, pp. 1136–1159, 2013.
time cost per inference of CNN-AMC can be sharply reduced [5] A. Hazza, M. Shoaib, S. A. Alshebeili, and A. Fahad, “An overview
of feature-based methods for digital modulation classification,” in Inter-
in implement, attributing to the parallel processing. The output national Conference on Communications, Signal Processing, and Their
of different kernels or neurons of a layer can be achieved Applications, 2013, pp. 1–6.
parallelly, and also the classifications of different received [6] S. U. Pawar and J. F. Doherty, “Modulation recognition in continuous
phase modulation using approximate entropy,” IEEE Transactions on
sequences can be processed simultaneously. On the other hand, Information Forensics & Security, vol. 6, no. 3, pp. 843–852, 2011.
the calculation complexity of ML-AMC can be easily affected [7] Y. C. Deng and Z. G. Wang, “Modulation recognition of mapsk signals
by the channel environments and also the modulation set. In using template matching,” Electronics Letters, vol. 50, no. 25, pp. 1986–
1988, 2014.
contrast, the time cost per inference of proposed CNN-AMC is [8] L. Xie and Q. Wan, “Cyclic feature based modulation recognition using
unchanged, working as a deep learning model with powerful compressive sensing,” IEEE Wireless Communications Letters, vol. PP,
and redundant approximation ability. This indicates that the no. 99, pp. 1–1, 2017.
[9] A. Swami and B. M. Sadler, “Hierarchical digital modulation classifica-
CNN-AMC can outperform the ML-AMC especially in more tion using cumulants,” IEEE Transactions on Communications, vol. 48,
complex environments. no. 3, pp. 416–429, 2000.
[10] H. C. Wu, M. Saquib, and Z. Yun, “Novel automatic modulation
classification using cumulant features for communications via multipath
V. C ONCLUSIONS channels,” IEEE Transactions on Wireless Communications, vol. 7, no. 8,
pp. 3098–3105, 2008.
Automatic modulation classification enabled by deep learn- [11] S. Aaron, E. Michael, and D. Joseph, “Modulation classification of satel-
lite communication signals using cumulants and neural networks,” in
ing approach has been proposed in this paper. As a parallel 2017 Cognitive Communications for Aerospace Applications Workshop
computing model, a deep learning based classifier named (CCAA), 2017, pp. 1–8.
[12] C. S. Park, J. H. Choi, S. P. Nah, W. Jang, and D. Y. Kim, “Auto- [37] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
matic modulation recognition of digital signals using wavelet features Computer Science, 2014.
and svm,” in International Conference on Advanced Communication [38] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transac-
Technology, 2008, pp. 387–390. tions on Knowledge & Data Engineering, vol. 22, no. 10, pp. 1345–1359,
[13] D. Boutte and B. Santhanam, “A hybrid ica-svm approach to continuous 2010.
phase modulation recognition,” IEEE Signal Processing Letters, vol. 16, [39] M. W. Aslam, Z. Zhu, and A. K. Nandi, “Automatic modulation
no. 5, pp. 402–405, 2009. classification using combination of genetic programming and knn,” IEEE
[14] Z. Zhang, Y. Li, X. Zhu, and Y. Lin, “A method for modulation Transactions on Wireless Communications, vol. 11, no. 8, pp. 2742–
recognition based on entropy features and random forest,” in IEEE 2750, 2012.
International Conference on Software Quality, Reliability and Security
Companion, 2017.
[15] M. L. D. Wong and A. K. Nandi, “Automatic digital modulation
recognition using artificial neural network and genetic algorithm,” Signal
Processing, vol. 84, no. 2, pp. 351–365, 2004.
[16] A. K. Nandi and E. E. Azzouz, “Modulation recognition using artificial
neural networks,” Signal Processing, vol. 56, no. 2, pp. 165–175, 1996.
[17] D. Wang, M. Zhang, Z. Cai, Y. Cui, Z. Li, H. Han, M. Fu, and B. Luo,
“Combatting nonlinear phase noise in coherent optical systems with Fan Meng was born in Jiangsu, China in 1992. He
an optimized decision processor based on machine learning,” Optics received the B.S. degree in 2015, from the school of
Communications, vol. 369, pp. 199–208, 2016. electronic engineering in the University of Electronic
[18] X. Zhang, J. Chen, and Z. Sun, “Modulation recognition of communi- Science and Technology of China, UESTC, and
cation signals based on schks-ssvm,” Journal of Systems Engineering & is currently working toward the Ph.D. degree in
Electronics, vol. 28, no. 4, pp. 627–633, 2017. the school of Information Science and Engineer-
[19] J. L. Xu, “Likelihood-ratio approaches to automatic modulation clas- ing, Southeast University, China. His main research
sification,” IEEE Transactions on Systems Man & Cybernetics Part C, topic is applying machine learning techniques in the
vol. 41, no. 4, pp. 455–469, 2011. wireless communication systems. Further research
[20] J. L. Xu, W. Su, and M. C. Zhou, “Software-defined radio equipped interests include machine learning in general, joint
with rapid modulation recognition,” IEEE Transactions on Vehicular demodulation and equalization, channel coding, re-
Technology, vol. 59, no. 4, pp. 1659–1667, 2010. source allocation, end-to-end communication system design with machine
learning tools.
[21] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
no. 7553, pp. 436–444, 2015.
[22] Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-
efficient reconfigurable accelerator for deep convolutional neural net-
works,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–
138, 2017.
[23] Y. He, F. R. Yu, N. Zhao, V. C. M. Leung, and H. Yin, “Software-defined
networks with mobile edge computing and caching for smart cities: A
big data deep reinforcement learning approach,” IEEE Communications
Magazine, vol. 55, no. 12, pp. 31–37, DECEMBER 2017. Peng Chen (S’15-M’17) was born in Jiangsu, China
[24] G. Gui, H. Huang, Y. Song, and H. Sari, “Deep learning for an in 1989. He received the B.E. degree in 2011 and
effective non-orthogonal multiple access scheme,” IEEE Transactions the Ph.D. degree in 2017, both from the School
on Vehicular Technology, pp. 1–1, 2018. of Information Science and Engineering, Southeast
[25] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Machine University, China. From Mar. 2015 to Apr. 2016, he
learning for wireless networks with artificial intelligence: A tutorial was a Visiting Scholar in the Electrical Engineering
on neural networks,” CoRR, vol. abs/1710.02913, 2017. [Online]. Department, Columbia University, New York, NY,
Available: http://arxiv.org/abs/1710.02913 USA.
[26] D. Wang, M. Zhang, Z. Li, J. Li, M. Fu, Y. Cui, and X. Chen, He is now an associate professor at the State
“Modulation format recognition and osnr estimation using cnn-based Key Laboratory of Millimeter Waves, Southeast Uni-
deep learning,” IEEE Photonics Technology Letters, vol. PP, no. 99, pp. versity. His research interests include radar signal
1–1, 2017. processing and millimeter wave communication.
[27] Y. Lécun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278–2324, 1998.
[28] T. J. O’Shea, J. Corgan, and T. C. Clancy, “Convolutional radio
modulation recognition networks,” pp. 213–226, 2016.
[29] T. J. O’Shea and J. Hoydis, “An introduction to deep learning for the
physical layer,” pp. 1–1, 2017.
[30] S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin, “Deep
learning models for wireless signal classification with distributed low-
Lenan Wu received his M.S. degree in the Electron-
cost spectrum sensors,” IEEE Transactions on Cognitive Communica-
ics Communication System from Nanjing University
tions and Networking, pp. 1–1, 2018.
of Aeronautics and Astronautics, China, in 1987,
[31] A. Ali, Y. Fan, and L. Shu, “Automatic modulation classification of
and the Ph.D. degree in Signal and Information
digital modulation signals with stacked autoencoders,” Digital Signal
Processing from Southeast University, China, in
Processing, vol. 71, 2017.
1997.
[32] P. E. Keller, Artificial Neural Networks: An Introduction, 2005.
Since 1997, he has been with Southeast Univer-
[33] V. Koltchinskii, “Neural network learning: Theoretical foundations,” sity, where he is a Professor, and the Director of
Annales De L Institut Henri Poincaré Probabilités Et Statistiques, Multimedia Technical Research Institute. He is the
vol. 39, no. 6, pp. 943–978, 2003. author or coauthor of over 400 technical papers
[34] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, and 11 textbooks, and is the holder of 20 Chinese
2016, http://www.deeplearningbook.org. patents and 1 International patent. His research interests include multimedia
[35] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural information systems and communication signal Processing.
networks,” in International Conference on Artificial Intelligence and
Statistics, 2011, pp. 315–323.
[36] P. Koniusz, F. Yan, and K. Mikolajczyk, Comparison of mid-level feature
coding approaches and pooling strategies in visual concept detection.
Elsevier Science Inc., 2013.
Xianbin Wang (S’98-M’99-SM’06-F’17) is a Pro-

fessor and Tier-I Canada Research Chair at Western
University, Canada. He received his Ph.D. degree in
electrical and computer engineering from National
University of Singapore in 2001.
Prior to joining Western, he was with Commu-
nications Research Centre Canada (CRC) as a Re-
search Scientist/Senior Research Scientist between
July 2002 and Dec. 2007. From Jan. 2001 to July
2002, he was a system designer at STMicroelectron-
ics, where he was responsible for the system design
of DSL and Gigabit Ethernet chipsets. His current research interests include
5G technologies, Internet-of-Things, communications security, machine learn-
ing and locationing technologies. Dr. Wang has over 300 peer-reviewed journal
and conference papers, in addition to 26 granted and pending patents and
several standard contributions.
Dr. Wang is a Fellow of Canadian Academy of Engineering, a Fellow of
IEEE and an IEEE Distinguished Lecturer. He has received many awards and
recognitions, including Canada Research Chair, CRC President’s Excellence
Award, Canadian Federal Government Public Service Award, Ontario Early
Researcher Award and five IEEE Best Paper Awards. He currently serves
as an Editor/Associate Editor for IEEE Transactions on Communications,
IEEE Transactions on Broadcasting, and IEEE Transactions on Vehicular
Technology and He was also an Associate Editor for IEEE Transactions
on Wireless Communications between 2007 and 2011, and IEEE Wireless
Communications Letters between 2011 and 2016. Dr. Wang was involved in
many IEEE conferences including GLOBECOM, ICC, VTC, PIMRC, WCNC
and CWIT, in different roles such as symposium chair, tutorial instructor, track
chair, session chair and TPC co-chair.

Meng 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Meng 2018

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

Automatic Modulation Classification: A Deep

density function of an observation sequence can be obtained C. Signal Segment

B. Network Architecture where xj is j-th pre-activation output. Finally, the representa-

266 Merge Concat

0.9 1.26 0.9 1.08

0.8 1.12 0.8 0.96

0.7 0.98 0.7 0.84

0.4 0.56 0.4 minimum(transfer) 0.48

0.2 Train accuracy 0.28 0.2 Two-step/ validation accuracy 0.24

TABLE II or experiences instead of purely blank. Here, we take two cases

Besides, the PDF of amplitude a is assumed to follow the

0.7 CNN 500

9797 48 170 187 9633 244 232 235

175 2324 2108 1929 183 3049 2964 2672

0.4% 5.8% 5.3% 4.8% 0.5% 7.6% 7.4% 6.7%

70 3385 3565 3094 29 2208 2258 1890

85 4116 4157 4790 155 4499 4546 5203 Y

0.7 E. Complexity & Implement

TABLE IV CNN-AMC is proposed as an approximation to the ML-

Xianbin Wang (S’98-M’99-SM’06-F’17) is a Pro-

You might also like