You are on page 1of 11

Digital Signal Processing 117 (2021) 103188

Contents lists available at ScienceDirect

Digital Signal Processing


www.elsevier.com/locate/dsp

WLAN interference signal recognition using an improved quadruple


generative adversarial network ✩
Xiaodong Xu ∗ , Ting Jiang, Jialiang Gong, Haifeng Xu, Xiaowei Qin
Key Laboratory of Wireless-Optical Communications, Chinese Academy of Sciences, University of Science and Technology of China, Heifei, Anhui, 230026, China

a r t i c l e i n f o a b s t r a c t

Article history: Cross-technology interference sources poses a great challenge for improving throughputs of wireless local
Available online 30 July 2021 area networks (WLAN) and wireless interference signal recognition (WISR) can provide a precondition
for mitigating this problem. The quadruple generative adversarial network (QGAN) has shown its
Keywords:
prevailing performance for specific emitter identification (SEI). In this paper, an enhanced collaborative
Wireless interference signal recognition
Quadruple generative adversarial network
learning mechanism is proposed to improve QGAN’s performance for WISR. ACGAN is involved in
Semi-supervised learning the Improved-QGAN architecture to substitute original GAN, and loss functions are further optimized
Knowledge distillation for generative, representation and classification sub-networks. Besides, a lightweight model based on
knowledge distillation (KD) is presented to reduce memory consumption and computational complexity
at inference phase. Numerical results indicate that the proposed Improved-QGAN outperforms the other
baseline algorithms both on the experimental dataset and benchmark dataset.
© 2021 Elsevier Inc. All rights reserved.

1. Introduction In general, WISR can be formulated as a classification task. Tra-


ditional procedure of WISR includes two stages: extracting features
Wireless communication networks are developing at a high from a given signal and applying classifiers to realize individual
speed nowadays. The number and diversity of wireless devices recognition [3]. Conventional approaches usually perform manual
are increasing quickly as well as spectrum demand. Moreover, 6G extraction of certain features, such as higher order statistics and
networks have tended to be extremely heterogeneous, densely de- cyclo-stationary moments [4,5], to get satisfactory performance.
ployed, and dynamic [1]. In this context, the unlicensed bands However, handcrafted features heavily rely on the expert’s knowl-
are over utilized, and cross-technology interference is serious be- edge, and may perform well on some specialized cases but poor in
tween different wireless systems [2]. Taking wireless local area generality [3].
networks (WLAN) into consideration, ZigBee, Bluetooth, microwave To solve aforementioned problem, deep learning (DL) based ap-
oven, which operate at the same frequency band as Wi-Fi, can pro- proaches have been introduced to WISR and demonstrated excel-
duce non-ignorable interference to Wi-Fi devices. Unsurprisingly, lent performance on accuracy without handcrafted features. The
so many heterogeneous interference sources reduce the wireless data driven approach based on successive layers with nonlinear-
access ability and data transmission efficiency of Wi-Fi network. To ity can learn underlying characteristics of communication signals
improve the performance of Wi-Fi networks, wireless interference in a supervised manner. Compared with traditional approaches
signal recognition (WISR) is an important process which can iden- based on handcrafted features and conventional classifiers, CNNs
tify the other techniques accessing when monitoring the occupied trained on time domain IQ data achieves significantly performance
spectrum. After that, some specific strategies followed like interfer- improvement in terms of classification accuracy. In [2], a CNN clas-
ence mitigation can be performed to make spectral resources use sifier trained on IQ vectors and amplitude/phase vectors can rec-
effectively and Wi-Fi networks run efficiently. ognize ZigBee, Wi-Fi, and Bluetooth signals successfully.
However, there are always not enough labeled data in most re-
alistic environment and semi-supervised methods are preferred in

This work was supported in part by the National Key Research and Development these cases. How to extract meaningful information from both la-
Program of China (2018YFA0701603) and the Natural Science Foundation of Anhui beled and unlabeled data is an important and valuable research
Province (2008085MF213).
theme in semi-supervised classification. In our previous work, mo-
* Corresponding author.
tivated by Triple-GAN [6], we proposed a quadruple generative
E-mail addresses: xdxu@ustc.edu.cn (X. Xu), tingj@mail.ustc.edu.cn (T. Jiang),
jl2012@mail.ustc.edu.cn (J. Gong), haifengx@mail.ustc.edu.cn (H. Xu), adversarial network (QGAN) [7] to cope with specific emitter iden-
qinxw@ustc.edu.cn (X. Qin). tification (SEI) problem in semi-supervised scenarios. We embed

https://doi.org/10.1016/j.dsp.2021.103188
1051-2004/© 2021 Elsevier Inc. All rights reserved.
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

the domain knowledge and added a representation sub-network  1 In 2 semi-supervised


learning regime, M labeled data Xl =
as a latent feature extractor into Triple-GAN. The numerical results xl , xl , · · · , xlM with their corresponding wireless technology
have showed QGAN’s ability to effectively identify specific emitters. type labels Yl = [ yl1 , yl2 , · · · , ylM ] and N unlabeled data Xu =
In this work, we continue to investigate the collaborative learn-
 1 2
xu , xu , · · · , xuN are collected. Semi-supervised learning framework
ing mechanism of QGAN, and propose an Improved-QGAN along for WISR is aiming at making full use of valuable information from
with its applications on WISR. Resorting to substituting the orig- both labeled data pair (Xl , Yl ) and unlabeled data Xu to success-
inal GAN by ACGAN, we enhance information interaction be- fully identify which class every sample belongs to.
tween sub-networks and further optimize the collaborative learn-
ing mechanism in the involved ones. Specifically, the overall loss 2.2. Related works
functions are modified for representation model (R), classification
model (C) and generative model (G), respectively. Besides, to re- Deep generative models have been recently applied for semi-
duce storage and computation overhead at inference stage, we
supervised learning, where GANs offer promising performance for
build a lightweight model of Improved-QGAN by jointly compress-
various tasks, such as computer vision [8], natural language pro-
ing sub-networks based on knowledge distillation with acceptable
cessing [9], medicine [10] and so on. GAN was first proposed by
performance loss. In order to verify the generalization ability of
Ian J. Goodfellow [11] and has quickly become a research hotspot.
Improved-QGAN, we perform experiments not only on our dataset
of WISR, but also on the benchmark dataset of automatic modula- GAN mainly includes two parts, a generator (G) and a discrimina-
tion classification (AMC). Numerical experiments verify the effec- tor (D), and form a minimax two-player game. The two parts are
tiveness and superiority of the proposed architecture. trained via the adversarial idea: D tries to distinguish whether the
The rest of this paper is organized as follows. Section 2 presents input is the real data or the fake data generated by G, while G
the background of WISR and related work. Section 3 provides the generates data distribution p G (x) approaching real data distribu-
modified loss functions and implementation details of the pro- tion pdata (x) as closely as possible. Ideally, G and D converge and
posed Improved-QGAN. In Section 4, the proposed model is applied reach a Nash equilibrium, and the generated data distribution ex-
to two tasks and experimental results are discussed. Section 5 con- actly matches the real data distribution. The loss function of a GAN
cludes the paper. can be formalized as follows:

2. Problem statement and related works


min max V ( D , G ) = Ex∼ pdata (x) [log D (x)]
G D
(4)
+ Ez∼ p z (z) [log (1 − D (G ( z )))] ,
2.1. Background of WISR
where z is the latent input vector following a certain random dis-
Without loss of generality, a wireless communication model can tribution p z ( z ).
be summarized as follows at the system level. Binary bit stream is Despite the great success of GAN, there are still obstacles to
transformed into an analog continuous time signal by coding, mod- training stability. Due to the deficiency of the original GAN, various
ulation and digital to analog converter module. Let s(t ) be denoted variations of GAN have emerged, including architecture optimiza-
as the equivalent baseband signal at the transmitter side. tion based GANs and objective function optimization based GANs
The wireless channel strength may vary over frequency and [12]. DCGAN [13] introduces the deconvolution layer into the gen-
time, and can affect the signal over air. Besides, hardware imper- erator and achieved great performance. CGAN [14] can generate
fections of the transmitter and receiver can also affect the signal. corresponding images given labels by introducing the conditional
Thus, the received signal is not exactly the same as the transmitted variable c in both generator and discriminator. WGAN [15] mod-
signal. Let r (t ) be the equivalent baseband signal at the receiver ified the objective function by replacing Jensen-Shannon (JS) di-
side, then it can be formulated as: vergence with Earth-Mover (EM) distance to stabilize the training
process.
r (t ) = s (t ) ∗ h (t ) + n (t ) , (1) In recent years, GANs have also been applied to many commu-
nication applications. O’Shea [16] learned a channel response ap-
where h(t ) stands for the equivalent channel impulse response be- proximation and information encoding by an adversarial approach,
tween the transmitter and the receiver, n(t ) is an additive white which is appropriate for a wide range of channel environments.
Gaussian noise with zero mean and variance and variance σ 2 , GAN was designed and implemented in [17] to identify rogue RF
while ∗ denotes the convolution operation. transmitters and classify trusted ones. Zhu [18] proposed a novel
The received signal r (t ) is further sampled at a predefined rate tensor-GAN for real-time indoor localization and demonstrated an
and generates discrete version r (n) consisting of two components, improved performance in localization accuracy, response time and
i.e., the in-phase component r I and quadrature component r Q : implementation complexity.
In our previous work [7], inspired by Triple-GAN, we embed the
r (n) = r I (n) + jr Q (n) (2) communication domain knowledge and add a representation sub-
network to construct a quadruple-structured framework (QGAN)
WISR using DL based approaches can be considered as a DL
for SEI. Different from the original QGAN, we will verify the effec-
classification problem. Assuming that only one of K wireless tech-
tiveness of QGAN with an enhanced collaborative learning mecha-
nologies is active at a time, WISR then turns into a K -class classifi- nism for WISR and modify the overall loss function of the frame-
cation problem. To perform WISR, r (n) must be translated into fea- work in this paper. In addition, we shall design a lightweight
ture vector suitable for training. If we collect some data samples of model of Improved-QGAN for inference stage based on knowledge
data point including two sets of real-valued data vectors, i.e. xiI = distillation.
[r iI (0) , r iI (1) , · · · r iI ( L − 1)] T , xiQ = [r iQ (0) , r iQ (1) , · · · , r iQ (( L − 1)),
where stands for the i-th sample. Then IQ vector as the input of 3. Proposed method
neural networks can be defined as
  iT  Triple-GAN [6] is a successful framework in semi-supervised
i
x =
x I  ∈ R2 × L (3) condition, which can archive a good accuracy for image classifi-
i T
xQ cation by incorporating a classifier as an additional player into

2
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

the original minimax two-player game. Inspired by the method,


QGAN was proposed for SEI, which combines Triple-GAN with do-
main knowledge of wireless communications and can learn the
distribution of features which related to the wireless propagation
information of real data. WISR aims to identify different wireless
technology types and can be regard as a simplified application of
SEI. In order to reduce the algorithm complexity, domain knowl-
edge is not embedded in both original QGAN and Improve-QGAN
in our proposed method in order to compare their recognition per-
formance under the same condition.
In this section, we propose an Improved-QGAN model to im-
prove semi-supervised classification performance of WISR. In addi-
tion, we figure out a lightweight model of Improved-QGAN based
on knowledge distillation followed to reduce computing and stor-
age overhead at inference stage.

3.1. Improved QGAN

Similar as original QGAN, our proposed Improved-QGAN can be


shown as Fig. 1. The training stage includes four parts of sub-
networks, including representation model (R), generative model
(G), classification model (C) and discriminative model (D), and
inference stage consists of only two parts, R and C. The overall
architecture can be described as follows. The samples xl and xu
are sampled from the labeled part and the unlabeled part of the
real dataset respectively. The given samples go through R and we
obtain the features f l and f u for corresponding samples. Then
C gives a pseudo-label ỹ u for unlabeled  feature f u and gener-
ates a pseudo data-label pair f u , ỹ u . A real label y g ∈ Y l and
a latent vector z g ∈ p z ( z ) is sampled, then y g and z g are concate-
Fig. 1. The framework for training lightweight Improved-QGAN.
nated as the input
 of G. The generator G produces another pseudo
data-label pair f g , y g . Finally, the data-label pairs are feed into      
D, which discriminates the real data-label pair as real, while the
recon (θ R ) = E M S E xl , x̃l + E M S E xu , x̃u . (5)
xl ∼ pl xu ∼ p u
other two pseudo data-label pairs as fake. While R, C and G aim
to make the two pseudo data-label pairs match the distribution of At the same time, we can get the encoder outputs of R for
the real data-label pair to fool D. labeled samples xl and unlabeled samples xu , i.e., the extracted
In Fig. 1, the solid line represents the data flow between sub- features f l and f u . The features are then fed into C, which pre-
networks, while the dash dot line depicts information interac- dicts their labels as ỹl and ỹ u , respectively. In order to make the
tion for collaborative learning. Collaborative learning is realized extracted features beneficial to subsequent C’s correct classifica-
by enhancing information interaction between sub-networks of tion, cross entropy loss is added to R. For labeled data pair (xl , yl ),
Improved-QGAN. For representation model (R), available knowl- the loss function is
edge in classification model (C) and discriminative model (D) can   
be transferred to R by the added cross entropy loss and adversarial C E −real (θ R ) = E pl − log p c yl |xl . (6)
training term respectively. For generative model (G), the modified
The unlabeled feature f u and the predicted label of unlabeled
loss function enhances its relation to C. In addition, C joins the
data ỹ u together make up the pseudo data-label pair. According to
minimax game between G and D by adversarial loss via D. The
four sub-networks interact with each other and can achieve better the adversarial training via the discriminator, the adversarial train-
result synchronously by enhancing the information interaction be- ing term is
tween sub-networks. We will present detailed description of each    
min Exu ∼ p u log 1 − D f u , ỹ u . (7)
model follows. R

To sum up, the overall loss function of R can be formulated as


3.1.1. Representation model R
follows:
The dimensions of the raw data are (2 × N ), where N may be
1024 or even larger. Such high dimensional data makes it difficult min recon (θ R ) + μC E −real (θ R )
R
for the generator G to fit the real data distribution. This inspires     (8)
us the idea of reducing the dimensions of the raw data so that + ηExu , ỹ u ∼ p u log 1 − D f u , ỹ u ,
the dimension of data which G needs to match can be reduced.
where the weighting factor μ and η are used for controlling the
The search space for G is greatly reduced and G can approximate relative importance of the corresponding terms.
the low-dimensional distribution more easily. Auto-encoder is a
common model for reducing dimensions and extracting meaning 3.1.2. Classification model C
features for original data. Besides, it is an unsupervised method The goal of C is to predict labels of its input including the ex-
which can use both labeled data and unlabeled data. Thus, we use tracted features of real labeled and unlabeled data, or generated
an auto-encoder as representation model to extract the valuable feature by G. For real data with known label, the cross entropy
features of raw data. For given training data xl and xu , the decoder loss function is
outputs of R can be denoted as x̃l and x̃u , respectively. Then the   
reconstruction loss of R is defined by C E −real (θC ) = E pl − log p c yl |fl . (9)

3
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

In addition, the generated data can be supplementary labeled 3.1.4. Discriminative model D
samples to optimize C, the loss function of these data is The discriminator D aims to determine the source of its in-
put correctly, i.e., real or pseudo data distribution. The data-label
 

C E −G (θC ) = E p g − log p c y g
f g . (10) pair where both the data and label are true is considered as real
data distribution, and the discriminator outputs “1”. The other two
As with Triple-GAN, we add the adversarial training term via D pseudo data-label pairs, where the data is generated by G and the
to C, the optimization of the classifier can be formulated as fol- label is true, or the data is true and the label is predicted by C,
lows: are regarded as fake data distributions, and the discriminator out-
puts “0”. The corresponding optimization formulation is presented
min C E −real (θC ) + ν C E −G (θC ) as follows:
C
    (11)    
+ ηExu , ỹ u ∼ p u log 1 − D f u , ỹ u , max Exl , y ∼ pl log D f l , yl
D l
   
+ α Exu , ỹ u ∼ p u log 1 − D f u , ỹ u (15)
where ν is another weighting factor.    
+ (1−α ) Ex g , y g ∼ p g log 1 − D f g , y g
3.1.3. Generative model G   The details of the optimization process for Improved-QGAN are
The generator G attempts to generate data G g = G z g , y g ∼

   summarized in Algorithm 1.
p g f g
z g , y g similar to the true data distribution pl f l , yl . Ide-
ally, the generated data just fits the real data distribution and can Algorithm 1 The proposed semi-supervised Improved-QGAN
further improve the performance of C by providing meaningful la- model.
beled data beyond real data. Require:
As we know, forcing a model to perform additional tasks can Labeled known data X l and corresponding label yl , unlabeled data X u ;
The number of training iterations iter;
improve performance on the original task [19,20]. ACGAN [21] is
Learning rate ς R , ςG , ςC , ς D for network R, G, C, D;
a variant of the GAN architecture which introducing an auxiliary Batch sizes for real labeled, unlabeled and generated data: bl , b u , b g ;
classifier and has superiority on generating samples more discrim- Parameters of networks (
R ,
G ,
C ,
D );
inable. In original ACGAN, generator G uses both class label y and Several hyper-parameters α , β, μ, λ, η, ν
1: for i = 1 to iter do
latent vector z to generate images X f ake = G ( y , z ). Then discrim- 2: Sample a batch of labeled data (xl , yl ) of size bl from ( X l , yl ), a batch of
inator D gives a probability over sources from real or fake distri- unlabeled data xu of size b u from X u , a batch of latent vector z and real
bution and a probability over the class label p ( S | X ) , p (Y | X ) = label y g of size b g from distribution p z ( z ) and p ( y l ) respectively;
3: Get presentation of labeled data f l ← R  θ R (xl) and unlabeled data f u ←
D ( X ). The loss function includes two parts: s for correct source R θ R (xu ), get generated data f g ← G θG z , y g , get predicted labels ỹl ←
     
and c for correct class and D tries to maximize s +c , while G is C θC f l , ỹ u ← C θC f u , ỹ g ← C θC f g ;
 
optimized to maximize c −s . 4: Make
 up
  real data-label
 pair f l , yl and two pseudo data-label pairs
f g , y g , f u , ỹ u ;
 
 5: Update the discriminative   model D
 S = E [log p ( S = real | X real ) + E log p S = f ake
X f ake θ D ← Adam  (∇

 θ D E pl log D f l , yl
 
 +α E p u log 1 − D
C = E [log p (Y = y | X real )] + E log p Y = y
X f ake .  f u , ỹ u 
+ (1−α ) E p g log 1 − D f g , y g ), θ D , ς D )
6: Update the representation model R
(12) θ R ← Adam +μC E −real (θ R )
 (∇  θ R (recon
 (θ R )
+η E p u log 1 − D f u , ỹ u ), θ R , ς R )
Motivated by ACGAN, we modified the loss function of G in 7: Update the classification model C
Improved-QGAN. The quality of the data generated by G at the θC ← Adam  (∇ ) +ν C E −G (θC )
 θC (C E−real (θC
+γ E p u log 1 − D f u , ỹ u ), θC , ςC )
beginning may be poor, however, D still tries to classify them cor- 8: Update the generative model G
rectly, which may have a negative effect on D. We divide c into θG ← Adam  (∇θG (C −fake (θ g )
two parts from different perspectives of D and G. For D, we only +λ E p g log(1 − D ( f g , y g )) ), θG , ςG )
9: end for
maximize C −real =E [log p (Y = y |
X real )] of real data. For G, we
   10: return Learned parameters (
R ,
G ,
C ,
D );
maximize C −fake =E log p Y = y
X f ake of fake data via D. This
loss aims at making generated data fool D to assign correct labels.
We apply the variation to Improved-QGAN by already existing clas- 3.2. Theoretical analysis
sifier C. As mentioned above, C only maximizes C −real of real data
We assume that the feature f outputted by R is suitable for
and that’s done by cross entropy as Eq. (11). As for G, in Improved-
subsequent classification. For fixed R, the adversarial losses of G
QGAN, G takes z g and y g as input and outputs the generated data
and C can be rewritten as:
f g . Then f g is sent to C and C gives the predicted label ỹ g . Nat-        
urally, the loss function of G is denoted as α E log 1 − D f u , ỹ u + (1 − α ) E log 1 − D f g , y g
pu pg
   

C −fake θ g = −E log p ỹ g = y g
f g . (13) = α p u ( f , y ) log (1 − D ( f , y )) dyd f

We add the adversarial training term in the same way, the loss
+ (1 − α ) p g ( f , y ) log (1 − D ( f , y )) d f d f
function of G can be defined as follows:

      = p α ( f , y ) log (1 − D ( f , y )) d yd f
min C −fake θ g + λEx g , y ∼ p g log 1 − D f g , y g , (14)
G g

= E [log (1 − D ( f , y ))] ,
where the hyper-parameter λ controls the trade-off between the pα

two losses. (16)

4
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

where p α ( f , y ) =α p u ( f , y ) + (1 − α ) p g ( f , y ).
Then the loss function of the minimax game between G, C and
D is equivalent to the following expression:

min max E [log ( D ( f , y ))] + E [log (1 − D ( f , y ))] + C E


α d pl pα

= min 2 J S ( pl ( f , y )  p α ( f , y ) ) − log (4) (17)


α
+ ρ C E −real (θC ) +τ C E −G (θC ) +κ C −fake (θG ) .
In above formula,  J S reaches the minimum if and only if
p α ( f , y ) = pl ( f , y ), C E −real reaches the minimum if and only if
p u ( f , y ) = pl ( f , y ) and C E −G (C −fake ) reaches the minimum if
and only if p u ( f , y ) = p g ( f , y ). Therefore, the whole equilibrium Fig. 2. The training framework for lightweight Improved-QGAN.
is reached if and only if p u ( f , y ) = p g ( f , y ) = pl ( f , y ).
To update R, parameters of G, C and D are frozen. C E −real (θ R ) y t = sof tmax (at / T ) and y s = sof tmax (a s / T ). In addition to origi-

aims to make R output effective feature for classification and nal cross entropy loss  S = i − yl log (sof tmax (as )), KD also tries
recon (θ R ) can take advantage of the information in both labeled to acquire information from teacher by matching soft target prob-
and unlabeled data. The adversarial term V ( R , D ) gives R feed- abilities:
back to produce data that can trick D. Thus, R is trained under the
  
formulation min C E −real +recon + V R , D opt . sof t = T2 y t log y t / y s . (19)
R
To sum up, sub-networks can converge by fixing other sub- i
networks to update current sub-network. As the iteration process
Thus the student is trained under the following loss function:
goes on, the sub-networks reach equilibrium finally. Our model has
higher computational efficiency than Triple-GAN as shown in part
 K D = (1 − β) s + βsof t . (20)
IV and will not degenerate into Triple-GAN.
KDGAN [27] combine KD with GAN to help classifier model the
3.3. Lightweight Improved-QGAN true data distribution and improve classification accuracy. Simi-
larly, we add an additional discriminator D’, which is trained to
As shown in Fig. 1, to obtain the predicted label of the unla- distinguish logits from teacher or from student. We make the log-
beled data, must go through R and C continuously. So R and C are
its distribution generated by the student approximates that of the
major parts that affect storage and computing overhead at infer-
teacher as much as possible by D’. The overall joint loss function
ence phase. In this part, we provide a lightweight model of the
to train a compact network for inference stage can be expressed as
Improved-QGAN to improve the processing efficiency. Many com-
follows:
pression techniques have emerged for neural network compression
[22] and got great progress recently. Among these methods, knowl-  =  K D +  M S E + η D  ,
edge distillation (KD) refers to compress knowledge from a large       (21)
network (teacher) to a small network (student), which was first  D  = E log D  (at , yl ) + E log 1 − D  (as , yl ) ,
pt ps
proposed by Bucilua [23] and popularized by Hinton [24]. Gen-
erally speaking, the probabilities from teacher have richer infor- where pt and p s stand for the logits distributions of teacher
mation about internal similarity structure over data of different and student respectively. The framework for training lightweight
classes than one-hot labels. The idea of KD is that a student, is Improved-QGAN is shown in Fig. 2.
trained not only by information provided by true labels but also by
soft targets related to the teacher. In our previous work [25], we 4. Experiments
proposed a well-designed joint compression scheme according to
the special structure of QGAN to obtain lightweight version. In this We present experimental results in this section. To evaluate
work, we inherited the idea of joint compression because of the the performance of the proposed framework in semi-supervised
similar structure between QGAN and Improved-QGAN. But we ex- learning, we train Improved-QGAN for two scenarios: a) wireless
perimented on two tasks, i.e., WISR based on our collected dataset interference signal recognition on dataset collected by ourselves,
and AMC on benchmark dataset, while previous work only aims and b) automatic modulation classification on benchmark dataset.
at WISR. Besides, we designed lightweight networks with differ-
ent size for both tasks, and demonstrated the performance of each 4.1. Dataset description
network.
To compress R, the mean squared error (MSE) loss used in [26] 4.1.1. Wireless interference signal recognition on ISM band
is an optional loss function. Let f t be the features of the teacher R, The dataset we consider for WISR mainly operate on 2.4 GHz
denoted as R t , and f s be the features of the student R, denoted as unlicensed frequency band, which total consists of 8 classes, in-
R s . Then R s can be trained by solving the following optimization cluding Wi-Fi, ZigBee, Bluetooth, microwave oven (MWO), analog
problem: video monitor (AVM), narrowband digital signal (NBD), wideband
2 digital signal (WBD) band and wideband background noise (BGN).
 M S E = f t − f s 2 (18) The IQ orthogonal data of every class, except for narrowband dig-
ital signal and wideband digital signal, are collected based on
We compress the model jointly by considering R s and C s , LimeSDR at line-of-sight and non-line-of-sight indoor conditions.
namely the student C, as a whole. Let at and a s be the logits of For simplicity, narrowband digital and wideband digital signal are
teacher and student respectively. KD introduced a hyper-parameter both generated by MATLAB R2019a, where non-ideal power am-
T , known as temperature, to control the relative magnitude of dif- plifiers are considered. The nonlinearity of the power amplifier
ferent classes’ probabilities. The soft probabilities are defined as is modeled by memoryless polynomial. Additive White Gaussian

5
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

Table 1
Collection settings for WISR dataset.

Signal types Center frequency Collection scenarios


Bluetooth 2.442 GHz
Wi-Fi 2.437 GHz Line-of-sight
MWO 2.450 GHz distance
ZigBee 2.440 GHz 1, 3, 5, 7 meters;
AVM 2.432 GHz
NBD 2.437 GHz Non-line-of-sight
WBD 2.437 GHz distance 1, 3
BGN 2.437 GHz meters

Table 2
Network training configurations for Improved-
QGAN.

Number of classes 8 / 11
Training set size 4.0e4 / 4.5e4
Learning rate 1e-3
Batch size 100 / 200
Iterations 400 Fig. 3. Classification accuracy comparison with different amount of labeled data for
Optimizer Adam WISR dataset.
Length of data 1024
[μ, η, ν , λ, α , β ] [1, 0.01, 0.1, 0.1, 0.5, 0.4] F1 score: Let T P , F P and F N stand for the number of predict-
ing the true positive samples as positive samples, the number of
predicting the true negative samples as positive samples, and the
Noise is a common assumption in communication domain. To sim- number of predicting the true positive samples as negative sam-
plify the problem, the signal is added additive white Gaussian ples. Then performance metrics precision P c , recall R c and F1 score
noise to obtain dataset with different SNR. Table 1 shows the de- can be denoted by:
tails of the dataset. The amount of data for each category is 5000,
TP TP 2P c R c
and the length of each sample is 1024. 80% randomly selected sig- Pc = , Rc = , F1 =
nal data samples are used for training and the rest for testing and TP + FN TP + FP Pc + Rc
validation. Confusion matrix: In the field of deep learning classification, a
confusion matrix, also known as an error matrix, is a specific table
4.1.2. Automatic modulation classification layout that allows visualization of the performance of an algorithm.
The dataset we use for AMC is benchmark dataset generated A confusion matrix is a tabular summary of the number of correct
in [28]. Specifically, it’s made up of 11 categories which are com- and incorrect predictions made by a classifier and can show the
monly seen in impaired environments, including the following identification performance clearly.
modulation formats: OOK, 4ASK, BPSK, QPSK, 8PSK, 16QAM, AM-
SSB-SC, AM-DSB-SC, FM, GMSK, OQPSK. Each sample is corrupted 4.4. Numerical results
by random noise, time offset, phase, and wireless channel distor-
tions. The amount of data for each category is 4096, and the length In this subsection, we will present various simulation results of
of each sample is 1024. 85% randomly selected examples are used our proposed model. We also evaluate performance of the obtained
for training and the rest for testing and validation. The SNR of lightweight model of Improved-QGAN.
dataset used in our experiment varies between −4 dB and 12 dB.
4.4.1. Classification performance
4.2. Implementation details Classification performance vs. amount of labeled data In semi-
supervised learning, the amount of labeled data is an important
In our experiments, the neural network is built by Tensorflow parameter, which can influence and evaluate the adaptability and
on Python 3.6 and trained on one NVIDIA GeForce GTX 1080 Ti performance of the algorithm in different situations. In this subsec-
GPUs. Configuration details of the model training are listed in Ta- tion, the classification accuracy under different amounts of labeled
ble 2. The architecture of R we used is CNNs with attention mech- data is tested. For WISR, The amount of labeled data is uniform
anism. The specific structures of R and C used for two tasks are distribution from 80 to 240 for each class and the SNR is about
different, while D and G are same. The details are given in Table 3. 4 dB. For AMC, The amount of labeled data is uniform distribution
from 400 to 1200 for each class and the selected SNR is 4 dB. The
4.3. Performance metrics input data length for both two tasks are 1024. Fig. 3 shows how
classification accuracy changes with the amounts of labeled data
In order to fully demonstrate and compare the classification for WISR, while Fig. 4 is based on AMC dataset. CNN stands the
performance of the proposed model and baseline models, we use network having the same architecture with R and C sub-networks
multiple performance metrics to show the experimental results. in Improved-QGAN.
Classification accuracy: The ratio of the number predicted cor- From Fig. 3 and Fig. 4, we can see that the classification perfor-
mance goes better as the amount of labeled data increases, and the
rectly and total sample number. Let predicted and true label be
classification performance of GAN-based model is better than CNN.
denoted as y and ỹ, then classification accuracy can be defined in
Besides, Improved-QGAN outperforms the baseline model (original
the following formula:
Triple-GAN and our previous QGAN) for both two tasks.

1  
N

Acc = I ỹ i = y i . (22) Classification performance vs. data length The proposed model is
N a data-driven DL approach, and properties of input data can impact
i =1

6
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

Table 3
Detailed network structure of Improved QGAN for different tasks.

Representation model R (WISR) Representation model R (AMC)


(1 × 7) Conv, 128, lReLU, Maxpooling 4, BatchNorm (1 × 3) Conv, 128, lReLU, Maxpooling 2
(1 × 7) Conv, 64, lReLU, Maxpooling 4, BatchNorm (1 × 3) Conv, 64, lReLU, Maxpooling 2
(1 × 7) Conv, 64, lReLU, Maxpooling 4, BatchNorm (1 × 3) Conv, 32, lReLU, Maxpooling 2
Flatten (1 × 3) Conv, 16, lReLU, Maxpooling 2
MLP 256, lReLU, MLP 128, ReLU (1 × 3) Conv, 8, lReLU, Maxpooling 2
Classification model C (WISR) Representation model C (AMC)
(1 × 7) Conv, 64, lReLU, Maxpooling 2, BatchNorm (1 × 3) Conv, 64, lReLU, Maxpooling 2
(1 × 7) Conv, 32, lReLU, Maxpooling 2, BatchNorm (1 × 3) Conv, 64, lReLU, Maxpooling 2
(1 × 7) Conv, 32, lReLU, BatchNorm Flatten, MLP 128, SeLU
Flatten, MLP 128, lReLU, MLP 8, SoftMax MLP 128, SeLU, MLP 11, SoftMax
Generative model G
MLP 500, Softplus, BatchNorm, MLP 500, Softplus, BatchNorm
MLP 128, lReLU
Discriminative model D
(1 × 19) Conv, 128, lReLU, Maxpooling 2, BatchNorm
(1 × 15) Conv, 32, lReLU, Maxpooling 2, BatchNorm
(1 × 11) Conv, 32, lReLU, BatchNorm
Flatten, MLP 128, lReLU; MLP 1, Sigmoid

Fig. 4. Classification accuracy comparison with different amount of labeled data for
Fig. 5. Classification accuracy comparison w.r.t. data length for WISR dataset.
AMC dataset.

model’s performance. The length of data is a key factor affecting


classification accuracy. In this subsection, the classification accu-
racy under different length of data is shown. Different lengths (64,
128, 256, 512, and 1024) are tested for both two tasks. The SNRs
are set at 4 dB. The amount of labeled data are 240 and 800 for
different task. Fig. 5 and Fig. 6 compare the performance under dif-
ferent length of data for two tasks respectively. The figures demon-
strated that the better classification accuracy is obtained under the
longer length of data generally. In addition, Improved-QGAN pro-
vides better performance compared with other related models.

Classification performance vs. SNR SNR is a common factor of


communication data and has effect on the classification accuracy.
In this subsection, we show the classification accuracy under dif-
ferent SNRs. The amounts of labeled data are 240 and 1200 for
different tasks. The input data length for two tasks are 1024. The
results of Fig. 7 and Fig. 8 are based on WISR and AMC dataset re-
spectively. The results illustrate the benefit of our proposed model.
Table 4 provides the quantitative results under different SNRs
for two tasks. The results show that Improved-QGAN are better
than baseline models in terms of precision, recall, F1 score and
classification accuracy in most scenarios. All the results shown Fig. 6. Classification accuracy comparison w.r.t. data length for AMC dataset.

7
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

Table 4
Performance comparison at different SNR regimes.

Task SNR Model Pc Rc F1-score Acc


WISR −2 dB TGAN 0.415 0.418 0.416 0.426
QGAN 0.423 0.424 0.419 0.433
Ours 0.460 0.470 0.459 0.481
4 dB TGAN 0.654 0.654 0.653 0.659
QGAN 0.664 0.662 0.661 0.668
Ours 0.678 0.668 0.670 0.674
6 dB TGAN 0.977 0.977 0.977 0.977
QGAN 0.971 0.973 0.973 0.975
Ours 0.982 0.982 0.982 0.982

AMC −2 dB TGAN 0.435 0.433 0.423 0.446


QGAN 0.448 0.451 0.427 0.458
Ours 0.431 0.464 0.413 0.469
0 dB TGAN 0.864 0.865 0.862 0.873
QGAN 0.890 0.886 0.886 0.892
Ours 0.894 0.895 0.894 0.900
4 dB TGAN 0.982 0.982 0.982 0.983
QGAN 0.987 0.988 0.987 0.988
Fig. 7. Classification accuracy comparison with different SNRs for WISR dataset. Ours 0.991 0.991 0.991 0.992

Fig. 8. Classification accuracy comparison with different SNRs for AMC dataset.

Fig. 9. Negative loglikelihood between real data and generated data.

above demonstrate the effectiveness and universality of Improved-


QGAN. It is worth noting that the F1 score of QGAN is better than
Improved-QGAN at low SNR scenario. In this case, the information
between the sub-networks exists noise and information decline
happens. The information interaction between the sub-networks
is affected when we didn’t design module to resist noise inter-
ference. It is difficult for a network to obtain effective information
from other sub-networks for training. Therefore, some performance
indicators of QGAN are better than the Improved-QGAN in the case
of low SNR. We will investigate techniques to counteract the effect
of noise interference in the case of low SNRs as future works to
improve the algorithm performance.

Generation performance of data In CV applications, we can


clearly see how generated data looks like by drawing it. However,
the generated wireless waveform by G is irregular, and it is diffi-
cult to find the rule with human’s eyes. According to [29], negative
loglikelihood between real data and generated data can represent
the similarity between them. Fig. 9 shows how the negative log-
likelihood changes over training epoch. The result demonstrated
that generated data gradually approaches the real data and tends
to be stable at last. Fig. 10. Test accuracy comparison over training epochs for WISR dataset.

8
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

Table 5
A comparison of epochs used to achieve target accuracy.

Target accuracy 0.90 .91 0.92 0.93


Epoch Improved-QGAN 56 59 69 75
(WISR) Triple-GAN 71 238 246 260
Target accuracy 0.94 .95 0.96 0.97
Epoch Improved-QGAN 107 121 130 140
(AMC) Triple-GAN 200 211 277 370

Table 6
Computation comparison between Triple-GAN and Improved-QGAN.

Task Model Parameters MACCs Time


WISR Triple-GAN 4669.38K 87.37M 8.70 s
Improved-QGAN 1508.87K 42.65M 8.52 s

AMC Triple-GAN 1804.61K 91.18M 24.06 s


Improved-QGAN 710.84K 40.76M 20.71 s

Fig. 12. Confusion matrices at SNR = 0 dB for AMC dataset.

Table 8 show the details of model parameters, compression ratios


(CR), occupied storage space, single sample inference time (IT) and
other indicators for two tasks. The results indicate that we can get
a trade-off between classification accuracy and model size.
Parameters stand the total number of training parameters of
representation sub-network and classification sub-network. Stor-
age is the storage space occupied by representation sub-network
and classification sub-network Compression Ratio is the ratio of
the total number of training parameters of teacher model to that
of student model. MACCs are the number of multiply-accumulate
operations. Inference Time is the time from the model accepts
a single sample to it gives predicted label. Speed Up shows the
ratio of teacher model’s single sample inference time to that of
student model. Table 9 and Table 10 demonstrate the classifica-
tion performance of lightweight QGAN and lightweight Improved-
QGAN, which are obtained by taking QGAN and Improved-QGAN
as teacher networks respectively. The results demonstrated that
Fig. 11. Confusion matrices at SNR = 4 dB for WISR dataset.
lightweight Improved-QGAN obtained better classification accu-
racy than lightweight QGAN under lightweight network with
Computational efficiency We compare the computational effi- same capacity, which benefits from the performance advantage of
ciency of Improved-QGAN and Triple-GAN in terms of learning Improved-QGAN over QGAN.
speed and computation time per training epoch [30]. The amount
of labeled data are 240 and 1200 respectively. Test accuracy over 5. Conclusion
training epoch for WISR is shown in Fig. 10. We can see that
to achieve the same recognition performance, Improved-QGAN In this paper, we proposed an improved version of QGAN for
is faster than Triple-GAN. Moreover, the final recognition accu- enhancing cooperative training among four sub-networks of origi-
racy of our model is higher than that of Triple-GAN. Table 5 nal QGAN. We modified the overall loss function of original QGAN
clearly presents that Improved-QGAN requires much less epochs to including an adversarial training term for R and C and a modified
achieve pre-specified test accuracies. Table 6 lists the total num- loss function motivated by ACGAN for G. We compared the per-
ber of training parameters and multiply-accumulate operations and formance of improved model with other models on two different
time for training per epoch. We can conclude that Improved-QGAN datasets and verified the benefits and performance improvements
takes less time and achieves satisfactory performance more effi- of the proposed model. Furthermore, we provided a compact net-
ciently. work of Improved-QGAN based on KD which can take up less
Finally, to understand the classification results in detail, we pre- memory and run faster. Experimental results show that the pro-
sented confusion matrices of Improved-QGAN as shown in Fig. 11 posed model outperforms baseline models under most scenarios
and Fig. 12. For WISR, almost all 8 classes of signals can be sepa- and the lightweight model achieves a satisfactory tradeoff between
rated with high accuracy. For AMC, the model shows good perfor- processing efficiency and classification accuracy.
mance with most modulation formats of signals, while has difficult
in discriminating QPSK and 8PSK, which may be due to their sim- CRediT authorship contribution statement
ilarities after various distortions.
Xiaodong Xu: Conceptualization, Methodology, Investigation,
4.4.2. Performance of lightweight Improved-QGAN Analysis, and Supervision. Ting Jiang: Writing-Original draft prepa-
In this subsection, we summarize the quantitative results of ration, Implementation, and Validation. Jialiang Gong: Implemen-
lightweight networks as listed in Table 7 to Table 10. Table 7 and tation-Benchmarks and Editing. Haifeng Xu: Data collection, Pre-

9
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

Table 7
Student compression ratio and classification accuracy for WISR.

Model Parameters Compression ratio Storage MACCs Interference time Speed up


Teacher 922.44K 1 2123 KB 19.98M 1211.01 us 1
Student 1 145.30K 6.35 585 KB 5.81M 612.43 us 1.98
Student 2 89.02K 10.36 365 KB 1.94M 360.15 us 3.36
Student 3 55.02K 16.77 232 KB 1.27M 335.34 us 3.61
Student 4 47.49K 19.42 203 KB 0.90M 260.85 us 4.64
Student 5 36.10K 25.55 158 KB 0.69M 233.05 us 5.20

Table 8
Student compression ratio and classification accuracy for AMC.

Model Parameters Compression ratio Storage MACCs Interference time Speed up


Teacher 164.24K 1 521 KB 15.49M 1406.30 us 1
Student 1 77.90K 2.11 314 KB 8.92M 1116.68 us 1.26
Student 2 40.24K 4.08 167 KB 3.15M 563.20 us 2.50
Student 3 26.12K 6.29 111 KB 2.35M 492.52 us 2.86
Student 4 20.12K 8.16 88 KB 1.88M 486.82 us 2.89
Student 5 16.63K 9.88 74 KB 1.67M 436.99 us 3.22

Table 9 References
Classification accuracy comparison of lightweight network for WISR.

Student model Teacher model P av g R av g F 1av g Acc [1] J. Du, C. Jiang, J. Wang, Y. Ren, Mérouane Debbah, Machine learning for 6G
wireless networks: carrying forward enhanced bandwidth, massive access, and
Teacher QGAN 0.937 0.930 0.931 0.933 ultrareliable/low-latency service, IEEE Veh. Technol. Mag. 15 (4) (Dec. 2020)
Improved-QGAN 0.957 0.956 0.956 0.957 122–134.
[2] M. Kulin, T. Kazaz, I. Moerman, E. De Poorter, End-to-end learning from spec-
Student 1 QGAN 0.930 0.929 0.928 0.929
trum data: a deep learning approach for wireless signal identification in spec-
Improved-QGAN 0.953 0.953 0.952 0.953
trum monitoring applications, IEEE Access 6 (2018) 18484–18501.
Student 2 QGAN 0.928 0.927 0.926 0.928 [3] X. Li, F. Dong, S. Zhang, W. Guo, A survey on deep learning techniques in wire-
Improved-QGAN 0.949 0.949 0.949 0.950 less signal recognition, Wirel. Commun. Mob. Comput. 2019 (2019) 1–12.
[4] K. Kim, I.A. Akbar, K.K. Bae, J.-S. Um, C.M. Spooner, J.H. Reed, Cyclostationary
Student 3 QGAN 0.924 0.925 0.924 0.926 approaches to signal detection and classification in cognitive radio, 2nd IEEE
Improved-QGAN 0.947 0.945 0.945 0.946 International Symposium on, IEEE, Apr. 2007.
[5] C.M. Spooner, A.N. Mody, J. Chuang, J. Petersen, Modulation recognition using
Student 4 QGAN 0.921 0.920 0.920 0.922
second- and higher-order cyclostationarity, in: 2017 IEEE International Sympo-
Improved-QGAN 0.942 0.941 0.941 0.943
sium on Dynamic Spectrum Access Networks (DySPAN), Mar. 2017.
Student 5 QGAN 0.918 0.914 0.914 0.916 [6] Z. Gan, L. Chen, W. Wang, Triangle generative adversarial networks, in: Pro-
Improved-QGAN 0.937 0.936 0.936 0.937 ceedings of the 31st International Conference on Neural Information Processing
Systems (NIPS), Dec. 2017.
[7] J. Gong, X. Xu, Y. Qin, W. Dong, A generative adversarial network based frame-
work for specific emitter characterization and identification, in: 11th Interna-
tional Conference on Wireless Communications and Signal Processing (WCSP),
Table 10 Oct. 2019.
Classification accuracy comparison of lightweight network for AMC. [8] C. Ledig, L. Theis, F. Huszar, J. Caballero, et al., Photo-realistic single image
Student model Teacher model P av g R av g F 1av g Acc super-resolution using a generative adversarial network, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017.
Teacher QGAN 0.987 0.988 0.987 0.988
[9] L. Yu, W. Zhang, J. Wang, Y. Yu, SeqGAN: sequence generative adversarial nets
Improved-QGAN 0.991 0.991 0.991 0.992
with policy gradient, in: Proceedings of the Thirty-First AAAI Conference on
Student 1 QGAN 0.975 0.973 0.973 0.975 Artificial Intelligence, Feb. 2017.
Improved-QGAN 0.982 0.982 0.982 0.983 [10] T. Schlegl, P. Seeböck, S.M. Waldstein, U. Schmidt-Erfurth, G. Langs, Unsuper-
vised anomaly detection with generative adversarial networks to guide marker
Student 2 QGAN 0.969 0.969 0.969 0.971 discovery, Inf. Process. Med. Imag. (May 2017).
Improved-QGAN 0.977 0.978 0.977 0.978 [11] I. Goodfellow, et al., Generative adversarial nets, in: Proceedings of the 27th
International Conference on Neural Information Processing Systems (NIPS), Dec.
Student 3 QGAN 0.964 0.964 0.964 0.966 2014.
Improved-QGAN 0.967 0.967 0.967 0.969 [12] Z. Pan, W. Yu, X. Yi, A. Khan, F. Yuan, Y. Zheng, Recent progress on generative
adversarial networks (GAN): a survey, IEEE Access 7 (2019) 36322–36333.
Student 4 QGAN 0.960 0.961 0.960 0.962
[13] A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with
Improved-QGAN 0.968 0.969 0.968 0.970
deep convolutional generative adversarial networks, Comput. Sci. (2015).
Student 5 QGAN 0.959 0.955 0.955 0.958 [14] M. Mirza, S. Osindero, Conditional generative adversarial nets, Comput. Sci.
Improved-QGAN 0.966 0.966 0.966 0.968 (2014) 2672–2680.
[15] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks,
in: Proceedings of the 34th International Conference on Machine Learning, Aug.
2017.
[16] T.J. Oshea, T. Roy, N. West, B.C. Hilburn, Physical layer communications system
processing, and Visualization. Xiaowei Qin: Writing-Reviewing and
design Over-the-Air using adversarial networks, in: 26th European Signal Pro-
Discussion. cessing Conference (EUSIPCO), Sep. 2018.
[17] D. Roy, T. Mukherjee, M. Chatterjee, E. Pasiliao, Detection of rogue RF transmit-
ters using generative adversarial nets, in: IEEE Wireless Communications and
Declaration of competing interest Networking Conference (WCNC), Apr. 2019.
[18] C. Zhu, L. Xu, X.-Y. Liu, F. Qian, Tensor-generative adversarial network with two-
dimensional sparse coding: application to real-time indoor localization, in: IEEE
The authors declare that they have no known competing finan-
International Conference on Communications (ICC), May 2018.
cial interests or personal relationships that could have appeared to [19] C. Szegedy, W. Liu, Y. Jia, et al., Going deeper with convolutions, in: IEEE Con-
influence the work reported in this paper. ference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015.

10
X. Xu, T. Jiang, J. Gong et al. Digital Signal Processing 117 (2021) 103188

[20] B. Ramsundar, S. Kearnes, P. Riley, et al., Massively multitask networks for drug Ting Jiang received her B.Eng. degree in School of
discovery, Comput. Sci. (2015). Information and Communication Engineering, Dalian
[21] A. Odena, C. Olah, J. Shlens, Conditional image synthesis with auxiliary clas- University of Technology (DUT), Dalian, China, in
sifier GANs, in: Proceedings of the 34th International Conference on Machine 2018. She is currently pursuing her M.Eng. in the De-
Learning, Aug. 2017.
partment of Electronic Engineering and Information
[22] Y. Cheng, D. Wang, P. Zhou, T. Zhang, Model compression and acceleration for
deep neural networks: the principles, progress, and challenges, IEEE Signal Pro-
Science at University of Science and Technology of
cess. 35 (1) (2018) 126–136,Jan. China (USTC). Her main research topic is deep learn-
[23] C. Bucilua, R. Caruana, A. Niculescu-Mizil, Model compression, in: Proceedings ing based wireless communication and signal process-
of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery ing.
and Data Mining, Aug. 2006.
[24] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network,
Jialiang Gong received the B.Eng. degree in Elec-
Comput. Sci. (2015).
tronic and Information Engineering from University of
[25] T. Jiang, X. Qin, X. Xu, J. Chen, W. Dong, Lightweight quadruple-GAN for in-
terference source recognition in Wi-Fi networks, in: IEEE 6th International Science and Technology of China (USTC), Hefei, China,
Conference on Computer and Communications (ICCC), Dec. 2020. in 2016. He is currently pursuing his Ph.D. degree
[26] A. Aguinaldo, P. Chiang, A. Gain, A. Patil, K. Pearson, S. Feizi, Compressing GANs in School of Cyberspace Science and Technology at
using knowledge distillation, arXiv preprint, arXiv:1902.00159, 2019. USTC. His research interest is deep learning for spe-
[27] X. Wang, R. Zhang, Y. Sun, J. Qi, KDGAN: knowledge distillation with generative cific emitter identification and cyberspace security.
adversarial networks, in: Proceedings of the 32nd International Conference on
Neural Information Processing Systems (NIPS), Dec. 2018.
[28] T.J. O’Shea, T. Roy, T.C. Clancy, Over the air deep learning based radio signal Haifeng Xu received his B.Eng. degree in School of
classification, IEEE J. Sel. Top. Signal Process. 12 (2018) 168–179. Computer Science and Information Engineering, Hefei
[29] M. Ben-Yosef, D. Weinshall, Gaussian mixture generative adversarial networks University of Technology (HUT), Hefei, China, in 2019.
for diverse datasets, and the unsupervised clustering of images, 2018. He is currently pursuing his M.Eng. in the Department
[30] D. Kim, Y. Choi, J. Han, C. Choi, Y. Kim, Fast adversarial training for semi- of Electronic Engineering and Information Science at
supervised learning, in: International Conference on Learning Representations University of Science and Technology of China (USTC).
(ICLR), 2019. His research topic is deep learning based wireless
communication and mobile computing.
Xiaodong Xu received his B.Eng. degree and Ph.D.
degree in Electronic and Information Engineering Xiaowei Qin received the B.S. and Ph.D. degrees
from University of Science and Technology of China from the Department of Electrical Engineering and In-
(USTC) in 2000 and 2007, respectively. From 2000 to formation Science, University of Science and Technol-
2001, he served as a Research Assistant at the R&D ogy of China (USTC), Hefei, China, in 2000 and 2008,
center, Konka Telecommunications Technology. Since respectively. Since 2014, he has been a member of
2007, he has been a faculty member with the De- staff in Key Laboratory of Wireless-Optical Communi-
partment of Electronic Engineering and Information cations of Chinese Academy of Sciences at USTC. His
Science, USTC. He is currently working with the CAS research interests include optimization theory, service
Key Laboratory of Wireless-Optical Communications, USTC. The research modeling in future heterogeneous networks, and big
interests include the areas of wireless communications, signal processing, data in mobile communication networks.
wireless artificial intelligence and information-theoretic security.

11

You might also like