You are on page 1of 15

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

Convolutional Neural Network for Behavioral


Modeling and Predistortion of Wideband
Power Amplifiers
Xin Hu , Senior Member, IEEE, Zhijun Liu , Xiaofei Yu, Yulong Zhao ,
Wenhua Chen , Senior Member, IEEE, Biao Hu , Xuekun Du , Member, IEEE,
Xiang Li , Member, IEEE, Mohamed Helaoui , Senior Member, IEEE,
Weidong Wang, Member, IEEE, and Fadhel M. Ghannouchi , Fellow, IEEE

Abstract— Power amplifier (PA) models, such as the neural I. I NTRODUCTION


network (NN) models and the multilayer NN models, have prob-
lems with high complexity. In this article, we first propose a novel
behavior model for wideband PAs, using a real-valued time-delay
A S an indispensable component in the wireless commu-
nication system, power amplifiers (PAs) provide enough
power for transmitting the signal through the channel to ensure
convolutional NN (RVTDCNN). The input data of the model is
sorted and arranged as a graph composed of the in-phase and that the receiver can collect the signal with a relatively good
quadrature (I/ Q) components and envelope-dependent terms signal-to-noise ratio [1]–[4]. However, a PA’s nonlinearity and
of current and past signals. Then, we created a predesigned memory effect can lead to a spectral expansion and a decrease
filter using the convolutional layer to extract the basis functions in the adjacent channel power ratio (ACPR) performance,
required for the PA forward or reverse modeling. Finally,
the generated rich basis functions are input into a simple, fully thereby degrading the quality of the communication [5]–[9].
connected layer to build the model. Due to the weight sharing Behavior modeling provides an effective method for nonlin-
characteristics of the convolutional model’s structure, the strong ear analysis and modeling of PAs. The behavior modeling
memory effect does not lead to a significant increase in the often constructs the mathematical nonlinear modeling function
complexity of the model. Meanwhile, the extraction effect of by capturing the input and output responses of the system
the predesigned filter also reduces the training complexity of
the model. The experimental results show that the performance when driven with significant time-varying signals to track
of the RVTDCNN model is almost the same as the NN models and and observe the static nonlinear behavior of the system,
the multilayer NN models. Meanwhile, compared with the above- as well as the dynamics of the system that is often desig-
mentioned models, the coefficient number and computational nated as memory effects [10], [11]. With the incoming 5G
complexity of the RVTDCNN model are significantly reduced. standard calling for a sharp increase in data transmission
This advantage is noticeable when the memory effects of the PA
are increased by using wider signal bandwidths. rates up to multiple Gbps, the signal bandwidth must also
be increased significantly, by up to several hundred MHz.
Index Terms— Digital predistortion (DPD), in-phase and Accordingly, ultrabroadband PA behavior modeling and digital
quadrature (I/ Q) components, neural network (NN), power
amplifiers (PAs), real-valued time-delay convolutional NN (CNN) predistortion (DPD) have become a popular choice in current
(RVTDCNN). research.
Traditional behavioral models with memory effects, includ-
ing the original Volterra model and several compact Volterra
Manuscript received December 30, 2019; revised June 19, 2020 and
October 5, 2020; accepted January 23, 2021. This work was supported by models, have been widely used in the modeling of wideband
the National Natural Science Foundation of China under Grant 61701033. PA [12], [13]. However, the high correlation between poly-
(Corresponding author: Zhijun Liu.) nomial bases in these models makes it difficult to improve
Xin Hu, Zhijun Liu, Xiaofei Yu, and Weidong Wang are with the School
of Electronic Engineering, Beijing University of Posts and Telecommu- their modeling performance [14]. Recently, the outstanding
nications, Beijing 100876, China, and also with the Key Laboratory of achievements of artificial neural networks (NNs) (ANNs) in
Universal Wireless Communications, Ministry of Education, Beijing Uni- the field of communications have attracted the attention of
versity of Posts and Telecommunications, Beijing 100876, China (e-mail:
huxin2016@bupt.edu.cn; lzj2017110489@bupt.edu.cn). researchers in the field of wireless PA modeling. Due to ANN’s
Yulong Zhao, Xuekun Du, Xiang Li, Mohamed Helaoui, and Fadhel excellent performance for the approximation of nonlinear
M. Ghannouchi are with the Intelligent RF Radio Laboratory, Depart- function, many works, published in the open literature, have
ment of Electrical and Computer Engineering, Schulish School of Engi-
neering, University of Calgary, Calgary, AB T2N 1N4, Canada (e-mail: studied its application in the area of PA modeling and predis-
fadhel.ghannouchi@ucalgary.ca). tortion [15]–[19]. When a PA exhibits complicated nonlinear
Wenhua Chen is with the Department of Electronic Engineering, Tsinghua characteristics and memory effects, it is difficult to achieve
University, Beijing 100084, China.
Biao Hu is with the School of Electronic Science and Engineering, Uni- good modeling performance with low complexity-based ANN
versity of Electronic Science and Technology of China, Chengdu 610054, models. This issue motivated the following work in addressing
China. the problem of how to derive broadband low-complexity
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TNNLS.2021.3054867. NN-based models that can provide accurate modeling perfor-
Digital Object Identifier 10.1109/TNNLS.2021.3054867 mance for the forward and inverse (predistorters) models.
2162-237X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

To address the above issues and inspired by the emer- band low complexity model, a CNN-based architecture
gence of artificial intelligence (AI) in the broadband com- for extracting the PA behavioral model is proposed to
munications area, advanced NN-based models have been improve nonlinear modeling performance.
investigated [20]–[24]. Deep learning [25]–[28] in the AI 2) The CNN-based architectures were commonly used in
field has shown excellent performance in discovering com- image processing, where the input data are in a 2-D
plex nonlinear relationships using labeled data. In particular, format. However, in PA modeling and DPD, the input
convolutional NNs (CNNs) [29], [30] and recurrent NNs data have a 1-D format (varies only versus time). One
(RNNs) [31], [32] in deep learning have been proven to of the contributions of this article is in proposing a way
be effective in many fields, including wireless communi- to map the 1-D data to 2-D data to make the use of a
cation [33]–[38]. However, according to the results of our CNN-based architecture possible.
research, the work on the use of deep learning to build the 3) If RNN or CNN is used in the modeling and linearization
behavior models and linearize PAs [14], [39]–[42] is limited. of PAs, they have high computational complexity for
One of the important reasons is that the regression algorithm parameters training [43]. To reduce the computational
that is based on RNNs is often utilized for natural speech and complexity, a training methodology for PA modeling
time series processing tasks. If they are used for modeling based on transfer learning [44] is proposed to accelerate
and linearization of PAs, although they have fewer parameters the training of the PA model.
compared with feedforward NNs due to their characteristics of 4) As a result of all the above contributions, this work
weight sharing, the complex training algorithm seems to make achieved the best PA modeling and linearization per-
the method complicated [43]. In addition, CNN is usually used formance (normalized mean square error (NMSE) and
as a classifier, and the output layer makes a discrete decision ACPR) for nonlinear PAs driven with broadband signals
rather than outputting a continuous signal. However, in this with bandwidths higher than 100 MHz. In addition,
presented work, it will demonstrate for the first time that CNN the proposed modeling technique achieved the lowest
can be adapted and used in the fields of behavior modeling complexity and the fastest convergence time among all
and DPD synthesis of the PAs. The NN model’s complexity the other models used for the same application (PA
reduction will mainly result from the characteristic of weight modeling and DPD).
sharing in CNN structures [29], [30]. The aspect of increasing The remainder of this article is organized as follows.
the input dimension without changing the network structure In Section II, the existing NN models for PA modeling,
has attracted our attention. including shallow NN models and deep NN (DNN) models,
We first apply CNN to PA modeling and propose a real- are briefly reviewed. Section III proposes the structure of
valued time-delay CNN (RVTDCNN) behavior model for the RVTDCNN model and describes it in a detailed manner.
wideband wireless PA. As CNN cannot be directly used to Section IV discusses the training process of the RVTDCNN
build the PA model, since the input signal is in a 1-D format model and analyzes the model complexity of the RVTDCNN
that changes with time, the input data are sorted and arranged model. Section V extends the RVTDCNN model to DPD
as the graph composed of the in-phase and quadrature (I /Q) design. Section VI describes the platform for experimental
components and envelope-dependent terms of current and past validation. Section VII reports the measurement and validation
signals. Next, this model constructs a predesigned filter using results and compares the proposed model with other models.
the convolutional layer to extract the basis functions required Finally, Section VIII gives the conclusions.
for PA forward or reverse modeling. Finally, the extracted basis
functions are inputted into a simple, fully connected (FC) layer II. ANN S FOR PA M ODELING
to build the model. The model complexity of RVTDCNN is
significantly reduced due to the weight sharing characteris- A. Shallow Neural Networks for PA Modeling
tics of the convolution structure. Meanwhile, the extraction The shallow NNs, with fewer hidden layers, are used to
effect of the predesigned filter also reduces the training express the output characteristics of the PA due to its rela-
complexity of the model. In order to evaluate the model tively simple network structure and training process, as shown
performance of the RVTDCNN model, we compared the in Fig. 1(a) and (b) [15], [19].
RVTDCNN model with other existing models (including NN A commonly used shallow NN structure includes an input
and multilayer NN model) by experiment and simulation. The layer, a hidden layer structure with one or two layers, and an
results show that, compared with the existing state-of-the-art output layer. The model in Fig. 1(a) considers injecting the
models, the performance of the RVTDCNN model, especially I /Q components of the input signal and embeds their corre-
its complexity, is reduced in terms of the number of the sponding time-delayed values into the spatial structure of the
model’s coefficients. input layer of the network to reflect the corresponding memory
The contributions of this article are as follows. effects, such as real-valued time-delay NN (RVTDNN) model
1) As the signal bandwidth increases, PA exhibits compli- in [15]. However, hidden and related information, such as
cated nonlinear characteristics and memory effects. It is envelope-dependent terms, requires further network compu-
difficult to achieve good modeling performance with low tation capability, which leads to a complex network struc-
complexity-based traditional behavioral models [14]. ture and additional hidden layers. To this end, the structure
To address the problem of how to derive the broad- in Fig. 1(b) is proposed to simplify the network structure,

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HU et al.: CNN FOR BEHAVIORAL MODELING AND PREDISTORTION 3

layer, one predesigned filter layer, one FC layer, and one


output layer. The predesigned filter layer is constructed using
a convolutional layer and is used to capture, in an effective
manner, the important features, and characteristics of the input
data. Due to the characteristics of the weight sharing, and
the data dimensionality reduction of the convolution kernel in
the predesigned filter structure, the input information can be
extracted at a small network scale. The dimensions of each
convolution kernel can be designed to yield low computation
complexity while maintaining a good prediction performance
of the model. After the predesigned filter layer, an FC layer is
used to integrate the features. The final output layer consists of
two neurons with a linear activation function, corresponding
to the I /Q components of the samples.
The input data include the I /Q components and the
envelope-dependent terms of current and past signals. The
input data have a 1-D format (varies only versus time).
Fig. 1. Conventional ANN topologies for PA modeling. (a) Shallow NN We mapped the 1-D data to a 2-D data, and the input matrix
topology with the input of the I /Q components. (b) Shallow NN topology is expressed as follows:
with the input of the I /Q components and the envelope-dependent terms.
(c) DNN topology.
X n = [Iin (n), Iin (n − 1), . . . , Iin (n − M) ;
which attempts to inject the I /Q components and important Q in (n), Q in (n − 1), . . . , Q in (n − M);
envelope-dependent terms. The corresponding models include |x(n)|, |x(n − 1)|, . . . , |x(n − M)|;
augmented radial basis function NN (ARBFNN) in [18] and
augmented real-valued time-delay NN (ARVTDNN) in [19]. |x(n)|2 , |x(n − 1)|2 , . . . , |x(n − M)|2 ;

However, with the increase in the signal bandwidth, the mem- |x(n)|3 , |x(n − 1)|3 , . . . , |x(n − M)|3 (1)
ory depth will also increase, and the input dimension of the
model will grow significantly, resulting in a complex net- where Iin (n) and Q in (n) represent the I /Q components of the
work structure. Overall, to provide sufficient network capacity, complex envelope x(n) of the PA input signal, respectively;
the shallow NN structure makes the calculation relatively |x(n)| denotes the amplitude of the current signal; Iin (n − i ),
complicated. Q in (n − i ), and |x(n − i )|, (i = 1, 2, . . . , M) denote the
corresponding terms of past samples, respectively; and M
B. Deep Neural Networks for PA Modeling represents the memory depth.
The reason why the input data are arranged from a 1-D
An NN with multiple hidden layers is proposed to improve
vector to a 2-D graph is to put it in a format suitable to the
the performance of PA modeling. Instead of the simple hidden
convolutional processing. The input data items corresponding
layer, DNNs’ architecture includes over three hidden layers to
to the adjacent delay signals are arranged adjacently to ensure
mimic and approximate the nonlinearity and memory effects of
that the 2-D convolution kernel extracts the cross-terms of the
PA, as shown in Fig. 1(c). The corresponding models include
differently delayed signals. As shown in Fig. 3, the input graph
the DNN model in [14]. As the number of the hidden layers
X n is transformed into a volume of the feature’s map by the
increases, the fitting and generalization capabilities of the NN
predesigned filter layer. This is accomplished by convolving
model also increase [14], so it is fair to assume that the
the input data with the multiple local convolution kernels and
accuracy of modeling will increase with the number of the
adding bias’ parameters to generate the corresponding local
hidden layers. Different from the shallow NN, DNN can build
features, as shown in Fig. 4. The convolution operation is
more complex models with relatively low complexity. From
expressed as follows:
the experiment conducted in [14], DNN can achieve the same
accuracy with relatively low complexity. However, when a PA h l = X n ⊗ ωlc
exhibits complicated nonlinear characteristics and deep mem- ⎡ ⎤
Iin (n) Iin (n − 1) ··· Iin (n − M)
ory effects, it is difficult to achieve low complexity modeling ⎢ Q in (n) Q in (n − 1) ··· Q in (n − M) ⎥
⎢ ⎥
performance with DNN. In addition, DNN’s implementation =⎢⎢ |x(n)|2 |x(n − 1)| ··· |x(n − M)| ⎥ ⎥ ⊗ ωl (2)
c
demands excessive signal processing resources as the signal ⎣ |x(n)| |x(n − 1)|2 ··· |x(n − M)| 2⎦
bandwidth gets wider. To further reduce the complexity of |x(n)|3 |x(n − 1)|3 ··· |x(n − M)|3
DNN, the weight sharing structure of the CNN network has a
remarkable effect in reducing the complexity of the model. where h l , (l = 1, 2, . . . , L) represents the convolution output
of the lth convolution kernel with the input volume data X n
III. R EAL -VALUED T IME -D ELAY C ONVOLUTIONAL arranged in a 2-D matrix, as illustrated in Fig. 3; L represents
N EURAL N ETWORK the number of convolution kernels; ωlc represents the coeffi-
The proposed RVTDCNN model is shown in Fig. 2. The cients of the lth convolution kernel; and ⊗ shows the operation
RVTDCNN model includes four layers, namely one input of convolution.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 2. Block diagram of the proposed RVTDCNN model.

lth convolution kernel. Through the predesigned filter, the rich


basis function features required for PA modeling are extracted,
which is proven in the Appendix.
Then, the basis function features, extracted by the pre-
designed filter, are arranged into a feature’s vector to be
injected into the FC layer. The feature’s vector is written as
T
m = m 1 , m 2 , . . . , m L×B×C
= [u 111 , u 112 , . . . , u 211 , . . . , u LBC ]T (4)
where m is a vector of L × B × C; the dimension of feature’s
maps is B × C.
The output of the FC layer is obtained as follows:
L×B×C

f f
Fig. 3. 2-D convolution diagram. at = f f
ωti m i + bt (5)
i=1
where at , (t = 1, 2, . . . , T ) denotes the tth neuron output; T
f
represents the number of neurons in the FC layer; ωti and
f
bt represent weights and bias, respectively; and f (.) is the
f

activation function in the FC layer.


Finally, the output layer weights and sums the output
characteristics of the FC layer to acquire the network output.
To ensure continuous values for the output data, we adjust the
activation function, f o , of the output layer by setting it as a
linear function y = x
⎧ T


⎪ 
⎨ Iout (n) =
⎪ ω1o t at + b1o
t=1
(6)


T
Fig. 4. Convolutional kernel extraction range example. ⎪ Q (n) =
⎪ 
ω2 t at + b2
o o
⎩ out
t=1

The output of the convolution process is then passed through where Iout (n)
and Q out (n)
represent the neuron output in the
a nonlinear activation function to obtain the nonlinear fitting output layer, which corresponds to the prediction of the I /Q
ability. The outputs of the activation function of the convolu- components of the output sample. {ω1to
, ω2t
o
, b1o , b2o } represents
tion kernel are written as the weights and biases of the output layer.
The label data contain I /Q components of the PA output
u l = f c h l + blc (3) samples. The output data vector is represented as
Yn = [Iout (n), Q out (n)]T (7)
where u l , (l = 1, 2, . . . , L) is the output feature maps of
the lth convolution kernel; f c (.) is the activation function where Iout (n) and Q out (n) represent the I /Q components of
of the convolution kernels. blc represents the bias of the the PA output signal y(n), respectively.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HU et al.: CNN FOR BEHAVIORAL MODELING AND PREDISTORTION 5

IV. A NALYSIS OF RVTDCNN B EHAVIOR M ODEL TABLE I


C OST F UNCTION AND NMSE AT D IFFERENT L EARNING R ATES
A. Training, Validation, and Testing of RVTDCNN
The input and output signals of PA are sampled and saved
in the random access memory (RAM), and then, the RVTD-
CNN model is trained offline. We first train all parameters
f f
θk = {ωkc , bkc , ωk , bk , ωko , bko } of the RVTDCNN model with
the Adam optimization algorithm [45]. The goal of network
training is to minimize the error between the label (measured
output) data and the RVTDCNN model output, determined in optimal, and the corresponding MSE is 1.24×10−7. Currently,
the forward path by updating the parameters of iteration k until the learning rate is also the choice of the fastest training speed
the convergence of the network. In the forward path, we define at the best performance. Therefore, the learning rate is set to
the mean square error (MSE) as a cost function, which can be 1 × 10−3 , and the threshold for the cost function is set to
expressed as 1.2 × 10−7 .
E mse (θ ) The trained convolutional layer is used as a predesigned
N   filter to extract the features of the input data. Due to the
1  extraction effect of the predesigned filter on the basis function,
+ Q out (n) − Q out (n)
2 2
= Iout (n) − Iout (n) (8)
2N n=1
only a simple, FC layer can be used to track the behavioral
characteristics of the PA. Therefore, when the parameters

where Iout (n) and Q out (n) represent the output of the RVT- of the predesigned filter are trained, only the parameters
DCNN model, respectively; Iout (n) and Q out (n) represent the f f f
θk = {ωk , bk , ωko , bko } of the FC layer and the output layer
I /Q components of the PA output samples, respectively; and need to be adjusted with the Levenberg–Marquardt (LM)
N is the length of the training data. algorithm [46]. The training process of the RVTDCNN model
In this article, 7000 sets of modeling data are used for the is shown in Algorithm 1.
modeling of RVTDCNN. Each set of modeling data contains To get the desired modeling performance, we needed to
input data and label data (measured output). The input data decide the specific parameters of RVTDCNN. The 100-MHz
are a 2-D graph with a dimension of 5 × (M + 1), where M orthogonal frequency-division multiplexing (OFDM) input
is the memory depth, as shown in (1). The label data are a signal was taken as an example for the description. The peak
vector with a dimension of 2×1 and are composed of the I /Q to average power ratio (PAPR) of the OFDM signal is 10.4 dB.
components of the PA output, as shown in (7). To achieve the The test PA is a Doherty PA. The small-signal gain of the PA is
modeling of RVTDCNN, we divided the modeling data into 28 dB, and the saturation power is 43 dBm. The choice of input
the training set, test set, and verification set according to the data affects modeling performance and model complexity.
ratio of 3:1:1. Therefore, the training set contains 4200 sets An inappropriate input dimension of input data will increase
of modeling data, the verification set contains 1400 sets of the model coefficients. According to the article [19], the com-
modeling data, and the test set contains 1400 sets of modeling bination of the components I, Q, |x(n)|, |x(n)|2 , |x(n)|3 is the
data. The training set is used to train the model, and the best choice of the input signal to the NN, yielding low model
unseen test set is used to test the final model to verify the complexity and good performance. Based on the determined
generalization ability of the model. The modeling performance input data, the appropriate size of the convolution kernel
is described by NMSE becomes a factor affecting the modeling performance. The
NMSE modeling performance and ACPR performance in DPD, under
N   2

2 different sizes and numbers of the convolution kernels, were
1
N n=1 Iout (n)− Iout (n) + Q out (n)− Q out (n) verified, and the results are shown in Table II. To decouple
= 10×lg 1 N
.
the effects of the FC layer and the predesigned filter settings,
n=1 (Iout (n)) + (Q out (n))
2 2
N
(9) the number of neurons in the FC layer is set to 20, which can
provide sufficient network capacity at different convolution
In the Adam optimization algorithm, the initialization para- kernel sizes. At this time, the modeling performance of the
meters β1 and β2 are used to control the exponential decay rate RVTDCNN model is only affected by the number and size of
of moving averages of the gradient and the squared gradient, convolution kernels. It was found that if the convolution kernel
which are required to be close to 1. Through experimental number is equal to or less than 2, the model’s NMSE perfor-
verification, this article sets the parameters to default values mance increases with the increase in the convolution kernel
β1 = 0.9, β2 = 0.999. Initialized the first moment vector μ0 number, regardless of the size of the convolution kernel. If the
and the second moment vector v 0 are often set to μ0 = 0 convolution kernel number exceeds 3, the NMSE performance
and v 0 = 0. The constant ε is used to prevent the second does not increase significantly with the increase in the convolu-
moment vector from being 0. This article sets ε to the default tion kernel number. This can be explained by the fact that few
value of 10−8 . We analyzed the cost function values and convolution kernels cannot fully extract the features that reside
corresponding NMSE performance at different learning rates, in the input data. Meanwhile, when the convolution kernel
as shown in Table I. It can be found from Table I that, when number is kept constant, the size of the convolution kernel
the learning rate is 1×10−3 , the NMSE performance is almost significantly affects the model coefficient number. When the

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE II
M ODELING P ERFORMANCE U NDER D IFFERENT C ONVOLUTION K ERNEL S CALES

Algorithm 1 Training of the RVTDCNN Model


Definition:
1. Determine the structure of the RVTDCNN model;
2. Get 4200 sets of training data including input data and
label data;
3. Define the cost function E mse (θ ) of the model;
4. Define the convergence threshold E 0 = 1.2 × 10−7 of
the cost function.
Extraction of the Pre-designed Filter:
1. Initialization:
1) Set the learning rate α = 10−3 , and the exponential
decay rates β1 = 0.9, β2 = 0.999;
2) Initialize 1st moment vector μ0 = 0 and 2nd moment
vector v 0 = 0;
3) Set the constant ε = 10−8 ; Fig. 5. NMSE performance under different neuron numbers in the FC layer.
2. Training the RVTDCNN model:
Loop: k = 1, 2, . . . , 200 000
1) Calculate the network output from Eq. (6) and the cost Therefore, considering the modeling performance and model
function from Eq. (8); complexity, the convolution layer contains three convolution
2) Judgment: if performance requirements are met, exit kernels of 3*3*1. The results in Table II correspond to the
the loop; PA used in this article. For different PAs, the optimal size and
3) Calculate partial derivative of the objective function to number of convolution kernels can be obtained through the
coefficients gk = ∂ Emse (θk−1 )
∂θk−1 ; scheme that is proposed in this article.
4) Update biased first and second moment estimate μk = The neuron number in the FC layer is also an important
β1 · μk−1 + (1 − β1 ) · gk , v k = β2 · v k−1 + (1 − β2 ) · gk2 ; factor affecting modeling performance and model complex-
5) Get bias-corrected first and second moment estimate ity. To obtain the minimum number of neurons in the FC
μ̂k = μk /(1 − β1k ), v̂ k = v k /(1 − β2k ); √ layer that can achieve the required performance, based on
6) Update coefficients θk = θk−1 − α · μ̂k /( v̂ k + ε). the determined input data and predesigned filter structure,
3. Save convolutional layer coefficients θc and define them the modeling performance under the different neuron number
as pre-designed filter coefficients. in the FC layer was verified, and the results are shown
PA Modeling: in Fig. 5. It can be found that when the number of neurons
Training the RVTDCNN model: is less than 6, the NMSE performance of the model will drop
Loop: l = 1, 2, . . . , 200 dramatically, meaning that a few neurons cannot provide the
1) Calculate the pre-designed filter output from Eq. (4) required network modeling capacity. In addition, when the
using the coefficients θc ; number of neurons is greater than 6, the NMSE performance
2) Calculate model output and cost function; of the model will not be significantly improved. Considering
3) Judgment: if performance requirements are met, exit the model complexity and modeling performance, the neuron
the loop; number in the FC layer was determined to be 6.
4) Update network coefficients using LM algorithm. The activation functions commonly used in CNN are
End the sigmoid function, the rectified linear unit (ReLU) func-
tion, the exponential linear unit (elu), the leaky ReLU, and
size of the convolution kernel is 3*3*1, the number of model the hyperbolic tangent sigmoid (Tanh) function, which are
coefficients is relatively small, and the NMSE performance is defined, respectively, in (10). To get the best modeling per-
also quite good. ACPR performance shows the same trend. formance, the abovementioned functions were used to train

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HU et al.: CNN FOR BEHAVIORAL MODELING AND PREDISTORTION 7

TABLE III The number of coefficients of the output layer can be


NMSE P ERFORMANCE OF D IFFERENT A CTIVATION F UNCTIONS obtained as follows:
Pout = Wout + Bout
= T × Tout + Tout (13)
where Tout represents the neuron number of the output layer.
the RVTDCNN model, and the results are shown in Table III, In summary, the number of coefficients of RVTDCNN can
which can be summarized as the Tanh function performs be calculated as follows:
better than others commonly used activation functions given PRVTDCNN = Pconv + Pfc + Pout . (14)
as follows:
1 For a generalized memory polynomial (GMP) model [47],
Sigmoid(x) = the complex coefficient number of the model represents the
1 + e−x
ReLU(x) = max(0, x) number of basis function terms considered. The number of
 real coefficients of the GMP model can be expressed as
x, if x ≥0
Elu(x) = PGMP = 2K a L a + 2K b L b Mb + 2K c L c Mc (15)
α e − 1 , if x < 0
x

Leaky− ReLU(x) = max(λx, x) where K a and L a are the index for aligned signal and envelope;
exp(2x) − 1 K b , L b , and Mb are the index for signal and lagging envelope;
tanh(x) = . (10) and K c , L c , and Mc are the index for signal and leading
exp(2x) + 1
envelope.
The ARVTDNN in [19], the RVTDNN in [15], and the DNN
B. Complexity Analysis of RVTDCNN
model in [14] are all FC networks. The coefficient number of
The complexity analysis aims to evaluate the capability the FC networks can be obtained as follows:
of different models to assess if the training procedure of I1
RVTDCNN is simpler than the training of other typical PMNNs = Ni1 −1 + 1 × Ni1 (16)
models. In terms of the model complexity, it refers to both i 1 =2
the number of coefficients and the argument floating-point
operations (FLOPs). The comparison of complexity, including where Ni1 means the neurons of the i 1 th layer, and I1 means
the total number of coefficients of the network structure and the number of layers (I1 ≥ 3, including the input and output
layers).
the FLOPs, is shown in Table IV. Based on the theory and
experimental data, the proposed RVTDCNN in this article For the long short-term memory (LSTM) model in [41],
has superior performance than the traditional models for its the number of model coefficients of the LSTM layer is
convolution calculation. The following is a specific calculation PLSTM = 4I (Nin + I + 1) (17)
process for the complexity of RVTDCNN.
Based on RVTDCNN, it can be stated that the convolution where Nin means the input number of the LSTM layer at each
structure decreases the model size, which makes the extraction moment; I is the number of neurons in the LSTM layer.
of the model’s features more efficient. Compared to the Except for the total number of the coefficients, the argument
standard feedforward network, the coefficient number of the FLOPs are also introduced to assess the network complexity.
convolutional structure to generate the same feature is much For the convolutional process, considering the complexity of
less due to the weight sharing, which reduces the coefficient the activation function, the formula for calculating the number
complexity of the RVTDCNN model. The total number of of FLOPs can be derived as follows:
coefficients is equal to the sum of the weight number and the FLOPsconv = 2r sz × BCL + 13BCL. (18)
bias number between layers. The number of coefficients of the
predesigned filter layer can be calculated as follows: For the FC layer in the RVTDCNN model, the FLOPs can
be calculated as follows:
Pconv = Wconv + Bconv
= r ×s ×z ×L +L (11) FLOPsfc = 2(B × C × L × T ) + 13T. (19)

where the kernel size is r × s × z, and the number of kernels For the output layer in the RVTDCNN model, the FLOPs
is L. can be calculated as follows:
The coefficient number of the FC layer can be calculated FLOPsout = 2T Tout . (20)
as follows:
The FLOPs of the FC networks can be obtained as follows:
Pfc = Wfc + Bfc I1
= B×C ×L ×T +T (12) FLOPsMNNs = 2Ni1 −1 Ni1 + K i1 Ni1 (21)
i 1 =2
where B × C × L denotes the size of the output tensor of the
predesigned filter, and T is the number of neurons of the FC where K i1 is the FLOPs that calculate the activation function
layer. of the i 1 th layer.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE IV
C OMPLEXITY C ALCULATIONS FOR D IFFERENT M ODELS

Fig. 6. Diagram of the proposed DPD architecture.

For the LSTM model in [41], considering the complexity


of the activation function, the FLOPs of the LSTM layer are Fig. 7. Experimental setup.

FLOPLSTM = I (8Nin + 8I + 71). (22)


|y(n)|2 , |y(n − 1)|2 , . . . , |y(n − M)|2 ;
According to the above formula, the calculation formulas 
of the complexity of the RVTDCNN model and other models |y(n)|3 , |y(n − 1)|3 , . . . , |y(n − M)|3 (23)
are listed in Table IV.
where Iout (n) and Q out (n) represent the I /Q components of
the complex envelope y(n) of the PA output, respectively.
V. E XTENSION TO DPD
The label data of the DPD model are the I /Q components
DPD is one of the most effective ways to alleviate the non- of the PA input signal, as shown in (7). The label data vector
linearity and memory effects of the PA [2]. The DPD model is of the DPD model is represented as
the inverse function of the PA’s nonlinear characteristics. This
article uses the indirect learning structure [19] to realize DPD X nDPD = [Iin (n), Q in (n)]T (24)
design, due to its simple implementation, as shown in Fig. 6.
where Iin (n) and Q in (n) represent the I /Q components of the
First, the DPD model is designed as a RVTDCNN model and
complex envelope x(n) of the PA input signal, respectively.
is trained to use PA output and input data. Then, the trained
When implementing DPD, the parameters and training
DPD model is used to update the DPD on the main path
methodology of DPD based on the RVTDCNN model are the
for the linearization of the PA. The input data of the DPD
same as those for PA modeling.
model are a 2-D matrix, as shown in (1), including the I /Q
components and envelope-dependent terms of the current and
past signals of the PA output. The input matrix of the DPD VI. E XPERIMENTAL S ETUP
model is expressed as follows: The experimental setup in Fig. 7 was used to evaluate

YnDPD = Iout (n), Iout (n − 1), . . . , Iout (n − M); the performance of the proposed model. The test signal is
a 100-MHz OFDM signal with a PAPR of 10.4 dB, which
Q out (n), Q out (n − 1), . . . , Q out (n − M); is generated by MATLAB on a personal computer (PC). The
|y(n)|, |y(n − 1)|, . . . , |y(n − M)|; OFDM signal is compounded of multiple OFDM symbols,

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HU et al.: CNN FOR BEHAVIORAL MODELING AND PREDISTORTION 9

generated by 16-QAM symbols, modulated onto 64 subcar- TABLE V


riers, and then filtered by a raised-cosine with the roll-off N ETWORK S TRUCTURE OF D IFFERENT M ETHODS (M = 3)
factor of 0.1. The test signal was first downloaded into the
arbitrary waveform generator (AWG) 81180A. Then, the AWG
transmits the generated baseband signal to the performance
signal generator (PSG) E8267D through a cable, which imple-
ments digital-to-analog (DAC) conversion and frequency-up
conversion. The modulation frequency in PSG is 2.14 GHz.
Then, the RF signal generated by PSG is fed into PA.
The PA output signal is fed into the coupler, whose output
is connected to a high-power load. In the feedback loop,
the output of the coupler is captured through the oscillo-
scope (MSO) 9404A. Then, the Keysight 89600 Vector Signal
Analyzer (VSA) software, running on the MSO, analyzes the
captured RF input signal, including frequency-down conver-
sion and analog-to-digital (ADC) conversion. The sampling
rate is set to 625 MHz. Then, the output baseband signal is
captured by the VSA and downloaded to the PC. The acquired
input and output signals are processed by the Python software
within a PC, to construct the behavior model of the PA. The
3.5.2 version of Python used was installed in the Windows
environment, using the 2017.3.4 version of PyCharm as its
integrated development environment.

VII. M EASUREMENT R ESULTS


A. Modeling Performance
RVTDCNN, as shown in Fig. 2, is used to illustrate the
performance of this behavioral model. The proposed modeling
method and other modeling methods are evaluated herein using
the NMSE performance in (9) and the ACPR performance in
DPD. The optimal network structure of different methods for
100 MHz is shown in Table V.
Fig. 8 shows the convergence curve of the RVTDCNN
model training process. As shown in Algorithm 1, the thresh-
old value of the cost function is set to 1.2 × 10−7 . When the
cost function value of the network is less than the threshold,
the network converges. In the case of the results presented
in Fig. 8, it takes 83 iterations for the model to converge. After
convergence, the model coefficients are used with a different
test data set independent of the training data set. The NMSE
values of the model using the training set and the test set are
quasi-equal to approximately −36.4 dB. Therefore, one can
conclude that RVTDCNN shows a good generalization for PA
modeling.
Table VI compares the performance of the traditional train-
ing method and the proposed transfer learning method. In the
traditional training method, both the weights and biases of
the convolution layer and the FC layer are unknowns at the
starting point of the optimization algorithm, and it required
200 000 iterations and 563 s of training time to train the
convolution layer and the FC layer. In contrast, the proposed Fig. 8. Convergence curve of the training process.
transfer learning method uses predesigned filters of the convo-
lutional layer based on prior knowledge of the same amplifier,
modeled with different signal excitations, and hence, only the each sample are reduced to 36% compared with the traditional
FC layer’s weights and biases are unknowns at the starting method. Therefore, the proposed training method only needs
point of the optimization algorithm. The network is trained as a 83 iterations to make the model converge, which confirms the
simple feedforward network. For each iteration, the FLOPs of superiority of the proposed transfer learning method.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE VI
C OMPARISON OF THE P ERFORMANCE B ETWEEN THE T RADITIONAL
T RAINING M ETHOD AND THE T RANSFER L EARNING M ETHOD

Fig. 10. Linearization performance of the PA using RVTDCNN at a


100-MHz LTE signal.

TABLE VII
M ODELING P ERFORMANCE AND C OMPLEXITY OF VARIOUS M ETHODS

Fig. 9. Spectral comparison of modeling errors between RVTDCNN and


other ANN models at a 100-MHz LTE signal.
dimensions of other models have been set to yield the best
performance, as shown in Table V. The best NMSE value that
Fig. 9 compares the spectrum of modeling errors between
the traditional GMP model can achieve is −33.19 dB, and the
the RVTDCNN model and other typical models at a 100-MHz
corresponding number of real model coefficients is 214. The
long-term evolution (LTE) signal. It can be seen from the
results show that RVTDCNN can improve NMSE performance
figure that the error spectrum of the RVTDCNN model is
by about 3 dB compared to the traditional GMP model, with
lower than −40 dB, for both out-of-band and in-band, which
about one-third less in the number of the model’s coefficients
shows the effectiveness of the proposed method. Meanwhile,
when implemented in digital processors. Compared with the
the error spectrum of the RVTDCNN model is lower than that
RVTDNN model, the RVTDCNN model can improve NMSE
of the GMP and RVTDNN model for both out-of-band and in-
by about 1.3 dB, with fewer model coefficients and FLOPs.
band. Compared with the DNN model, the ARVTDNN, and
Meanwhile, the RVTDCNN model can lead to almost the
the LSTM model, the modeling performance of the proposed
same NMSE performance as the ARVTDNN model, the DNN
RVTDCNN model is also not reduced, which verifies the
model, and the LSTM model, but with less than half of the
superiority of the proposed method in modeling performance.
required number of coefficients, as shown in Table VII.
Fig. 10 shows the output spectrum after the linearization of
Fig. 11 shows the complexity comparison between the
the PA, using the proposed DPD model at a 100-MHz LTE
RVTDCNN model and other models when obtaining sim-
signal. The same dimension was used to derive the inverse
ilar modeling performance. At this time, the ARVTDNN
model, and it was found that the RVTDCNN inverse model
model, the DNN model, the LSTM model, and the proposed
(DPD model) has a significant effect in reducing the PA
RVTDCNN model have similar modeling performance, and
distortion when cascaded with the nonlinear PA. Using the
the NMSE performance is about −36.30 dB. The GMP model
proposed DPD to linearize the PA, the ACPR performance is
and the RVTDNN model have the best modeling performance,
reduced from −31 to −46 dBc.
and the NMSEs are −33.19 and −35.09 dB, respectively.
It can be found that the proposed method has a significant
B. Comparison of Modeling Performance With Various decrease in the number of model coefficients and the num-
Methods ber of FLOPs due to the weight sharing feature, while the
To prove the superiority of the RVTDCNN model, Table VII modeling performance does not decrease.
compares the NMSE performance and the ACPR performance To further verify the performance of the RVTDCNN model
of DPD with the proposed modeling method and other meth- under the different transmitter’s hardware impairment condi-
ods, where the experimental results are based on the optimal tions, we evaluated three cases of the transmitter, as shown
dimension of the model structure. The structure and model in Table VIII. Case 1 indicates that only the nonlinear

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HU et al.: CNN FOR BEHAVIORAL MODELING AND PREDISTORTION 11

Fig. 11. Comparison of the complexity of the proposed RVTDCNN model


and other models under similar modeling performance.

TABLE VIII
D IFFERENT C ASES OF S IGNAL D ISTORTION AT T RANSMITTER

TABLE IX Fig. 12. Gain characteristics and phase characteristics of the PA output under
NMSE AND ACPR P ERFORMANCES OF THE RVTDCNN different bandwidths. (a) Gain characteristics. (b) Phase characteristics.
M ODEL U NDER D IFFERENT C ASES

distortion of the PA is considered in the transmission chain.


Case 2 indicates the existence of both the PA nonlinear distor-
tion and I /Q imbalance. In Case 3, PA nonlinear distortion,
I /Q imbalance, and dc offset are included in the transmission
chain. The specific distortion level is shown in Table VIII.
Table IX shows the NMSE and ACPR of the RVTDCNN
model under different cases of the transmitter. It was found
that RVTDCNN shows good modeling performance in Case 1
because the system at that time only contains the nonlinear Fig. 13. Linearization performance of the PA using RVTDCNN at the
distortion of the PA. Meanwhile, RVTDCNN also shows 200-MHz LTE signal.
superior performance in NMSE and ACPR performances for
Case 2 and Case 3. The reason is that RVTDCNN can also
eliminate the imperfections of the transmitter, in addition to of memory required for modeling increases accordingly. The
the PA’s nonlinearity, such as dc offset and I /Q imbalance. same network structure is used to model PA with different
Fig. 12 compares the gain and phase characteristics of signal bandwidths, resulting in good modeling performance.
the PA output between the 100-MHz input signal and the If the signal bandwidth is further increased, we can achieve
200-MHz input signal. It can be found that the PA output of the good modeling performance by increasing the number of con-
200-MHz input signal exhibits a much stronger nonlinearity volution kernels and neurons in the FC layer. It can be found
and memory effect than that of the 100-MHz input signal. that, for the traditional ANN models, such as ARVTDNN,
Table X shows the model complexity and modeling perfor- the strong memory effect leads to a rapid increase in the model
mance of RVTDCNN under different signal bandwidths. It can complexity. The memory depth increased from 2 to 5, and the
be deduced that as the signal bandwidth increases, the length number of the model coefficients of ARVTDNN increases to

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE X
M ODELING P ERFORMANCE AND C OMPLEXITY OF THE RVTDCNN M ODEL U NDER D IFFERENT I NPUT S IGNAL BANDWIDTHS

563. For the proposed model, the memory depth increases To simplify the derivation process, the coefficients of the
from 2 to 5, and the number of the model’s coefficients model, including ak ’s and ckq ’s, have been omitted in (25).
is 266, which can be considered as reasonable, and is half Thus, (25) can be expanded to be as follows:
the number of model coefficients of the ANN model. The
y(n) = (Iin (n) + j Q in (n)) + (Iin (n) + j Q in (n))|x(n)|
coefficient number of the LSTM model does not increase with
the signal bandwidth, but it is still about twice the coefficient + (Iin (n) + j Q in (n))|x(n)|2 + · · ·
number of the RVTDCNN model. + (Iin (n) + j Q in (n))|x(n − 1)|
Fig. 13 shows the output spectrum after the linearization + (Iin (n) + j Q in (n))|x(n − 1)|2 + · · ·
of the PA using the RVTDCNN model at the 200-MHz LTE + (Iin (n) + j Q in (n))|x(n − 2)|
signal. The results show that, under a wide signal bandwidth, + (Iin (n) + j Q in (n))|x(n − 2)|2 + · · ·
the RVTDCNN model still has a significant linearization effect = Iin (n) + Iin (n)|x(n)|+ Iin (n)|x(n)|2 + Iin (n)|x(n −1)|
on the PA. + Iin (n)|x(n − 1)|2 + Iin (n)|x(n − 2)|
+ Iin (n)|x(n − 2)|2
VIII. C ONCLUSION
+ j Q in (n) + Q in (n)|x(n)| + Q in (n)|x(n)|2
In this article, the RVTDCNN is proposed for modeling the + Q in (n)|x(n − 1)| + Q in (n)|x(n − 1)|2
nonlinear and memory effects of wideband PA. RVTDCNN + Q in (n)|x(n − 2)| + Q in (n)|x(n − 2)|2
extracts the effective features of the 2-D input graph data with
+ other terms. (26)
a convolutional structure. Doherty PA, with the LTE signals
from 40 to 200 MHz, is tested to verify the effectiveness of If these items in (26) can be fitted in the predesigned filter,
the RVTDCNN model. For the PA with 100-MHz input signal the predesigned filter can generate rich basis functions, and the
under different cases, the NMSE can reach about −36 dB, modeling performance of the proposed model can be achieved.
with an ACPR around −46 dBc with DPD. The results show In the predesigned filter, the 3*3 convolution kernels are used
that the RVTDCNN still has a good modeling effect when to extract features from the input signal. In the convolution
there are I /Q imbalance and dc offset, which verifies that process, the stride is set to be 1 until the input tensors are all
the proposed model has strong adaptability. Compared with convolved. The convolution process is shown in Fig. 4. The
the existing shallow NN and DNN, in terms of the number convolution output under different convolution steps represents
of model coefficients and FLOPs, the proposed RVTDCNN is the capture of different local features of the input. We took
verified to reduce the number of model coefficients by more out the local input convoluted in the first step
than 50% under different signal bandwidths.
h 11 = Iin (n) + Q in (n) + |x(n)| + Iin (n − 1)
A PPENDIX + Q in (n − 1) + |x(n − 1)| + Iin (n − 2)
E XTRACTION E FFECT OF P REDESIGNED + Q in (n − 2) + |x(n − 2)| + b (27)
F ILTER ON BASIS F UNCTION
where b is the bias. For the convenience of calculation,
Because the predesigned filter can completely capture the
the corresponding coefficient is ignored in the formula.
basis function features required for modeling, the number of
This article deduced the item in (26) through the above
neurons in the FC layer required is significantly reduced,
local input, and the other input term with memory effect can
thereby reducing the complexity of the model. To verify
be deduced through the same method. The convolution output
that the predesigned filter can generate rich basis functions,
described above is input into a nonlinear activation function
we introduce a baseband PA model [2], which is expressed as
to obtain the output of the predesigned filter. The output u 11
follows:
can be expressed as follows:
K −1 K −1 Q−1
y(n) = ak x(n)|x(n)| +k
ckq x(n)|x(n − q)|k (25) u 11 = tanh(h 11 )
k=0 k=1 q=1 = tanh(Iin (n) + Q in (n) + |x(n)| + Iin (n − 1)
where K and Q are the nonlinearity order and the lagging + Q in (n − 1) + |x(n − 1)| + Iin (n − 2)
cross terms index, respectively; ak and ckq are the coefficients
+ Q in (n − 2) + |x(n − 2)| + b) (28)
of the model. For simplicity, the memory terms in (25) have
been omitted. where tanh(·) is the activation function given.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HU et al.: CNN FOR BEHAVIORAL MODELING AND PREDISTORTION 13

The hyperbolic tangent sigmoid function can be expanded [2] Y. Liu, W. Pan, S. Shao, and Y. Tang, “A general digital predistor-
by the Taylor series, and the approximate output is described tion architecture using constrained feedback bandwidth for wideband
power amplifiers,” IEEE Trans. Microw. Theory Techn., vol. 63, no. 5,
as follows: pp. 1544–1555, May 2015.
1 2  π [3] A. Katz, J. Wood, and D. Chokola, “The evolution of PA linearization:
tanh(x) = x − x 3 + x 5 + − · · · |x| < . (29) From classic feedforward and feedback through analog and digital
3 15 2 predistortion,” IEEE Microw. Mag., vol. 17, no. 2, pp. 32–40, Feb. 2016.
Bring (27) into x 3 , while considering the critical correlation [4] Z. Popovic, “Amping up the PA for 5G: Efficient GaN power ampli-
items related to the justification process, ignored the irrelevant, fiers with dynamic supplies,” IEEE Microw. Mag., vol. 18, no. 3,
pp. 137–149, May 2017.
redundant terms. The result will be shown in the following: [5] X. Hu, T. Liu, Z. Liu, W. Wang, and F. M. Ghannouchi, “A novel single
feedback architecture with time-interleaved sampling for multi-band
h 311 = (Iin (n) + Q in (n) + |x(n)| + Iin (n − 1) + Q in (n − 1) DPD,” IEEE Commun. Lett., vol. 23, no. 6, pp. 1033–1036, Jun. 2019.
+|x(n −1)|)+ Iin (n −2)+ Q in (n −2)+|x(n −2)|+b)3 [6] J. Reina-Tosina, M. Allegue-Martinez, C. Crespo-Cadenas, C. Yu, and
S. Cruces, “Behavioral modeling and predistortion of power amplifiers
= 6b Iin (n)|x(n)| + 3Iin (n)|x(n)|2 + 6b Iin (n)|x(n − 1)| under sparsity hypothesis,” IEEE Trans. Microw. Theory Techn., vol. 63,
+ 3Iin (n)|x(n − 1)|2 + 6b Iin (n)|x(n − 2)| no. 2, pp. 745–753, Feb. 2015.
[7] Q. Zhang, W. Chen, and Z. Feng, “Reduced cost digital predistortion
+ 3Iin (n)|x(n − 2)|2 +6b Q in (n)|x(n)|+3Q in (n)|x(n)|2 only with in-phase feedback signal,” IEEE Microw. Wireless Compon.
Lett., vol. 28, no. 3, pp. 257–259, Mar. 2018.
+ 6b Q in (n)|x(n − 1)| + 3Q in (n)|x(n − 1)|2
[8] J. Joung, C. K. Ho, K. Adachi, and S. Sun, “A survey on power-
+ 6b Q in (n)|x(n − 2)| + 3Q in (n)|x(n − 2)|2 amplifier-centric techniques for spectrum- and energy-efficient wire-
less communications,” IEEE Commun. Surveys Tuts., vol. 17, no. 1,
+ other terms. (30) pp. 315–333, 1st Quart., 2015.
[9] A. Cheaito, M. Crussière, J.-F. Hélard, and Y. Louët, “Quantifying
In summary, we combine (28)–(30) while omitting both the the memory effects of power amplifiers: EVM closed-form derivations
irrelevant, redundant terms and multiplication factors. The out- of multicarrier signals,” IEEE Wireless Commun. Lett., vol. 6, no. 1,
put of the predesigned filter can be expressed approximately pp. 34–37, Feb. 2017.
[10] S. Wang, M. Roger, and C. Lelandais-Perrault, “Impacts of crest factor
using by the following: reduction and digital predistortion on linearity and power efficiency of
power amplifiers,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 66,
u 11 = tanh(h 11 ) = h 11 + h 311 + other terms no. 3, pp. 407–411, Mar. 2019.
= (Iin (n) + Q in (n) + |x(n)| + Iin (n − 1) + Q in (n − 1) [11] J. Cai, C. Yu, L. Sun, S. Chen, and J. B. King, “Dynamic behav-
ioral modeling of RF power amplifier based on time-delay support
+ |x(n −1)|)+ Iin (n −2)+ Q in (n − 2) + |x(n − 2)|) vector regression,” IEEE Trans. Microw. Theory Techn., vol. 67, no. 2,
+ b + Iin (n)|x(n)| + Iin (n)|x(n)|2 + Iin (n)|x(n − 1)| pp. 533–543, Feb. 2019.
[12] R. N. Braithwaite, “Digital predistortion of an RF power amplifier using
+ Iin (n)|x(n − 1)|2 + Iin (n)|x(n − 2)| a reduced volterra series model with a memory polynomial estimator,”
IEEE Trans. Microw. Theory Techn., vol. 65, no. 10, pp. 3613–3623,
+ Iin (n)|x(n − 2)|2 + Q in (n)|x(n)| Oct. 2017.
+ Q in (n)|x(n)|2 + Q in (n)|x(n − 1)| [13] A. Rahati Belabad, S. A. Motamedi, and S. Sharifian, “A novel gen-
eralized parallel two-box structure for behavior modeling and digital
+ Q in (n)|x(n − 1)|2 + Q in (n)|x(n − 2)| predistortion of RF power amplifiers at LTE applications,” Circuits, Syst.,
Signal Process., vol. 37, no. 7, pp. 2714–2735, Jul. 2018.
+ Q in (n)|x(n − 2)|2
[14] R. Hongyo, Y. Egashira, T. M. Hone, and K. Yamaguchi, “Deep neural
+ other terms. (31) network-based digital predistorter for Doherty power amplifiers,” IEEE
Microw. Wireless Compon. Lett., vol. 29, no. 2, pp. 146–148, Feb. 2019.
By arranging (31), (31) can be rewritten as follows: [15] T. Liu, S. Boumaiza, and F. M. Ghannouchi, “Dynamic behav-
ioral modeling of 3G power amplifiers using real-valued time-delay
u 11 neural networks,” IEEE Trans. Microw. Theory Techn., vol. 52, no. 3,
pp. 1025–1033, Mar. 2004.
= Iin (n) + Iin (n)|x(n)| + Iin (n)|x(n)|2
[16] M. Rawat, K. Rawat, and F. M. Ghannouchi, “Adaptive digital predistor-
+ Iin (n)|x(n − 1)| + Iin (n)|x(n − 1)|2 + Iin (n)|x(n − 2)| tion of wireless power amplifiers/transmitters using dynamic real-valued
focused time-delay line neural networks,” IEEE Trans. Microw. Theory
+ Iin (n)|x(n − 2)|2 Techn., vol. 58, no. 1, pp. 95–104, Jan. 2010.
+ Q in (n)+ Q in (n)|x(n)|+ Q in (n)|x(n)|2 + Q in (n)|x(n −1)| [17] M. Rawat and F. M. Ghannouchi, “A mutual distortion and impair-
ment compensator for wideband direct-conversion transmitters using
+ Q in (n)|x(n − 1)|2 + Q in (n)|x(n − 2)| neural networks,” IEEE Trans. Broadcast., vol. 58, no. 2, pp. 168–177,
Jun. 2012.
+ Q in (n)|x(n − 2)|2 + other terms. (32) [18] M. Hui, T. Liu, M. Zhang, Y. Ye, D. Shen, and X. Ying, “Aug-
mented radial basis function neural network predistorter for linearisa-
Comparing (32) and (26), we can find that the linear term, tion of wideband power amplifiers,” Electron. Lett., vol. 50, no. 12,
the nonlinear term, and the lagging cross-terms in (26) both pp. 877–879, Jun. 2014.
exist in (32). Therefore, the terms produced by the predesigned [19] D. Wang, M. Aziz, M. Helaoui, and F. M. Ghannouchi, “Augmented
filter correspond to the terms of the polynomials, and more real-valued time-delay neural network for compensation of distortions
and impairments in wireless transmitters,” IEEE Trans. Neural Netw.
terms can be provided. Namely, the predesigned filter can Learn. Syst., vol. 30, no. 1, pp. 242–254, Jan. 2019.
generate enough rich basis set to get an excellent performance. [20] X. You, C. Zhang, X. Tan, S. Jin, and H. Wu, “AI for 5G: Research
directions and paradigms,” Sci. China Inf. Sci., vol. 62, no. 2, p. 21301,
Feb. 2019.
R EFERENCES
[21] Z. M. Fadlullah, B. Mao, F. Tang, and N. Kato, “Value iteration
[1] V. Camarchia, M. Pirola, R. Quaglia, S. Jee, Y. Cho, and B. Kim, “The architecture based deep learning for intelligent routing exploiting het-
Doherty power amplifier: Review of recent solutions and trends,” IEEE erogeneous computing platforms,” IEEE Trans. Comput., vol. 68, no. 6,
Trans. Microw. Theory Techn., vol. 63, no. 2, pp. 559–571, Feb. 2015. pp. 939–950, Jun. 2019.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

[22] Y. Wang, M. Liu, J. Yang, and G. Gui, “Data-driven deep learning for [45] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
automatic modulation recognition in cognitive radios,” IEEE Trans. Veh. in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA,
Technol., vol. 68, no. 4, pp. 4074–4077, Apr. 2019. 2015, pp. 7–9.
[23] Z. M. Fadlullah et al., “State-of-the-art deep learning: Evolving machine [46] S. Haykin, Neural Networks: A Comprehensive Foundation.
intelligence toward tomorrow’s intelligent network traffic control sys- Upper Saddle River, NJ, USA: Prentice-Hall, 1999.
tems,” IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2432–2455, [47] D. R. Morgan, Z. Ma, J. Kim, M. G. Zierdt, and J. Pastalan, “A gener-
4th Quart., 2017. alized memory polynomial model for digital predistortion of RF power
[24] N. Kato et al., “The deep learning vision for heterogeneous network amplifiers,” IEEE Trans. Signal Process., vol. 54, no. 10, pp. 3852–3860,
traffic control: Proposal, challenges, and future perspective,” IEEE Oct. 2006.
Wireless Commun., vol. 24, no. 3, pp. 146–153, Jun. 2017.
[25] J. Schmidhuber, “Deep learning in neural networks: An overview,”
Neural Netw., vol. 61, pp. 85–117, Jan. 2015.
[26] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A
survey of deep neural network architectures and their applications,”
Neurocomputing, vol. 234, pp. 11–26, Apr. 2017.
[27] M. Robnik-Sikonja, “Data generators for learning systems based on
RBF networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 5,
pp. 926–938, May 2016. Xin Hu (Senior Member, IEEE) received the B.Sc.
[28] Z. Liu, X. Hu, T. Liu, X. Li, W. Wang, and F. M. Ghannouchi, degree in electrical information engineering from the
“Attention-based deep neural network behavioral model for wideband Huazhong University of Science and Technology,
wireless power amplifiers,” IEEE Microw. Wireless Compon. Lett., Wuhan, China, in 2007, and the Ph.D. degree in
vol. 30, no. 1, pp. 82–85, Jan. 2020. electrical information engineering from the Institute
[29] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn- of Electrics, Chinese Academy of Sciences, Beijing,
ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11, China, in 2012.
pp. 2278–2324, Nov. 1998. From 2012 to 2016, he was a Senior Engineer with
[30] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Aerospace Science and Technology Corporation.
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9. In 2016, he joined the Electronics Technology Lab-
[31] A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with oratory, Beijing University of Posts and Telecommu-
deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., nications, Beijing, where he is currently an Associate Professor. From 2019 to
Speech Signal Process., May 2013, pp. 6645–6649. 2020, he was a Visiting Scholar with the iRadio Laboratory, University of
[32] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- Calgary, Calgary, AB, Canada, where he has been an Adjunct Researcher,
works,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, since 2019. His current research interests include digital predistortion of
Nov. 1997. nonlinear power amplifiers and the application of signal processing techniques
[33] G. Gui, H. Huang, Y. Song, and H. Sari, “Deep learning for an effective to RF and microwave problems, dynamic wireless resource management.
nonorthogonal multiple access scheme,” IEEE Trans. Veh. Technol., Dr. Hu received the National Natural Science Foundation of China in 2018.
vol. 67, no. 9, pp. 8440–8450, Sep. 2018. He was involved in several projects funded by the National High Technology
[34] H. Huang, Y. Song, J. Yang, G. Gui, and F. Adachi, “Deep-learning- Research and Development Program of China and the National Natural
based millimeter-wave massive MIMO for hybrid precoding,” IEEE Science Foundation of China.
Trans. Veh. Technol., vol. 68, no. 3, pp. 3027–3032, Mar. 2019.
[35] F. Meng, P. Chen, L. Wu, and X. Wang, “Automatic modulation
classification: A deep learning enabled approach,” IEEE Trans. Veh.
Technol., vol. 67, no. 11, pp. 10760–10772, Nov. 2018.
[36] F. Tang, B. Mao, Z. M. Fadlullah, and N. Kato, “On a novel deep-
learning-based intelligent partially overlapping channel assignment in
SDN-IoT,” IEEE Commun. Mag., vol. 56, no. 9, pp. 80–86, Sep. 2018.
[37] Z. Md. Fadlullah, F. Tang, B. Mao, J. Liu, and N. Kato, “On intelligent
traffic control for large-scale heterogeneous networks: A value matrix- Zhijun Liu received the B.Sc. degree in electron-
based deep learning approach,” IEEE Commun. Lett., vol. 22, no. 12, ics science and technology from the Nanjing Uni-
pp. 2479–2482, Dec. 2018. versity of Posts and Telecommunications, Nanjing,
China, in 2017. He is currently pursuing the Ph.D.
[38] M. Liu, T. Song, J. Hu, J. Yang, and G. Gui, “Deep learning-inspired
degree in electronics science and technology with
message passing algorithm for efficient resource allocation in cognitive
the Beijing University of Posts and Telecommunica-
radio networks,” IEEE Trans. Veh. Technol., vol. 68, no. 1, pp. 641–653,
tions, Beijing, China.
Jan. 2019.
His research interests include digital predistortion
[39] P. Chen, S. Alsahali, A. Alt, J. Lees, and P. J. Tasker, “Behavioral
of nonlinear power amplifiers and the application
modeling of GaN power amplifiers using long short-term memory
of deep learning techniques to RF and microwave
networks,” in Proc. Int. Workshop Integr. Nonlinear Microw. Millimetre-
problems.
Wave Circuits (INMMIC), Brive La Gaillarde, France, Jul. 2018, pp. 1–3.
[40] J. Sun, W. Shi, Z. Yang, J. Yang, and G. Gui, “Behavioral modeling and
linearization of wideband RF power amplifiers using BiLSTM networks
for 5G wireless systems,” IEEE Trans. Veh. Technol., vol. 68, no. 11,
pp. 10348–10356, Nov. 2019, doi: 10.1109/TVT.2019.2925562.
[41] D. Phartiyal and M. Rawat, “LSTM-deep neural networks based pre-
distortion linearizer for high power amplifiers,” in Proc. Nat. Conf.
Commun. (NCC), Bangalore, India, Feb. 2019, pp. 1–5.
[42] T. J. Liu et al., “Digital predistortion linearization with deep neural Xiaofei Yu received the B.Sc. degree in electronics
networks for 5G power amplifiers,” in Proc. Eur. Microw. Conf. Central science and technology from the Beijing University
Eur. (EuMCE), May 2019, pp. 216–219. of Posts and Telecommunications, Beijing, China,
[43] F. Mkadem and S. Boumaiza, “Physically inspired neural network model in 2018, where she is currently pursuing the M.Sc.
for RF power amplifier behavioral modeling and digital predistortion,” degree in electronics science and technology.
IEEE Trans. Microw. Theory Techn., vol. 59, no. 4, pp. 913–923, Her research interests include digital predistortion
Apr. 2011. of nonlinear power amplifiers and the application of
[44] H. B. Ammar, E. Eaton, J. M. Luna, and P. Ruvolo, “Autonomous cross- deep learning techniques to RF and crowdsensing.
domain knowledge transfer in lifelong policy gradient reinforcement
learning,” in Proc. Int. Joint Conf. Artif. Intell. (AAAI), Jan. 2015,
pp. 3345–3351.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HU et al.: CNN FOR BEHAVIORAL MODELING AND PREDISTORTION 15

Yulong Zhao was born in Henan, China, in 1983. Xiang Li (Member, IEEE) received the B.Sc.
He received the B.S. degree in applied physics and M.Sc. degrees in electronic engineering from
and the M.S. degree in electronic engineering from Tsinghua University, Beijing, China, in 2009 and
Xidian University, Xi’an, China, in 2006 and 2009, 2012, respectively, and the Ph.D. degree in electrical
respectively. He is currently pursuing the Ph.D. and computer engineering from the University of
degree with the University of Calgary, Calgary, AB, Calgary, Calgary, AB, Canada, in 2018.
Canada. He is currently working as a Post-Doctoral Fel-
From 2009 to 2015, he was a Microwave Engi- low with the University of Calgary. His research
neer and Project Leader with the ZTE Corporation. interests include multiband/wide-band power ampli-
He was with the Radio Remote Unit Development fier design, outphasing transmitter design, and
Department, where he designed high-power ampli- MMIC power amplifiers for wireless and satellite
fiers for 3G and 4G wireless communications. He is currently with the communication.
Intelligent RF Radio Technology Laboratory, University of Calgary. His Dr. Li was the recipient of the Student Paper Award of the 2010 Asia-Pacific
current research interests include high-efficiency and wideband RF PAs, Microwave Conference (APMC).
MMIC PAs, microwave passive components, and digital predistortion.

Wenhua Chen (Senior Member, IEEE) received Mohamed Helaoui (Senior Member, IEEE)
the B.S. degree in microwave engineering from the received the M.Sc. degree in communications and
University of Electronic Science and Technology information technology from école Supérieure des
of China (UESTC), Chengdu, China, in 2001, and Communications de Tunis, Tunisia, in 2003, and
the Ph.D. degree in electronic engineering from the Ph.D. degree in electrical engineering from
Tsinghua University, Beijing, China, in 2006. the University of Calgary, Calgary, AB, Canada,
From 2010 to 2011, he was a Post-Doctoral Fellow in 2008.
with the Intelligent RF Radio Laboratory(iRadio He is currently an Associate Professor with
Lab), University of Calgary, Calgary, AB, Canada. the Department of Electrical And Computer
He is currently an Associate Professor with the Engineering, University of Calgary. His research
Department of Electronic Engineering, Tsinghua activities have led to over 60 publications. He holds
University. He has authored or coauthored more than 200 journals and seven patents (pending). His current research interests include digital signal
conference papers. His current research interests include energy-efficient processing, power efficiency enhancement for wireless transmitters, switching
power amplifier (PA) design and linearization, millimeter-wave and terahertz mode power amplifiers, and advanced transceiver design for software-defined
integrated circuits and systems. radio and millimeter-wave applications.
Dr. Chen was a recipient of the 2015 Outstanding Youth Science Foundation Dr. Helaoui is a member of the COMMTTAP Chapter in the IEEE
of NSFC, the 2014 URSI Young Scientist Award, and the Student Paper Southern Alberta Section.
Awards of several international conferences. He is an Associate Editor of
the IEEE T RANSACTIONS ON M ICROWAVE T HEORY AND T ECHNIQUES .

Weidong Wang (Member, IEEE) received the Ph.D.


Biao Hu received the B.S. degree in applied physics degree from the Beijing University of Posts and
and the M.S. and Ph.D. degrees in physical elec- Telecommunications, Beijing, China, in 2002.
tronics from the University of Electronic Science He is currently a Professor with the School of
and Technology of China, Chengdu, China, in 2005, Electronic Engineering, Beijing University of Posts
2008, and 2014, respectively. and Telecommunications. He takes the role of an
Since 2016, he has been an Associate Profes- expert of the National Natural Science Foundation
sor with the School of Electronic Science and and a member of the China Association of Commu-
Engineering, University of Electronic Science and nication. His research interests include communica-
Technology of China. Since 2019, he has been tion systems, radio resource management, the Inter-
a Visiting Scholar with the iRadio Laboratory, net of Things, and signal processing.
University of Calgary, Calgary, AB, Canada. His
current research interests include microwave wireless power transmission,
microwave/millimeter-wave integrated circuits and antennas, high-power
microwave technique, and its applications.

Fadhel M. Ghannouchi (Fellow, IEEE) is currently


a Professor, the Alberta Innovates/Canada Research
Chair, and the Director of the iRadio Laboratory,
Xuekun Du (Member, IEEE) received the M.Sc. Department of Electrical and Computer Engineer-
degree in communication and information system ing, University of Calgary, Calgary, AB, Canada,
from Liaoning Technical University, Fuxin, China, and a part-time Thousand Talent Professor with the
in 2014, and the Ph.D. degree in communication and Department of Electronics Engineering, Tsinghua
information systems from the University of Elec- University, Beijing, China.
tronic Science and Technology of China, Chengdu, He has authored or coauthored more than
China, in 2019. 800 referred articles, 6 books. He holds 25 patents
He is currently a Post-Doctoral Fellow with the (3 pending). His research interests are in the areas
Intelligent RF Radio Laboratory, Department of of RF and wireless communications, nonlinear modeling of microwave
electrical and computer engineering, University of devices and communications systems, design of power- and spectrum-efficient
Calgary, Calgary, AB, Canada. From 2016 to 2018, microwave amplification systems, and the design of SDR systems for wireless,
he has been a Visiting Ph.D. Student with the Intelligent RF Radio Laboratory, optical, and satellite communications applications.
Department of electrical and computer engineering, University of Calgary. Mr. Ghannouchi is a fellow of the Academy of Science of the Royal Society
His current research interests include passive circuit design, high efficient of Canada, the Canadian Academy of Engineering, the Engineering Institute of
wideband power amplifier design, MMIC PA design, active devices model- Canada, the Institution of Engineering and Technology (IET), and the Institute
ing, artificial neural network modeling, measurements, and characterization of Electrical and Electronic Engineering (IEEE). He is the Co-Founder of three
techniques. university spin-off companies.

Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 13:38:50 UTC from IEEE Xplore. Restrictions apply.

You might also like