0% found this document useful (0 votes)

18 views11 pages

Deep Learning Based Pilot-Free Transmission Error Correction Coding

Uploaded by

2001zhangmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views11 pages

Deep Learning Based Pilot-Free Transmission Error Correction Coding

Uploaded by

2001zhangmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 72, NO.

12, DECEMBER 2023 16031

Deep Learning Based Pilot-Free Transmission: Error

Correction Coding for Low-Resolution Reception
Under Time-Varying Channels
Rui Zeng , Zhilin Lu , Xudong Zhang , and Jintao Wang , Senior Member, IEEE

Abstract—Recently, deep learning aided methods have been de- has been employed to design more powerful communication
veloped for error correction coding with quantitative constraints. systems since first introduced in [3]. Impressive improvements
However, previous studies only focus on additive white Gaussian have been achieved in channel estimation [4], [5], channel state
noise (AWGN) channels, which is not sufficient for actual com-
munication environments. In this article, we propose a novel au- information (CSI) feedback [6], [7], channel decoding [8], [9],
toencoder aided error correction coding scheme for low-resolution and end-to-end communication systems [10], [11], [12]. Addi-
reception under time-varying channels. Based on the symbol ex- tionally, [13], [14], [15] focus on the combination of deep learn-
tension of the proposed autoencoder and the faster-than-Nyquist ing and non-orthogonal multiple access technology (NOMA),
(FTN) technology, pilot-free transmission can be realized without
considering massive machine-type communication (mMTC)
adding additional bandwidth. The transformer block is introduced
to lighten and improve the decoder. Additionally, two kinds of scenarios that require low cost and low power consumption. In
preamplification techniques are applied for further performance low-bit scenarios, especially for one-bit quantization, which is
boosting. Simulations show that the proposed method can achieve an important means of reducing power consumption, learning
better performance compared with the traditional methods at high based methods have been well studied in MIMO detection [16],
signal-to-noise ratio (SNR) under different time-varying chan-
[17], [18] and pilot design [19]. However, error correction coding
nels without quantization. Moreover, it outperforms the previous
state-of-the-art ECCNet and can achieve remarkable transmission for low-resolution reception is still challenging.
performance even under time-varying low-resolution reception Recently, a deep learning based error correction coding
scenarios. scheme has been proposed for one-bit quantization recep-
Index Terms—Error correction coding, faster-than-Nyquist, tion [20]. The autoencoder is integrated with the turbo code
low-resolution reception, pilot-free transmission, preamplification. to compensate for the distortion caused by quantization. ECC-
Net [21] replaces the heavy fully-connected (FC) layers with the
convolutional layers for lighter design and higher performance.
I. INTRODUCTION
However, only additive white Gaussian noise (AWGN) channels
ARGE amounts of antenna arrays improve spectrum and
L energy efficiency while bringing intolerable power con-
sumption and hardware implementation overhead [1]. Low-
have been considered in previous work, which is not practical
enough for the real time-varying environment. Additionally, the
practical deployment is still difficult due to the heavy network
resolution reception is desirably effective for reducing the power parameters and the huge floating point operations (FLOPs).
consumption of analog-to-digital converters (ADCs), which in- As a network structure that can extract both temporal and
creases exponentially with the number of quantization bits [2]. spatial features, the transformer block has demonstrated amaz-
However, the nonlinear distortion caused by quantization will ing excellence in CV and NLP [22], [23]. Compared with the
seriously affect the transmission performance even using popu- convolution layer which only focuses on local features, it can
lar coding schemes such as turbo codes, low density parity check capture global and temporal features. With self-attention, the
(LDPC) codes, and Polar codes. transformer captures the long-range dependency and achieves
Deep learning has made extraordinary achievements in com- encouraging performance improvements. Inspired by this, we
puter vision (CV) and natural language processing (NLP), and design a lighter and more powerful network named TransECC-
Net. In TransECCNet, the autoencoder extends the symbols after
Manuscript received 22 February 2023; revised 29 May 2023; accepted 8 July constellation mapping to cope with the imperfect environment
2023. Date of publication 12 July 2023; date of current version 19 December of time-varying channels and quantization. And the faster-than-
2023. This work was supported by Tsinghua University-China Mobile Research
Institute Joint Innovation Center. The review of this article was coordinated by Nyquist (FTN) transmission is utilized to solve the problem of
Dr. Maged Elkashlan. (Corresponding author: Jintao Wang.) the transmission rate decrease caused by the symbol extension
The authors are with the Department of Electronic Engineering, Tsinghua while the bandwidth remains unchanged. Based on the symbol
University, Beijing 100084, China, and also with the Beijing National Research
Center for Information Science and Technology, Tsinghua University, Beijing extension and FTN, pilot-free transmission can be realized,
100084, China (e-mail: zeng_r17@[Link]; luzl18@[Link]; which can improve transmission efficiency since additional pilot
zhang_xd18@[Link]; wangjintao@[Link]). overhead is no longer required.
The key results can be reproduced with the following github link: https://
[Link]/DTMB-DL/TransECCNet Moreover, we propose two kinds of preamplification tech-
Digital Object Identifier 10.1109/TVT.2023.3294672 niques to make TransECCNet robust to multi-path fading
0018-9545 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See [Link] for more information.

Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
16032 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 72, NO. 12, DECEMBER 2023

channels. One is bilinear production, which can effectively help

the end-to-end system to capture and counter the time-varying
characteristics of the channel [24]. And the other is based on the
squeeze and excitation (SE) block, which can extract the useful
part from noisy signals and model the temporal features.
To prove the superiority of our proposed method, its perfor-
mance under the flat fading channel and two multi-path fading
channels are given. Specially, a multi-tap channel with strong
echoes is considered to show the robustness of the proposed
scheme. Compared with the orthogonal frequency division mul-
tiplex (OFDM) system, TransECCNet can achieve better bit
error rate (BER) performance without quantization. Moreover,
effective transmission can be achieved within an acceptable Fig. 1. The adopted system architecture for low-resolution reception. The
turbo code is integrated with the autoencoder. Mod denotes the modulation
signal-to-noise ratio (SNR) loss, if low-resolution reception is module, while Demod denotes the demodulation module.
considered.
Our contributions can be summarized as follows:
r A novel error correction coding scheme based on TransEC-
L
CNet is developed for time-varying channels. Experiments where dn ∈ CN ×1 , S = N log M , and N << logL M .
2 2
show its superiority compared with the traditional OFDM Each symbol block dn is treated as the input of the autoen-
systems at high SNR without quantization. And the pilot- coder and extended by G-fold as
free transmission can be realized since the symbol exten-
sion and FTN are utilized. xn = fE (ΘE , dn ) = [xn1 , xn2 , . . ., xnGN ]T , (2)
r The transformer block is introduced to lighten and im-
prove the decoder. Compared with ECCNet, TransECCNet where xn ∈ CGN ×1 , and ΘE denotes the trainable parameters
can reduce parameters and FLOPs while improving BER of the encoder fE . The autoencoder increases redundancy rea-
performance. Therefore, the proposed scheme can achieve sonably by learning the optimal representation characteristics of
effective transmission even with low-resolution reception. modulated symbols to resist the non-ideal condition caused by
r Two kinds of preamplification techniques are applied for quantization and time-varying, and thus additional explicit pilots
further performance boosting, with which the system can are no longer required. The size of G is less than 2(1 + α) as
be more robust. explained in [20], where α is the raised cosine roll off coefficient
The rest of this article is organized as follows. Section II intro- of the root raised cosine (RRC) filter.
duces the system model. Section III explains the whole pipeline And then these extended symbols are transmitted by the FTN
of the proposed error correction coding scheme and the detailed technology in [25] as
design of TransECCNet. And then, two kinds of preamplification
S GN
techniques are depicted in Section IV. After that, the numerical √ Tt
results and analysis are presented in Section V. The conclusion x(t) = ρ xki g t − (k − 1)N Tt − i , (3)
i=1
G
is drawn in Section VI. k=1

where ρ, Tt , and g(t) denote the power of the transmitter, the

II. SYSTEM MODEL symbol period for Nyquist transmission, and the pulse shaping
A. System Architecture function, respectively. By the FTN transmission, the proposed
scheme keeps the same bandwidth as the traditional turbo code
To compensate for the distortion caused by low-resolution without the autoencoder. Due to the non-orthogonal transmis-
quantization, we adopt the system shown in Fig. 1. The turbo sion, inter-symbol-interference (ISI) and colored noise are in-
code and the autoencoder are cascaded, which are treated as the troduced, which is not an issue since the corresponding effects
outer code and the inner code respectively. Prior information can be learned and implicitly equalized by the autoencoder.
provided by the outer code can help the autoencoder learn the The transmitted signal is matched filtered after passing
best representation, and thus resist the influence of time-varying through the time-varying channel, which can be represented as
and quantization. Note that the turbo code can be replaced with
other popular coding schemes such as LDPC codes and Polar y(t) = (h(t) ∗ x(t) + z(t)) ∗ g(−t)
+∞ +∞
codes.
Specially, original bits s are sent into the turbo encoder to = x(u)h(τ − u)du + z(τ ) g(τ − t)dτ,
get a codeword with the length of L bits. And then it will be −∞ −∞
modulated by M-ary modulation and form complex symbols d. (4)
Since the block fading channel is considered, the symbols can
be divided into blocks of size N , which can be written as where ∗, h(t), and z(t) denote linear convolution, the channel
T impulse response, and the zero-mean Gaussian noise with vari-
d = dT1 , dT2 , . . ., dTS , (1) ance σ 2 , respectively. Oversampling (4) at t = (n − 1)N Tt +

Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: DEEP LEARNING BASED PILOT-FREE TRANSMISSION: ERROR CORRECTION CODING FOR LOW-RESOLUTION RECEPTION 16033

m TGt yields the received symbols

S GN
√
ynm = ρ xki qnk [m − i] + znm , (5)
k=1 i=1

where
+∞ +∞
Tt
qnk [m − i] = g u − (k − 1)N Tt − i
−∞ −∞ G

Tt
× h(τ − u)du × g τ − (n − 1)N Tt − m dτ, (6) Fig. 2. The schematic diagram of a multi-path fading channel.
G
and
+∞ have E[|hi |2 ] = pi . The complex Gaussian distribution is widely
Tt
znm = z(τ )g τ − (n − 1)N Tt − m dτ. (7) used as a statistical model to characterize H, if we assume the
−∞ G
environment is full of scatterings.
qnk [m − i] can be treated as the channel impulse response, OFDM can resist multi-path fading, since each of the subcar-
and znm in (7) is colored Gaussian noise as depicted in the riers is orthogonal and flat. For OFDM systems under unknown
Ungerboeck model [26]. Note that qnk [m − i] = 0 if m = i, time-varying channels, the pilot data and the cyclic prefix (CP)
which generates the ISI. are necessary, which brings additional overhead. We consider
For low-resolution reception, ynm is quantized as two channel estimation methods: least square (LS) and linear
minimum mean square error (LMMSE). The latter has better
rnm = Q (ynm ) (8)
performance on account of the use of second-order statistics
where the real and imaginary parts are quantized respectively. RHH = E[HH H ]. The relationship between LS and LMMSE
Every GN quantized samples gather together to form a block can be written as
as −1
β
ĤLMMSE = RHH RHH + I ĤLS , (12)
rn = [rn1 , rn2 , . . ., rnGN ]T . (9) SNR
The decoder compresses the dimension of rn by G-fold to where β is a constant related to the modulation type. Specially,
reconstruct β = 1 for QPSK and β = 17/9 for 16-QAM.
T
d̂n = fD (ΘD , rn ) = dˆn1 , dˆn2 , . . ., dˆnN , (10)
C. Pilot-Free Transmission and FTN
where ΘD is the trainable parameters of the decoder fD . Generally, pilots are necessary for channel estimation, which
All S blocks make up the reconstructed codeword d̂ as makes the traditional transmission methods available. The re-
T ceiver estimates the channel matrix through the known pilot data
d̂ = d̂T1 , d̂T2 , . . ., d̂TS , (11) and interpolation methods, and then realizes channel equaliza-
tion based on the matrix inversion methods with high computa-
and finally d̂ is demodulated and decoded to obtain recovered
tional complexity. However, the additional pilot overhead will
information bits ŝ.
reduce the effective transmission rate and spectral efficiency.
In our proposed scheme, the transmitted symbols are extended
B. Channel Models
by G-fold, which means the expanded symbols have the potential
Generally, sampling the continuous channel yields the dis- to carry more redundancy against the channel effects and utilize
crete channel model, which is modeled as a time-varying linear prior information learned from the end-to-end optimization.
system with additive noise [27]. Block fading is considered, Fig. 3 gives a toy example of the transmitter constellation under
which means that the channel is constant within a transmission a Rayleigh fading channel with QPSK modulation, where the
frame but changes from frame to frame. channel effects are learned by the encoder. By encoding the orig-
As shown in Fig. 2, the transmitter and receiver can be inal symbols into embedding vectors robust to the time-varying
regarded as two focal points of an ellipse, and all paths re- channels, the CSI can be implicitly inferred from received
flected by the same ellipse will have the same relative delay. symbols due to the asymmetrical constellation, and thus explicit
At a specific time delay, all signals are combined to form a pilots are no longer necessary, which is similar to that in [24].
tap in the channel impulse response. Therefore, the multi-path The symbol extension based on the autoencoder will cause a
fading channel is usually modeled as multi-tap filtering, while decrease in transmission rate, but this issue can be addressed by
each tap is characterized by the power delay profile (PDP), the FTN technology.
p = {pi }. The output of the channel can be expressed as The FTN introduces a new factor ρ ∈ (0, 1], which accelerates
y = h ∗ x + n, where ∗ denotes the linear convolution and n the transmission rate to 1/ρ times the original Nyquist rate.
denotes the noise. Considering that the channel impulse response As shown in Fig. 4 with ρ = 1/2, the green extended symbol
is a vector h = [h1 , h2 , .., hl ] sampled from distribution H, we can be loaded between the original two blue symbols, so the

Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
16034 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 72, NO. 12, DECEMBER 2023

Algorithm 1: Training Strategy of the Whole TransECCNet.

Require: learning rate η, network parameters
Θ = [ΘE , ΘD ]
1: for each epoch in total epochs do
2: Update temperature factor T in current epoch
3: for each batch in total epoch do
4: Sample a minibatch dn with M-ary modulation
5: Sample a channel vector h from H
6: Encoder: xn ← fE (ΘE , dn )
7: Channel: yn ← fh (h, xn )
8: Quantization: rn ← Q(yn )
9: Preamplification: Rn ← P (rn )
10: Decoder: d̂n ← fD (ΘD , Rn )
11: Calculate MSE loss L(dn , d̂n )
Fig. 3. A toy example of the transmitter constellation generated by the encoder 12: Back propagation: g ← ∇Θ L
under a Rayleigh fading channel with QPSK modulation, where the symbol 13: Update network Θ ← Θ − ηg
extension coefficient G = 2.
14: end for
15: end for

extended by G-fold. To cope with the non-differential problem

caused by quantization, soft quantization is introduced. After
preamplification, the received symbols are reconstructed by the
decoder, and they gather together for further demodulation and
decoding to obtain the recovered bits.
Consistent with ECCNet in [21], the encoder is composed of
a residual block and a time distributed fully-connected (TDFC)
layer. The TDFC layer uses a linear layer for every temporal
slice rather than all elements, which can combine the corre-
lation extracted by the convolution layer while reducing the
parameters. Specially, N temporal slices of size 32 are mapped
into vectors of size 2G by a parameter-shared linear layer. The
encoder learns the best representation with redundancy, and
extends the modulated symbols by G-fold to adapt to the harsh
Fig. 4. The sketch map of the FTN transmission with ρ = 1/2. environment of time-varying channel and nonlinear distortion
caused by quantization.
We redesign the decoder by replacing the ResSEBlock of
total transmission time will not increase. Note that the ISI is ECCNet with the transformer encoder block in ViT [23]. The
introduced at t/T = 0 and t/T = 1 because of the new green new decoder consists of L transformer blocks and one TDFC
symbol. However, through the end-to-end optimization with layer. The transformer block has long-term memory, and can
sufficient training data, it can be learned and implicitly equalized capture the timing characteristics of the channels. By the multi-
by the transformer block in the decoder. head self-attention (MSA) and feedforward module, it recon-
structs the received symbols from the redundant damaged data.
III. TRANSECCNET DESIGN FOR ERROR Finally, the processed data are reorganized by the TDFC layer
CORRECTION CODING to obtain the recovered symbols. In this article, we fix L = 4.
Inferring the state of the current channel is still necessary
A. Network Pipeline Design
though the transmission is pilot-free. Since no explicit pilots are
The whole pipeline of our proposed TransECCNet-based transmitted, the CSI can only be obtained from the quantized
scheme is illustrated in Fig. 5. The turbo code is treated as the symbols rn in (9). To make full use of the rn for implicit CSI
outer code, and the autoencoder is treated as the inner code. The estimation, the preamplification technique is applied to enhance
original bits are sent into the turbo encoder, modulated by M-ary and extract global channel features. The preamplification mod-
modulation, and divided into blocks of size N . The real part and ule is designed as a residual structure to reduce the information
the imaginary part are spliced in parallel as two feature channels, loss for the subsequent decoding while extracting the channel
and then become the input of the encoder. The channel module state. Two kinds of preamplification techniques are offered in
consists of pulse shaping, real channel transmission, matched this article, which will be shown in detail in Section IV.
filtering, and oversampling. Note that the FTN technique is Quantization makes the back propagation of the gra-
utilized to save bandwidth, since the transmitted symbols are dient infeasible, since the ideal quantization function is

Fig. 5. The whole pipeline of the error correction coding scheme. The autoencoder is integrated with the turbo code, and the input is modulated symbols after
blocking. (c × L) denotes the input feature shape of the corresponding block. The batch normalization layers, the ReLU activation functions, and the reshape
blocks are left out for simplicity.

Fig. 6. The detailed design of the transformer block. (c × L) denotes the input
feature shape of the corresponding block.

Fig. 7. The diagram of the self-attention.

non-differentiable. Methods like low-pass filtering [28] are
adopted to solve this problem at the cost of gradient mismatch
and performance deterioration. To overcome these shortcom-
The core of self-attention is the MSA module. Generally,
ings, we adopt the soft quantization function as ECCNet [21],
self-attention can be illustrated in Fig. 7. The input X with the
which can be expressed as
dimension of Dx × N is mapped into the matrix Q, K, and V
n
through the FC layer, each of which is of Dk × N size. Matrix
Q(x) = (ασ (βx − bi ) − o), (13) Q and K T are multiplied to calculate the correlation, which
i=1
√ the matrix V . The softmax function and the
then will weight
where n is the quantization order, α and β are scale factors, bi scale factor Dk are used for normalization. The process can
is the i-th offset, and o is the global offset. Wherein, the σ(x) is be expressed by (15). For multi-head self-attention, X is mapped
1 into multiple groups of QKV to obtain the similarity and weight
σ(x) = , (14) of different focuses.
1 + exp(−T x)

where T denotes the temperature factor. By gradually increasing Q · KT
the temperature factor, the gap between the soft quantization Attn(Q, K, V ) = softmax √ V (15)
Dk
function and the ideal quantization function can be narrowed.
Based on the soft quantization function, the proposed TransEC-
The feedforward module is an inverted bottleneck structure.
CNet can be trained in an end-to-end way. The training strategy
The dimension of features is first increased and then reduced
is summarized as Algorithm 1.
to avoid information loss. The activation function is GeLU, and
the dropout layer is adopted.
B. Transformer Based Decoder Design The decoder is composed of L identical transformer block
The detail of the transformer block is given in Fig. 6. Each and one TDFC layer. More specifically, the quantized symbols
transformer block consists of an MSA module and a feedforward after preamplification r0 pass through the MSA module and the
module, both of which follow a layer normalization block. In feedforward module alternately in L iterations, and finally are
addition, the residual structure is adopted for spatial information sent into the TDFC layer to yield the recovered symbols. The
preserving. specific steps are shown in Algorithm 2.

Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
16036 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 72, NO. 12, DECEMBER 2023

Algorithm 2: Detail Algorithm of the Transformer Aided

Decoder.
Input: quantized symbols after preamplification R0
Output: the recovered symbols y
1: for l = 1, 2, . . ., L do
2: Rl ← MSA(LN(Rl−1 )) + Rl−1
3: Rl ← MLP(LN(Rl )) + Rl
4: end for
5: y ← TDFC(RL )
6: return y

Fig. 9. The detailed design of the SE preamplification.

the extractor has generous parameters, increasing the burden of

network training. To address this issue, another light preamplifier
is designed.

B. Squeeze and Excitation Preamplifier Design

The SE block is applied to explicitly model the interdepen-
dencies between feature channels, while enhancing effective
Fig. 8. The sketch map of the bilinear preamplifier design. information and suppressing irrelevant parts as first presented
in [30]. In ECCNet [21], the residual SE block is adopted as
IV. PREAMPLIFICATION TECHNIQUE AGAINST TIME the backbone network of the decoder. Differently, we use 1 × 1
-VARYING CHANNEL convolution in the inception layer and the identity layer, and
treat it as the SE preamplifier drawn in Fig. 9.
To be more robust in time-varying channels, we utilize the In the inception branch, the quantized symbols first pass
preamplification technique to augment the quantized symbols. through two convolutional operations of size 1 × 1 to augment
Like data augmentation, preamplification can generate more the number of feature channels from 2 to K. After that, they
representations from the original data to reasonably expand the are global average pooled to one dimension, which is named
training feature domain. And thus, the robustness and versatility “squeeze”. And then, two FC layers are applied to predict the
of the model can be greatly improved. Below are two kinds of importance of each channel to obtain a group of weights, which is
preamplification techniques applied in this article. named “excitation”. The channel-wise weights are used to scale
the convoluted features. Meanwhile, the quantized symbols are
A. Bilinear Preamplifier Design sent into an identity branch for the final summation operation.
Bilinear production is widely used for features combination The SE block enables the preamplifier to fully extract the
in CV fields [29], and is proved to be effective to fully leverage most effective part of channel information from the damaged
the channel information in the received symbols [24]. To have a redundant symbols, thereby improving the generalization ability
robust performance in multi-path fading channels, we embed and robustness of the decoder. The additional overhead of the
it into the proposed TransECCNet, before the decoder. The SE preamplifier is negligible compared with that of the bilinear
structure of the bilinear preamplifier is shown in Fig. 8. preamplifier, since the large convolutional subnetwork extractor
In the bilinear preamplifier, the quantized symbols r ∈ is no longer required.
R2×GN flows through two branches respectively. In one branch,
r is sent to a feature extractor composed of convolution layers V. SIMULATION PERFORMANCES AND RESULTS ANALYSIS
to obtain the channel feature z ∈ RNk ×1 , and then sent to the
A. Experiment Setting and Benchmarks
bilinear combiner. In the other branch, r is sent directly to the
combiner. With the bilinear production, r is augmented by each In our experiments, we set G = 4, N = 16, and K = 40,
element of z, which can be written as respectively. And thus the raised cosine roll off coefficient α
is 1 as explained in II-A. The input of the 1/3 turbo code utilized
R = r ⊗ z ∈ R2Nk ×GN , (16)
in LTE standard [31] is 6144 bits, and the output codeword is
where ⊗ denotes the Kronecker product. The network parame- 18444 bits. For the turbo decoder, a Max-Log-MAP algorithm
ters of the extractor are consistent with those in [24]. is employed to perform decoding by 5 iterations. The channel
By bilinear production, the received signal is combined with varies once every 64Tt , and keeps constant within one frame.
the global channel characteristics generated by the extractor to For both QPSK and 16-QAM modulation, TransECCNet is
help the decoder realize more accurate demodulation. However, trained when Eb /N0 is fixed to 35 dB, and tested at a wide

range of Eb /N0 . The learning rates of the Adam optimizers are

1 × 10−4 for flat fading channels and weak multi-path fading
channels, and 5 × 10−4 for strong multi-path fading channels,
if quantized. For the unquantized network, the learning rate is
set to 5 × 10−4 . During the training phase, the unit temperature
increment ΔT is set to 1 for every 10 epochs, in which 30 batches
are randomly sampled. The total epochs are 500, and the batch
size is fixed to 100. Each batch generates a mean square loss
function as
B
1
(Θ∗E , Θ∗D ) = arg min di − d̂i 2 , (17)
ΘE ,ΘD B i=1

where B, di , and d̂i denote the batch size, the symbols after
blocking, and the reconstructed symbols. The whole network is
implemented based on PyTorch, and simulated on an Intel(R)
Core(TM) i9-10900 K @ 3.70 GHz CPU and an NVIDIA RTX
2080 8 GB GPU.
In our benchmarks, LS and LMMSE are used to estimate
the CSI for the OFDM systems. Second order statistics of the
channel, which are known as the prior information, will be ex-
ploited to provide better performance for LMMSE. The number
of subcarriers is 64, 25% of which are pilots. The CP occupies 16
samples in total. The OFDM systems are also integrated with the
1/3 turbo code and M-ary modulation, but they require additional
overhead resulting from the pilots and the CP compared with
TransECCNet. No quantization is considered in benchmarks
since low-resolution reception is too hard for these systems to
deal with.
For further clarification, we abbreviate the compared schemes
as follows:
1) OFDM + LS: the OFDM system with channel estimation
of LS
2) OFDM + MMSE: the OFDM system with channel esti- Fig. 10. BER curves under the flat fading channel. (a) QPSK. (b) 16QAM.
mation of LMMSE
3) TransECC + unQAT: TransECCNet without quantiza-
tion From the curves in Fig. 10(a), the proposed TransECCNet
4) TransECC + SE + unQAT: TransECCNet with the SE is able to achieve a rapid reduction of the BER compared
preamplifier and without quantization with the OFDM systems, while additional overhead is saved
5) TransECC + q bit: TransECCNet with q-bit quantization since it transmits without any pilots. After one-bit quantization,
6) TransECC + SE + q bit: TransECCNet with the SE TransECCNet and ECCNet have an almost fixed SNR loss
preamplifier and q-bit quantization respectively, while the downward trend of the waterfall is kept.
7) TransECC + Bi + q bit: TransECCNet with the bilinear By G-fold symbol extension, the autoencoder aided scheme can
preamplifier and q-bit quantization learn the best representation with redundancy against the harsh
8) ECC + q bit: ECCNet with q-bit quantization environment of time-varying channel and quantization. The flat
9) ECC + SE + q bit: ECCNet with the SE preamplifier and fading channel is simple enough, so vanilla ECCNet is still better
q-bit quantization than the benchmarks at high SNR despite one-bit quantization.
10) ECC + Bi + q bit: ECCNet with the bilinear preamplifier The proposed TransECCNet slightly outperforms ECCNet when
and q-bit quantization quantized, since the transformer block can improve the ability
of the decoder to track the channel. Moreover, parameters and
FLOPs of TransECCNet are greatly declined compared with
B. Flat Fading ECCNet, which further shows the advantage of the proposed
We first show the performance of our proposed TransECCNet network.
under the flat fading channel with QPSK and 16-QAM modula- Similar results can be observed in Fig. 10(b). The gap be-
tion. For flat fading, we consider one-tap filtering with average tween the network with and without one-bit quantization be-
power E[h2 ] = 1. Experimental results are demonstrated in comes larger. It must be pointed out that one-bit quantization
Fig. 10. in 16-QAM modulation is a formidable task, since traditional

Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
16038 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 72, NO. 12, DECEMBER 2023

Fig. 12. The BCE loss curves with QPSK modulation under the channel A
while training TransECCNet.

TransECCNet has nearly 3 dB gain compared with only ECCNet

when the BER is 10−3 , which also proves the superiority of
our proposed network. The performance improvement of about
2 dB for TransECCNet can be achieved when using either
of the preamplifiers at the above BER, by which the channel
information is fully extracted. The curve of TransECCNet with
the SE preamplifier is close to that with the bilinear preamplifier,
but the parameters and FLOPs are greatly reduced. Note that
the performance of ECCNet with the bilinear preamplifier is
degraded unexpectedly, because the decoder and the extractor
are both based on large convolutional layers, which increases
the burden of the convergence of the network training.
Similar conclusions can also be found under 16-QAM mod-
Fig. 11. BER curves under the channel A. (a) QPSK. (b) 16QAM. ulation as shown in Fig. 11(b). TransECCNet with the SE
preamplifier can still obtain a lower BER over the benchmarks at
high SNR despite one-bit quantization, while that with the bilin-
turbo codes fail to recover any effective information from the
ear preamplifier seems slightly inferior. Besides, TransECCNet
quantized symbols as proved in [21]. However, the proposed
without any preamplifier fails to give a satisfactory BER per-
scheme makes it possible by introducing redundant information
formance, which in turn proves the effectiveness of the pream-
similar to inner coding. At high SNR, TransECCNet can achieve
plification technique. None of schemes based on ECCNet can
performance comparable to the benchmarks even with one-bit
achieve effective transmission if quantized, since their network
quantization, which greatly reduces the power consumption of
capacities are not sufficient under time-varying channels.
the receiver.
Fig. 12 gives the curves of the binary cross entropy (BCE)
loss varying with the epochs when training TransECCNet with
C. Weak Multi-Path
QPSK modulation and one-bit quantization. Compared with
To show the effectiveness of the two kinds of preamplifica- TransECCNet only, that with the SE preamplifier converges
tion techniques, we consider a weak multi-path channel, called faster and has better performance. The curves of training and
channel A, with the PDP of p = [0, −8, −17, −21, −25](dB) validating are very close, indicating that the network does not
and corresponding relative delay of [0, 3, 5, 6, 8](Tt ). have serious overfitting and is easy to train.
Similar to the situation above, the proposed scheme enjoys
better performance over the benchmarks at high SNR as illus-
trated in Fig. 11(a). By one-bit quantization, schemes based on D. Strong Multi-Path
networks experience performance degradation, but still have a To further illustrate the superiority of the proposed network
comparable performance with the OFDM systems at high SNR. and preamplification techniques, we consider channel B with
All TransECCNet aided schemes outperform those based on strong echoes, where the average power of each tap is equal.
ECCNet. The gains from the transformer and the SE pream- Specially, the PDP is p = [0, 0, 0, 0, 0](dB) with the relative
plifier are given at the top-right of the figure. Specially, only delay of [0, 1, 2, 3, 4](Tt ).

Fig. 13. The PDFs of the LLR before the turbo decoder with QPSK modulation
under the channel B when SNR=10 dB.

First, we compare the probability density functions (PDFs) of

the log likelihood ratio (LLR) through different networks with
QPSK modulation to illustrate the impact of network structures
on the final turbo decoding performance, as shown in Fig. 13.
The PDF of the LLR of TransECCNet without quantization tends
to present two different peaks, while the PDF presents a single
peak if one-bit quantization is considered. TransECCNet with
the SE preamplifier has a wider and flatter PDF of the LLR,
compared with TransECCNet only, providing more decoding
possibilities and thus achieving lower BER performance, which
can be verified in Fig. 14(a).
As illustrated in Fig. 14(a), TransECCNet with the SE pream-
plifier is still superior to the OFDM system at high SNR
even under one-bit quantization. With the bilinear preamplifier,
TransECCNet requires higher SNR for the same BER perfor-
mance. The gap between TransECCNet with and without quan-
tization widens even further, since the ISI caused by the strong
echoes seriously affects the learning ability of the network when
quantized. ECCNet fails to reduce the BER even under no quan-
tization from Fig. 14(c), while TransECCNet can, which further
shows the reliability of the transformer based decoder. The SE
preamplifier brings only about 0.5 dB gain for TransECCNet
when the BER is 10−3 , since the QPSK modulation is a vanilla
scenario for the network without quantization and TransECCNet
is enough powerful. The remaining conclusions are consistent
with the foregoing.
During the situation of the strong multi-path channel under
16-QAM modulation, one-bit quantization is so difficult that no
useful features can be extracted even after adopting TransECC-
Net with the SE preamplifier, which performs well before. How-
ever, two-bit quantization is available as shown in Fig. 14(b). Fig. 14. BER curves under the channel B. (a) QPSK. (b) 16QAM. (c) Gains
under QPSK and 16QAM.
TransECCNet with the SE preamplifier can realize reliable
transmission with an acceptable SNR loss compared with the
unquantized scheme, and can outperform the benchmarks if SNR
E. Complexity Analysis
is high enough. Without any preamplifier, TransECCNet expe-
riences a near 8 dB loss even under no quantization as shown in We compare the theoretical complexity of the schemes in-
Fig. 14(c). Similarly, all schemes based on ECCNet fail to work volved, as shown in Table I. M denotes the number of the sub-
whether quantized or not, since they are not robust enough to carriers, with M = GN for fair comparisons. DE and DD denote
(l) (l)
overcome both strong multi-path interference and time-varying. the number of the convolutional layers, NE,k and ND,k denote

Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
16040 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 72, NO. 12, DECEMBER 2023

TABLE I
COMPLEXITIES OF INVOLVED SCHEMES

TABLE II scheme, meanwhile the low-resolution reception can greatly

PARAMETERS, FLOPS, AND AVERAGE RUNNING TIMES OF INVOLVED SCHEMES
ON DIFFERENT BATCH SIZES
reduce the power consumption. However, the network based
methods require enough data and time for offline training, which
is unnecessary for the conventional methods.

VI. CONCLUSION
In this article, we proposed a novel error correction coding
scheme with a transformer-based autoencoder named TransEC-
CNet. By the symbol extension and FTN technology, pilot-free
transmission was available without any additional bandwidth
cost. Two kinds of preamplification techniques were applied
for further robustness and performance boosting. Compared
(l) (l) with previous ECCNet, parameters and FLOPs were greatly de-
the kernel size of the l-th convolutional layer, CE,in and CD,in
(l) (l) clined while the BER performance was improved. TransECCNet
denote the number of the input channels, and CE,out and CD,out
outperformed the conventional OFDM systems without quan-
denote the number of the output channels for the encoder and the
tization under time-varying channels, and had better spectral
decoder, respectively. h denotes the number of the multi-heads,
efficiency since the overhead of pilots and CP was not required.
and the feature channel CD is unified for all layers in the decoder
Moreover, it worked with a tolerable SNR loss under the hard
of TransECCNet. In this article, the number of feature channels
low-resolution reception scene where the traditional benchmarks
is larger than M , resulting in network-based methods exhibiting
became totally invalid.
higher complexity than conventional methods. However, if M is
large enough, the complexity of ECCNet and TransECCNet is
lower than OFDM+MMSE. Additionally, ECCNet consistently REFERENCES
demonstrates a higher complexity compared to TransECCNet
[1] T. S. Rappaport et al., “Wireless communications and applications above
due to its noticeably higher feature channel values. 100 GHz: Opportunities and challenges for 6G and beyond,” IEEE Access,
Furthermore, to show the major improvements in storage vol. 7, pp. 78729–78757, 2019.
and calculation, we compare the parameters, the FLOPs, and [2] R. Walden, “Analog-to-digital converter survey and analysis,” IEEE J. Sel.
Areas Commun., vol. 17, no. 4, pp. 539–550, Apr. 1999.
the average running times of the involved schemes without [3] T. O’shea and J. Hoydis, “An introduction to deep learning for the physical
quantization on different batch sizes. For the convenience of layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575,
comparison, we do not consider the running time of the turbo Dec. 2017.
[4] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel esti-
code and modulation. Under channel B, the transmission time of mation and signal detection in OFDM systems,” IEEE Wireless Commun.
each block modulated with QPSK is tested on an NVIDIA RTX Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018.
2080 8 GB GPU. The complexity analysis is given in Table II. [5] H. Huang, J. Yang, H. Huang, Y. Song, and G. Gui, “Deep learning for
super-resolution channel estimation and DOA estimation based massive
Compared with ECCNet, TransECCNet is much lighter and MIMO system,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8549–8560,
faster, since large convolution kernels and numerous channel Sep. 2018.
filters are abandoned. More specifically, the amount of the [6] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO
CSI feedback,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751,
parameters of TransECCNet is reduced by nearly 2.7 times, Oct. 2018.
while the FLOPs are decreased by two orders of magnitude. [7] Z. Lu, J. Wang, and J. Song, “Multi-resolution CSI feedback with deep
The extractor based on a convolutional subnetwork is required learning in massive MIMO system,” in Proc. IEEE Int. Conf. Com-
mun.2020, pp. 1–6.
for the bilinear preamplifier, which brings nontrivial storage and [8] H. Zhu, Z. Cao, Y. Zhao, and D. Li, “Learning to denoise and decode: A
computing overhead. Meanwhile, the overhead caused by the SE novel residual neural network decoder for polar codes,” IEEE Trans. Veh.
preamplifier can be almost ignored. Technol., vol. 69, no. 8, pp. 8725–8738, Aug. 2020.
[9] H. Kim, Y. Jiang, R. Rana, S. Kannan, S. Oh, and P. Viswanath, “Commu-
Compared with the conventional methods, the scheme based nication algorithms via deep learning,” in Proc. 6th Int. Conf. Learn. Rep-
on TransECCNet is slightly slower when batch size is 1, since resentations, 2018. [Online]. Available: [Link]
the symbols are represented as an embedded vector with re- ryazCMbR-
[10] B. Zhu, J. Wang, L. He, and J. Song, “Joint transceiver optimization for
dundancy. But when the batch size becomes larger, multiple wireless communication PHY using neural network,” IEEE J. Sel. Areas
data can be processed in parallel on the GPU, thus speeding Commun., vol. 37, no. 6, pp. 1364–1373, Jun. 2019.
up the calculation, which is the advantage of networks over [11] H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based end-to-
end wireless communication systems with conditional GANs as unknown
conventional methods. Moreover, better performance at high channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143,
SNR and pilot-free transmission can be achieved by the proposed May 2020.
Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: DEEP LEARNING BASED PILOT-FREE TRANSMISSION: ERROR CORRECTION CODING FOR LOW-RESOLUTION RECEPTION 16041

[12] H. Jiang, L. Dai, M. Hao, and R. MacKenzie, “End-to-end learning for Rui Zeng received the B.S. degree in 2021 from the
RIS-aided communication systems,” IEEE Trans. Veh. Technol., vol. 71, Department of Electronic Engineering from Tsinghua
no. 6, pp. 6778–6783, Jun. 2022. University, Beijing, China, were is currently work-
[13] J. Pan et al., “AI-driven blind signature classification for IoT connectivity: ing toward the Ph.D. degree. His research interests
A deep learning approach,” IEEE Trans. Wireless Commun., vol. 21, no. 8, include deep learning for channel equalization, data
pp. 6033–6047, Aug. 2022. detection, and modulation recognition.
[14] Y. Ahn, W. Kim, and B. Shim, “Active user detection and channel estima-
tion for massive machine-type communication: Deep learning approach,”
IEEE Internet Things J., vol. 9, no. 14, pp. 11904–11917, Jul. 2022.
[15] N. Ye, X. Li, H. Yu, L. Zhao, W. Liu, and X. Hou, “DeepNOMA: A unified
framework for noma using deep multi-task learning,” IEEE Trans. Wireless
Commun., vol. 19, no. 4, pp. 2208–2225, Apr. 2020.
[16] S. Khobahi, N. Shlezinger, M. Soltanalian, and Y. C. Eldar, “LoRD-Net:
Unfolded deep detection network with low-resolution receivers,” IEEE
Trans. Signal Process., vol. 69, pp. 5651–5664, 2021.
[17] A. Klautau, N. González-Prelcic, A. Mezghani, and R. W. Heath, “Detec- Zhilin Lu received the B.S. degree from the De-
tion and channel equalization with deep learning for low resolution mimo partment of Electronic Engineering from Tsinghua
systems,” in Proc. 52nd Asilomar Conf. Signals, Syst., Comput., 2018, University, Beijing, China, were he is currently work-
pp. 1836–1840. ing toward the Ph.D. degree. His research interests
[18] L. V. Nguyen, A. L. Swindlehurst, and D. H. N. Nguyen, “Linear and deep include deep learning for wireless communication,
neural network-based receivers for massive MIMO systems with one-bit and neural network architecture design.
ADCs,” IEEE Trans. Wireless Commun., vol. 20, no. 11, pp. 7333–7345,
Nov. 2021.
[19] Y. Zhang, M. Alrabeiah, and A. Alkhateeb, “Deep learning for massive
MIMO with 1-bit ADCs: When more antennas need fewer pilots,” IEEE
Wireless Commun. Lett., vol. 9, no. 8, pp. 1273–1277, Aug. 2020.
[20] E. Balevi and J. G. Andrews, “Deep learning-based encoder for one-bit
quantization,” in Proc. IEEE Glob. Commun. Conf., 2019, pp. 1–6.
[21] R. Zeng, Z. Lu, J. Wang, and J. Song, “Error correction coding for one-bit
quantization with CNN-based autoencoder,” IEEE Commun. Lett., vol. 26,
no. 8, pp. 1814–1818, Aug. 2022. Xudong Zhang received the B.S. degree in 2022 from
[22] A. Vaswani et al., “Attention is all you need,” in Proc. 31st Int. Conf. the Department of Electronic Engineering, Tsinghua
Neural Inf. Process. Syst., 2017, pp. 6000–6010. University, Beijing, China, where he is currently
[23] A. Dosovitskiy et al., “An image is worth 16 × 16 words: Transformers for working toward the Ph.D. degree. His research inter-
image recognition at scale,” in Proc. 9th Int. Conf. Learn. Representations, ests include deep learning for wireless communica-
2021. [Online]. Available: [Link] tion, massive MIMO, and neural network architecture
[24] H. Ye, G. Y. Li, and B. Juang, “Deep learning based end-to-end wireless design.
communication systems without pilots,” IEEE Trans. Cogn. Commun.
Netw., vol. 7, no. 3, pp. 702–714, Sep. 2021.
[25] A. Liveris and C. Georghiades, “Exploiting faster-than-Nyquist signaling,”
IEEE Trans. Commun., vol. 51, no. 9, pp. 1502–1511, Sep. 2003.
[26] G. Ungerboeck, “Adaptive maximum-likelihood receiver for carrier-
modulated data-transmission systems,” IEEE Trans. Commun., vol. 22,
no. 5, pp. 624–636, May 1974.
[27] T. S. Rappaport, Wireless Communications: Principles and Practice. En-
glewood Cliffs, NJ, USA:Prentice-Hall, 1996. Jintao Wang (Senior Member, IEEE) received the
[28] Z. Lu, J. Wang, and J. Song, “Binary neural network aided CSI feedback [Link]. and Ph.D. degrees in electrical engineering
in massive MIMO system,” IEEE Wireless Commun. Lett., vol. 10, no. 6, from Tsinghua University, Beijing, China, in 2001
pp. 1305–1308, Jun. 2021. and 2006, respectively. From 2006 to 2009, he was an
[29] T. Lin, A. RoyChowdhury, and S. Maji, “Bilinear CNN models for Assistant Professor with the Department of Electronic
fine-grained visual recognition,” in Proc. Int. Conf. Comput. Vis., 2015, Engineering, Tsinghua University. Since 2009, he has
pp. 1449–1457. been an Associate Professor and Ph.D. Supervisor.
[30] J. Hu, L. Shen, G. Sun, S. Albanie, and E. Wu, “Squeeze-and-excitation He has authored or coauthored more than 100 journal
networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, and conference papers and holds more than 40 na-
pp. 2011–2023, Aug. 2020. tional invention patents. His research interests include
[31] LTE; Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing space-time coding, MIMO, and OFDM systems. He
and Channel Coding, document TS 36.212, Version 15.9.0, Release 15, is the Standard Committee Member of the Chinese national digital terrestrial
3GPP, European Telecommunication Standard Institute Standard, 2020. television broadcasting standard.

Authorized licensed use limited to: Tsinghua University. Downloaded on February 26,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.

张维为：中国震撼
No ratings yet
张维为：中国震撼
130 pages
论文格式范文：论人类命运共同体构建中的和平搁置争端黄瑶
No ratings yet
论文格式范文：论人类命运共同体构建中的和平搁置争端黄瑶
25 pages
10. 帝国主义与依附
No ratings yet
10. 帝国主义与依附
518 pages
Deep Learning-Based End-to-End Wireless Communication Systems With Conditional GANs
No ratings yet
Deep Learning-Based End-to-End Wireless Communication Systems With Conditional GANs
11 pages
2. 制造同意：垄断资本主义劳动过程的变迁
No ratings yet
2. 制造同意：垄断资本主义劳动过程的变迁
274 pages
1. 劳动与垄断资本：二十世纪中劳动的退化
No ratings yet
1. 劳动与垄断资本：二十世纪中劳动的退化
444 pages

Deep Learning Based Pilot-Free Transmission Error Correction Coding

Uploaded by

Deep Learning Based Pilot-Free Transmission Error Correction Coding

Uploaded by

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 72, NO.

12, DECEMBER 2023 16031

Deep Learning Based Pilot-Free Transmission: Error

channels. One is bilinear production, which can effectively help

where ρ, Tt , and g(t) denote the power of the transmitter, the

m TGt yields the received symbols

Algorithm 1: Training Strategy of the Whole TransECCNet.

extended by G-fold. To cope with the non-differential problem

Fig. 7. The diagram of the self-attention.

Algorithm 2: Detail Algorithm of the Transformer Aided

Fig. 9. The detailed design of the SE preamplification.

the extractor has generous parameters, increasing the burden of

B. Squeeze and Excitation Preamplifier Design

range of Eb /N0 . The learning rates of the Adam optimizers are

TransECCNet has nearly 3 dB gain compared with only ECCNet

First, we compare the probability density functions (PDFs) of

TABLE II scheme, meanwhile the low-resolution reception can greatly

You might also like