No Downlink Pilots Are Needed in TDD Massive Mimo: Hien Quoc Ngo, Member, IEEE, and Erik G. Larsson, Fellow, IEEE

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. XX, NO.
X, XXX 2016 1
No Downlink Pilots are Needed in TDD Massive

MIMO
Hien Quoc Ngo, Member, IEEE, and Erik G. Larsson, Fellow, IEEE
Abstract—We consider the Massive Multiple-Input Multiple- Conventionally, each user is assumed to approximate its
Output downlink with maximum-ratio and zero-forcing process- instantaneous channel gain by its mean [8]–[10]. This is
arXiv:1606.02348v3 [cs.IT] 23 Dec 2016
ing and time-division duplex operation. To decode, the users must known to work well in Rayleigh fading. Since Rayleigh fading
know their instantaneous effective channel gain. Conventionally,
it is assumed that by virtue of channel hardening, this instanta- channels harden when the number of BS antennas is large
neous gain is close to its average and hence that users can rely (the effective channel gains become nearly deterministic), the
on knowledge of that average (also known as statistical channel effective channel gain is close to its mean. Thus, using the
information). However, in some propagation environments, such mean of this gain for signal detection works very well. This
as keyhole channels, channel hardening does not hold. way, downlink pilots are avoided and users only need to know
We propose a blind algorithm to estimate the effective channel
gain at each user, that does not require any downlink pilots. We the channel statistics. However, for small or moderate numbers
derive a capacity lower bound of each user for our proposed of antennas, the gain may still deviate significantly from its
scheme, applicable to any propagation channel. Compared to the mean. Also, in propagation environments where the channel
case of no downlink pilots (relying on channel hardening), and does not harden, using the mean of the effective gain as
compared to training-based estimation using downlink pilots, our substitute for its true value may result in poor performance
blind algorithm performs significantly better. The difference is
especially pronounced in environments that do not offer channel even with large numbers of antennas.
hardening. The users may estimate their effective channel gain by
using downlink pilots, see [2] for single-cell systems and
Index Terms—Blind channel estimation, downlink, keyhole
channels, Massive MIMO, maximum-ratio processing, time- [11] for multi-cell systems. Effectively, these downlink pilots
division duplexing, zero-forcing processing. are orthogonal between the users and beamformed along
with the downlink data. The users may use, for example,
I. I NTRODUCTION linear minimum mean-square error (MMSE) techniques for
the estimation of this gain. The downlink rates of multi-
N Massive Multiple-Input Multiple-Output (MIMO), the
I base station (BS) is equipped with a large antenna array
(with hundreds of antennas) that simultaneously serves many
cell systems for maximum-ratio (MR) and zero-forcing (ZF)
precoders with and without downlink pilots were analyzed
in [12]. The effect of using outdated gain estimates at the
(tens or more of) users. It is a key, scalable technology for next
users was investigated in [13]. Compared with the case when
generations of wireless networks, due to its promised huge
the users rely on statistical channel knowledge, the downlink-
energy efficiency and spectral efficiency [2]–[7]. In Massive
pilot based schemes improve the system performance in low-
MIMO, time-division duplex (TDD) operation is preferable,
mobility environments (where the coherence interval is long).
because the amount of pilot resources required does not
However, in high-mobility environments, they do not work
depend on the number of BS antennas. With TDD, the BS
well, owing to the large requirement of downlink training
obtains the channel state information (CSI) through uplink
resources; this required overhead is proportional to the number
training. This CSI is used to detect the signals transmitted from
of multiplexed users. A better way of estimating the effective
users in the uplink. On downlink, owing to the reciprocity of
channel gain, which requires less resources than the transmis-
propagation, CSI acquired at the BS is used for precoding.
sion of downlink pilots does, would be desirable.
Each user receives an effective (scalar) channel gain multi-
Inspired by the above discussion, in this paper, we consider
plied by the desired symbol, plus interference and noise. To
the Massive MIMO downlink with TDD operation. The BS
coherently detect the desired symbol, each user should know
acquires channel state information through the reception of up-
its effective channel gain.
link pilot signals transmitted by the users – in the conventional
Manuscript received May 09, 2016; revised September 07, 2016 and Octo- manner, and when transmitting data to the users, it applies MR
ber 28, 2016; accepted November 08, 2016. The associate editor coordinating or ZF processing with slow time-scale power control. For this
the review of this paper and approving it for publication was Dr. Jun Zhang.
This work was supported in part by the Swedish Research Council (VR), the system, we propose a simple blind method for the estimation
Swedish Foundation for Strategic Research (SSF), and ELLIIT. Part of this of the effective gain, that each user should independently
work was presented at the 2015 IEEE International Conference on Acoustics, perform, and which does not require any downlink pilots.
Speech and Signal Processing (ICASSP) [1].
H. Q. Ngo and E. G. Larsson are with the Department of Electrical Our proposed method exploits the asymptotic properties of
Engineering (ISY), Linköping University, 581 83 Linköping, Sweden (Email: the received data in each coherence interval. Our specific
hien.ngo@liu.se; egl@isy.liu.se). H. Q. Ngo is also with the School of contributions are:
Electronics, Electrical Engineering and Computer Science, Queen’s University
Belfast, Belfast BT3 9DT, U.K. • We give a formal definition of channel hardening, and
Digital Object Identifier xxx/xxx an associated criterion that can be used to test if chan-
2 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. XX, NO. X, XXX 2016
nel hardening holds. Then we examine two important A. Uplink Training

propagation scenarios: independent Rayleigh fading, and Let τc be the length of the coherence interval (in symbols).
keyhole channels. We show that Rayleigh fading channels For each coherence interval, let τu,p be the length of uplink
harden, but keyhole channels do not. training duration (in symbols). All users simultaneously send
• We propose a blind channel estimation scheme, that each pilot sequences of length τu,p symbols each to the BS. We
user applies in the downlink. This scheme exploits the assume that these pilot sequences are pairwisely orthogonal.
asymptotic properties of the sample average power of the So it is required that τu,p ≥ K. The linear MMSE estimate
received signal per coherence interval. We presented a of gk is given by [14]
preliminary version of this algorithm in [1]. √
τu,p ρu βk τu,p ρu βk
• We derive a rigorous capacity lower bound for Massive ĝk = gk + wp,k , (2)
MIMO with estimated downlink channel gains. This τu,p ρu βk + 1 τu,p ρu βk + 1
bound can be applied to any types of channels and can be where wp,k ∼ CN (0, IM ) independent of gk , and ρu is the
used to analyze the performance of any downlink channel transmit signal-to-noise ratio (SNR) of each pilot symbol.
estimation method. The variance of the mth element of ĝk is given by
• Via numerical results we show that, in hardening propaga-
τu,p ρu βk2
tion environments, the performance of our proposed blind Var {ĝkm } = E |ĝkm |2 = , γk . (3)
scheme is comparable to the use of only statistical chan- τu,p ρu βk + 1
nel information (approximating the gain by its mean). In Let g̃k = gk − ĝk be the channel estimation error, and g̃km
contrast, in non-hardening propagation environments, our be the mth element of g̃k . Then from the properties of linear
proposed scheme performs much better than the use of MMSE estimation, g̃km and ĝkm are uncorrelated, and
statistical channel information only. The results also show
that our blind method uniformly outperforms schemes Var {g̃km } = E |g̃km |2 = βk − γk . (4)
based on downlink pilots [2], [11]. In the special case where gk is Gaussian distributed (corre-
Notation: We use boldface upper- and lower-case letters sponding to Rayleigh fading channels), the linear MMSE es-
to denote matrices and column vectors, respectively. Specific timator becomes the MMSE estimator and g̃km is independent
notation and symbols used in this paper are listed as follows: of ĝkm .
()∗ , ()T , and ()H Conjugate, transpose, and transpose
conjugate
B. Downlink Data Transmission
det (·) and Tr (·) Determinant and trace of a matrix
CN (0, Σ) Circularly symmetric complex Let sk (n) be the nth symbol intended for the kth user.
Gaussian vector with zero mean We assume that E s(n)s(n)H = IK , where s(n) ,
and covariance matrix Σ [s1 (n), . . . , sK (n)]T . With linear processing, the M × 1 pre-
| · |, k · k Absolute value, Euclidean norm coded signal vector is
E {·}, Var {·} Expectation, variance operators K
√ X√
→
P
Convergence in probability x(n) = ρd ηk ak sk (n), (5)
k=1
In n × n identity matrix
[A]k , ak The kth column of A. where {ak }, k = 1, . . . , K, are the precoding vectors which
are functions of the channel estimate Ĝ , [ĝ1 , . . . , ĝK ], ρd is
II. S YSTEM M ODEL the (normalized) average transmit power, {ηk } are the power
coefficients, and Dη is a diagonal matrix with {ηk } on its
We consider a single-cell Massive MIMO system with an
diagonal. For a given {ak }, the power control coefficients {ηk }
M -antenna BS and K single-antenna users, where M > K.
are chosen to satisfy an average power constraint at the BS:
The channel between the BS and the kth user is an M × 1

channel vector, denoted by gk , and is modelled as: E kx(n)k2 ≤ ρd . (6)
p
g k = β k hk , (1) The signal received at the kth user is1
where βk represents large-scale fading which is constant yk (n) = gkH x(n) + wk (n)
over many coherence intervals, and hk is an M × 1 small- K
√ X √
scale fading channel vector. We assume that the elements = ρd ηk αkk sk (n) + ρd ηk′ αkk′ sk′ (n) + wk (n), (7)
of hk are uncorrelated, zero-mean and unit-variance random k′ 6=k
variables (RVs) which are not necessarily Gaussian distributed.
where wk (n) ∼ CN (0, 1) is additive Gaussian noise, and
Furthermore, hk and hk′ are assumed to be independent, for
k 6= k ′ . The mth elements of gk and hk are denoted by gkm αkk′ , gkH ak′ .
and hmk , respectively.
Then, the desired signal sk is decoded.
Here, we focus on the downlink data transmission with TDD
We consider two linear precoders: MR and ZF processing.
operation. The BS uses the channel estimates obtained in the
uplink training phase, and applies MR or ZF processing to 1 Here we restrict our consideration to one coherence interval so that the
transmit data to all users in the same time-frequency resource. channels remain constant.
NGO et al.: NO DOWNLINK PILOTS ARE NEEDED IN TDD MASSIVE MIMO 3
• MR processing: here the precoding vectors {ak } are In (13) we have used the facts that
ĝk 2
ak = , k = 1, . . . , K. (8) 1 + ρd ηk kgk k2 1
M + ρd ηk kgM
kk
kĝk k = 1 → 1, as M → ∞,
1 + ρd ηk βk M M + ρd ηk βk
• ZF processing: here the precoding vectors are
−1 and for k 6= k ′ ,
1 H
ak =
−1 Ĝ Ĝ Ĝ
, (9)

Ĝ ĜH Ĝ k ρd ηk gkH gk′ ρd ηk gkH gk′ /M
= → 0, as M → ∞.

k 1 + ρd ηk′ βk′ M 1/M + ρd ηk′ βk′
for k = 1, . . . , K. The limit in (13) implies that if the channel hardens, the sum-
With the precoding vectors given in (8) and (9), the power capacity (12) can be approximated for M ≫ K as:
constraint (6) becomes K
X
K
X C≈ max log2 (1 + ρd ηk βk M ) , (14)
ηk ≤ 1. (10) K
P
ηk ≥0, k=1 ηk ≤1
k=1
k=1
which does not depend on the small-scale fading. As a
III. P RELIMINARIES OF C HANNEL H ARDENING consequence, the system scheduling, power allocation, and
One motivation of this work is that Massive MIMO chan- interference management can be done over the large-scale
nels may not always harden. In this section we discuss the fading time scale instead of the small-scale fading time scale.
channel hardening phenomena. We specifically study channel Therefore, the overhead for these system designs is signifi-
hardening for independent Rayleigh fading and for keyhole cantly reduced.
channels. Another important advantage is: if the channel hardens, then
Channel hardening is a phenomenon where the norms of we do not need instantaneous CSI at the receiver to detect the
the channel vectors {gk }, k = 1, . . . , K, fluctuate only little. transmitted signals. What the receiver needs is only the statisti-
We say that the propagation offers channel hardening if cal knowledge of the channel gains. This reduces the resources
(power and training duration) required for channel estimation.
kgk k2 P
→ 1, as M → ∞, k = 1, . . . , K. (11) More precisely, consider the signal received at the kth user
E {kgk k2 } given in (7). The kth user wants to detect sk from yk . For this
purpose, it needs to know the effective channel gain αkk . If
A. Advantages of Channel Hardening the channel hardens, then αkk ≈ E {αkk }. Therefore, we can
If the BS and the users know the channel G perfectly, the use the statistical properties of the channel, i.e., E {αkk } is
channel is deterministic and its sum-capacity is given by [15] a good estimate of αkk when detecting sk . This assumption
is widely made in the Massive MIMO literature [8]–[10] and
C= max log2 det IM + ρd GDη GH , (12)
K
circumvents the need for downlink channel estimation.
P
ηk ≥0, k=1 ηk ≤1
where Dη is the diagonal matrix whose kth diagonal element

is the power control coefficient ηk . B. Measure of Channel Hardening
In Massive MIMO, for most propagation environments, we
gH g We next state a simple criterion, based on the Chebyshev
have asymptotically favorable propagation [16], i.e. kM k′ →
′ inequality, to check whether the channel hardens or not. A
0, as M → ∞, for
n k 6= ko . In addition, if the channel hardens,
kgk k2 2 similar method was discussed in [17]. From Chebyshev’s
i.e., M → E kgk k = βk , as M → ∞,2 then we have, inequality, we have
for fixed K,  2 
K 
 kg k2 

X k
C− max log2 (1 + ρd ηk βk M ) Pr n o − 1 ≤ ǫ

K
 E kgk k2
P  
ηk ≥0, k=1 ηk ≤1 
k=1
 

β1 · · · 0  2 

 kg k2 
max log2 detIK +ρd Dη M  ... . . . ... 
   
=C− = 1 − Pr n
k
o − 1 ≥ ǫ

K
P
ηk ≥0, k=1 ηk ≤1  E kg k2
0 · · · βK  k


 1+ρ η kg k2 ρd η1 g1H gK
 o n
1+ρd η1 β1 M · · · 1+ρd ηK βK M
d 1 1
2
  1 Var kg k k
= P max log2 det  .. .. ..  ≥1− · n o2 , for any ǫ ≥ 0. (15)
  . . .  ǫ 2
ηk ≥0, K
k=1 ηk ≤1 H 2 E kgk k
ρd ηK gK g1 1+ρd ηK kgK k
1+ρd η1 β1 M · · · 1+ρd ηK βK M
→ 0, as M → ∞. (13) Clearly, if
n o
2
2 Note that favorable propagation and channel hardening are two different Var kgk k
1 H
properties of the channels. Favorable propagation, M gk gk′ → 0 as M → n o2 → 0, as M → ∞, (16)
∞, does not imply hardening, M 1
kgk k2 → βk . One example of the contrary 2
E kgk k
is the keyhole channel in Section III-C2.
modelled as [22]:
nk
p X (k) (k) (k)
gk = βk c j a j bj , (18)
j=1
(k)
where nk is the number of effective keyholes, aj is the
random channel gain from the kth user to the jth keyhole,
(k)
bj ∈ CM×1 is the random channel vector between the jth
(k)
keyhole associated with the kth user and the BS, and cj
represents the deterministic complex gain of the jth keyhole
(k) (k)
associated with the kth user. The elements of bj and aj
Fig. 1. Examples of keyhole channels: (1)—keyhole effects occur when the (k)
are i.i.d. CN (0, 1) RVs.
Furthermore, the gains {cj } are
distance between transmitter and receiver is large. The transmitter and the m 2

receiver have their own local scatters which yield locally uncorrelated fading. normalized such that E |gk | = βk . Therefore,
However, the scatter rings are much smaller than the distance between them, nk
the channel becomes low rank, and hence keyhole effects occur [20]; (2)— X (k) 2
the receiver is located inside a building, the only way for the radio wave ci = 1. (19)
to propagation from the transmitter to the receiver is to go through several i=1
narrow holes which can be considered as keyholes; and (3)—the transmitter
and the receiver are separated by a tunnel. When nk = 1, we have a degenerate keyhole (single-keyhole)
channel. Conversely, when nk → ∞, under the additional
(k) (k)
assumptions that ci 6= 0 for finite nk and ci → 0 as
we have channel hardening. In contrast, (11) implies nk → ∞, we obtain an i.i.d. Rayleigh fading channel.
n o We assume that different users have different sets of key-
Var kgk k2 holes. This assumption is reasonable if the users are located at
n o2 → 0, as M → ∞, random in a large area, as illustrated in Figure 1. Then from
2
E kgk k the derivations in Appendix A, we obtain
n o
2
so if (16) does not hold, then the channel does not harden. Var kgk k
1 X
nk
(k) 4 1
Var {kgk k2 } n o2 = 1 + ci +
Therefore, we can use to determine if channel 2 M i=1 M
(E{kgk k2 })2 E kgk k
hardening holds for a particular propagation environment. nk
X (k) 4
→ ci 6= 0, M → ∞. (20)
i=1
C. Independent Rayleigh Fading and Keyhole Channels
Consequently, the keyhole channels do not harden. In addi-
In this section, we study the channel hardening property of
(k) 2
two particular channel models: Rayleigh fading and keyhole tion, since ci ≤ 1, we have
channels. n o
1) Independent Rayleigh Fading Channels: Consider the Var kgk k2 nk
1 X
(k) 2 1
channel model (1) where {hm o2 ≤ 1 + ci + . (21)
k } (the elements of hk ) are i.i.d.
n
2 M i=1 M
CN (0, 1) RVs. Independent Rayleigh fading channels occur E kgk k
in a dense,isotropic
scattering environment [18]. By using the Using (19), (21) becomes
identity E kgk k4 = βk2 (M + 1)M [19], we obtain n o
2
n o Var kgk k 2
Var kgk k2 n o2 ≤ 1 + , (22)
1 4
2 M
n o2 = 2 2 E kgk k − 1 E kgk k
2 βk M
E kgk k
where the right hand side corresponds to the case of single-
1 keyhole channels (nk = 1). This implies that a single-keyhole
= → 0, M → ∞. (17)
M channel represents the worst case in the sense that then the
Therefore, we have channel hardening. channel gain fluctuates the most.
2) Keyhole Channels: A keyhole channel (or double scat-
tering channel) appears in scenarios with rich scattering around IV. P ROPOSED D OWNLINK B LIND C HANNEL E STIMATION
the transmitter and receiver, and where there is a low-rank con- T ECHNIQUE
nection between the two scattering environments. The keyhole The kth user should know the effective channel gain αkk
effect can occur when the radio wave goes through tunnels, to coherently detect the transmitted signal sk from yk in (7).
corridors, or when the distance between the transmitter and Most previous works on Massive MIMO assume that E {αkk }
receiver is large. Figure 1 shows some examples where the is used in lieu of the true αkk when detecting sk . The reason
keyhole effect occurs in practice. This channel model has been behind this is that if the channel is subject to independent
validated both in theory and by practical experiments [21]– Rayleigh fading (the scenario considered in most previous
[24]. Under keyhole effects, the channel vector gk in (1) is Massive MIMO works), it hardens when the number of BS
antennas is large, and hence αkk ≈ E {αkk }; E {αkk } is then In case the argument of the square root is non-positive, we set
a good estimate of αkk . However, as seen in Section III, under the estimate |αkk | equal to E {|αkk |}.
other propagation models the channel may not always harden For completeness, the kth user also needs to estimate the
when M → ∞ and then, using E {αkk } as the true effective phase of αkk . When M is large, with high probability, the real
channel αkk to detect sk may result in poor performance. part of αkk is much larger than the imaginary part of αkk .
For the reasons explained, it is desirable that the users Thus, the phase of αkk is very small and can be set to zero.
estimate their effective channels. One way to do this is to have Based on that observation, we propose to treat the estimate of
the BS transmit beamformed downlink pilots [2]. Then at least [
|αkk | as the estimate of the true αkk : α̂kk = |αkk |
K downlink pilot symbols are required. This can significantly The algorithm for estimating the downlink effective channel
reduce the spectral efficiency. For example, suppose M = 200 gain αkk is summarized as follows:
antennas serve K = 50 users, in a coherence interval of length
200 symbols. If half of the coherence interval is used for the
downlink, then with the downlink beamforming training of [2], Algorithm 1: (Blind downlink channel estimation method)
we need to spend at least 50 symbols for sending pilots. As 1. For each coherence interval, using a data block of τd
a result, less than 50 of the 100 downlink symbols are used samples yk (n), compute ξk according nP to (23). o
for payload in each coherence interval, and the insertion of K 2
2. The kth user acquires ηk and E ′ ηk ′ |αkk′ | .
k 6=k
the downlink pilots reduces the overall (uplink + downlink) See Remark 1 for a detailed discussion on how to
spectral efficiency by a factor of 1/4. acquire these values.
In what follows, we propose a blind channel estimation 3. The estimate of the effective channel gain αkk is as
method which does not require any downlink pilots. r nP o
K

 ξk −1−ρd E η |αkk′ |2
k′ 6=k k′

 ρd ηk n ,
A. Downlink Blind Channel Estimation Algorithm o (27)
α̂kk = PK 2
We next describe our downlink blind channel estimation 
 if ξk > 1 + ρd E k ′ 6=k ηk′ |αkk′ |


algorithm, a refined version of the scheme in [1]. Consider E {|αkk |} , otherwise.
the sample average power of the received signal at the kth
user per coherence interval: Remark 1: Ton implement Algorithm
PK o 1, the kth user has to
|yk (1)|2 + |yk (2)|2 + . . . + |yk (τd )|2 know ηk and E k′ 6=k ηk |αkk |
′ ′
2
. We assume that the kth
ξk , , (23)
τd user knows these values. This assumption is reasonable since
where yk (n) is the nth sample received at the kth user and these values depend only on the large-scale fading coefficients,
τd is the number of symbols per coherence interval spent on which stay constant over many coherence intervals. The BS
downlink transmission. From (7), and by using the law of large can compute these
nP values and inform o the kth user about them.
K 2
numbers, we have, as τd → ∞, In addition E k 6=k k
′ η ′ |α kk ′ | can be expressed in closed
 
K
X form (except for in the case of ZF processing with keyhole
2 2 P
ξk − ρd ηk |αkk | + ρd ηk′ |αkk′ | + 1 → 0. (24) channels) as follows:
k′ 6=k 
 PK
PK 2
Since k′ 6=k ρd ηk′ |αkk′ | is a sum of many terms, it can


 ηk′ βk , for MR,
  
be approximated by its mean (this follows from the law of X K   k′ 6=k

(Rayleigh/keyhole channels)
large numbers). As a consequence, when K, and τd are large, E ηk′ |αkk′ |2 = K (28)
 ′   P
ξk in (23) can be approximated as follows: k 6=k 

 η k ′ (β k − γ k ), for ZF.
  
 k′ 6=k
K


X  (Rayleigh channels)
ξk ≈ ρd ηk |αkk |2 + ρd E ηk′ |αkk′ |2 + 1. (25)
 ′ 
k 6=k Detailed derivations of (28) are presented in Appendix B.
Furthermore, the approximation (25) is still good even if K
is small. The reasonPis that when K is small, with high
K B. Asymptotic Performance Analysis
probability the term k′ 6=k ηk′ |αkk′ |2 is much smaller than
ηk |αkk |2 ,Psince with high probability |αkk′ |2 ≪ |αkk |2 . As In this section, we analyze the accuracy of our proposed
K
a result, k′ 6=k ηk′ |αkk′ |2 can be approximated by its mean downlink blind channel estimation scheme when τc and M
even for small K. (In fact, in the special case of K = 1, this go to infinity for two specific propagation channels: Rayleigh
sum is zero.) fading and keyhole channels. We use the model (18) for
Equation (25) enables us to estimate the amplitude of the keyhole channels. When τc → ∞, ξk in (23) is equal to its
effective channel gain αkk using the received samples via ξk asymptotic value:
as follows:
v n o  
u K
u ξk − 1 − ρd E PK′ ηk′ |αkk′ |2 X
[ t k 6=k ξk − ρd ηk |αkk |2 + ρd ηk′ |αkk′ |2 + 1 → 0, (29)
|α kk | = . (26)
ρd ηk k′ 6=k
and hence, the channel estimate α̂kk in (27) becomes Our proposed scheme is expected to work very well at
s large τc and M .
 K
P ηk′ - Keyhole channels: Following a similar methodology


 |αkk |2 + ′ 2 ′ 2
ηk (|αkk(| − E {|αkk | }),
 ′
k 6=k
) used in the case of Rayleigh fading, and using the identity
K
α̂kk = P 2 (30)
if ξk > 1 + ρd E ηk′ |αkk′ | , nk


 k′ 6=k gkH gk′ p X (k) (k) (k)

 = βk c j a j νj , (36)
E {|αkk |} , otherwise. kgk′ k j=1
Since τc → ∞, it is reasonable to assume that the BS can
(k′ )
bj
H
gk ′
(k)
perfectly estimate the channels in the uplink training phase, where νj , kgk′ k is CN (0, 1) distributed, we
i.e., we have Ĝ = G. (This can be achieved by using very can arrive at the same result as (35). The random variable
(k)
long uplink training duration.) With this assumption, αkk is a νj is Gaussian due to the fact that conditioned on gk′ ,
positive real value. Thus, (30) can be rewritten as (k)
νj is a Gaussian RV with zero mean and unit variance
s which is independent of gk′ .
PK

 ηk′ |αkk′ |2 −E{|αkk′ |2 }


 1+ ηk α2kk
, 2) Zero-forcing Processing: With ZF processing, when
 k′ 6=k
α̂kk  ( ) τc → ∞,
= K
P (31)
αkk 
 if ξk > 1 + ρd E ηk′ |αkk′ |2 , α̂kk

 ′
k 6=k → 1, as M → ∞. (37)


 E{αkk } αkk
, otherwise.
αkk This follows from (29) and the fact that αkk′ → 0, for k 6= k ′ .
1) Maximum-Ratio Processing: With MR processing, from
(28) and (31), we have V. C APACITY L OWER B OUND
v


u g gk′ 2
H
k

−βk
Next, we give a new capacity lower bound for Massive
 u K
t P ηk′ k g
k′ k MIMO with downlink channel gain estimation. It can be


 1 + ηk kg k2
,
α̂kk  k′ 6=k
k
applied, in particular, to our proposed blind channel esti-
= K
P (32) mation scheme.3 Denote by yk , [yk (1) . . . yk (τd )]T ,
αkk  if ξk > 1 + ρd ηk′ βk ,


 sk , [sk (1) . . . sk (τd )]T , and wk , [wk (1) . . . wk (τd )]T .

 k′ 6=k

 E{kgk k} , otherwise. Then from (7), we have
kgk k
K
- Rayleigh fading channels: Under Rayleigh fading chan- √ X √
yk = ρd ηk αkk sk + ρd ηk′ αkk′ sk′ + wk . (38)
nels, αkk = kgk k, and hence, k′ 6=k
 
 X K  The capacity of (38) is lower bounded by the mutual
Pr ξk > 1 + ρd ηk′ βk information between the unknown transmitted signal sk and
 
k′ 6=k the observed/known values yk , α̂kk . More precisely, for any
 
 K K  distribution of sk , we obtain the following capacity bound for
X X
= Pr 1 +
2
ρd ηk′ |αkk′ | > 1 + ρd ηk′ βk the kth user:

k′ =1 k′ 6=k
 1
  Ck ≥ I(yk , α̂kk ; sk )
 K  τd
X
2
≥ Pr ρd ηk |αkk | > ρd ηk′ βk 1
= h(sk ) − h(sk |yk , α̂kk )

k′ 6=k
 τd
  (a) 1 1h
 1 K = h(sk )− h sk (1)|yk , α̂kk +h sk (2)|sk (1), yk , α̂kk
2 1 X ηk′  τd τd
= Pr kgk k > βk i
M M ′ ηk  + . . . + h sk (τd )|sk (1), . . . , sk (τd − 1), yk , α̂kk
k 6=k
→ 1, as M → ∞, (33) (b) 1 1
≥ h(sk ) − h (sk (1)|yk , α̂kk ) + h (sk (2)|yk , α̂kk )
where the convergence follows the fact that M 1
kgk k →
2 τd τd
1 P K η ′
βk and M k′ 6=k ηk βk → 0, as M → ∞.
k + . . . + h (sk (τd )|yk , α̂kk ) , (39)
In addition, by the law of large numbers, where in (a) we have used the chain rule [25], and in (b) we
H 2
gk gk ′ ! have used the fact that conditioning reduces entropy.
kgk′ k − βk gk gk ′ 2 M
H
βk M It is difficult to compute h (sk (n)|yk , α̂kk ) in (39) since
= 2 − M

kg k
k
2 M kg ′ k
k kg k2 k
α̂kk and sk (n) are correlated. To render the problem more
ˆ kk (n)}, n = 1, ..., τd ,
tractable, we introduce new variables {α̂
→ 0, as M → ∞. (34)
3 In Massive MIMO, the bounding technique in [8], [10] is commonly used
From (32), (33), and (34), we obtain due to its simplicity. This bound is, however, tight only when the effective
α̂kk channel gain αkk hardens. As we show in Section III, channel hardening does
→ 1, as M → ∞. (35) not always hold (for example, not in keyhole channels). A detailed comparison
αkk between our new bound and the bound in [8], [10] is given in Section VII-C.
which can be considered as the channel estimates of αkk using where △xi , xi − xi−1 . Precise steps to compute (44) are:
Algorithm 1, but ξk is now computed as 1. Generate N random realizations of the channel G. Then
2
|yk (1)|2 + . . . + |yk (n − 1)|2 + |yk (n + 1)|2 . . . + |yk (τd )|2 the corresponding N × 1 random vectors of αkk , |αkk′ | ,
. ˆ kk (1) are obtained.
and α̂
τd − 1
ˆ kk (n) is very close to α̂kk . More importantly, 2. From sample vectors obtained in step 1, numeri-
Clearly, α̂ cally build the density function {pα̂ˆ kk (1) (xi )} and
ˆ
α̂kk (n) is independent of sk′ (n), k ′ = 1, ..., K. This fact will the joint density functions {pαkk ,α̂ˆ kk (1) (yj , xi )} and
be used for subsequent derivation of the capacity lower bound.
ˆ kk (n) is a deterministic function of yk , p|α ′ |2 ,α̂ˆ kk (1) (zn , xi ). These density functions can be
Since α̂ kk
numerically computed using built-in functions in MAT-
h (sk (n)|yk , α̂kk ) = h sk (n)|yk , α̂kk , α̂ ˆ kk (n) , and hence,
LAB such as “kde” and “kde2d”.
(39) becomes 3. Using (45), we compute the achievable rate (44) as (46),
1 1h ˆkk (1)

shown at the top of the next page, where
Ck ≥ h(sk ) − h sk (1)|yk , α̂kk , α̂
τd τd i X pαkk ,α̂ˆ kk (1) (yj , xi )
ˆ kk (τd )
+ . . . + h sk (τd )|yk , α̂kk , α̂ E { αkk | xi } = yj △yj , (47)
j
pα̂ˆ kk (1) (xi )
1 1h ˆ kk (1)

≥ h(sk ) − h sk (1)|yk (1), α̂ n o X p|α ′ |2 ,α̂ˆ kk (1) (zn , xi )
E |αkk′ |2 xi =
τd τd
zn △zn kk .
i pα̂ˆ kk (1) (xi )
ˆkk (τd ) , (40)
+ . . . + h sk (τd )|yk (τd ), α̂ n
(48)
where in the last inequality, we have used again the fact that Remark 3: The bound (44) relies on a worst-case Gaussian
conditioning reduces entropy. The bound (40) holds irrespec- noise argument [26]. Since the effective noise is the sum of
tive of the distribution of sk . By taking sk (1), . . . , sk (τd ) to many random terms, its distribution is, by the central limit
be i.i.d. CN (0, 1), we obtain theorem, close to Gaussian. Hence, our bounds are expected

ˆ kk (1) .
Ck ≥ log2 (πe) − h sk (1)|yk (1), α̂ (41) to be rather tight and they are likely to closely represent
what state-of-the-art coding would deliver in reality. (This is
The right hand side of (41) is the mutual information generally true for the capacity lower bounds used in much of
between yk (1) and sk (1) given the side information α̂ ˆ kk (1).
the Massive MIMO literature; see for example, quantitative
ˆ ′
Since α̂kk (1) and sk′ (1), k = 1, ..., K, are independent, we examples in [27, Myth 4].)
have
n o
ˆ kk (1) = 0,
E w̄k (1)| α̂ VI. N UMERICAL R ESULTS AND D ISCUSSIONS
n o In this section, we provide numerical results to evaluate our
ˆ kk (1) = 0,
E s∗k (1)w̄k (1)| α̂ proposed channel estimation scheme. We consider the per-user
n o
ˆ kk (1) = 0,
E α∗kk s∗k (1)w̄k (1)| α̂ (42) normalized MSE and net throughput as performance metrics.
We define
PK √
where w̄k (1) , k′ 6=k ρd ηk′ αkk′ sk′ (1) + wk (1). Hence we SNRd = ρd × median[cell-edge large-scale fading],
can apply the result in [26] to further bound the capacity for
the kth user as (43), shown at the top of the next page.4 and
Inserting (7) into (43), we obtain a capacity lower bound
SNRu = ρu × median[cell-edge large-scale fading],
(achievable rate) for the kth user given by (44) at the top of
the next page. where the cell-edge large-scale fading is the large-scale fading
Remark 2: The computation of then capacity lowerobound between the BS and a user located at the cell-edge. This gives
(44) involves the expectations E |αkk′ |2 α̂
ˆ
kk (1) and SNRd and SNRu the interpretation of the median downlink and
n o
ˆkk (1) the uplink cell-edge SNRs. For keyhole channels, we assume
E αkk | α̂ which cannot be directly computed. (k) √
n o nk = nKH and cj = 1/ nKH , for all k = 1, . . . , K and
2 ˆ
However, we can compute E |αkk′ | α̂ kk (1) and j = 1, . . . , nKH .
n o
ˆkk (1) numerically by first using Bayes’s rule and In all examples, we compare the performances of three
E αkk | α̂
cases: i) “use E {αkk }”, representing the case when the kth
then discretizing it using the Riemann sum:
Z Z user relies on the statistical properties of the channels, i.e.,
pX,Y (x, y) it uses E {αkk } as estimate of αkk ; ii) “DL pilots [2]”,
E {X|y} = xpX|Y (x|y)dx = x dx
x x pY (y) representing the use of beamforming training [2] with lin-
X pX,Y (xi , y) ear MMSE channel estimation; and iii) “proposed scheme”,
≈ xi △x i , (45)
pY (y) representing our proposed downlink blind channel estimation
i
scheme (using Algorithm 1). In our proposed scheme, the
4 The core argument behind the bound (43) is the maximum-entropy curves with τd = ∞ correspond to the case that the kth user
property of Gaussian noise [26]. Prompted by a comment from the reviewers, perfectly knows the asymptotic value of ξk . Furthermore, we
we stress that to obtain (43), it is not sufficient that the effective noise and
the desired signal are uncorrelated. It is also required that the effective noise choose τu,p = K. For the beamforming training scheme, the
and the desired signal are uncorrelated, conditioned on the side information. duration of the downlink training is chosen as τd,p = K.
  n o2 


E y ∗
(1)s k (1)| ˆkk (1)
α̂ 

k
Ck ≥ Rkblind
 
, E log2 1 + n o n o2  , (43)
 2 ˆ ∗ ˆ 
 E |yk (1)| α̂kk (1) − E yk (1)sk (1)| α̂kk (1)

  n o2 

 ρ d ηk

E αkk | ˆ kk (1)
α̂ 

Rkblind
 
= E log2 1 + P n o n o 2  , (44)
2 ˆ ˆ
1 + ρd K
 
k′ =1 ηk′ E |αkk′ | α̂kk (1) − ρd ηk E αkk | α̂kk (1)
 
 
X  ρd ηk |E { αkk | xi }|2 
Rkblind =
 
pα̂ˆ kk (1) (xi ) △xi log21+ K n o , (46)
 P 2 2
i 1 + ρd ηk E |αkk | xi − ρd ηk |E { αkk | xi }|
′ ′
k′ =1
1
10
A. Normalized Mean-Square Error
use E{α}
We consider the normalized MSE at user k, defined as: DL pilots [2]
n o 0 proposed scheme
Normalized Mean-Square Error

2 10
E |α̂kk − αkk |
MSEk , 2 . (49)
|E {αkk }| -1 τd = 50 τd = 100 τd = ∞
10
In this part, we choose βk = 1, and equal power allocation
to all users, i.e, ηk = 1/K, ∀k. Figures 2 and 3 show the
normalized MSE versus SNRd for MR and ZF processing, re- -2
10
spectively, under Rayleigh fading and single-keyhole channels.
Here, we choose M = 100, K = 10, and SNRu = 0 dB.
-3
We can see that, in Rayleigh fading channels, for both 10
MR and ZF processing, the MSEs of the three schemes (use
E {αkk }, DL pilots, and proposed scheme) are comparable.
-4
Using E {αkk } in lieu of the true αkk for signal detection 10
-10 -5 0 5 10
works rather well. However, in keyhole channels, since the SNRd (dB)
channels do not harden, the MSE when using E {αkk } as the (a) Rayleigh fading channels
estimate of αkk is very large. In both propagation environ- 1
10
ments, our proposed scheme works very well and improves
use E{α }
when τd increases (since the approximation in (25) becomes
DL pilots [2]
tighter). Our scheme outperforms the beamforming training 0 proposed scheme
10
scheme for a wide range of SNRs, even for short coherence
intervals. The training-based method uses the received pilot
signals only during a short time, to estimate the effective -1
10
channel gain. In contrast, our proposed scheme uses the
received data during a whole coherence block. This is the
basic reason for why our proposed scheme can perform better -2
10
than the training-based scheme. (Note also that the training-
based method is based on linear MMSE estimation, which is
suboptimal, but that is a second-order effect.) -3
10
Next we study the affects of the number of BS antennas and τd = 50 τd = 100 τd = ∞
the number of keyholes on the performance of our proposed
scheme. We choose K = 10, τd = 100, SNRu = 0 dB, -4
10
-10 -5 0 5 10
and SNRd = 5 dB. Figure 4 shows the normalized MSE SNRd (dB)
versus M for different numbers of keyholes nKH with MR and
(b) Single-keyhole channels
ZF processing. When nKH = ∞, we have Rayleigh fading.
As expected, the MSE reduces when M increases. More Fig. 2. Normalized MSE versus SNRd for different channel estimation
importantly, our proposed scheme works well even when M schemes, for MR processing. Here, M = 100, K = 10, and SNRu = 0 dB.
is not large. Furthermore, we can see that the MSE does
not change much when the number of keyholes varies. This
implies the robustness of our proposed scheme against the
1 -1
10 10
use E{α} Number of Keyholes maximum-ratio
DL pilots [2] -2
n = 1, 2, 4, ∞
0 proposed scheme 10
10

-3
10
-1
10
-4
10
τd = 50 50 100 150 200 250 300 350 400
-1
-2
10 10
zero-forcing
-2
10 Number of Keyholes
τd = 100 n = 1, 2, 4, ∞
-3
10
τd = ∞ -3
10
-4
10
-10 -5 0 5 10 -4
10
SNRd (dB) 50 100 150 200 250 300 350 400
Number of Base Station Antennas (M)
(a) Rayleigh fading channels
1
10 Fig. 4. Normalized MSE versus M for different number of keyholes nk =
use E{α} nKH , using Algorithm 1. Here, SNRu = 0 dB, SNRd = 5 dB, and K = 10.
DL pilots [2]
0 proposed scheme
10
where B is the spectral bandwidth, τc is again the coherence
interval in symbols, and τd is the number of symbols per
-1
10 coherence interval allocated for downlink transmission. Note
that RknoCSI , Rkpilot , and Rkblind are the corresponding achievable
rates of these cases. Rkblind is given by (44), while Rkpilot
-2
10 τd = 50 and RknoCSI can be computed by using (44), but α̂ ˆ kk (1) is
replaced with the channel estimate of αkk using scheme
τd
τd = 100 in [2] respectively E {αkk }. The term in (50) and (52)
-3
10 τc
τd = ∞ comes from the fact that, for each coherence interval of τc
samples, with our proposed scheme and the case of no channel
-4
10 estimation, we spend τd samples for downlink payload data
-10 -5 0 5 10 τd − τd,p
SNRd (dB) transmission. The term in (51) comes from the fact
τc
(b) Single-keyhole channels that we spend τd,p symbols on downlink pilots to estimate
the effect channel gains [2]. In all examples, we choose
Fig. 3. Same as Figure 2, but for ZF processing. B = 20 MHz and τd = τc /2 (half of the coherence interval
is used for downlink transmission).
We consider a more realistic scenario which incorporates
different propagation environments. the large-scale fading and max-min power control:
Note that, with the beamforming training scheme in [2],
• To generate the large-scale fading, we consider an
we additionally have to spend at least K symbols on training
annulus-shaped cell with a radius of Rmax meters, and
pilots (this is not accounted for here, since we only evaluate
the BS is located at the cell center. K + 1 users are
MSE). By contrast, our proposed scheme does not require any
placed uniformly at random in the cell with a minimum
resources for downlink training. To account for the loss due to
distance of Rmin meters from the BS. The user with
training, we will examine the net throughput in the next part.
the smallest large-scale fading βk is dropped, such that
K users remain. The large-scale fading is modeled by
B. Downlink Net Throughput path loss, shadowing (with log-normal distribution), and
random user locations:
The downlink net throughputs of three cases—use E {αkk }, υ
DL pilots, and proposed schemes—are defined as: dk σsh ·N (0,1)
βk = PL0 × 10 10 , (53)
Rmin
τd noCSI
SknoCSI = B R , (50)
τc k where υ is the path loss exponent and σsh is the standard
pilot τd − τd,p pilot deviation of the shadow fading. The factor PL0 in (53) is
Sk =B Rk , (51)
τc a reference path loss constant which is chosen to satisfy a
τd given downlink cell-edge SNR, SNRd . In the simulation,
Skblind = B Rkblind , (52)
τc we choose Rmin = 100, Rmax = 1000, υ = 3.8, and
1.0 1.0
use E{α}
0.9 0.9
DL pilots [2]
0.8 0.8 proposed scheme
perfect CSI
Cumulative Distribution
0.7 SNRd = -3 dB 0.7
SNRd = 5 dB
0.6 0.6
0.5 0.5
0.4 0.4
SNRd = -3 dB
0.3 0.3
use E{α} SNRd = 5 dB
0.2 0.2
DL pilots [2]
0.1 proposed scheme 0.1
perfect CSI
0.0 0.0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Per-User Net Throughput (Mbits/s) Per-User Net Throughput (Mbits/s)
(a) Rayleigh fading channels (a) Rayleigh fading channels
1.0 1.0
use E{α}
0.9 0.9
DL pilots [2]
0.8 proposed scheme 0.8
perfect CSI
0.7 0.7
SNRd
0.6 0.6
= -3 dB
0.5 0.5
SNRd = 5 dB SNRd = 5 dB
0.4 0.4
0.3 0.3
use E{α }
0.2 SNRd = -3 dB 0.2
DL pilots [2]
0.1 proposed scheme
0.1
perfect CSI
0.0 0.0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Per-User Net Throughput (Mbits/s) Per-User Net Throughput (Mbits/s)
(b) Single-keyhole channels (b) Single-keyhole channels
Fig. 5. The cumulative distribution of the per-user downlink net throughput Fig. 6. Same as Figure 5, but for ZF processing.
for MR processing. Here, M = 100, K = 10, τc = 200 (τd = 100),
SNRd = 10SNRu , and B = 20 MHz.
additionally add the curves labelled “perfect CSI”. These

σsh = 8 dB. We generate 1000 random realizations of curves represent the presence of a genie receiver at the
user locations and shadowing fading profiles. kth user, which knows the channel gain perfectly. For both
• The power control control coefficients {ηk } are chosen propagation environments, our proposed scheme is the best
from the max-min power control algorithm [28]: and performs very close to the genie receiver. For Rayleigh
 1+ρd βk
fading channels, due to the hardening property of the channels,
!, for MR,


 1
PK
1
K β
P k′
our proposed scheme and the scheme using statistical property
 ρd γk ρd γ ′ +
k′ =1 k
γ ′
k′ =1 k of the channels are comparable. These schemes perform better
ηk = 1+ρd (βk −γk ) (54)

 K K β −γ
! , for ZF. than the beamforming training scheme in [2]. The reason is
 1 1 ′ ′
k k
that, with beamforming training scheme, we have to spend
 P P
ρd γk ρd γ ′ + γ ′
k′ =1 k k′ =1 k
τd,p pilot samples for the downlink training. For single-
This max-min power control offers uniformly good ser- keyhole channels, the channels do not harden, and hence, it is
vice for all users for the case where the kth user uses necessary to estimate the effective channel gains. Our proposed
E {αkk } as estimate of αkk . scheme improves the system performance significantly. At
Figures 5 and 6 show the cumulative distributions of the SNRd = 5 dB, with MR processing, our proposed scheme
per-user downlink net throughput for MR respectively ZF can improve the 95%-likely net throughput by about 20%
processing, under Rayleigh fading and single-keyhole chan- and 60%, compared with the downlink beamforming training
nels. Here we choose M = 100, K = 10, τc = 200, scheme respectively the case of without channel estimation.
and SNRd = 10SNRu . As a baseline for comparisons, we With ZF processing, our proposed scheme can improve the
45 1.0
40 0.9
Average Net Throughput (Mbits/s)
K=5 0.8
35
0.7
30 SNRd = -3 dB SNRd = 5 dB
K = 10 0.6
25
0.5
20
0.4
15
0.3
use E{α
}
10 0.2
proposed scheme
DL pilots [2]
DL pilots
5 0.1 proposed scheme
use E{αkk} perfect CSI
0 0.0
50 100 150 200 250 300 350 400 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Coherence Interval τc (symbols) Per-User Net Throughput (Mbits/s)
(a) Rayleigh fading channels (a) Rayleigh fading channels
40 1.0
K=5 use E{α}
0.9
35 DL pilots [2]
Average Net Throughput (Mbits/s)
0.8 proposed scheme

perfect CSI
30
0.7
25 0.6
20 0.5
0.4
15
0.3
10 SNRd = 5 dB
proposed scheme 0.2
K = 10 DL pilots
5 0.1 SNRd = -3 dB
use E{αkk}
0 0.0
50 100 150 200 250 300 350 400 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Coherence Interval τc (symbols) Per-User Net Throughput (Mbits/s)
(b) Single-keyhole channels (b) Single-keyhole channels
Fig. 7. The average per-user downlink net throughput for MR processing. Fig. 8. Same as Figure 5, but with long-term average power constraint (55).
Here, M = 100, SNRd = 10SNRu = 5 dB, and B = 20 MHz.
VII. C OMMENTS
A. Short-Term V.s. Long-Term Average Power Constraint
95%-likely net throughput by 15% and 66%, respectively.
The MSE of “use E {αkk }” does not depend on SNRd (see The precoding vectors ak in (8) and (9) are chosen to satisfy
Figures 2 and 3), but it depends on SNRu . In Figures 5 and a short-term average power constraint where the expectation
6, when SNRd increases, SNRu also increases, and hence, the of (6) is taken over only s(n). This short-term average power
per-user throughput gaps between the “use E {αkk }” curves constraint is not the only possibility. Alternatively, one could
and the “perfect CSI” curves vary as SNRd increases. consider a long-term average power constraint where the
expectation in (6) is taken over s(n) and over the small-scale
Finally, we investigate the effect of the coherence interval fading. With MR combining, the long-term-average-power-
τc and the number of users K on the performance of our based precoding vectors {ak } are
proposed scheme. Figure 7 shows the average downlink net
throughput versus τc with MR processing for different K ĝk ĝk
ak = p =√ , k = 1, . . . , K. (55)
in both Rayleigh fading and keyhole channels. The average 2
E {kĝk k } M γk
is taken over the large-scale fading. Our proposed scheme However, with ZF, the long-term-average-power-based pre-
overcomes the disadvantage of beamforming training scheme coder is not always valid. For example, for single-keyhole
in high mobility environments (short coherence interval), and channels, perfect uplink estimation, and K = 1, we have
the disadvantage of statistical property-based scheme in non- h
−1 i
2
hardening propagation environments, and hence, performs
E G G G H
, (56)
very well in many cases, even when τc and K are small. k
1.0
which is infinite.
use E{α}
We emphasize here that compared to the short-term average 0.9
bound (59)
power case, the long-term average power case does not make a 0.8 proposed bound (44)
difference in the sense that the resulting effective channel gain
does not always harden, and hence, it needs to be estimated. 0.7
(The harding property of the channels is discussed in detail 0.6

in Section III.) To see this more quantitatively, we compare maximum-ratio
0.5
the performance of three cases: “use E {αkk }”, “DL pilots
[2]”, and “proposed scheme” for MR with long-term average 0.4
zero-forcing
power constraint (55). As seen in Figure 8, under keyhole
0.3
channels, our proposed scheme improves the net throughput
significantly, compared to the “use E {αkk }” case. 0.2
0.1
B. Flaw of the Bound in [2], [11]
0.0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
In the above numerical results, the curves with downlink Per-User Net Throughput (Mbits/s)
pilots are obtained by first replacing α̂ˆ kk (1) in (44) with
(a) Rayleigh fading channels
the channel estimate obtained using the algorithm in [2], and
1.0
then using the numerical technique discussed in Remark 2 to
compute the capacity bound. 0.9
maximum-ratio
Closed-form expressions for achievable rates with downlink 0.8
training were given in [2, Eq. (12)] and [11]. However, those
formulas were not rigorously correct, since {akk′ } are non- Cumulative Distribution 0.7
Gaussian in general (even in Rayleigh fading) and hence the 0.6

linear MMSE estimate is not equal to the MMSE estimate;
0.5
the expressions for the capacity bounds in [2], [11] are
zero-forcing
valid only when the MMSE estimate is inserted. However, 0.4
the expressions [2], [11] are likely to be extremely accurate
0.3
approximations. A similar approximation was stated in [12].
0.2 use E{αkk}
0.1 bound (59)
C. Using the Capacity Bounding Technique of [8], [10]
proposed bound (44)
It may be tempting to use the bounding technique in [8], 0.0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
[10] to derive a simpler capacity bound as follows (the index Per-User Net Throughput (Mbits/s)
n is omitted for simplicity of notation): (b) Single-keyhole channels
i) Divide the received signal (7) by the channel estimate
ˆ kk ,
α̂ Fig. 9. The cumulative distribution of the per-user downlink net throughput
for MR and ZF processing. Here, M = 100, K = 10, τc = 200 (τd = 100),
yk SNRd = 10SNRu = 5 dB, and B = 20 MHz.
yk′ = √
ˆ kk
ρd ηk α̂
K r
αkk X ηk′ αkk′ wk More quantitatively, Figure 9 shows a comparison between
= sk + sk ′ + √ . (57)
ˆ kk
α̂ ′
ηk ˆ
α̂kk ρ ˆ
d ηk α̂kk our new bound (44) and the bound (59). The figure shows
k 6=k
the cumulative distributions of the per-user downlink net
ii) Rewrite (57) as the sum of then desired o signal multiplied throughput for MR and ZF processing, for the same setup
αkk
with a deterministic gain, E α̂ ˆ kk sk , and remaining as in Section VI-B. In Rayleigh fading, the throughputs for
terms which constitute uncorrelated effective noise, the three cases “use E {αkk }”, “bound (59)”, and “proposed
bound (44)”, are very close, and hence, relying on statistical
αkk αkk αkk
yk′ = E sk + −E sk channel knowledge (E {αkk }) for signal detection is good
ˆ kk
α̂ ˆ kk
α̂ ˆ kk
α̂ enough – obviating the need for the bound in (59). In in
K r
X ηk′ αkk′ wk keyhole channels, the bound (59) is significantly inferior to
+ sk ′ + √ . (58) our proposed bound. Therefore, the bound (59) is of no use
ˆ
ηk α̂kk ˆ kk
ρd ηk α̂
k′ 6=k
neither in Rayleigh fading nor in keyhole channels.
The worst-case Gaussian noise property [26] then yields the
capacity bound (59), shown at the top of the next page. This
VIII. C ONCLUSION
bound does not require the complicated numerical computation
given in Section V. However, this bound is tight only when In the Massive MIMO downlink, in propagation environ-
the effective channel gain αkk hardens, which is generally not ments where the channel hardens, using the mean of the
the case under the models that we consider herein. effective channel gain for signal detection is good enough.
 
n o2
αkk
 E ˆ kk
α̂

RkUnF
 
= log2 1 + n o K
 . (59)
 P ηk′ αkk′ 2 1 1 
Var α kk
+ E + ρd ηk E
ˆ
α̂ kkηk
k′ 6=k
ˆ
α̂ kk |α̂ˆ kk |2
However, the channels may not always harden. Then, to reli- where we used the fact that if z is a circularly symmetric com-

ably decode the transmitted signals, each user should estimate plex Gaussian random variable with zero mean, then E z 2 =
H
its effective channel gain rather than approximate it by its (k) (k)
0. The above result implies that the terms b̃i b̃n [inside
mean. We proposed a new blind channel estimation scheme at
the double sum of (60)] are zero-mean mutual uncorrelated
the users which does not require any downlink pilots. With this
random variables. Furthermore, they are uncorrelated with
scheme, the users can blindly estimate their effective channel Pnk
(k) 2
gains directly from the data received during a coherence i=1 b̃i , so (60) can be rewritten as:
interval. Our proposed channel estimation scheme is computa- n o 
2 
tionally easy, and performs very well. Numerical results show Var kgk k 1 X nk 2 2 
that in non-hardening propagation environments and for large (k)
n o2 = 2 E b̃i
numbers of BS antennas, our proposed scheme significantly 2 M  
E kgk k i=1
outperforms both the downlink beamforming training scheme | {z }
,Term1
in [2] and the conventional approach that approximates the ( )
nk
nk X
effective channel gains by their means. 1 X (k) H (k) 2
+ 2 E b̃
i b̃n −1. (62)
M i=1
n6=i
A PPENDIX | {z }
,Term2
A. Derivation of (20)
We have,
We have,
n o nk X nk Xnk
X (k) 4 (k) 2 (k) 2
Var kgk k2 1 n o 1 n o 2 Term1 = E b̃ i + E b̃ i b̃ n
4 2
n o2 = 2 2 E kgk k − 2 2 E kgk k i=1 i=1 n6=i
2 βk M βk M
E kgk k nk
X (k) 4 (k) 4 (k) 4
1 n o = E ci ai bi
4
= 2 2 E kgk k − 1 i=1
βk M nk
nk X 2
 2
 X (k)2 (k) 2 (k) 2 (k) (k) 2 (k) 2
nk Xnk + E ci ai bi E cn an bn
1 X (k) (k) (k)
H 
= 2E c i a i bi c(k) (k) (k)
n a n bn −1 i=1 n6=i
M  i=1 n=1  nk nk X nk
X (k) 4 X (k) 2 (k) 2
 2  = 2M (M + 1) ci + M 2 ci cn
 nk nk nk 
1 X
(k) 2 XX (k) H (k)
 i=1 i=1 n6=i
= 2 E b̃i + b̃i b̃n −1, (60) nk
M   X (k) 4
ci + M 2 ,
 i=1 i=1 n6=i  = M (M + 2) (63)
i=1
(k) (k) (k) (k)
where b̃i , ci ai bi . We can see that, the terms in the
double sum have zero mean. We now consider the covariance where
n we o have used the identity that if z ∼ CN (0, In ), then
4
between two arbitrary terms: E kzk = n(n + 1).
H H ∗ Furthermore, we have
(k) (k) (k) (k)
E b̃i b̃n b̃i′ b̃n′ , ( )
(k) (k) (k) H (k) (k) (k) 2
Term2 = E ci ai bi cn an bn
where i 6= n, i′ 6= n′ , and (i, n) 6= (i′ , n′ ). Clearly, if (i, n) 6=
(n′ , i′ ), then
(k) 2 (k) 2 (k) 2

(k) 2
H 2

(k) (k)
H H ∗ = ci cn E ai E an E bi bn
(k) (k) (k)
E b̃i b̃(k) b̃ b̃ = 0. 2
n i′ n′ (k) 2
= M ci c(k)

n . (64)
If (i, n) = (n′ , i′ ), the we have
∗ Substituting (63) and (64) into (62), we obtain
H H
(k) (k) (k)
E b̃i b̃(k)
n b̃ i′ b̃ n′
n
2
o
Var kgk k
1 X
nk
(k) 4 1
∗ = 1 +
(k)
H T
(k) n o2 ci + . (65)
=E b̃i (k)
b̃n b̃n (k)
b̃i = 0, (61) 2 M i=1 M
E kgk k
B. Derivation of (28) [14] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral effi-
ciency of very large multiuser MIMO systems,” IEEE Trans. Commun.,
Here, we provide the proof of (28). vol. 61, no. 4, pp. 1436–1449, Apr. 2013.
• With MR, for both Rayleigh and keyhole channels, gk [15] P. Viswanath and D. N. C. Tse, “Sum capacity of the vector Gaussian
broadcast channel and uplink-downlink duality,” IEEE Trans. Inf. The-
and ak′ are independent, for k 6= k ′ . Thus, we have ory, vol. 49, no. 8, pp. 1912–1921, Aug. 2003.
[16] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Aspects of favorable
E |αkk′ |2 = E aH H
k′ gk gk ak′ propagation in massive MIMO,” in Proc. European Signal Processing
n o
Conf. (EUSIPCO), Lisbon, Portugal, Sep. 2014.
= βk E kak′ k2 [17] Y.-G. Lim, C.-B. Chae, and G. Caire, “Performance analysis of massive
MIMO for cell-boundary users,” IEEE Trans. Wireless Commun., vol. 14,
= βk . (66) no. 12, pp. 6827–6842, Dec. 2015.
[18] A. L. Moustakas, H. U. Baranger, L. Balents, A. M. Sengupta, and
• With ZF, for Rayleigh channels, the channel estimate ĝk S. H. Simon, “Communication through a diffusive medium: Coherence
is independent of the channel estimation error g̃k . So g̃k and capacity,” Science, vol. 287, no. 5451, pp. 287–290, 2000.
and ak′ are independent. In addition, from (9), we have [19] A. M. Tulino and S. Verdú, “Random matrix theory and wireless commu-
nications,” Foundations and Trends in Communications and Information
ĝkH ak′ = 0, k 6= k ′ , Theory, vol. 1, no. 1, pp. 1–182, Jun. 2004.
[20] D. Gesbert, H. Bölcskei, D. A. Gore, and A. J. Paulraj, “Outdoor MIMO
and therefore, wireless channels: Models and performance prediction,” IEEE Trans.
Commun., vol. 50, no. 12, pp. 1926–1934, Dec. 2002.
E |αkk′ |2 = E |gkH ak′ |2 [21] P. Almers, F. Tufvensson, and A. F. Molisch, “Keyhole effect in MIMO
H wireless channels: Measurements and theory,” IEEE Trans. Wireless
= E |g̃k ak′ |2 Commun., vol. 5, no. 12, pp. 3596–3604, Dec. 2006.
[22] X. Li, S. Jin, X. Gao, and M. R. McKay, “Capacity bounds and
= E aH H
k′ g̃k g̃k ak′ low complexity transceiver design for double-scattering MIMO multiple
n o access channels,” IEEE Trans. Signal Process., vol. 58, no. 5, pp. 2809–
= (βk − γk )E kak′ k2 2822, May 2010.
[23] C. Zhong, S. Jin, K.-K. Wong, and M. R. McKay, “Ergodic mutual
= βk − γk . (67) information analysis for multi-keyhole MIMO channels,” IEEE Trans.
Wireless Commun., vol. 10, no. 6, pp. 1754–1763, Jun. 2011.
R EFERENCES [24] G. Levin and S. Loyka, “From multi-keyholes to measure of correlation
and power imbalance in MIMO channels: Outage capacity analysis,”
[1] H. Q. Ngo and E. G. Larsson, “Blind estimation of effective downlink IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3515–3529, Jun. 2011.
channel gains in massive MIMO,” in Proc. IEEE International Confer- [25] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
ence on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, York: Wiley, 1991.
Australia, Apr. 2015. [26] M. Médard, “The effect upon channel capacity in wireless communica-
[2] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Massive MU-MIMO tions of perfect and imperfect knowledge of the channel,” IEEE Trans.
downlink TDD systems with linear precoding and downlink pilots,” in Inf. Theory, vol. 46, no. 3, pp. 933–946, May 2000.
Proc. Allerton Conference on Communication, Control, and Computing, [27] E. Björnson, E. G. Larsson, and T. L. Marzetta, “Massive MIMO: 10
Urbana-Champaign, Illinois, Oct. 2013. myths and one critical question,” IEEE Commun. Mag., vol. 54, no. 2,
[3] E. G. Larsson, F. Tufvesson, O. Edfors, and T. L. Marzetta, “Massive pp. 114–123, Feb. 2016.
MIMO for next generation wireless systems,” IEEE Commun. Mag., [28] H. Yang and T. L. Marzetta, “A macro cellular wireless network with
vol. 52, no. 2, pp. 186–195, Feb. 2014. uniformly high user throughputs,” in Proc. IEEE Veh. Technol. Conf.
[4] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An (VTC), Sep. 2014.
overview of massive MIMO: Benefits and challenges,” IEEE J. Select.
Topics Signal Process., vol. 8, no. 5, pp. 742–758, Oct. 2014.
[5] T. Bogale and L. B. Le, “Massive MIMO and mmWave for 5G wireless
HetNet: Potentials and challenges,” IEEE Veh. Technol. Mag., vol. 11,
no. 1, pp. 64–75, Feb. 2016.
[6] X. Gao, O. Edfors, F. Rusek, and F. Tufvesson, “Massive MIMO
performance evaluation based on measured propagation data,” IEEE
Hien Quoc Ngo received the B.S. degree in electri-
Trans. Wireless Commun., vol. 14, no. 7, pp. 3899–3911, Jul. 2015.
cal engineering from Ho Chi Minh City University
[7] Q. Zhang, S. Jin, K.-K. Wong, H. Zhu, and M. Matthaiou, “Power
of Technology, Vietnam, in 2007. He then received
scaling of uplink massive MIMO systems with arbitrary-rank channel
the M.S. degree in Electronics and Radio Engineer-
means,” IEEE J. Select. Topics Signal Process., vol. 8, no. 5, pp. 966– PLACE ing from Kyung Hee University, Korea, in 2010, and
981, Oct. 2014. PHOTO
[8] J. Jose, A. Ashikhmin, T. L. Marzetta, and S. Vishwanath, “Pilot the Ph.D. degree in communication systems from
HERE Linköping University (LiU), Sweden, in 2015. From
contamination and precoding in multi-cell TDD systems,” IEEE Trans.
May to December 2014, he visited Bell Laboratories,
Wireless Commun., vol. 10, no. 8, pp. 2640–2651, Aug. 2011.
Murray Hill, New Jersey, USA.
[9] H. Yang and T. L. Marzetta, “Performance of conjugate and zero-
Hien Quoc Ngo is currently a postdoctoral re-
forcing beamforming in large-scale antenna systems,” IEEE J. Sel. Areas
searcher of the Division for Communication Systems
Commun., vol. 31, no. 2, pp. 172–179, Feb. 2013.
in the Department of Electrical Engineering (ISY) at Linköping University,
[10] J. Hoydis, S. ten Brink, and M. Debbah, “Massive MIMO in the UL/DL
Sweden. He is also a Visiting Research Fellow at the School of Electronics,
of cellular networks: How many antennas do we need?” IEEE J. Sel.
Electrical Engineering and Computer Science, Queen’s University Belfast,
Areas Commun., vol. 31, no. 2, pp. 160–171, Feb. 2013.
U.K. His current research interests include massive (large-scale) MIMO
[11] J. Zuo, J. Zhang, C. Yuen, W. Jiang, and W. Luo, “Multi-cell multi-
systems and cooperative communications.
user massive MIMO transmission with downlink training and pilot
contamination precoding,” IEEE Trans. Veh. Technol., vol. 65, no. 8, Dr. Hien Quoc Ngo received the IEEE ComSoc Stephen O. Rice Prize
in Communications Theory in 2015. He also received the IEEE Sweden
pp. 6301–6314, Aug. 2016.
VT-COM-IT Joint Chapter Best Student Journal Paper Award in 2015. He
[12] A. Khansefid and H. Minn, “Achievable downlink rates of MRC and ZF
was an IEEE Communications Letters exemplary reviewer for 2014, an IEEE
precoders in massive MIMO with uplink and downlink pilot contami-
Transactions on Communications exemplary reviewer for 2015. He has been
nation,” IEEE Trans. Commun., vol. 63, no. 12, pp. 4849–4864, Dec.
a member of Technical Program Committees for several IEEE conferences
2015.
such as ICC, Globecom, WCNC, VTC, WCSP, ISWCS, ATC, ComManTel.
[13] T. Kim, K. Min, and S. Choi, “Study on effect of training for downlink
massive MIMO systems with outdated channel,” in Proc. IEEE Interna-
tional Conference on Communications (ICC), London, UK, Jun. 2015.
Erik G. Larsson is Professor of Communica-

tion Systems at Linköping University (LiU) in
Linköping, Sweden. He previously worked for the
Royal Institute of Technology (KTH) in Stock-
PLACE holm, Sweden, the University of Florida, USA, the
PHOTO George Washington University, USA, and Ericsson
HERE Research, Sweden. In 2015 he was a Visiting Fellow
at Princeton University, USA, for four months. He
received his Ph.D. degree from Uppsala University,
Sweden, in 2002.
His main professional interests are within the
areas of wireless communications and signal processing. He has co-authored
some 130 journal papers on these topics, he is co-author of the two Cambridge
University Press textbooks Space-Time Block Coding for Wireless Communi-
cations (2003) and Fundamentals of Massive MIMO (2016). He is co-inventor
on 16 issued and many pending patents on wireless technology.
He served as Associate Editor for, among others, the IEEE Transactions on
Communications (2010-2014) and IEEE Transactions on Signal Processing
(2006-2010). He serves as chair of the IEEE Signal Processing Society
SPCOM technical committee in 2015–2016 and he served as chair of the
steering committee for the IEEE Wireless Communications Letters in 2014–
2015. He was the General Chair of the Asilomar Conference on Signals,
Systems and Computers in 2015, and Technical Chair in 2012.
He received the IEEE Signal Processing Magazine Best Column Award
twice, in 2012 and 2014, and the IEEE ComSoc Stephen O. Rice Prize in
Communications Theory in 2015. He is a Fellow of the IEEE.

No Downlink Pilots Are Needed in TDD Massive Mimo: Hien Quoc Ngo, Member, IEEE, and Erik G. Larsson, Fellow, IEEE

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

No Downlink Pilots Are Needed in TDD Massive Mimo: Hien Quoc Ngo, Member, IEEE, and Erik G. Larsson, Fellow, IEEE

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. XX, NO.

No Downlink Pilots are Needed in TDD Massive

nel hardening holds. Then we examine two important A. Uplink Training

where Dη is the diagonal matrix whose kth diagonal element

Normalized Mean-Square Error

Normalized Mean-Square Error

(b) Single-keyhole channels (b) Single-keyhole channels

additionally add the curves labelled “perfect CSI”. These

0.8 proposed scheme

(The harding property of the channels is discussed in detail 0.6

Gaussian in general (even in Rayleigh fading) and hence the 0.6

Erik G. Larsson is Professor of Communica-

You might also like