You are on page 1of 6

PARAMETRIC CHANNEL ESTIMATION FOR MASSIVE MIMO

Luc Le Magoarou, Stéphane Paquelet

b<>com, Rennes, France

ABSTRACT allows to precisely assess the role of system design on estimation per-
Channel state information is crucial to achieving the capacity of multi- formance, as well as to propose new computationally efficient channel
arXiv:1710.08214v3 [cs.IT] 5 Apr 2018

antenna (MIMO) wireless communication systems. It requires esti- estimation algorithms showing asymptotic performance equivalent
mating the channel matrix. This estimation task is studied, consider- to classical ones based on sparse recovery.
ing a sparse physical channel model, as well as a general measurement
model taking into account hybrid architectures. The contribution is 2. PROBLEM FORMULATION
twofold. First, the Cramér-Rao bound in this context is derived. Sec- Notations. Matrices and vectors are denoted by bold upper-case and
ond, interpretation of the Fisher Information Matrix structure allows lower-case letters: A and a (except 3D “spatial” vectors that are
to assess the role of system parameters, as well as to propose asymp- denoted −→a ); the ith column of a matrix A by: ai ; its entry at the ith
totically optimal and computationally efficient estimation algorithms. line and jth column by: aij or Aij . A matrix transpose, conjugate
Index Terms— Cramér-Rao bound, Channel estimation, MIMO. and transconjugate is denoted by: AT , A∗ and AH respectively. The
image, rank and trace of a linear transformation represented by A are
1. INTRODUCTION denoted: im(A), rank(A) and Tr(A) respectively. For matrices A
Multiple-Input Multiple-Output (MIMO) wireless communication and B, A ≥ B means that A−B is positive semidefinite. The linear
systems allow for a dramatic increase in channel capacity, by adding span of a set of vectors A is denoted: span(A). The Kronecker prod-
the spatial dimension to the classical time and frequency ones [1, 2]. uct, standard vectorization and diagonalization operators are denoted
This is done by sampling space with several antenna elements, form- by vec(·), diag(·), and ⊗ respectively. The identity matrix, the m×n
ing antenna arrays both at the transmitter (with nt antennas) and matrix of zeros and ones are denoted by Id, 0m×n and 1m×n respec-
receiver (with nr antennas). Capacity gains over single antenna tively. CN (µ,Σ) denotes the standard complex gaussian distribution
systems are at most proportional to min(nr ,nt ). with mean µ and covariance Σ. E(.) denotes expectation and cov(.)
Millimeter wavelengths have recently appeared as a viable so- the covariance of its argument.
lution for the fifth generation (5G) wireless communication systems
[3, 4]. Indeed, smaller wavelengths allow to densify half-wavelength 2.1. Parametric physical channel model
separated antennas, resulting in higher angular resolution and ca- Consider a narrowband block fading channel between a transmitter
pacity for a given array size. This observation has given rise to the and a receiver with respectively nt and nr antennas. It is represented
massive MIMO field, i.e. the study of systems with up to hundreds or by the matrix H ∈ Cnr ×nt , in which hij corresponds to the channel
even thousands of antennas. between the jth transmit and ith receive antennas.
Massive MIMO systems are very promising in terms of capacity. Classically, for MIMO systems with few antennas, i.e. when
However, they pose several challenges to the research community the quantity nr nt is small (up to a few dozens), estimators such as
[5, 6], in particular for channel estimation. Indeed, maximal capacity the Least Squares (LS) or the Linear Minimum Mean Squared Error
gains are obtained in the case of perfect knowledge of the channel (LMMSE) are used [8].
state by both the transmitter and the receiver. The estimation task However, for massive MIMO systems, the quantity 2nr nt is large
amounts to determine a complex gain between each transmit/receive (typically several hundreds), and resorting to classical estimators may
antenna pair, the narrowband (single carrier) MIMO channel as a
become computationally intractable. In that case, a parametric model
whole being usually represented as a complex matrix H ∈ Cnr ×nt of may be used. Establishing it consists in defining a set of np parameters
such complex gains. Without a parametric model, the number of real θ , (θ1 ,...,θnp )T that describe the channel as H ≈ f (θ) for a given
parameters to estimate is thus 2nr nt , which is very large for massive function f , where the approximation is inherent to the model struc-
MIMO systems.
ture and neglected in the sequel (considering H = f (θ)). Channel
Contributions and organization. In this work, massive MIMO estimation then amounts to estimate the parameters θ instead of the
channel estimation is studied, and its performance limits are sought, channel matrix H directly. The parametrization is particularly useful
as well as their dependency on key system parameters. In order to if np ≪ 2nr nt , without harming accuracy of the channel description.
answer this question, the framework of parametric estimation [7] is Inspired by the physics of wave propagation under the plane waves
used. A physical channel model is first presented, with the general assumption, it has been proposed to express the channel matrix as a
considered observation model, and the objective is precisely stated. sum of rank-1 matrices, each corresponding to a single physical path
The Cramér-Rao bound for is then derived, which bounds the vari- between transmitter and receiver [9]. Adopting this kind of modeling
ance of any unbiased estimator. Then, the interpretation of the bound and generalizing it to take into account any three-dimensional antenna
This work has been performed in the framework of the Horizon 2020 array geometry, channel matrices take the form
project ONE5G (ICT-760809) receiving funds from the European Union. The
authors would like to acknowledge the contributions of their colleagues in P
X
the project, although the views expressed in this contribution are those of the H= cp er (−
u−
→ −−→ H
r,p ).et (ut,p ) , (1)
authors and do not necessarily represent the project. p=1
where P is the total number of considered paths (no more than a notations, rMSE is expressed
few dozens), cp , ρp ejφp is the complex gain of the pth path, − u−

t,p  2  −2
is the unit vector corresponding to its Direction of Departure (DoD) rMSE = E H− Ĥ F . H F
and − u−

r,p the unit vector corresponding to its Direction of Arrival    2  −2
(DoA). Any unit vector − →
u is described in spherical coordinates by = Tr cov ĥ + E(Ĥ)−H F . H F , (3)
an azimuth angle η and an elevation angle ψ. The complex response | {z } | {z }
Bias
and steering vectors er (− →u ) ∈ Cnr and et (− → Variance
u ) ∈ Cnt are defined as

− −
a−→.→ −
1
(ex ( u ))i = √nx e −j 2π
λ x,i u
for x ∈ {r,t}. The set {− a−→ −−−→
x,1 ,...,ax,nx } where the bias/variance decomposition can be done independently of
gathers the positions of the antennas with respect to the centroid of the considered model [7]. The goal here is to lower-bound the vari-
the considered array (transmit if x = t, receive if x = r). In order to ance term, considering the physical model introduced in the previous
lighten notations, the matrix Ax , 2π λ
(−
a−→ −−−→
x,1 ,... , ax,nx ) ∈ R
3×nx
is subsection. The bias term is not studied in details here, but its role is
introduced. It simplifies the steering/response vector expression to evoked in section 3.3.
T→−
e x (−

u ) = √1nx e−jAx u , where the exponential function is applied
3. CRAMÉR-RAO LOWER BOUND
component-wise. In order to further lighten notations, the pth atomic
channel is defined as Hp , cp er (− u−→ −−→ H
r,p ).et (ut,p ) , and its vectorized
In this section, the variance term of eq. (3) is bounded using the
nr nt Cramér-Rao Bound (CRB) [17, 18], which is valid for any unbiased
version hp , vec(Hp ) ∈ C . Therefore, defining the vectorized
P estimator θ̂ of the true parameter θ. The complex CRB [19] states,
channel h , vec(H), yields h = P p=1 hp . Note that the channel de-
scription used here is very general, as it handles any three-dimensional  ∂g(θ) ∂g(θ) H
antenna array geometry, and not only Uniform Linear Arrays (ULA) cov g(θ̂) ≥ I(θ)−1 ,
or Uniform Planar Arrays (UPA) as is sometimes proposed. ∂θ ∂θ
In short, the physical channel model can be seen as a parametric h
∂logL H
i
model with θ = {θ (p) , (ρp ,φp ,ηr,p ,ψr,p ,ηt,p ,ψt,p ), p = 1,...,P }. with I(θ) , E ∂logL
∂θ ∂θ
the Fisher Information Matrix (FIM),
There are thus 6P real parameters in this model (the complex gain, where L denotes the model likelihood, and g is any complex differ-
DoD and DoA of every path are described with two parameters each). entiable vector function. In particular, regarding the variance term of
Of course, the model is most useful for estimation in the case where eq. (3),
6P ≪ 2nr nt , since the number of parameters is thus greatly reduced.
Note that most classical massive MIMO channel estimation    ∂h(θ) ∂h(θ) H 
Tr cov h(θ̂) ≥ Tr I(θ)−1 , (4)
methods assume a similar physical model, but discretize a priori the ∂θ ∂θ
DoDs and DoAs, so that the problem fits the framework of sparse 
recovery [10, 11, 12]. The approach used here is different, in the with ∂h(θ)
∂θ
= ∂h(θ)
∂θ1
,..., ∂h(θ)
∂θnp
. A model independent expression for
sense that no discretization is assumed for the analysis. the FIM is provided in section 3.1, and particularized in section 3.2 to
the model of section: 2.1. Finally, the bound is derived from eq. (4) in
2.2. Observation model section 3.3.
In order to carry out channel estimation, ns known pilot symbols
are sent through the channel by each transmit antenna. The corre- 3.1. General derivation
sponding training matrix is denoted X ∈ Cnt ×ns . The signal at the First, notice that vectorizing eq. (2), the observation matrix Y follows
receive antennas is thus expressed as HX + N, where N is a noise a complex gaussian distribution,
matrix with vec(N) ∼ CN (0,σ 2 Id). Due to the high cost and power
consumption of millimeter wave Radio Frequency (RF) chains, it has 
vec(Y) ∼ CN (XT ⊗WH )h(θ),σ 2 (Idns ⊗WH W) .
been proposed to have less RF chains than antennas in both the trans- | {z }| {z }
µ(θ) Σ
mitter and receiver [13, 14, 15, 16]. Such systems are often referred to
as hybrid architectures. Mathematically speaking, this translates into
In that particular case, the Slepian-Bangs formula [20, 21] yields:
specific constraints on the training matrix X (which has to “sense” the
channel through analog precoders vi ∈ Cnt , i = 1,...,nRF , nRF being n o
H −1 ∂µ(θ)
the number of RF chains on the transmit side), as well as observing I(θ) = 2Re ∂µ(θ)
∂θ
Σ ∂θ
the signal at the receiver through analog combiners. Let us denote n o (5)
2 H
wj ∈ Cnr , j = 1,...,nc the used analog combiners, the observed data = 2α
σ2
Re ∂h(θ)
∂θ
P ∂h(θ)
∂θ
,
is thus expressed in all generality as
2 ∗ −1
with P , α σ
2 (X ⊗W)Σ (XT ⊗WH ) where α2 , n1s Tr(XH X)
Y = WH HX+WH N, (2)
is the average transmit power per time step. Note that the expression
can be simplified to P = α12 (X∗ XT ) ⊗ (W(WH W)−1 WH )
where W , (w1 ,... ,wnc ) and the training matrix is constrained to
using elementary properties of the Kronecker product. The matrix
be of the form X = VZ, where Z ∈ CnRF ×ns is the digital training
W(WH W)−1 WH is a projection matrix onto the range of W. In
matrix.
order to ease further interpretation, assume that XH X = α2 Idns .
This assumption means that the transmit power is constant during
2.3. Objective: bounding the variance of unbiased estimators
training time (kxi k22 = α2 , ∀i) and that pilots sent at different time
In order to assess the fundamental performance limits of channel
instants are mutually orthogonal (xH i xj = 0, ∀i 6= j). This way,
estimation, the considered performance measure is the relative Mean ∗ T
1
X X is a projection matrix onto the range of X∗ , and P can
Squared Error (rMSE). Denoting indifferently H(θ) , f (θ) or H α 2
itself be interpreted as a projection, being the Kronecker product of
the true channel (h(θ) or h in vectorized form) and H(θ̂) , f (θ̂) two projection matrices [22, p.112] (it is an orthogonal projection
or Ĥ its estimate (h(θ̂) or ĥ in vectorized form) in order to lighten since PH = P).
3.2. Fisher information matrix for a sparse channel model where I(p,q) ∈ R6×6 contains the couplings between parameters of the
2  H
Consider
P now the parametric channel model of section 2.1, where pth and qth paths and is expressed I(p,q) , 2α Re ∂θ∂h P ∂θ∂h
(q) .
h= P −−→ ∗ −−→ σ2 (p)
p=1 hp , with hp = cp et (ut,p ) ⊗er (ur,p ). The off-diagonal blocks I(p,q) of I(θ), corresponding to couplings
Intra-path couplings. The derivatives of h with respect to parame-
between parameters of distinct paths, or inter-path couplings, can be
ters of the pth path θ (p) can be determined using matrix differentiation
expressed explicitly (as in eq. (7) for intra-path couplings). However,
rules [23]:
the obtained expressions are less prone to interesting interpretations,
• Regarding the complex gain cp = ρp ejφp , the model yields the
and inter-paths couplings have been observed to be negligible in most
expressions ∂h(θ)
∂ρp
= ρ1p hp and ∂h(θ)
∂φp
= jhp . cases. They are thus not displayed in the present paper, for brevity
T −−→

• Regarding the DoA, ∂h(θ) ∂ηr,p
= Id n t ⊗diag(−jA r vηr,p ) hp and reasons. Note that a similar FIM computation was recently carried
out in the particular case of linear arrays [25]. However, the form of
∂h(θ)
∂ψr,p
= Idnt ⊗diag(−jATr − v−−→  −−→
ψr,p ) hp , where vηr,p and vψr,p
−−−→
−−→ the FIM (in particular parameter orthogonality) was not exploited in
are the unit vectors in the azimuth and elevation directions at ur,p , [25], as is done here in sections 4 and 5.
respectively.
• Regarding the DoD, ∂h(θ) ∂ηt,p
= diag(jATt − vη−t,p
→)⊗Id  h and
nr p 3.3. Bound on the variance
T −−→ − −→ and − −→ are
∂h(θ)
 The variance of channel estimators remains to be bounded, using
∂ψt,p
= diag(jA v
t ψt,p )⊗Id nr h p , where v ηt,p v ψt,p
eq. (4). From eq. (5), the FIM can be expressed more conveniently
the unit vectors in the azimuth and elevation directions at − u−→, t,p 2
only with real matrices as I(θ) = 2α
σ2
D̄T P̄D̄, with
respectively.  
∂h(θ) ∂h(θ) ∂h(θ) ∂h(θ) ∂h(θ) ∂h(θ)
Denoting ∂θ∂h (p) , ∂ρp
, ∂φp , ∂ηr,p , ∂ψr,p , ∂ηt,p , ∂ψt,p , the 
Re{ ∂h(θ) }
 
Re{P} −Im{P}

part of the FIM corresponding to couplings between the parameters D̄ , ∂θ , P̄ , ,
Im{ ∂h(θ)
∂θ
} Im{P} Re{P}
θ(p) (intra-path couplings) is expressed as
  where P̄ is also a projection matrix. Finally, injecting eq. (5) into
2α2 ∂hH ∂h
I(p,p) , 2 Re P . (6) eq. (4) assuming the FIM is invertible, gives for the relative variance
σ ∂θ(p) ∂θ (p)
 σ2

Let us now particularize this expression. First of all, in order Tr cov h(θ̂) .khk−2 T
2 ≥ 2α2 Tr D̄(D̄ P̄D̄) D̄ .khk−2
−1 T
2
to ease interpretations carried out in section 4, consider the case of σ2

≥ 2α T
2 Tr D̄(D̄ D̄) D̄ .khk−2
−1 T
2
optimal observation conditions (when the range of P contains the 2
3P
range of ∂h(θ) ). This allows indeed to interpret separately the role of = 2α2σkhk2 np = SNR ,
∂θ 2
the observation matrices and the antenna arrays geometries. Second, (9)
consider for example the entry corresponding to the coupling between where the second inequality comes from the fact that P̄ being an
the departure azimuth angle ηt,p and the arrival azimuth angle ηr,p of orthogonal projection matrix, P̄ ≤ Id ⇒ D̄T P̄D̄ ≤ D̄T D̄ ⇒
the pth path. (D̄T P̄D̄)−1 ≥ (D̄T D̄)−1 ⇒ D̄(D̄T P̄D̄)−1 D̄T ≥ D̄(D̄T D̄)−1 D̄T
n It is expressed ounder the optimal observation assumption
2 ∂h(θ) H ∂h(θ) (using elementary properties of the ordering of semidefinite positive
as 2α
σ2
Re ∂ηr,p ∂ηt,p
. Moreover,
matrices, in particular [26, Theorem 4.3]).
 The first equality comes
∂h(θ) H ∂h(θ)

T −−→
 T −−→
 from the fact that Tr D̄(D̄T D̄)−1 D̄T = Tr(Idnp ) = np . Finally,
∂ηr,p ∂ηt,p
= hHp diag jAt vηt,p ⊗diag jAr vηr,p hp the second equality is justified by np = 6P considering the sparse
−ρ2 α2 khk2
= p 1T AT − v−→ 1T AT − v−→ = 0,
 
nr nt nt t ηt,p nr r ηr,p channel model, and by taking SNR , σ 2 2 (this is actually an
optimal SNR, only attained with perfect precoding and combining).
since Ar 1nr = 0 and At 1nt = 0 by construction (because the anten- Optimal bound. The first inequality in eq. (9) becomes an equality
nas positions are taken with respect to the array centroid). This means if an efficient estimator is used [7]. Moreover, the second inequality
that the parameters ηr,p and ηt,p are statistically uncoupled, i.e. or- 
is an equality if the condition im ∂h(θ) ⊂ im (P) is fulfilled (this
thogonal parameters [24]. Computing all couplings for θ (p) yields ∂θ
corresponds to optimal observations, further discussed in section 4).
 1  Remarkably, under optimal observations, the lower bound on the
ρ2
0 01×2 01×2
p relative variance is directly proportional to the considered number of
2ρ2p α2 
 0 1 01×2 01×2 

I(p,p) = 2
 , (7) paths P and inversely proportional to the SNR, and does not depend
σ  02×1 02×1 Br 02×2  on the specific model structure, since the influence of the derivative
02×1 02×1 02×2 Bt matrix D̄ cancels out in the derivation.
Sparse recovery CRB. It is interesting to notice that the bound ob-
where tained here is similar to the CRB for sparse recovery [27] (correspond-
 T −−→ 2  ing to an intrinsically discrete model), that is proportional to the spar-
Ax vηx,p − →T A A T −
vη−x,p −−→
1  x x vψx,p
Bx = 2 , (8) sity of the estimated vector, analogous here to the number of paths.
nx −−−→ T T −−→
vψx,p Ax Ax vηx,p
T −−−→ 2
Ax vψx,p
2 4. INTERPRETATIONS
with x ∈ {r, t}. These expressions are thoroughly interpreted in The main results of sections 3.2 and 3.3 are interpreted in this section,
section 4. ultimately guiding the design of efficient estimation algorithms.
Global FIM. Taking into account couplings between all paths, The Parameterization choice. The particular expression of the FIM al-
global FIM is easily deduced from the previous calculations and block lows to assess precisely the chosen parameterization. First of all, I(θ)
structured, has to be invertible and well-conditioned, for the model to be theoret-
 
I(1,1) I(1,2) ... I(1,P ) ically and practically identifiable [28, 29], respectively. As a coun-
I(2,1) I(2,2) terexample, imagine two paths indexed by p and q share the same DoD
I(θ) =  ..
.
..
.
,
(P,1) (P,P )
and DoA, then proportional columns appear in ∂h(θ) ∂θ
, which implies
I I
non-invertibility of the FIM. However, it is possible to summarize the Algorithm 1 Sequential direction estimation (DoA first)
effect of these two paths with a single virtual path of complex gain 1: Choose m DoAs to test: {−
→,...,−
u1 u→m} 
cp +cq without any accuracy loss in channel description, yielding an −
→)
W H e r (u −
→)
W H e r (u
2: Build the matrix Kr = 1
|...| n
invertible FIM. Similarly, two paths with very close DoD and DoA kWH er (−u→)
1 k 2
kWH et (−u→
n )k
2
yield an ill-conditioned FIM (since the corresponding steering vec- 3: Find the index î of the maximal entry of diag(KH H
r YY Kr ),
tors are close to colinear), but can be merged into a single virtual path set −̂

ut ← −

uî (O(m) complexity)
with a limited accuracy loss, improving the conditioning. Interest- 4: Choose n DoDs to test:{−

v1 ,...,−
v→
n}
ingly, in most channel models, paths are assumed to be grouped into XH et (−→)
v XH et (−v→
n)

clusters, in which all DoDs and DoAs are close to a principal direction 5: Build the matrix Kt = 1
|...|
kXH et (−
→)
v1 k 2
kXH et (− v→n )k
2
[30, 31, 32]. Considering the MSE, merging close paths indeed de- 6: Find the index ĵ of the maximal entry of e r (−

uî )H YKt ,
creases the variance term (lowering the total number of parameters),
without increasing significantly the bias term (because their effects on set −̂

ut ← −

vĵ (O(n) complexity)
the channel matrix are very correlated). These considerations suggest
dissociating the number of paths considered in the model P from the phase, DoD and DoA are mutually orthogonal. Maximum Likeli-
number of physical paths, denoted Pφ , taking P < Pφ by merging hood (ML) estimators of orthogonal parameters are asymptotically
paths. This is one motivation behind the famous virtual channel independent [24] (when the number of observations, or equivalently
representation [9], where the resolution at which paths are merged is the SNR goes to infinity). Classically, channel estimation in mas-
fixed and given by the number of antennas. The theoretical frame- sive MIMO systems is done using greedy sparse recovery algorithms
work of this paper suggests to set P (and thus the merging resolution) [10, 11, 12]. Such algorithms can be cast into ML estimation with dis-
so as to minimize the MSE. A theoretical study of the bias term of cretized directions, in which the DoD and DoA (coefficient support)
the MSE (which should decrease when P increases) could thus allow are estimated jointly first (which is costly), and then the gain and phase
to calibrate models, choosing an optimal number of paths P ∗ for are deduced (coefficient value), iteratively for each path. Orthogo-
estimation. Such a quest for P ∗ is carried out empirically in section 5. nality between the DoD and DoA parameters is thus not exploited by
Optimal observations. The matrices X and W (pilot symbols classical channel estimation methods. We propose here to exploit it
and analog combiners) determine the quality of channel observa- via a sequential decoupled DoD/DoA estimation, that can be inserted
tion. Indeed, it was shown  in section 3.3 that the 1lowest∗ CRB is in any sparse recovery algorithm in place of the support estimation
obtained when im ∂h(θ) ∂θ
⊂ im (P), with P = α2 (X X T
) ⊗ step, without loss of optimality in the ML sense. In the proposed
(W(WH W)−1 WH ) . In case of sparse channel model, using the method, one direction (DoD or DoA) is estimated first using an ML
expressions for ∂h(θ)
∂θ
derived above, this is equivalent to two distinct criterion considering the other direction as a nuisance parameter, and
conditions for the training matrix: the other one is deduced using the joint ML criterion. Such a strategy
! is presented in algorithm 1. It can be verified that lines 3 and 6 of the
P n −−→ −
−→ o
→), ∂et (ut,p ) , ∂et (ut,p )
[
span e t (−
u−t,p ⊂ im(X), algorithm actually correspond to ML estimation of the DoA and joint
p=1
∂ηt,p ∂ψt,p ML estimation, respectively. The overall complexity of the sequential
directions estimation is thus O(m+n), compared to O(mn) for the
and for the analog combiners: joint estimation with the same test directions. Note that a similar
! approach, in which DoAs for all paths are estimated at once first, was
∂er (−
u−
→ −−→ o
P n
r,p ) ∂er (ur,p )
[
span −−

er (ur,p ), , ⊂ im(W),
recently proposed [33] (without theoretical justification).
p=1
∂ηr,p ∂ψr,p
5. PRELIMINARY EXPERIMENT
∂ex (−
u−→ Let us compare the proposed sequential direction estimation to the
where x,p )
= diag(−jATx − vξ−x,p
→)e (− −→
x ux,p ) with x ∈ {r, t} and
∂ξx,p classical joint estimation. This experiment must be seen as an exam-
ξ ∈ {η, ψ}. These conditions are fairly intuitive: to estimate accu- ple illustrating the potential of the approach, and not as an extensive
rately parameters corresponding to a given DoD (respectively DoA), experimental validation.
the sent pilot sequence (respectively analog combiners) should span Experimental settings. Consider synthetic channels generated us-
the corresponding steering vector and its derivatives (to “sense” small ing the NYUSIM channel simulator [34] (setting f = 28 GHz, the
changes). To accurately estimate all the channel parameters, it should distance between transmitter and receiver to d = 30 m) to obtain the
be met for each atomic channel. DoDs, DoAs, gains and phases of each path. The channel matrix is
Array geometry. Under optimal observation conditions, perfor- then obtained from eq. (1), considering square Uniform Planar Arrays
mance limits on DoD/DoA estimation are given by eq. (8). The lower (UPAs) with half-wavelength separated antennas, with nt = 64 and
the diagonal entries B−1 x , the better the bound. This implies the nr = 16. Optimal observations are considered, taking both W and X
bound is better if the diagonal entries of Bx are large and the off- as the identity. Moreover, the noise variance σ 2 is set so as to get an
diagonal entries are small (in absolute value). Since the unit vectors SNR of 10 dB. Finally, the two aforementioned direction estimation

vη−x,p
→ and − v−−→ T 2
ψx,p are by definition orthogonal, having Ax Ax = β Id strategies are inserted in the Matching Pursuit (MP) algorithm [10],
2
with maximal β is optimal, and yields uniform performance limits discretizing the directions taking m = n = 2, 500, and varying the
for any DoD/DoA. Moreover, in this situation, β 2 is proportional to total number P of estimated paths.
1
Pnx − −
→ 2
nx i=1 kax,i k2 , the mean squared norm of antenna positions with Results. Table 1 shows the obtained relative MSE and estimation
respect to the array centroid. Having a larger antenna array is thus times (Python implementation on a laptop with an Intel(R) Core(TM)
beneficial (as expected), because the furthest antennas are from the i7-3740QM CPU @ 2.70 GHz). First of all, for P = 5, 10, 20, the
array centroid, the larger β 2 is. estimation error decreases and the estimation time increases with
Orthogonality of DoA and DoD. Section 3.2 shows that the matrix P , exhibiting a trade-off between accuracy and time. However,
corresponding to intra-path couplings (eq. (7)) is block diagonal, increasing P beyond a certain point seems useless, since the error
meaning that for a given path, parameters corresponding to gain, re-increases, as shown by the MSE for P = 40, echoing the trade-off
evoked in section 3.3, and indicating that P ∗ is certainly between 20 [7] Steven M. Kay, Fundamentals of Statistical Signal Processing:
and 40 for both methods in this setting. Finally, for any value of P , Estimation Theory, Prentice-Hall, Inc., Upper Saddle River,
while the relative errors of the sequential and joint estimation methods NJ, USA, 1993.
are very similar, the estimation time is much lower (between ten and [8] Mehrzad Biguesh and Alex B Gershman, “Training-based
twenty times) for sequential estimation. This observation validates mimo channel estimation: a study of estimator tradeoffs
experimentally the theoretical claims made in the previous section. and optimal training signals,” IEEE transactions on signal
6. CONCLUSIONS AND PERSPECTIVES processing, vol. 54, no. 3, pp. 884–893, 2006.
In this paper, the performance limits of massive MIMO channel es- [9] Akbar M Sayeed, “Deconstructing multiantenna fading
timation were studied. To this end, training based estimation with a channels,” IEEE Transactions on Signal Processing, vol. 50,
physical channel model and an hybrid architecture was considered. no. 10, pp. 2563–2579, 2002.
The Fisher Information Matrix and the Cramér-Rao bound were de- [10] S.G. Mallat and Z. Zhang, “Matching pursuits with time-
rived, yielding several results. The CRB ended up being proportional frequency dictionaries,” Signal Processing, IEEE Transactions
to the number of parameters in the model and independent from the on, vol. 41, no. 12, pp. 3397–3415, Dec 1993.
precise model structure. The FIM allowed to draw several conclu-
sions regarding the observation matrices and the arrays geometries. [11] J.A. Tropp and A.C. Gilbert, “Signal recovery from random
Moreover, it suggested computationally efficient algorithm which are measurements via orthogonal matching pursuit,” Information
asymptotically as accurate as classical ones. Theory, IEEE Transactions on, vol. 53, no. 12, pp. 4655–4666,
This paper is obviously only a first step toward a deep theoretical Dec 2007.
understanding of massive MIMO channel estimation. Apart from [12] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Com-
more extensive experimental evaluations and optimized algorithms, a pressed channel sensing: A new approach to estimating sparse
theoretical study of the bias term of the MSE would be needed to cal- multipath channels,” Proceedings of the IEEE, vol. 98, no. 6,
ibrate models, and the interpretations of section 4 could be leveraged pp. 1058–1076, June 2010.
to guide system design. [13] Omar El Ayach, Sridhar Rajagopal, Shadi Abu-Surra, Zhouyue
Acknowledgments. The authors wish to thank Matthieu Crussière Pi, and Robert W Heath, “Spatially sparse precoding in mil-
for the fruitful discussions that greatly helped improving this work. limeter wave mimo systems,” IEEE Transactions on Wireless
7. REFERENCES Communications, vol. 13, no. 3, pp. 1499–1513, 2014.
[1] Emre Telatar, “Capacity of multi-antenna gaussian channels,” [14] Ahmed Alkhateeb, Omar El Ayach, Geert Leus, and Robert W
European transactions on telecommunications, vol. 10, no. 6, Heath, “Channel estimation and hybrid precoding for millime-
pp. 585–595, 1999. ter wave cellular systems,” IEEE Journal of Selected Topics in
[2] David Tse and Pramod Viswanath, Fundamentals of wireless Signal Processing, vol. 8, no. 5, pp. 831–846, 2014.
communication, Cambridge university press, 2005. [15] Robert W Heath, Nuria Gonzalez-Prelcic, Sundeep Rangan,
[3] Theodore S Rappaport, Shu Sun, Rimma Mayzus, Hang Zhao, Wonil Roh, and Akbar M Sayeed, “An overview of signal
Yaniv Azar, Kevin Wang, George N Wong, Jocelyn K Schulz, processing techniques for millimeter wave mimo systems,”
Mathew Samimi, and Felix Gutierrez, “Millimeter wave mobile IEEE journal of selected topics in signal processing, vol. 10,
communications for 5g cellular: It will work!,” IEEE access, no. 3, pp. 436–453, 2016.
vol. 1, pp. 335–349, 2013. [16] Akbar M. Sayeed and John H. Brady, Millimeter-Wave
[4] A Lee Swindlehurst, Ender Ayanoglu, Payam Heydari, and MIMO Transceivers: Theory, Design and Implementation, pp.
Filippo Capolino, “Millimeter-wave massive mimo: the next 231–253, John Wiley & Sons, Ltd, 2016.
wireless revolution?,” IEEE Communications Magazine, vol. [17] Calyampudi Radakrishna Rao, “Information and the accuracy
52, no. 9, pp. 56–62, 2014. attainable in the estimation of statistical parameters,” Bulletin
[5] Fredrik Rusek, Daniel Persson, Buon Kiong Lau, Erik G Lars- of the Calcutta Mathematical Society, vol. 37, pp. 81–89, 1945.
son, Thomas L Marzetta, Ove Edfors, and Fredrik Tufvesson, [18] Harald Cramér, Mathematical Methods of Statistics, vol. 9,
“Scaling up mimo: Opportunities and challenges with very Princeton university press, 1946.
large arrays,” IEEE Signal Processing Magazine, vol. 30, no. 1,
[19] Adriaan Van den Bos, “A cramér-rao lower bound for complex
pp. 40–60, 2013.
parameters,” IEEE Transactions on Signal Processing [see also
[6] Erik G Larsson, Ove Edfors, Fredrik Tufvesson, and Thomas L Acoustics, Speech, and Signal Processing, IEEE Transactions
Marzetta, “Massive mimo for next generation wireless sys- on], 42 (10), 1994.
tems,” IEEE Communications Magazine, vol. 52, no. 2, pp.
186–195, 2014. [20] David Slepian, “Estimation of signal parameters in the pres-
ence of noise,” Transactions of the IRE Professional Group on
Information Theory, vol. 3, no. 3, pp. 68–89, 1954.
Joint estimation Sequential estimation
rMSE Time rMSE Time [21] G. W. Bangs, Array Processing With Generalized Beamformers,
Ph.D. thesis, Yale university, CT, USA, 1971.
P =5 0.077 1.24 0.092 0.11
P = 10 0.031 2.40 0.039 0.16 [22] Willi-Hans Steeb and Yorick Hardy, Matrix calculus and Kro-
P = 20 0.017 4.66 0.021 0.24 necker product: a practical approach to linear and multilinear
P = 40 0.025 9.50 0.023 0.42 algebra, World Scientific, 2011.
[23] Kaare Brandt Petersen, Michael Syskind Pedersen, et al., “The
Table 1. Relative MSE and estimation time (in seconds), in average matrix cookbook,” Technical University of Denmark, vol. 7,
over 100 channel realizations, the lowest rMSE being shown in bold. pp. 15, 2008.
[24] David Roxbee Cox and Nancy Reid, “Parameter orthogonality [30] Adel AM Saleh and Reinaldo Valenzuela, “A statistical model
and approximate conditional inference,” Journal of the Royal for indoor multipath propagation,” IEEE Journal on selected
Statistical Society. Series B (Methodological), pp. 1–39, 1987. areas in communications, vol. 5, no. 2, pp. 128–137, 1987.
[25] Nil Garcia, Henk Wymeersch, and Dirk Slock, “Optimal robust
precoders for tracking the aod and aoa of a mm-wave path,” [31] Jon W Wallace and Michael A Jensen, “Modeling the indoor
arXiv preprint arXiv:1703.10978, 2017. mimo wireless channel,” IEEE Transactions on Antennas and
Propagation, vol. 50, no. 5, pp. 591–599, 2002.
[26] Jerzy K Baksalary, Friedrich Pukelsheim, and George PH
Styan, “Some properties of matrix partial orderings,” Linear [32] Michael A Jensen and Jon W Wallace, “A review of antennas
Algebra and its Applications, vol. 119, pp. 57–85, 1989. and propagation for mimo wireless communications,” IEEE
[27] Zvika Ben-Haim and Yonina C Eldar, “The cramér-rao bound Transactions on Antennas and Propagation, vol. 52, no. 11, pp.
for estimating a sparse parameter vector,” IEEE Transactions 2810–2824, 2004.
on Signal Processing, vol. 58, no. 6, pp. 3384–3389, 2010.
[33] Hadi Noureddine and Qianrui Li, “A two-step compressed
[28] Thomas J Rothenberg, “Identification in parametric mod- sensing based channel estimation solution for millimeter wave
els,” Econometrica: Journal of the Econometric Society, pp. mimo systems,” in Colloque GRETSI, 2017.
577–591, 1971.
[29] Costas Kravaris, Juergen Hahn, and Yunfei Chu, “Advances [34] Mathew K Samimi and Theodore S Rappaport, “3-d millimeter-
and selected recent developments in state and parameter wave statistical channel model for 5g wireless system design,”
estimation,” Computers & chemical engineering, vol. 51, pp. IEEE Transactions on Microwave Theory and Techniques, vol.
111–123, 2013. 64, no. 7, pp. 2207–2225, 2016.

You might also like