Professional Documents
Culture Documents
2, JUNE 2022
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1187
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
1188 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 8, NO. 2, JUNE 2022
TABLE I
S UMMARY OF B EAMFORMING S CHEMES IN C ELL -F REE N ETWORKS
TABLE II
• For a fully centralized uplink cell-free network, we for- D EFINITIONS OF M AJOR S YSTEM M ODEL PARAMETERS ,
mulate the optimal beamforming problem in order to DRL M ODEL PARAMETERS AND A BBREVIATIONS
maximize the normalized sum-rate of the network. We
then use a centralized learning approach to realize the
uplink beamforming by using the deep deterministic
policy gradient (DDPG) algorithm [33].
• We also propose a novel distributed experience-based
beamforming system using the distributed distributional
deterministic policy gradient (D4PG) algorithm [34] in
which the APs act as the distributed agents.
• To reduce the complexity of centralized learning, we
propose a novel DRL-based beamforming scheme with
distributed learning.
• We evaluate the performance of the proposed beamform-
ing designs numerically for different system settings con-
sidering non-orthogonal pilot contamination and shadow
fading.
• We compare the performances of the proposed DRL
methods with those of the traditional conjugate method
and MMSE method for beamforming.
The rest of this paper is organized as follows. The system
model and the beamforming problem are presented in
Section II. Section III briefly reviews the preliminaries of DRL
and DRL-based beamforming. We design a centralized DRL
solution, a distributed experienced-based DRL solution, and
another beamforming solution based on distributed learning
for the beamforming problem in Section IV. In Section IV-C,
we discuss the complexity of all of the DRL-based solutions.
In Section VI, we present and discuss the numerical results.
Section VII concludes the paper. Definitions of the major
system model parameters, DRL models parameters and the
abbreviations used in the paper are given in Table II.
Notations: For a random variable (rv) X, FX (x ) and fX (x ),
variable X following the Nakagami-m distribution is given by
respectively, represent cumulative distribution function (CDF) 2m m x 2m−1 e − m x2
and probability density function (PDF). Moreover, E[·] denotes fX (x ) = Γ(m)Ω m Ω . A random variable X that
the expectation. For a given matrix A ∈ CM ×N , AH repre- follows the Gamma distribution is denoted by X ∼ G(α, β),
β α α−1 −βx
sents the Hermitian transpose of A. The PDF of a random with the PDF being fX (x ) = Γ(α) x e , x > 0, where
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1189
β > 0, α ≥ 1, and Γ(z ) is the Euler’s Gamma function. Lemma 1: Let ϕ k = [ϕk ,1 . . . ϕk ,τp ]H , ||ϕ k ||2 = 1, be the
Moreover, the distributions OG(αi , βi ), i = 1, . . . , N , refer pilot sequence with sample size τp that is assigned to the k-th
to the i-th ascending-ordered rv from a set of N gamma rvs UE. The MMSE estimation constant Emk of gmk is given by
with parameters αi and βi .
√
τp ρk Fmk α β
mk
Emk = K mk . (3)
αml
II. S YSTEM M ODEL , A SSUMPTIONS , AND τp l=1 ρl Fml β |ϕ H
mk ϕ ml | + 1
2
ml
P ROBLEM F ORMULATION
A. Cell-Free Network Model Proof: See the Appendix.
In the above, we use the fact that E[|hmk |2 ] = α mk
βmk . If all of
We consider the uplink of a wireless network with M
the UEs receive a set of mutually orthogonal pilot sequences
single-antenna APs and K single-antenna UEs that have fixed
(i.e., |ϕ H
k ϕ l | = 0, ∀ k = l ), the estimated small-scale chan-
locations within a certain coverage area, as shown in Fig. 1.
nel fading in (A.2) reduces to a scaled version of the exact
Each AP has a baseband processor to partially process the sig-
fading gain plus a relatively small AWGN. However, depend-
nal received from all connected UEs. We refer to such an AP
ing on the applications and due to the limitations concerning
as an “enhanced-AP" (eAP) to distinguish them from the con-
the length of the training sequence, non-orthogonal pilot sig-
ventional APs. The eAPs are connected via the backhaul links,
nals have to be used among some active UEs. To decrease the
hence forming a cell-free network [2]. Such a network archi-
computational complexity at the ECP, one approach would be
tecture enables the distributed eAPs to collaboratively serve
to estimate the CSI at distributed eAPs, where the m-th eAP
all the UEs within the network coverage area. The beamform-
estimates channel gains gmk , ∀ k = 1, . . . , K .
ing optimization can be done either in a edge cloud processor
(ECP) or in the eAPs in a distributed manner (Fig. 1). The
eAPs are connected to the ECP in a star network topology. D. Uplink Data Transmission
All of the eAPs use the same available spectrum to serve the In a cell-free network, each eAP receives the composite
same set of UEs. That is, the frequency reuse factor is 1 and of signals from all UEs. For each UE, a weighted sum of
all of the eAPs are allocated the same total channel bandwidth. composite signals from all eAPs is constructed to maximize
the signal component while minimizing the interference plus
B. Channel Model noise. This process takes place at the baseband level in the
ECP, before forwarding the detected signal of each UE to its
Between the k-th UE and the m-th eAP, the channel gain is final destination. Formally, the overall signal received by the
a random variable ECP to be used in detecting the k-th UE’s data is given by (4),
1/2 shown at the bottom of the next page, where wmk is the m-th
gmk = Fmk hmk . (1)
element of the beamforming vector related to the k-th UE such
In (1), hmk is the small-scale channel fading gain that fol- that 0 ≤ wmk ≤ 1.
lows a Nakagami-Mmk distribution with spreading and shape Moreover, pk is the uplink transmission power of the k-th
parameters Mmk and Ωmk , respectively. Therefore, |hmk |2 ∼ UE such that 0 ≤ pk ≤ Pk , where Pk is the power budget
G(αmk , βmk ), where αmk = Mmk is the shape parameter of the k-th UE. Also, xk is the transmitted symbol of the k-
and βmk = M mk
Ωmk is the inverse-scale parameter. Moreover,
th UE such that E[|xk |2 ] = 1, and η̃m is the AWGN at the
we have m-th eAP with η̃m ∼ CN (0, 1/2). The instantaneous signal-
σsh zmk to-interference-plus-noise ratio (SINR) for the k-th UE is given
Fmk = L−2κ
mk 10
10 , (2) by (5), shown at the bottom of the next page, [36] where
|g̃mi |2 ∼ G(α̃mi , β̃mi ), for i = k, l, v, u. Moreover, α̃mi =
where Lmk = ||dmk || is the Euclidean distance between the
Mmi , for i = k, l, v, u. In addition, β̃mi = Ω FMmiτ σ̇ρmip E 2 ,
k-th UE and the m-th eAP. Also, κ is the path-loss expo- mi mi p i i mi
Mmk σ̇mk
nent (κ ≥ 2) and σsh √ is the shadow
√ fading variance in dB. for i = k, l. Also, β̃mv = 2 |ϕ H ϕ |2 .
Ωmk Fmvs τp ρv pk Emk k v
Furthermore, zmk = δamk + 1 − δbmk , where 0 ≤ δ ≤ 1 M σ̇
Similarly, β̃mu = Ω F τ ρmqp Emk2 |ϕ H ϕ |2 , with
is the transmitter-receiver shadow fading correlation coeffi- mq mu p u q mq q u
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
1190 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 8, NO. 2, JUNE 2022
M
K M
√ √
yk = wmk ĝml pl xl + η̃m = τp ρk pk xk wmk Emk gmk
m=1 l=1 m=1
DesiredSignal
⎡
⎢
M
⎢ K
⎢ √
+ wmk ⎢ τp ρl pl xl Eml gml
⎢
m=1 ⎣l=1,l=k
Inter−UEInterference
⎤
⎥
K
K
K
⎥
√ √ ⎥
+ τp ρv pk xk Emk |ϕ H
k ϕ v |gmv + H
τp ρu pq xq Emq |ϕ q ϕ u |gmu ⎥
⎥
v =1,v =k q=1,q=k u=1,u=q ⎦
non−orthogonalpilot−relatedestimationerror
⎡ ⎤
⎢ ⎥
M
⎢√ K
√ ⎥
⎢ ⎥
+ wmk ⎢ pk Emk xk |ϕ H
k ηm | + pz Emz xz |ϕ H
z ηm | + η̃m ⎥ (4)
⎢ ⎥
m=1 ⎣ z =1,z =k ⎦
AWGN
AWGN−relatedestimationerror
M
m=1 wmk |g̃mk |
2 2
γk = K K K (5)
M K
2
m=1 wmk l=1,l=k |g̃ml |2 + v =1,v =k |g̃mv | +
2
q=1,q=k u=1,u=q |g̃mu | 2 +1
M
m=1 wmk |ǧmk |
2 2
γk = K K K (7)
M K
l=k +1 |ǧml | v =1,v =k |ǧmv | + |ǧ |
2 2 2 2 +1
m=1 wmk + q=1,q=k u=1,u=q mu
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1191
The beamforming problem can be then formulated as where S and A represent the state space and the agent’s action
K
space, respectively. Moreover, P is the transition probability
P1 : maximize log2 (1 + γk ) matrix, where P(s, a, s ) ∈ [0, 1] is the probability that state
W ∈[0 1]M ×K
k =1
s changes to state s by selecting action a. R : S × A → R
⎛ ⎞ defines the expected reward of performing action a at state s.
M
l
s.t. C1 : ⎝wmδ
2
− 2⎠
wmi γ̄ml ≥ P¯s , Finally, ζ is the reward’s discount factor. The goal is to select
l
m=1 i=δl +1 the best action at each step so as to maximize the accumulated
2 discounted reward. In a DRL model, a neural network learns
C2 : wmk ∈ [0, 1], ∀ m, k ,
to map the states to values or state-action to Q-values using the
∀k , ∀ δl = 1, . . . , l − 1 , and l = 2, . . . , K , historical outcomes. DRL algorithms fall into three categories:
(9) (i) value-based methods that aim to learn a value function like
deep Q-learning (DQN) algorithm [38], (ii) policy-based meth-
where γ̄ml = pl |gml |2 , and W ∈ [0, 1]M ×K is the over- ods that learn the optimal policy function, and (iii) actor-critic
all beamforming matrix in which w k = [w1k , . . . , wMk ]. methods that combine value-based and policy-based methods.
Note that in (9), the objective is a function of γk , which In this paper, we use two actor-critic algorithms, namely,
is a function of ωk . The constraint C1 corresponds to the
K K (K −1)
DDPG [33] and D4PG [34]. These methods are suit-
l=2 (l − 1) = 2 conditions of successful SIC able to optimize beamforming, since in both DDPG and
operation with a receiver sensitivity of P¯s . Note that, for max- D4PG, the action space can be continuous. Moreover, for
imizing the minimum transmission rate (max-min fairness), D4PG, the exploration can be distributed among multiple
the objective function of the optimization problem will be agents. The DDPG algorithm uses state-action Q-value critic
minimum log2 (1 + γk ). The receiver deals only with the mea- based on deep Q-learning [38] and updates the policy using its
k =1,...,K
sured (estimated) channel values that include the estimation critic gradients. D4PG builds on the DDPG approach by mak-
error and the AWGN component. The SINR value after the ing several enhancements such as the Q-value estimation and
SIC operation decreases due to pilot contamination. the distributed collection of experiences. Note that an expe-
The globally optimal solution of problem P1 in (9) is the rience is the process of exploring a new action by executing
one that gives the best performance among all possible matri- it in the environment or simulating it, thereby receiving some
ces W. However, P1 is a non-convex problem due to the reward and observing the new state.
non-convexity of the objective function. Accordingly, the glob-
ally optimal solution of P1 in (9) can only be obtained by
an exhaustive search. The computational complexity of such
B. DRL Agent for Beamforming Optimization
as exhaustive search will be combinatorial in terms of the
number of UEs and eAPs, and therefore, will not be feasible For a cell-free network with M APs and K UEs (Fig. 1), we
for practical implementation. Iterative optimization methods develop a DRL model to optimize the beamforming matrix W
can be used, for example, at the ECP of the cell-free system, This matrix includes beamforming vectors of all UEs within
that can converge to locally optimal solutions after a finite the network coverage area given the complete CSI. Note that,
number of iterations. However, this will require all the CSI since the beamforming vector of the k-th UE is given by w k =
information to be transmitted to the ECP resulting in increased [w1k · · · wMk ], the matrix W has a dimension of M × K.
network overhead as well as high processing complexity. DRL- We cast the beamforming optimization problem as an MDP
based solutions based on distributed learning can reduce the problem. We then train a DRL model where an agent learns by
network overhead as well as processing complexity at the ECP. interacting with a cell-free network as the environment. Such
More specifically, centralized learning with distributed agents’ a DRL model can be implemented either centrally or in a dis-
experience can improve the quality of the solution, while a tributed manner. The design of the environment for a cell-free
completely distributed learning-based scheme will reduce the network includes the definition of state s, the action a, and the
signal processing complexity at the ECP and the network over- immediate reward function r, needed for the DRL algorithm
head significantly. In the later sections of this paper, we will to estimate the policy and the Q-values. To improve the train-
develop three DRL-based solutions, namely, the fully cen- ing process, new actions need to be explored by the agent.
tralized training and learning-based method, the centralized Therefore, we add a noise generated from a random process
learning-based method with distributed training, and the fully at each action taken in the training phase. Although originally
distributed training and learning-based method, for the for- an Ornstein-Uhlenbeck (OU) process was proposed for explo-
mulated problem. The centralized learning-based method can ration, later results showed that, an uncorrelated, zero mean
serve as a benchmark for the two other methods. Gaussian noise gives the same performance [38]. Therefore,
due to its simplicity of implementation, the Gaussian noise
process is preferred to the OU process for exploration. The
III. BACKGROUND ON DRL AND B EAMFORMING
state can be any key performance indicator. While the action
O PTIMIZATION
of this model is the optimization variable W (the beamforming
A. DRL Preliminaries matrix) of the problem P1 , the reward can be any performance
DRL is a solution method for Markov decision processes metric that jointly quantifies the performance of all active
(MDPs). An MDP is characterized by a tuple (S, A, P, R, ζ), UEs. In Table III, we summarize the design parameters of
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
1192 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 8, NO. 2, JUNE 2022
TABLE III
D ESIGN PARAMETERS FOR THE DRL M ODEL
Algorithm 1 DDPG Learning Process [33] (Implemented in
the ECP)
1: Randomly initialize the actor network μ(s|θ μ ) and the
critic network Q(s, a|θQ ) with weights θμ , θQ .
2: Initialize the target networks Q and μ with weights
θQ ← θQ and θμ ← θμ .
3: Initialize the replay buffer R.
4: for episode = 1 to max-number-episodes do
5: Initialize a random process N for action exploration.
a DRL model and the corresponding measures in the cell-free 6: Observe the initial state s1 .
network. 7: for t = 1 to max-episode-steps do
8: Perform action at = μ(st ) + Nt . Observe reward
rt and the next state st+1 .
IV. DRL-BASED C ENTRALIZED AND D ISTRIBUTED
9: Store the transition (st , at , rt , st+1 ) in R.
B EAMFORMING M ETHODS
10: Let N be the batch size. Sample random mini batch
A. The DDPG Algorithm: A Fully Centralized Solution of N transitions from R.
In this section, we propose a DRL-based centralized solution 11: Update the critic by minimizing the loss
for the beamforming problem in (9) using the DDPG learning
1
algorithm. This solution serves as a benchmark for the other L= (yj − Q(sj , aj |θQ ))2 , (14)
beamforming techniques in the subsequent sections. The actor- N
j
critic algorithm in DDPG can handle continuous state space
and continuous action space. Since the elements wij of W are
yj = rj + ζQ (sj +1 , μ (sj +1 )|θμ )|θQ ). (15)
continuous in the range [0, 1], to find the optimal beamforming
matrix W ∈ [0, 1]M ×K , we use DDPG. It uses two neural 12: Update the actor policy using sampled policy
networks as function estimators: (i) the critic, Q(s, a), whose gradient ascent as
parameters are θQ and calculates the expected return given
state s and action a; (ii) the actor, μ(s), whose parameter is θμ 1
and determines the policy. In DDPG, the actor directly maps ∇θμ J ≈ ∇a Q(s, a|θQ )|s=sj ,a=μ(sj )
N (16)
states to actions instead of outputting a probability distribution j
μ
across a discrete action space. The starting point for learning × ∇θμ μ(s|θ )|s=sj .
an estimator to Q ∗ (s, a) is the Bellman equation given by 13: Update the target networks as
Q ∗ (s, a) = E(s,a,r ,s )∈R r (s, a) + ζ max Q ∗ (s , a ) , (10) θQ ← τ θQ + (1 − τ )θQ , (17)
a
where R is the set of the experiences and ζ is the discount
factor. θμ ← τ θμ + (1 − τ )θμ . (18)
Computing the maximum over actions in the target is quite 14: end for
challenging in the continuous action spaces. In particular, the 15: end for
training of the Q-value function involves minimizing the loss
function L(θQ ) in equation (11). However, the target value
yt that we want our Q-value to be close to depends on the that are updated at each iteration and they increase the chance
weights of critic network Q. This dependency will lead to an of convergence of the algorithm. Therefore, in Algorithm 1, the
instability in the learning of critic network Q which will affect critic network minimizes the loss function presented at (14). In
the learning of the policy μ. policy learning, DDPG learns θμ that maximizes Q. At every
round, it maximizes the expected return as
L(θQ ) = Est ∼ρμ ,at ∼β,rt ∼E (Q(st , at )|θQ ) − yt )2
J (θμ ) = E Q(s, a)s = st , a = at , (12)
yt = r (st , at ) + ζQ st+1 , μ(st+1 )|θQ . (11)
and updates the weights θμ by following the gradient of (12)
To mitigate this instability, in [33] a lagged version of the
∇Jθμ (θ) ≈ ∇a Q(s, a).∇μ(s|θμ ). (13)
actual critic network and the actor network are created and
updated through the Polyak averaging at each iteration to This update rule represents the Deterministic Policy
enhance the convergence chances. DDPG handles this cou- Gradient theorem [39]. The term ∇a Q(s, a) is obtained from
pling and instability by using two target networks, namely, the backpropagation of the Q-network Q(s, a|θQ ) w.r.t the
a critic target network Q’(s, a) and a policy target network action input μ(s|θμ ). Algorithm 1 summarizes the DDPG
μ (s). These two networks use a set of parameters θQ and θμ learning process with settings presented in Table III (i.e., the
updated by Poylak averaging with factor τ as shown at (17) states st , actions at and rewards rt are defined as UE SINR
and (18) in Algorithm 1. These additional networks represent 1K, . . . , γK }, the beamforming matrix W, and the sum-rate
{γ
a lagged version of the actual critic network and actor network k =1 log2 (1 + γk ), respectively, at the t-th learning step).
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1193
B. D4PG-Based Beamforming With Distributed Agents’ replay buffer, which combines all experiences explored by
Experience the actor networks. In Algorithm 3, we describe the explo-
In the previous section, we discussed the fully centralized ration technique used in the actor network. This approach
DDPG-based beamforming scheme as a replacement of con- improves the quality of training data since the eAPs simultane-
ventional centralized optimization at the ECP. In this section, ously generate multiple experiences with different exploration
we take a step toward distributed beamforming in cell-free processes.
networks. We propose a DRL-based beamforming scheme The design parameters of the DRL model for D4PG are
which exploits the experiences of the distributed actors (e.g., represented in Table III, therefore the states st , actions at and
eAPs) that belong to a centrally located agent (e.g., ECP). This rewards rt are defined as the UE SINR{γ1 , . . . , γK }, the
scheme is based on the distributed distributional deep deter- beamforming matrix W, and the sum-rate K k =1 log2 (1 + γk ),
ministic policy gradient (D4PG) algorithm [34]. Generally, the respectively, at the t-th learning step. D4PG is an actor-critic
D4PG algorithm applies a set of improvements on DDPG and method that enhances the DDPG algorithm to perform in a
make it run in a distributional fashion. These improvements distributed manner, thereby improving the estimations of the
enable D4PG to outperform DDPG [40]. This is achieved by Q-values. Below we discuss these enhancements to the D4PG
having a multiple actor neural networks to gather independent algorithm, which we explicitly describe in Algorithm 2.
experiences and feed it to a single replay buffer. However, The main features of the D4PG algorithm are described
it contains a single critic network that samples the indepen- below.
dently gathered experiences from the replay buffer to explore a 1) Distributional Critic: In D4PG, the Q-value is a ran-
new Q-value. Traditionally, the independent actors, the replay dom variable following some distribution Z with parameters
buffer, and the critic of the D4PG are implemented within θZ , thus QθZ (s, a) = E[Z (s, a|θZ )]. The objective func-
the same operation area/module. In this paper, we propose tion for learning the distribution minimizes some measure of
an implementation for the D4PG by setting the number of the distance between the distribution estimated of the target
independent actors to M (equal to the number of eAPs) and critic network and that of the critic network, e.g., the binary
placing an actor at each eAP, and place the replay buffer and cross-entropy loss. Formally,
the critic network at the ECP. As illustrated in Fig. 2, the
actor networks implemented in the eAPs generate independent
experiences that are sent to the ECP to be pushed into the L(θZ ) = E[d (Tμ Z (s, a|θZ ), Z (s, a|θZ )], (19)
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
1194 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 8, NO. 2, JUNE 2022
Algorithm 2 D4PG-Based Beamforming With Distributed 3) Multiple Distributed Parallel Actors: This process takes
Agents’ Experience place in parallel within K actors, each one generating samples
1: Input: batch size P , trajectory length N , number of actors independently. The samples are collected in a replay buffer
K , replay size R, initial learning rates α0 and β0 , time from which a learner samples batches to update the weights
for updating the target networks ttarget , time for updating of the networks.
the actors policy tactor . 4) Prioritized Experience Replay (PER): Finally, D4PG
2: Initialize actor and critic networks (θ μ ,θ Z ) at random. collects R samples from the replay buffer with non-uniform
3: Initialize target weights (θ μ ,θ Z ) ← (θ μ ,θ Z ). probability pi . The i-th sample is selected with priority
4: Launch K actors and replicate their weights (θ μ ,θ Z ). (Rpi )−1 that also indicates the importance of the sample.
5: for t = 1, . . . , T do
6: Sample P transitions (si:i+N , ai:i+N , ri:i+N ) of C. DRL-Based Beamforming With Distributed Learning
length N from replay with priority pi .
In the previous section, we proposed a D4PG learning
7:
NConstruct the target distributions Yi =
−1 n N μ Z method to train a policy that predicts the beamforming matrix
n=0 ζ ri+n + ζ Z (si+N , μ (si+N |θ )|θ ). given a cell-free network environment. We enhanced the
8: Compute the actor and critic updates
learning performance by allocating an agent per eAP and
1
δθ Z = ∇θZ (R pi )−1 d (Yi , Z (si , ai |θZ )), then utilizing the distributed experience collected from sev-
P eral agents. The enhanced method converges faster and shows
i
better performance compared to the fully centralized DDPG
1
δθ μ = ∇θμ μ(si |θμ )E[∇a Z (si , a|θZ )]|a=μ(si |θμ ) . solution. However, the proposed D4PG method still conducts
P the learning process at the ECP that involves a large body of
i
the computational tasks.
9: Update network parameters θμ ← θμ +αt δθμ , θZ ← In this section, we propose a cell-free beamforming scheme,
θZ + βt δθ Z . which, in addition to distributing the agents’ experiences, splits
10: If t mod ttarget = 0, update the target networks the learning process of the DRL among all eAPs. In such a
(θμ , θZ ) ← (θμ , θZ ). model, the eAPs divide the computational tasks equally and
11: If t mod tactors = 0, replicate network weights to the the ECP only performs limited control and coordination task.
actors. In this scheme, every eAP is responsible to find the optimal
12: end for beamforming vector for all UEs. To this end, all vectors of the
13: Return policy parameters θμ . overall beamforming matrix are considered to be constants,
to be simultaneously found by other eAPs. As an example,
Algorithm 3 Actor for a cell-free network with M eAPs and K UEs, eAPm is
1: repeat responsible to optimize ωm = W (m, [1, . . . , k ]). Thus, eAPm
2: Set t = 0 solves the following subproblem:
3: Initialize random process N at each actor K
4: At step t, select action at = μ(st |θμ ) + Nt . Receive Pm : maximize log2 (1 + γk )
reward rt and observe state st . ωm ∈[0 1]1×K k =1
5: Send the experience (st , at , rt , st ) to ECP to be stored ⎛ ⎞
M
l
in the replay buffer. S.t. C1 : ⎝wmδ
2
− 2 ⎠
wmi γ̄ml ≥ P¯s ,
l
6: until learner finishes, i.e. Algorithm 2 stops. m=1 i=δl +1
2
C2 : wmk ∈ [0, 1],
where Tμ is the Bellman operator. The deterministic policy ∀k , ∀ δl = 1, . . . , l − 1 , and l = 2, . . . , K .
gradient update yields (22)
∇θμ J (θμ ) ≈ Eρ ∇θμ μ(s|θμ )∇a Q(s, a|θZ )|a=μ(s|θμ ) In the following, we present a new system design for the
cell-free network environment where the eAPs interact with
= Eρ ∇θμ μ(s|θμ )E ∇a Z (s, a|θZ ) |a=μ(s|θμ ) . each other to find the optimal beamforming vector. We intend
(20) to solve the optimization problem in (9) through multiple eAPs
solving the problem in (22). Therefore, as shown in Fig. 3, we
2) N-Step Returns: An agent in D4PG computes the N-step distribute the learning process among the eAPs by implement-
Temporal Difference (TD) target instead. Formally, ing a DDPG agent in each eAP with its local experience buffer,
N −1
actor network, and critic network.
N
(Tμ Q)(s0 , a0 ) = r (s0 , a0 ) + E ζ n r (sn , an ) The design parameters of the DRL model are slightly
n=1
st and rt are the
changed compared to Table III. The states
UE SINR {γ1 , . . . , γK } and sum-rate K k =1 log2 (1 + γk ) (as
N
+ ζ Q(sN , μθ (sN ))s0 , a0 . before), but the actions at are changed to ωm , since the m-
th eAP learns to optimize the m-th row of the beamforming
(21) matrix W. Beamforming is then optimized as follows: The
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1195
first step is the DDPG process that includes: (i) generating Algorithm 4 Local DDPG (Implemented in Each eAP)
experiences, and (ii) updating the network parameters using 1: Initiate the cell-free network environment with random
Bellman equation (10), and the policy gradient in (13). Each beamforming matrix W local .
agent learns to optimize one row of the beamforming matrix. 2: Initialize actor network μ(·) and critic network Q(·) with
Here, the ECP, which is connected to all the eAPs through parameters θμ and θQ , respectively.
backhaul links, acts as the coordinator to facilitate sharing 3: Initiate the random process N .
of the optimized rows with other agents. At each episode, 4: Initiate rmax = 0.
the eAPs discover multiple actions. They select the action 5: for each episode do
with the highest reward in terms of the maximum normalized 6: Observe initial state s from the cell-free network
sum-rate during the episode and send it to the coordinator. environment.
The coordinator receives all the rows from the eAPs, and 7: for each step do
then concatenates them to create a new beamforming matrix. 8: Apply a = μ(s) + Nt on the cell-free network
Finally, it broadcasts the new matrix to all eAPs. The number environment and observe new state s and reward r .
of updates of the beamforming matrix, i.e., the horizon, can 9: Store (s, a, r , s ) in the replay buffer R.
be selected based on the required accuracy. Algorithm 4 sum- 10: Update θμ and θQ weights with a batch from R
marizes the learning process in each eAP, and Algorithm 5 using Bellman equation (10) and policy gradient (13).
summarizes the coordination process among the eAPs by the 11: if r ≥ rmax then
the ECP. 12: aoptimal = a.
13: rmax = r .
V. C OMPLEXITY A NALYSIS AND C OMMUNICATION 14: end if
OVERHEAD 15: end for
16: if update time of beamforming matrix then
As we have mentioned before, the beamforming
17: Send aoptimal to the coordinator.
optimization problem (9) is non-convex. Conventional
18: Receive W local from the coordinator.
approaches to solve a non-convex problem include iterative
19: W local = W new .
algorithms such as exhaustive search, steepest descent,
20: end if
gradient ascent, and interior-point methods.1 Considering an
21: end for
exhaustive search, we quantize the beamforming vector of
k-UE, i.e., ωk = [w1k . . . wMk ], by a certain step size Δ.
For the beamforming optimization problem, this results in a
complexity of O(( Δ 1 )M +K ). The combinatorial complexity To evaluate the time-complexity of a DNN, the con-
renders the conventional beamforming techniques impractical, ventional measure is the floating-point operations per sec-
in particular, for dense cell-free networks. This calls for ond (FLOPs). For any fully connected layer Li of input
novel methods such as DRL-based solutions that retain low size Ii and output size Oi , the number of FLOPs is
complexity while guaranteeing efficiency. given by
1 Note that the objective function in (9) is twice differentiable. FLOPS(Li ) = 2 Ii Oi . (23)
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
1196 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 8, NO. 2, JUNE 2022
TABLE IV
C OMPLEXITY A NALYSIS
TABLE V
Algorithm 5 Coordinator (Implemented in the ECP) S IMULATION PARAMETERS
1: repeat
2: W new = 0
3: repeat
4: if aoptimal is received from i -th eAP then
5: W i, new = aoptimal .
6: end if
7: until aoptimal is received from every eAP.
8: Broadcast W new to all agents.
9: until all agents stop training.
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1197
Small Scale
varying CSI (at each channel use); however, this requires a 14
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
1198 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 8, NO. 2, JUNE 2022
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1199
TABLE VI
E VALUATION OF AVERAGE OF N ORMALIZED T RANSMISSION S UM -R ATE
VII. C ONCLUSION
We have studied the beamforming optimization problem in
cell-free networks. First, we have considered a fully central-
ized network and designed a DRL-based beamforming method
based on the DDPG algorithm with continuous optimization
space. We have also enhanced this method in D4PG by col-
Fig. 9. Performances of DRL models in a testing environment.
lecting distributed experiences from geographically-distributed
eAPs. Afterward, we have developed a DRL-based beam-
forming design with distributed learning, which divides the
require less than a second for finding the optimal beamforming beamforming optimization tasks among the eAPs. Even though
matrix. the D4PG beamforming technique demonstrates a promising
Finally, we consider the case of various coherent blocks performance, it still conducts the learning process at the ECP.
(different CSI realizations) and we study the performances Our observations and lessons learned are summarized below.
of the DRL-based methods. We setup a cell-free network • For physical layer design of cell-free networks, compared
environment where each episode composed of 50 iterations to classical optimization techniques, DRL-based meth-
uses one CSI realization. We train the DRL-models on 1000 ods can reduce the computational time required to solve
episodes, i.e., 1000 different CSI realizations. For benchmark- the optimization problems because they instantaneously
ing, we test the trained policies in a test environment for return the learned policy in the inference mode.
3000 episodes, i.e., 3000 different CSI realizations, and we • Centralized DRL approaches such as the centralized
compare them with the centralized MMSE method and the DDPG provide the upper bound system performance
conjugate beamforming method. The results are shown in (e.g., in terms of transmission sum-rate); however, at
Fig. 9 where we compare the DRL-models to MMSE and the cost of the overhead of sharing of CSI information
Conjugate Beamforming (CB) methods on 100 CSI realiza- with the ECP [42]. The distributed approaches that per-
tions. Moreover, in Table VI we compare the DRL-models form using only local data do not do not require sharing
based on the average of the normalized transmission sum-rate the CSI information with the ECP. Therefore, a trade-off
over a larger set of CSI realizations, i.e., 3000 channel real- between performance and communication overhead exists
izations. The CSI realizations between the test environment in the design of cell-free networks.
and the training environment are different which is the case A future research direction could be to develop DRL mod-
for real scenarios since the CSI matrix space is huge and we els for channel estimation and pilot assignment for cell-free
need that the learned policies generalize over unknown CSI networks, and also investigate the robustness of the DRL solu-
realizations. tions in presence of estimation errors as well as errors in
Fig 9 and Table VI show that the MMSE method retains its reward signals.
superiority over the DRL-based methods, while the DRL meth-
ods outperform the conjugate beamforming method. Since
the training can be done offline, the DRL methods are suit-
able for practical implementation due to their low inference A PPENDIX
time. This time is almost negligible compared to the conver- Let τp ≤ τc , where τc is the coherence time of the channel
gence time of MMSE and conjugate beamforming methods, via which the sequence is sent to all the eAPs with constant
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
1200 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 8, NO. 2, JUNE 2022
power. The received pilot vector at the m-th eAP yields [10] A. Liu and V. K. N. Lau, “Joint BS-user association, power alloca-
tion, and user-side interference cancellation in cell-free heterogeneous
K
networks,” IEEE Trans. Signal Process., vol. 65, no. 2, pp. 335–345,
√
y p,m = τp ρk gmk ϕ k + ηm , (A.1) Jan. 2017.
[11] Y. Jin, J. Zhang, S. Jin, and B. Ai, “Channel estimation for cell-free
k =1 mmWave massive MIMO through deep learning,” IEEE Trans. Veh.
where ρk is the normalized transmission power for each sym- Technol., vol. 68, no. 10, pp. 10325–10329, Oct. 2019.
[12] S. J. Nawaz, S. K. Sharma, S. Wyne, M. N. Patwary, and
bol of the k-th UE pilot vector. Moreover, ηm ∈ Cτp ×1 is the M. Asaduzzaman, “Quantum machine learning for 6G communication
zero-mean complex additive white Gaussian noise (AWGN) networks: State-of-the-art and vision for the future,” IEEE Access, vol. 7,
vector related to pilot symbols with independent and identi- pp. 46317–46350, 2019.
[13] S. Chakraborty, E. Björnson, and L. Sanguinetti, “Centralized and dis-
cally distributed (i.i.d) rvs, i.e., ηm ∼ CN (0, 1/2). To find the tributed power allocation for max-min fairness in cell-free massive
1/2
best estimate of gmk (denoted by ĝmk = Fmk ĥmk ) given the MIMO,” in Proc. 53rd Asilomar Conf. Signals Syst. Comput., 2019,
pp. 576–580.
vector of observations y p,m , we first project y p,m over ϕ H k . [14] R. Nikbakht, A. Jonsson, and A. Lozano, “Unsupervised-learning power
Therefore, control for cell-free wireless systems,” in IEEE 30th Annu. Int. Symp.
Personal Indoor Mobile Radio Commun. (PIMRC), 2019, pp. 1–5.
ẏp,m = ϕ H
k y p,m [15] O. Simeone, “A very brief introduction to machine learning with appli-
cations to communication systems,” IEEE Trans. Cogn. Commun. Netw.,
K
√ √ vol. 4, no. 4, pp. 648–664, Dec. 2018.
= τp ρk gmk + τp ρl gml ϕ H H
k ϕ l + ϕ k ηm . [16] Z. Xiong, Y. Zhang, D. Niyato, R. Deng, P. Wang, and L.-C. Wang,
“Deep reinforcement learning for mobile 5G and beyond: Fundamentals,
l=1,l=k
DesiredValue applications, and challenges,” IEEE Veh. Technol. Mag., vol. 14, no. 2,
EstimationError pp. 44–52, Jun. 2019.
(A.2) [17] A. Abdallah and M. M. Mansour, “Angle-based multipath estimation
and beamforming for FDD cell-free massive MIMO,” in Proc. IEEE
20th Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC),
Here gmk can be estimated from (A.2) by using the maximum 2019, pp. 1–5.
a posteriori decision rule (MAP), which is identical to the min- [18] G. Femenias and F. Riera-Palou, “Cell-free millimeter-wave massive
imum mean square method (MMSE) [43], [44]. Furthermore, MIMO systems with limited fronthaul capacity,” IEEE Access, vol. 7,
pp. 44596–44612, 2019.
given that the pilot signals are partially orthogonal and par- [19] Y. Al-Eryani, E. Hossain, and D. I. Kim, “Generalized coordinated
tially non-orthogonal, ϕ H k y p,m in (A.2) represents a sufficient multipoint (GCoMP)-enabled NOMA: Outage, capacity, and power allo-
statistics for the optimal estimation of gmk (MMSE). Thus the cation,” IEEE Trans. Commun., vol. 67, no. 11, pp. 7923–7936, Nov.
2019.
best estimate of gmk is given by [6]
[20] Y. Al-Eryani, M. Akrout, and E. Hossain, “Multiple access in cell-free
∗ networks: Outage performance, dynamic clustering, and deep reinforce-
E ẏp,m gmk
ĝmk = ẏp,m = Emk ẏp,m . (A.3) ment learning-based design,” IEEE J. Sel. Areas Commun., vol. 39, no. 4,
E |ẏp,m |2 pp. 1028–1042, Apr. 2021.
[21] E. Nayebi, A. Ashikhmin, T. L. Marzetta, H. Yang, and B. D. Rao,
Under the assumption that for all m and k, gmk s are “Precoding and power optimization in cell-free massive MIMO
systems,” IEEE Trans. Wireless Commun., vol. 16, no. 7, pp. 4445–4459,
proper independent but non-identically distributed (i.n.d) com- Jul. 2017.
plex Gaussian rvs, and that ηm s are zero-mean i.i.d random [22] A. Zhou, J. Wu, E. G. Larsson, and P. Fan, “Max-min optimal beam-
variables, we get Emk as shown in Lemma 1. forming for cell-free massive MIMO,” IEEE Commun. Lett., vol. 24,
no. 10, pp. 2344–2348, Oct. 2020.
[23] A. Abdallah and M. M. Mansour, “Efficient angle-domain processing for
R EFERENCES FDD-based cell-free massive MIMO systems,” IEEE Trans. Commun.,
vol. 68, no. 4, pp. 2188–2203, Apr. 2020.
[1] P. Popovski, K. F. Trillingsgaard, O. Simeone, and G. Durisi, “5G wire- [24] H. Yang and T. L. Marzetta, “Energy efficiency of massive MIMO: Cell-
less network slicing for eMBB, URLLC, and mMTC: A communication- free vs. cellular,” in Proc. IEEE 87th Veh. Technol. Conf. (VTC Spring),
theoretic view,” IEEE Access, vol. 6, pp. 55765–55779, 2018. 2018, pp. 1–5.
[2] S. Zhou, M. Zhao, X. Xu, J. Wang, and Y. Yao, “Distributed wireless [25] Y. Zhang, H. Cao, M. Zhou, and L. Yang, “Cell-free massive MIMO:
communication system: A new architecture for future public wireless Zero forcing and conjugate beamforming receivers,” J. Commun. Netw.,
access,” IEEE Mag. Commun., vol. 41, no. 3, pp. 108–113, Mar. 2003. vol. 21, no. 6, pp. 529–538, Dec. 2019.
[3] E. Björnson and L. Sanguinetti, “Making cell-free massive MIMO com-
[26] Z. Chen, E. Björnson, and E. G. Larsson, “Dynamic resource alloca-
petitive with MMSE processing and centralized implementation,” IEEE
tion in co-located and cell-free massive MIMO,” IEEE Trans. Green
Trans. Wireless Commun., vol. 19, no. 1, pp. 77–90, Jan. 2020.
Commun. Netw., vol. 4, no. 1, pp. 209–220, Mar. 2020.
[4] M. Bashar, K. Cumanan, A. G. Burr, M. Debbah, and H. Q. Ngo, “On
the uplink max-min SINR of cell-free massive MIMO systems,” IEEE [27] M. Alonzo, S. Buzzi, A. Zappone, and C. D’Elia, “Energy-efficient
Trans. Wireless Commun., vol. 18, no. 4, pp. 2021–2036, Apr. 2019. power control in cell-free and user-centric massive MIMO at millimeter
[5] Y. Al-Eryani and E. Hossain, “The D-OMA method for massive multiple wave,” IEEE Trans. Green Commun. Netw., vol. 3, no. 3, pp. 651–663,
access in 6G: Performance, security, and challenges,” IEEE Veh. Technol. Sep. 2019.
Mag., vol. 14, no. 3, pp. 92–99, Sep. 2019. [28] G. Interdonato, M. Karlsson, E. Björnson, and E. G. Larsson, “Local
[6] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, partial zero-forcing precoding for cell-free massive MIMO,” IEEE Trans.
“Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Wireless Commun., vol. 19, no. 7, pp. 4758–4774, Jul. 2020.
Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017. [29] D. Wang, M. Wang, P. Zhu, J. Li, J. Wang, and X. You, “Performance of
[7] G. Interdonato, E. Björnson, H. Q. Ngo, P. Frenger, and E. G. Larsson, network-assisted full-duplex for cell-free massive MIMO,” IEEE Trans.
“Ubiquitous cell-free massive MIMO communications,” EURASIP J. Commun., vol. 68, no. 3, pp. 1464–1478, Mar. 2020.
Wireless Commun. Net., vol. 2019, no. 1, pp. 197–210, Aug. 2019. [30] G. Interdonato, P. Frenger, and E. G. Larsson, “Scalability aspects of
[8] M. Attarifar, A. Abbasfar, and A. Lozano, “Modified conjugate beam- cell-free massive MIMO,” in Proc. IEEE Int. Conf. Commun. (ICC),
forming for cell-free massive MIMO,” IEEE Trans. Wireless Commun., 2019, pp. 1–6.
vol. 8, no. 2, pp. 616–619, Apr. 2019. [31] C. D’Andrea, A. Zappone, S. Buzzi, and M. Debbah, “Uplink power
[9] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO control in cell-free massive MIMO via deep learning,” in Proc. IEEE 8th
systems,” IEEE Trans. Commun., vol. 68, no. 7, pp. 4247–4261, Int. Workshop Comput. Adv. Multi-Sensor Adapt. Process. (CAMSAP),
Jul. 2020. 2019, pp. 554–558.
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.
FREDJ et al.: DISTRIBUTED BEAMFORMING TECHNIQUES FOR CELL-FREE WIRELESS NETWORKS USING DRL 1201
[32] M. Bashar et al., “Exploiting deep learning in limited-fronthaul cell-free Setareh Maghsudi received the Ph.D. degree
massive MIMO uplink,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, (summa cum laude) from Technical University of
pp. 1678–1697, Aug. 2020. Berlin in 2015. She is an Assistant Professor
[33] T. P. Lillicrap et al., “Continuous control with deep reinforcement with the University of Tübingen and a Senior
learning,” 2015, arXiv:1509.02971. Research Scientist with the Fraunhofer Heinrich
[34] G. Barth-Maron et al., “Distributed distributional deterministic policy Hertz Institute. From 2017 to 2020, she was an
gradients,” 2018, arXiv:1804.08617. Assistant Professor with the Technical University of
[35] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, Berlin. From 2015 to 2017, she held postdoctoral
“Cell-free massive MIMO: Uniformly great service for everyone,” in positions with the University of Manitoba, Canada,
Proc. IEEE 16th Int. Workshop Signal Process. Adv. Wireless Commun. and Yale University, USA. She has received several
(SPAWC), 2015, pp. 201–205. competitive fellowships, awards, and research grants,
[36] W. Liao, T. Chang, W. Ma, and C. Chi, “QoS-based transmit beamform- including from the German Research Foundation (DFG), the German Ministry
ing in the presence of eavesdroppers: An optimized artificial-noise-aided of Education and Research (BMBF), and the Japan Society for the Promotion
approach,” IEEE Trans. Signal Process., vol. 59, no. 3, pp. 1202–1216, of Science. Her main research interests lie in the intersection of network
Mar. 2011. analysis and optimization, game theory, and machine learning.
[37] S. Buzzi, C. D’Andrea, and C. D’Elia, “User-centric cell-free massive
MIMO with interference cancellation and local ZF downlink precoding,”
in Proc. 15th Int. Symp. Wireless Commun. Syst. (ISWCS), 2018, pp. 1–5.
[38] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
with double Q-learning,” in Proc. 30th AAAI Conf. Artif. Intell., 2016,
pp. 2094–2100.
[39] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
“Deterministic policy gradient algorithms,” in Proc. 31st Int. Conf.
Mach. Learn., vol. 32, Jun. 2014, pp. 387–395.
[40] Y. Tassa et al., “DeepMind control suite,” 2018, arXiv:1801.00690. Mohamed Akrout received the B.E. degree in
[41] M. G. Bellemare, W. Dabney, and R. Munos, “A distributional perspec- computer engineering from École Polytechnique
tive on reinforcement learning,” 2017, arXiv:1707.06887. de Montreal, Montreal, Canada, and the “Diplome
[42] H. A. Ammar, R. Adve, S. Shahbazpanahi, G. Boudreau, and d’Ingenieur” degree from Telecom ParisTech
K. V. Srinivas, “Distributed resource allocation optimization for user- (ENST), Paris, France, in 2016, and the M.S.
centric cell-free mimo networks,” IEEE Trans. Wireless Commun., early degree in artificial intelligence from University
access, Oct. 13, 2021, doi: 10.1109/TWC.2021.3118303. of Toronto, Toronto, Canada, in 2018. He is
[43] R. M. Gray and L. D. Davisson, An Introduction to Statistical Signal currently pursuing the Ph.D. degree with the
Processing, 1st ed. New York, NY, USA: Cambridge Univ. Press, 2010. Electrical and Computer Engineering Department,
[44] H. Yin, D. Gesbert, M. Filippou, and Y. Liu, “A coordinated approach University of Manitoba, Winnipeg, Canada. He has
to channel estimation in large-scale multiple-antenna systems,” IEEE J. published in topics related to statistical inference,
Sel. Areas Commun., vol. 31, no. 2, pp. 264–273, Feb. 2013. wireless communication, and computational neuroscience. His current
research interests include signal processing for inverse problems and
physically-consistent antenna design.
Authorized licensed use limited to: Izmir Katip Celebi Univ. Downloaded on November 07,2022 at 09:12:37 UTC from IEEE Xplore. Restrictions apply.