You are on page 1of 6

Federated Learning Games for Reconfigurable

Intelligent Surfaces via Causal Representations


Charbel Bou Chaaya, Sumudu Samarakoon, and Mehdi Bennis
Centre for Wireless Communications, University of Oulu, Finland
Emails: {charbel.bouchaaya, sumudu.samarakoon, mehdi.bennis}@oulu.fi

Abstract—In this paper, we investigate the problem of robust designs (e.g., different RISs, propagation environments, users
Reconfigurable Intelligent Surface (RIS) phase-shifts configura- distribution, etc.). Moreover, these approaches produce a high
tion over heterogeneous communication environments. The prob- inference accuracy within a fixed environment, from which
lem is formulated as a distributed learning problem over different
arXiv:2306.01306v1 [cs.LG] 2 Jun 2023

environments in a Federated Learning (FL) setting. Equivalently, the training and testing data are drawn. They subsequently
this corresponds to a game played between multiple RISs, as fail to have a good Out-of-Distribution (OoD) generalization
learning agents, in heterogeneous environments. Using Invariant in unseen environments.
Risk Minimization (IRM) and its FL equivalent, dubbed FL Although FL provides a learning framework where multiple
Games, we solve the RIS configuration problem by learning agents train a collaborative model while preserving privacy,
invariant causal representations across multiple environments
and then predicting the phases. The solution corresponds to its state-of-the-art approach, such as Federated Averaging
playing according to Best Response Dynamics (BRD) which yields (F EDAVG) [8], is known to perform poorly when the local
the Nash Equilibrium of the FL game. The representation learner data is non-identical across participating agents. This is due
and the phase predictor are modeled by two neural networks, to the fact that F EDAVG (and its variants) solve the distributed
and their performance is validated via simulations against other learning problem via Empirical Risk Minimization (ERM),
benchmarks from the literature. Our results show that causality-
based learning yields a predictor that is 15% more accurate in that minimizes the empirical distribution of the local losses
unseen Out-of-Distribution (OoD) environments. assuming that the data is identically distributed. To mitigate
Index Terms—Reconfigurable Intelligent Surface (RIS), Fed- this issue, the authors in [9] leveraged Distributionally Robust
erated Learning, Causal Inference, Invariant Learning. Optimization (DRO) [10] and proposed a federated DRO for
RIS configurations. Therein, the distributed learning problem
I. I NTRODUCTION is cast as a minimax problem, where the model’s parameters
The advent of Reconfigurable Intelligent Surfaces (RISs) are updated to minimize the maximal weighted combination
will substantially boost the performance of wireless commu- of local losses. This ensures a good performance for the
nication systems. These surfaces are manufactured by layering aggregated model over the worst-case combination of local
stacks of sheets made out of engineered materials, called distributions.
meta-materials, built on a planar structure. The reflection On the other hand, [11] used Invariant Risk Minimization
coefficients of the meta-material elements, called meta-atoms, (IRM) [12] to formulate the problem of learning optimal RIS
vary depending on their physical states. Thus, the direction of phase-shifts. The aim of IRM is to capture causal dependencies
incident electromagnetic waves on RISs can be manipulated in the data, that are invariant across different environments. In
with the aid of simple integrated circuit controllers that modify [11], the authors empirically showed that using relative dis-
meta-atoms’ states. In this view, the RIS technology provides tances and angles between the RIS and the transmitter/receiver
a partial control over the wireless propagation environment as causal representations for the CSI, improves the robustness
rendering improved spectral efficiency with a minimal power of the RIS phase predictor. However, these representations
footprint [1]. RIS is considered a fundamental enabler to were not learned by the configuration predictor, but were
achieve the 6G vision of smart radio environments [2]. predefined and fixed. Also, this solution assumes that multiple
One of the major challenges in the RIS technology is the environments are known to the predictor beforehand, which is
accurate tuning of RIS phases. To this extent, a vast majority an unfeasible assumption.
of the existing literature on RIS-assisted communication relies The main contribution of this paper is a novel distributed
on the use of Channel State Information (CSI) to train Machine IRM-based solution to the RIS configuration problem. We cast
Learning (ML) models that predict the optimal RIS configura- the problem of RIS phase control as a federated learning prob-
tion [3]–[5]. These methods seek either a centralized-controller lem with multiple RISs controllers defined over heterogeneous
driven approach, or a distributed multi-agent optimization environments, using a game-theoretic reformulation of IRM,
technique, such as Federated Learning (FL). Other works such referred to as Federated Learning Games (FL G AMES) [13],
as [6], [7] exploit the users’ locations to employ a location- [14]. The solution of this problem is proven to be the Nash
based passive RIS beamforming. Either way, their main focus Equilibrium of a strategic game where each RIS updates its
is to draw on the statistical correlations of the observed configuration predictor by minimizing its local loss function.
data, while overlooking the impacts of heterogeneous system This game is indexed by a representation learner that is
Representation
Learner
Predictor by denoting φr and ϑr as the azimuth and elevation angles-
RIS 1 RIS 2 of-departure (AoD) from the RIS respectively, the channel is
h modeled as
h √
g = αr aN (φr , ϑr ) , (1)
Transmitter
g g Transmitter

where αr represents the path-loss. Additionally, we define the


Blockage
Blockage steering function:
Receiver
Receiver
Receiver Environment 2  T
Environment 1 2πj 2πj
aN (φ, ϑ) = e λ ∆1 (φ,ϑ) , · · · , e λ ∆N (φ,ϑ) , (2)
Fig. 1: System model illustrating different RIS-assisted com-
munication scenarios. Each RIS is conceived differently from and a set of operators for n = 1, . . . , N :
other RISs, and serves differently distributed receivers.
∆n (φ, ϑ) = iN (n)dx cos(ϑ) sin(φ) + jN (n)dy sin(ϑ), (3)
 
n−1
iN (n) = (n − 1) mod Nx , jN (n) = , (4)
shared among the RISs to extract a causal representation from Nx
the CSI data. The representation learner and predictors are
trained in a distributed and supervised learning manner. The where λ is the wavelength, mod and ⌊·⌋ denote the modulus
numerical validations yield that the proposed design improves and floor operators. On the other hand, the channel h ∈ CN
the accuracy of the predictor tested in OoD settings by 15% between the Tx and the RIS will have both LoS and non line-
compared to state-of-the-art RIS designs. of-sight (NLoS) components. Therefore, we model h using
The remainder of this paper is organized as follows. In Rician fading with spatial correlation, since the RIS elements
Section II, we describe the system model and conventional are closely distanced. Accordingly, we have:
approaches to the RIS configuration problem using F EDAVG. r r !
√ κ 1 e
The FL G AMES solution that involves extracting causal rep- h = αt h+ h , (5)
1+κ 1+κ
resentations from the data is discussed in Section III. Section
IV presents the simulation results that compare the proposed where αt and κ are the path-loss and the Rician coefficient
algorithm with benchmarks. Concluding remarks are drawn in respectively, h is the LoS factor, and h
e represents the small
Section V. scale fading process in the NLoS component. Further, for the
LoS link, the LoS factor is:
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION
We consider a set of environments R where each environ- h = aN (φt , ϑt ) , (6)
ment r ∈ R consists of a RIS-assisted downlink communica- where (φt , ϑt ) are the angles-of-arrival (AoA) to the RIS. We
tion between a transmitter (Tx) and a receiver (Rx) as shown model the NLoS link as h e ∼ CN (0N , R), where R is a
in Fig. 1. Both the Tx and the Rx are equipped with a single covariance matrix that captures the spatial correlation among
antenna each, and we assume that the direct link between them the channels of the RIS elements. In the case of isotropic
is blocked in which, the channel of direct link is dominated by scattering in front of the RIS, a closed-form expression for R
the reflected channel. The RIS in environment r is composed is [15, Proposition 1]:
by N r = Nxr Nyr reflective elements where Nxr and Nyr are  
the number of horizontal and vertical reflective elements, 2 |um − un |
[R]m,n = sinc m, n = 1, . . . , N, (7)
respectively. Additionally, the inter-element distances over λ
horizontal and vertical axes are characterized by drx and dry . T
Each RIS element applies a phase shift on its incident signal where un = [iN (n) dx , jN (n) dy ] represents the locations
and the reflected signals are aggregated at the Rx. Note that the of the nth element with iN and jN being the horizontal and
location of the Tx is fixed while the location of Rx is arbitrary, vertical indices given in (4), and sinc(·) is the normalized
which is sampled by a predefined distribution. The choices of sampling function.
the Rx distribution along with the parameters (Nxr , Nyr , drx , dry ) B. Downlink Rate Maximization
collectively define the environment r. We assume that these
environments are completely separate, i.e. each Rx receives At every transmission slot, in each environment r ∈ R, the
only the signal reflected by the RIS in its corresponding RIS selects its phases in order to enhance the downlink rate
r T
environment. at the Rx. Let θ r = [θ1r , . . . , θN ] denote the phase decisions
at the RIS. Thus, the received signal at the Rx is:
A. Channel Model
yr = hH

r Θr gr sr + zr , (8)
For the notation simplicity, we have omitted the notion of
r r 
environment during the discussion within this subsection. Let where Θr = diag ejθ1 , . . . , ejθN is the RIS reflection
g ∈ CN be the channel between the RIS and the Rx, which matrix, sr is the transmitted
 signal satisfying the
 power budget
is dominated by its line-of-sight (LoS) component. Hence, constraint E |sr |2 = p, and zr ∼ CN 0, σ 2 is the additive
noise with power σ 2 . In this view, the objective of downlink our attention to algorithms that learn causal representations
rate maximization can be cast as follows: that are invariant across different agents, which improves the

|hH 2
 model’s OoD generalization across many environments.
r Θr gr | p
maximize log2 1 + , (9) In this direction, IRM [12] jointly trains an extraction
θr ∈ C σ2
function fϕ and a predictor fw across training environments
where C is the set of feasible RIS configurations. R in such a way that fw ◦ fϕ generalizes well in unseen envi-
In order to solve (9), a perfect knowledge of both channels ronments Rall ⊃ R. The main idea is to build a parameterized
h and g is assumed. Even under perfect CSI, determining feature extractor fϕ that reveals the causal representations in
the optimal set of phase shifts θ r⋆ requires a heuristic search the samples, so as to perform causal inference by optimizing
due to the notion of configuration classes C. Such solutions fw . Thus, given an extraction function, the predictor fw is
cannot be practically adopted since they are not scalable with the one that is simultaneously optimal across all training
the number of RIS elements. As a remedy, we resort to ML environments R. Formally, this boils down to solving the
in order to devise a data-driven solution. following problem:
In this context, consider that the RIS in environment
|R|
r ∈ R (later referred to as agent r) has a dataset Dr = 1 X
xrj , crj j = 1, . . . , Dr containing Dr samples of ob- minimize ℓr (fw ◦ fϕ )
ϕ, w |R| r=1 (12)
served CSI xrj = hrj , gjr that are labeled by crj corresponding
to the optimal RIS phase shifts θ r⋆ solving (9). We then seek subject to w ∈ arg min ℓr (fw′ ◦ fϕ ) ∀ r ∈ R.
w′
to construct a mapping function fw (·) parameterized by w,
that solves: Note that IRM is formulated as a single agent optimization
problem, and assumes that training environments are known
|R| Dr
1 X 1 X to the agent beforehand. Extending IRM to the distributed
ℓ crj , fw (xrj ) ,

minimize (10)
w |R| r=1 Dr j=1 setting can be done using game theory (IRM G AMES [13]).
Therein, different agents, each equipped with its own pre-
where ℓ(· , ·) is the loss function in terms of phase prediction. dictor wr , cooperate to train an ensemble model: wav =
In order to optimize the model parameter w in the ERM P Dr
P
r∈R wr . In contrast to (12) that designs a unique
n∈R Dn
formulation in (10), all agents are required to share their predictor across training environments, the aggregate model
datatsets Dr with a central server. Due to privacy concerns wav satisfies:
and communication constraints, (10) is recast as a FL problem |R|
by minimizing a global loss function as follows: 1 X
minimize ℓr (fwav ◦ fϕ )
|R|
av
ϕ, w |R| r=1
1 X  fw′ + P fwn  (13)
minimize ℓr (fw ) , (11) r
w |R| r=1 subject to wr ∈ arg min ℓr
n∈R\{r}
◦ fϕ
|R|
PDr wr′
where ℓr (fw ) = D1r j=1

ℓ crj , fw (xrj ) is the local loss ∀r ∈ R.
function of agent r. One of the most popular approaches in
FL to solve (11) is the F EDAVG algorithm [8]. The set of constraints in (13) represents the Nash Equilibrium
However, the formulation in (11) assumes that all agents in of a game where the players/environments r ∈ R select actions
R have an equal impact on training the global model w. This wk in order to minimize their cost functions ℓr (fwav ◦ fϕ ).
falls under the strong assumption of uniform and homogeneous Since there are no algorithms that guarantee reaching the Nash
data distribution across agents. Under data heterogeneity, drifts point of the aforementioned continuous non-zero sum game,
in the agents’ local updates with respect to the aggregated Best Response Dynamics (BRD) is used due to its simplicity.
model might occur, since the local optima do not necessarily It is worth mentioning that [13] also showed that the feature
coincide with the global optima. Thus, the obtained model suf- extraction map can be fixed to identity fϕ = I, and the overall
fers from the instability in convergence, and fails to generalize prediction network fw ◦ fϕ will be recovered by the predictor
to OoD samples. To obviate this issue, we resort to Invariant fw solving (13). This version of the algorithm is called F-
Risk Minimization (IRM) [12] and its FL variant dubbed FL IRM G AMES, while the one where the extractor and predictor
G AMES [14]. are learned separately is referred to as V-IRM G AMES.
IRM G AMES is a very suitable fit for FL since it allows
III. FL G AMES FOR P HASE O PTIMIZATION agents distributed in different environments to train a collective
The key deficiency of using F EDAVG on limited datasets IRM-based model. Hence, the authors in [14] proposed minor
distributed across agents is that its trained predictor heavily modifications in order to adapt it to the FL setup. First,
relies on statistical correlations among observations. These parallelized BRD is used to mitigate the sequential dependen-
correlations are specious since they depend on the environment cies and accelerate convergence. Moreover, V-IRM G AMES
from which they were sampled. Thus, overfitting to these requires an extra round of optimization for the extraction
correlations prevents F EDAVG from training a predictor that is function parameter ϕ, which delays its convergence. Thus,
robust in unseen environments. To overcome this issue, we turn the stochastic gradient descent (SGD) over ϕ is replaced by
a Gradient Descent (GD) update that takes larger steps in the Algorithm 1 FL G AMES for RIS
direction of the global optimum. In contrast to [14], which Inputs: Set of RISs: R, Datasets: Dr , Learning rates: ηw , ηϕ ,
considered highly correlated datasets, the data in our setting Number of rounds: rounds, Mini-batch size: m
shows negligible oscillations in parameter updates under BRD, Outputs: fϕ , fwav
in which, we do not adopt buffers in training. These buffers can Server executes:
be used by each agent to store the historically played actions of Initialize ϕ and {wr }r∈R
its opponents. Then, an agent responds to a uniform distribu- Broadcast ϕ and {wr }r∈R to all agents r ∈ R
tion over these past actions. This smoothens the oscillations of Set round ← 1
BRD caused by the local correlations in the datasets. Finally, while round ≤ rounds do
the detailed steps of the V-FL G AMES algorithm that trains for each agent r ∈ R parallel do
both a representation learner and a predictor are presented in Compute ∇ϕr ← ∇ϕ ℓr (fwav ◦ fϕ ; Dr )
Algorithm 1. Communicate ∇ϕr to the server
end for
IV. S IMULATION R ESULTS Update extractor:P
A. Simulation Settings ϕ ← ϕ − ηϕ r∈R P Dr Dn ∇ϕk
n∈R

For our experiments, we consider three different environ- Broadcast ϕ to all agents r ∈ R
ments R. In all three environments, the Tx is located at the for each agent r ∈ R parallel do
coordinate (0, 35, 3) and the RIS comprising N = 10 × 10 Sample mini-batch Br of size m from Dr
reflective elements is at (10, 20, 1), with the coordinates given Update predictor:
in meters within a Cartesian system. The Rx is located in wr ← wr − ηw ∇wr ℓr (fwav ◦ fϕ ; Br )
an annular region around the RIS with inner and outer radii Communicate wr to the server
Rmin = 1 m and Rmax = 5 m. The Rician factor κ is set to 5, end for
Average Predictor: wav ← r∈R P Dr Dn wk
P
N dx dy
and the pathloss coefficients are calculated by αt = 4πd 2 and n∈R
Nd d x y
t
Broadcast {wr }r∈R to all agents r ∈ R
αr = 4πd 2 . To simplify the exhaustive search for optimal
r round ← round + 1
phases, we assume C contains two configurations classes,
T T end while
namely θ1 = [0, 0, . . . , 0] and θ2 = [0, π, 0, π, . . . , 0, π] .
In environment 1, the RIS elements are distanced by dx =
Environment 1 Environment 2
dy = 0.5λ. Therein, the Rx is uniformly placed around the RIS 2 2
Rmax-Rmin 1 Rmax-Rmin 1
with a tendency to be deployed closer to the RIS following 2p 2p
the distributions illustrated under Environment 1 in Fig. 2. In pr pq pr pq
environment 2, the RIS is characterized by dx = dy = 0.25λ. 0 0 0
0 2p
Rmin Rmax 0 2p Rmin Rmax 0
In contrast to environment 1, here, the Rx is higher likely r q r q
to be placed far from the RIS (see the distributions under Environment 3
2
Environment 2 in Fig. 2). The RIS in environment 3 has dx = Rmax-Rmin 1
p
dy = 0.4λ. Therein, the Rx’s distance to the RIS is uniform pr pq
but the angle distribution is concentrated in one direction as
0 0
illustrated in Fig. 2 under Environment 3. Note that the data Rmin r Rmax 0
q
2p
from environments 1 and 2 are used to compose the training
data while environment 3 is used only for testing. Fig. 2: Distributions of the receiver’s position (distance r and
In this work, we leverage V-FL G AMES that exhibits a angle θ from the RIS) in different environments.
superior performance than F-FL G AMES (where fϕ = I). On Representation Learner Predictor
the other hand, the authors in [11] showed by empirical 128
64 64
simulations, due to the fact that RIS channels have strong LoS
h 32 32
components, that one can select fixed causal representations 2
based on the channels in (1) and (5). The causal representa-
RIS
tions in this case are the AoA and AoD at the RIS, (φt , ϑt ) configuration

and (φr , ϑr ), and the relative distances RIS-Tx dt and RIS-Rx softmax

dr . This benchmark variant of FL G AMES where fϕ is fixed g tanh ReLU


tanh ReLU
to (φt , ϑt , φr , ϑr , dt , dr ) is called F-FL G AMES in this paper, tanh
and is not to be confused with F-FL G AMES in [14], where
fϕ = I. Fig. 3: Structure of the neural networks used for the repre-
For training in F EDAVG and V-FL G AMES, we collect sentation learner and the predictor: circles represent activation
Dr = 1500 CSI samples xrj = hrj , gjr from environments functions and trapezoids correspond to the weights and biases.
1 and 2, that are decoupled over real and imaginary parts. The activation function type and the output dimension are
This data is scaled in such a way that the normalized mean shown at the bottom and top of each layer.
100 FedAvg 82.8%
2500 F-FL Games

Communications Rounds to achive the NE


90 V-FL Games
90
2000
80
80

Test Accuracy
1500 83.3%
70 70 79.1%
80.4% 66.7%
60 68.7%
60 FedAvg - Same environment 1000
F-FLGames - Same environment
50 V-FLGames - Same environment
50 FedAvg - Different environment 500
0 250 500 F-FLGames - Different environment
V-FLGames - Different environment
40 0
0 500 1000 1500 2000 2500 3000 4 Agents 8 Agents
Communication Rounds Agents per environment

(a) Test accuracy convergence for all algorithms compared (e) Communication rounds needed for convergence with
in the same or in a different environment. multiple agents per environment. The number on top of
the bar is the corresponding test accuracy.

85
0.55
80
80

Spectral Efficiency [bits/s/Hz]


0.50
75
75
0.45
Test Accuracy

Test Accuracy

70

65 0.40
70
60 0.35
Best V-FLGames
55 F-FL Games 65 F-FL Games F-FLGames Random
0.30
V-FL Games V-FL Games FedAvg
50 FedAvg FedAvg
0.25
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2 3 4 5 7 10 12 15 100 250 500 750 1000 1250
Weight E Local Steps Samples per Environment

(b) Impact of the weight αe on the (c) Impact of number of local itera- (d) Impact of the number of samples per envi-
testing accuracy. tions on the testing accuracy. ronment on the achievable spectral efficiency.
Fig. 4: Performance comparison of different algorithms in OoD testing datasets.

is zero and the normalized variance is one. We also record implying they do not overfit to the statistical correlations in
the parameters (φt , ϑt , φr , ϑr , dt , dr ) to train F-FL G AMES, the channels. It is also interesting to observe the convergence
which are scaled using a minmax scaler that normalizes rate of the different methods. We notice that V-FL G AMES
the data to the interval [−1, 1] by dividing by the absolute converges faster than F-FL G AMES, requiring around 500
maximum. Unless stated otherwise, we use 1000 samples communication epochs in both cases.
collected from environment 3 for testing. The design of the Fig. 4(b) demonstrates the impact of mixing data from
neural networks of the extractor and the predictor of V-FL different environments on model training. Here, we keep the
G AMES is based on the multi-layer perceptrons architecture, total number samples to be D1 + D2 = 3000 and vary the
and is shown in Fig. 3. Note that F EDAVG and F-FL G AMES amount of data from the two environments, in which, αe = D D2
1

only use the predictor part. The considered loss function is represents the fraction of samples of environment 1 compared
the cross-entropy. The mini-batch size used for training the to environment 2. We first notice that all algorithms reach
predictor is m = 32, and the learning rates are fixed at their peak performance with balanced datasets, i.e. αe = 0.5.
ηϕ = 5 × 10−4 and ηw = 2 × 10−3 . In the figures, lines When the datatsets are biased towards one of the environments,
correspond to the simulation results that are averaged over five F EDAVG loses 15% of its performance. On the other hand,
runs while the their respective standard deviations are shown F-FL G AMES and V-FL G AMES maintain a steady accuracy
as shaded areas. with a slight degradation of about 3–4%.
Inspired by F EDAVG’s flexibility in allowing more local
B. Discussion computations at each agent before sharing their models, we
We first plot the evolution of the testing accuracy of all modify our proposed algorithm to study the effect of the
algorithms in Fig. 4(a). Within the same environment (En- number of local iterations on the model accuracy. Insofar, each
vironments 1 and 2), F EDAVG and V-FL G AMES perform agent performs a few SGD updates locally prior to model
similarly with an accuracy of 90%, while F-FL G AMES gives sharing. Fig. 4(c) shows the impact of the number of local
an accuracy of 87%. When testing in a different environment steps on the testing accuracy. It can be noted that the accuracy
(Environment 3), F EDAVG’s accuracy drops to 68%, high- drops when each RIS performs more local computations
lighting its lack of robustness. In this case, F-FL G AMES and when using FL G AMES due to the fact that the models are
V-FL G AMES yield slightly lower accuracies of 85% and 80%, overfitting to the local datasets. However, the performance of
F-FL G AMES is more consistent with the local steps, losing fashion, and the results are compared with the environment-
only 2% of accuracy with seven local iterations, while V-FL unaware F EDAVG and an IRM-based predictor. The numerical
G AMES loses around 8% of accuracy with 15 local updates. results show that a phase predictor trained with the geometric
The reason for this behavior is that by letting each agent properties of the environments demonstrated a better perfor-
run over more samples from its local dataset, the testing mance than a representation learner followed by a predictor.
accuracy at equilibrium decreases, since the played strategies Moreover, the extractor-predictor network exhibits faster train-
do not account for the opponents’ actions. On the other hand, ing convergence when using more RISs. The extensions for
the accuracy of F EDAVG slightly increases with more local multiple users and multiple antennas at Tx and Rx are left for
iterations, but still performs poorly. future works.
The effect of the dataset size Dr per agent on the achievable
R EFERENCES
spectral efficiency is illustrated in Fig. 4(d). Note that all
agents use an equal amount of samples, i.e., αe = 0.5 is held. [1] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and
C. Yuen, “Reconfigurable intelligent surfaces for energy efficiency in
For the comparison, we additionally present the optimal rate wireless communication,” IEEE transactions on wireless communica-
given by the best configurations (indicated by Best) and the tions, vol. 18, no. 8, pp. 4157–4170, 2019.
rates given by random phase decision making (indicated by [2] M. D. Renzo, M. Debbah, D.-T. Phan-Huy, A. Zappone, M.-S. Alouini,
C. Yuen, V. Sciancalepore, G. C. Alexandropoulos, J. Hoydis et al.,
Random). All algorithms reach their best performance with “Smart radio environments empowered by reconfigurable ai meta-
the highest number of samples Dr = 1250 with about 21%, surfaces: An idea whose time has come,” EURASIP Journal on Wireless
14% and 32% losses compared to Best rates in V-FL G AMES, Communications and Networking, vol. 2019, no. 1, pp. 1–20, 2019.
[3] Ö. Özdoğan and E. Björnson, “Deep learning-based phase reconfigura-
F-FL G AMES, and F EDAVG, respectively. The advantage of tion for intelligent reflecting surfaces,” in 2020 54th Asilomar Confer-
learning invariant causal representations with minimal amount ence on Signals, Systems, and Computers. IEEE, 2020, pp. 707–711.
of data is highlighted when Dr ≤ 750. F EDAVG looses its [4] K. Stylianopoulos, N. Shlezinger, P. Del Hougne, and G. C. Alexan-
dropoulos, “Deep-learning-assisted configuration of reconfigurable in-
performance rapidly. Even with 100 samples per environment, telligent surfaces in dynamic rich-scattering environments,” in ICASSP
FL G AMES algorithms lose only 6% of their performance, 2022-2022 IEEE International Conference on Acoustics, Speech and
while F EDAVG incurs more than 15% of its accuracy. F E - Signal Processing (ICASSP). IEEE, 2022, pp. 8822–8826.
[5] G. C. Alexandropoulos, K. Stylianopoulos, C. Huang, C. Yuen, M. Ben-
DAVG requires around 750 samples per agent to reach its nis, and M. Debbah, “Pervasive machine learning for smart radio en-
best performance, that is more than 10% less than that given vironments enabled by reconfigurable intelligent surfaces,” Proceedings
by V-FL G AMES, underscoring its weakness in OoD settings. of the IEEE, vol. 110, no. 9, pp. 1494–1525, 2022.
[6] J. Park, S. Samarakoon, H. Shiri, M. K. Abdel-Aziz, T. Nishio,
Additionally, with 1250 samples per environment, the error A. Elgabli, and M. Bennis, “Extreme ultra-reliable and low-latency
variance of V-FL G AMES and F-FL G AMES is 73% and 98% communication,” Nature Electronics, vol. 5, no. 3, pp. 133–141, 2022.
less than that of F EDAVG. [7] G. C. Alexandropoulos, S. Samarakoon, M. Bennis, and M. Debbah,
“Phase configuration learning in wireless networks with multiple recon-
Finally, we vary the number of agents per environment figurable intelligent surfaces,” in 2020 IEEE Globecom Workshops (GC
as shown in Fig. 4(e). For this experiment, 1500 samples Wkshps. IEEE, 2020, pp. 1–6.
[8] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
from each environment are shared among all agents, so more “Communication-efficient learning of deep networks from decentralized
agents having less data are involved. The achieved testing data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–
accuracies of the FL G AMES algorithms are still superior than 1282.
[9] C. B. Issaid, S. Samarakoon, M. Bennis, and H. V. Poor, “Federated
the F EDAVG benchmark. Surprisingly, doubling the number distributionally robust optimization for phase configuration of riss,” in
of collaborating RISs from 8 to 16 induces an 82% increase 2021 IEEE Global Communications Conference (GLOBECOM). IEEE,
in the number of training epochs for convergence in F-FL 2021, pp. 1–6.
[10] Y. Deng, M. M. Kamani, and M. Mahdavi, “Distributionally robust fed-
G AMES. The same does not hold for V-FL G AMES that suffers erated averaging,” Advances in neural information processing systems,
from an 8% increase, while losing 3-4% in accuracy compared vol. 33, pp. 15 111–15 122, 2020.
to F-FL G AMES. This implies that, with more agents owning [11] S. Samarakoon, J. Park, and M. Bennis, “Robust reconfigurable intel-
ligent surfaces via invariant risk and causal representations,” in 2021
fewer data, the training of a causal extractor and a predictor IEEE 22nd International Workshop on Signal Processing Advances in
converges faster than training of only a predictor. Wireless Communications (SPAWC), 2021, pp. 301–305.
[12] M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant risk
minimization,” arXiv preprint arXiv:1907.02893, 2019.
V. C ONCLUSIONS [13] K. Ahuja, K. Shanmugam, K. Varshney, and A. Dhurandhar, “Invariant
risk minimization games,” in International Conference on Machine
This paper proposes a distributed phase configuration con- Learning. PMLR, 2020, pp. 145–155.
trol for RIS-assisted communication systems. The rate max- [14] S. Gupta, K. Ahuja, M. Havaei, N. Chatterjee, and Y. Bengio, “Fl games:
A federated learning framework for distribution shifts,” arXiv preprint
imization problem is formulated using federated IRM as arXiv:2205.11101, 2022.
opposed to a heterogeneity-unaware ERM approach. Our novel [15] E. Björnson and L. Sanguinetti, “Rayleigh fading modeling and channel
robust RIS phase-shifts controller leverages the underlying hardening for reconfigurable intelligent surfaces,” IEEE Wireless Com-
munications Letters, vol. 10, no. 4, pp. 830–834, 2020.
causal representations of the data that are invariant over dif-
ferent environments. A neural network based feature extractor
first uncovers the causal structure of the CSI data, then feeds it
to another neural network based configuration predictor. Both
neural networks are trained in a distributed supervised learning

You might also like