You are on page 1of 17

Digital Twin-Enabled Domain Adaptation for

Zero-Touch UAV Networks: Survey and Challenges


Maxwell McManus1 , Yuqing Cui1 , Josh (Zhaoxi) Zhang1 , Jiangqi Hu1 , Sabarish Krishna Moorthy1 ,
Zhangyu Guan1 , Nicholas Mastronarde1 , Elizabeth Serena Bentley2 , Michael Medley2
1 Deptartment of Electrical Engineering, University at Buffalo, Buffalo, NY 14260, USA
2 Air Force Research Laboratory (AFRL), Rome, NY 13440, USA

Email: {memcmanu, yuqingcu, zhaoxizh, sk382, jiangqih, guan, nmastron}@buffalo.edu,


{elizabeth.bentley.3, michael.medley}@us.af.mil
arXiv:2301.03359v1 [cs.NI] 31 Dec 2022

Abstract—In existing wireless networks, the control programs and Beyond network capabilities, UAVs can provide additional
have been designed manually and for certain predefined sce- computational resources for offloading and support in mobile
narios. This process is complicated and error-prone, and the edge computing (MEC) networks [4]. UAV swarms are also
resulting control programs are not resilient to disruptive changes.
Data-driven control based on Artificial Intelligence and Machine expected to enable 5G massive MIMO (MMIMO), serving as
Learning (AI/ML) has been envisioned as a key technique to dynamic relays to enable high-throughput communications be-
automate the modeling, optimization and control of complex tween MMIMO base stations and ground users and minimize
wireless systems. However, existing AI/ML techniques rely on inter-cell interference [5], [6]. Additionally, future network
sufficient well-labeled data and may suffer from slow convergence architectures, i.e., 6G, are projected to support hybrid aerial-
and poor generalizability. In this article, focusing on digital twin-
assisted wireless unmanned aerial vehicle (UAV) systems, we ground communications, in which terrestrial networks, aerial
provide a survey of emerging techniques that can enable fast- UAV networks, and satellite communications are linked hier-
converging data-driven control of wireless systems with enhanced archically to further enhance QoS and network flexibility [7].
generalization capability to new environments. These include However, while UAVs can certainly enable a new range
SLAM-based sensing and network softwarization for digital of applications, the challenges are multi-fold. First, the man-
twin construction, robust reinforcement learning and system
identification for domain adaptation, and testing facility sharing agement of UAV-assisted networks needs to consider the high
and federation. The corresponding research opportunities are mobility of all connected nodes, and this requires new resource
also discussed. orchestration and algorithm designs to anticipate the dynamics
Index Terms—UAV, Digital Twin, Domain Adaptation, Network and requirements of each flying node and hybrid link in
Softwarization, AI/ML. addition to those dynamics inherent to the networking envi-
ronment [8]. The situation will get even worse when jointly
considering the newly emerging sophisticated communication
I. I NTRODUCTION techniques, such as heterogeneous multi-band communications
Unmanned aerial vehicles (UAVs) have been envisioned as [9], device-to-device communication links [10], integrated
a key enabling technology for a wide set of new applications access and backhaul (IAB) 5G networks [11], non-orthogonal
because of their unique characteristics such as fast deployment, multiple access (NOMA) [12], [13] and spectrum coexistence
high mobility, on-board processing capabilities, and reduced [14], [15]. Moreover, in the current practice of wireless engi-
size. This has allowed significant progress in foundational neering, the networking environments are usually assumed to
research towards UAV-assisted communication networks, e.g., be known at design time, and the resulting control programs
swarm UAV networks. Specifically, the high mobility of UAVs may fail when encountering unforeseen conditions. Traditional
can be leveraged to enable dynamic network area coverage manual network management has hindered the adoption of new
and maximize service capacity at mobile ground nodes [1]. techniques and the evolution of wireless networks, motivating
Furthermore, UAV swarms can serve as MIMO-enabled self- a new paradigm that can enable zero-touch management of
organizing flying hotspots for terrestrial ad-hoc networks to UAV-enabled networks, including planning, design and de-
improve spectral efficiency [2]. In IoT networks, UAVs can ployment, service delivery, resource management, and end-
be leveraged as distributed relay nodes to expand coverage to-end optimization [16].
area and improve quality of service (QoS) [3]. To enable 5G Data-driven Approaches. Data-driven modeling and
decision-making based on Artificial Intelligence and Machine
ACKNOWLEDGMENT OF SUPPORT AND DISCLAIMER: (a) Contrac- Learning (AI/ML) are envisioned to be key enablers of zero-
tor acknowledges Government’s support in the publication of this paper. This touch wireless network management. In recent years, data-
material is based upon work funded in part by AFRL under AFRL Contract
No. #FA8750-20-C-1021 and #FA8750-21-F-1012 and in part by the NSF driven approaches based on AI/ML have shown great potential
under Grant SWIFT-2229563. (b) Any opinions, findings and conclusions or for automating the modeling and control of complicated wire-
recommendations expressed in this material are those of the author(s) and do less systems. Examples of recent efforts include deep learning-
not necessarily reflect the views of AFRL.
Distribution A. Approved for public release: Distribution unlimited AFRL- based edge computing for Internet of Things (IoT) [17], multi-
2022-5944 on 16 Dec 2022. label classification for user association in mm-wave networks
DT Overview

Physical Domain Twin Domain


 Data generation and collection  Contains accurate model of physical entity
 Environmental interaction  Can be centralized or deployed on edge servers
 Parameterized control  Accelerates AI using both real and synthetic data

Wireless Network Data Acquisition Modeling and Softwarization


E.g.: Aerial-Ground  Location information
 Hybrid simulation
Mobile Network  Network performance
 System virtualization
 RF interference
 Data fusion
 Ground truth data
End Devices  Synthetic sensing
 Dataset construction

Aerial Backhaul
Network
Virtual Networking Scenario

Base Station
Domain Adaptation
Optimization and Learning
 Reality gap measurement
 Training data generation  AI integration
Centralized
Network Control  Domain-agnostic feature extraction  Parametric analysis
Machine Processing/
 System identification  Autonomous control
Learning Analytics

Fig. 1: Top-level overview of a DT-enabled system.

[18], trajectory and passive beamforming design in UAV-RIS used to train AI/ML models to determine the optimal solutions
wireless networks based on a decaying deep Q-network [19], for complex control problems, while the trained models can
and network slicing for industrial IoT based on deep federated be fine-tuned through real-time feedback from the physical
Q-learning [20], among others. Readers are referred to [21]– entity. However, while the great potential of DTs has been
[24] and the references therein for a good survey of the main demonstrated in smart factory, manufacturing and military
results in this field. applications [27]–[29], its adoption in wireless communication
However, the primary challenges with data-driven ap- networks is still in its early phase.
proaches are their slow convergence rates and the limited
generalization capabilities of the learned policies when faced
with new environments. Specifically, the performance of ML In this article, we aim to provide a survey of the main results
(especially deep learning) algorithms highly relies on the of DT-enabled machine learning in UAV-assisted wireless
availability of a sufficient amount of well-labeled contextual networks, and discuss the research challenges and possible
data for model training, leading to slow convergence rates in solutions. In existing literature, there are already a number of
online applications [21], [25]. Additionally, collecting training surveys and tutorials focusing on DT-enabled wireless systems
data can be too time costly and in some cases pose safety risks [31], [34]–[37]. For example, in [31] Minerva et al. discuss the
for hardware or network operators. Alternatively, the models foundational properties, essential characteristics and business
can be trained in an offline manner using data previously values of DTs focusing on IoT application domains such as
collected or generated by simulators [26]. However, the trained digital patient, digital city and cultural heritage. The authors of
models may suffer from poor robustness, i.e., it is hard for [34] discuss DT-enabled 6G from an architectural perspective,
the models to generalize to new environments with different including the key design requirements in decoupling, scalabil-
transition kernels. ity, security and reliability as well as deployment. The enabling
Digital Twin-enabled Data-driven Control. Digital twins technologies for DTs are discussed in [36] for cognizing
(DT) are envisioned as key enablers of fast-convergent and and controlling the physical world, DT modeling, DT data
robust learning for next-generation intelligent cyber-physical management, DT services as well as connections in DTs. In
systems, such as smart factories and manufacturing [27]–[29], [37], Nguyen et al. identify the potential benefits of DT for
construction, bio-engineering and automotive [30], [31], as rolling out 5G networks, including interactive 5G emulation,
well as wireless communication networks [32], [33]. With 5G radio and channel emulation, and continuous validation
high-fidelity models in the virtual DT environment, the cor- and optimization. The application of AI/ML techniques in
responding physical entity can be reconfigured, simulated and wireless network modeling and control has also attracted
tested at a fraction of the cost and in a fraction of the significant research attention. Readers are referred to [38]–
time of deployment-based configuration testing. By simulating [41] and references therein for a good survey and tutorial for
the behaviors of the physical entity in real-time, possible the main results in this field. Different from the above surveys
trajectories of a physical entity’s life-cycle can be generated and tutorials, in this article we discuss the challenges and
using physics-based simulation in order to predict events and enabling techniques for fast-convergent and robust learning
conduct root-cause diagnosis. Further, the resulting data can be in DT-assisted wireless UAV systems.

2
II. D IGITAL T WINS FOR W IRELESS S YSTEMS : A P RIMER in observable performance and behaviors between physical
domain entities and their virtual counterparts. This discrep-
Digital twins were first introduced in the NASA Apollo pro- ancy is typically caused by generalizations of unpredictable
gram as a “multi-physics, multi-scale probabilistic simulation” real-world phenomena present in the simulation. Experience
of an object, system, or process in the physical world, which collected by agents, especially in dynamic or time-varying
uses physical parameters, historical data, and sensor updates to environments, may only be valid temporarily, requiring con-
provide an accurate virtual “mirror” of the target system [42]. tinuous computation and re-optimization. Virtualization is a
As depicted in Fig. 1, a DT system generally consists of key technique for improving the flexibility and efficiency of
three major components: a physical entity with observable zero-touch control for wireless networking systems [16]. This
behaviors, a logical (or virtual) object that represents the requires dedicated computational resources and infrastructure
physical entity in a simulated environment, and a bidirectional to support synthetic data generation and processing as well
feedback system between the two entities [27], [29], [43], [44]. as bidirectional communication between twin and physical
Considering modern applications, DT systems can monitor and domain systems. We will discuss several methods of target
virtualize dynamically the behaviors of the physical systems at system virtualization and softwarization for wireless networks
run-time and further aid in a zero-touch manner the decision- Section IV.
making in unforeseen situations based on data-driven modeling A detailed example of source domain design for a coordi-
and optimization [45]. nated UAV swarm network is outlined in [47]. This example
In order to provide accurate modeling and control decisions includes a centralized intelligence center collecting periodic
in spite of mathematical generalization, a DT system requires updates of environment state data, and returning control di-
a bidirectional feedback loop capable of translating observed rectives to the deployed hardware for optimal MAC-layer
physical behaviors into a virtual model and vice versa. This configuration. The intelligence center contains a simulation
behavioral translation process is termed domain adaptation. To of the deployed hardware capable of generating synthetic data
broaden the scope of this investigation, we consider a general analogous to physical domain experience, which is in turn
theoretical definition of a DT with three critical elements: the used to train a deep neural network to optimize protocol
physical domain, the twin domain, and domain adaptation. We parameters based on a physical domain scenario. With reliable
discuss each element in the context of wireless networks with communication between the twin (i.e., source) and physical
flying base stations as an example to motivate the application (i.e., the target) domains, offloaded computation can facilitate
of DTs for next-generation wireless network optimization. A accelerated convergence and practical applications, removing
survey of more general DT use cases envisioned to support prohibitive resource constraints on physical domain systems.
6G network capabilities such as high-density deployment con- Domain Adaptation. This is the process by which ex-
figuration and reflective intelligent surface-enabled terahertz perience collected or generated in one domain is translated
communications can be found in [46]. for use in the complementary domain. While the addition of
Physical Domain. The physical domain is also called the resources from the source domain can be incredibly useful,
target domain. This domain encompasses all scenario- and communication between twin and physical domains may not
application-specific aspects of the system, such as basic net- always be reliable. In the case of unreliable communication
work functionalities, mobility controls, physical entities, and between domains, the sim-to-real gap may be increased due to
other features of the deployment environment. In general, data the lack of synchronization between physical systems and their
acquisition is handled in the physical domain and uploaded to virtual counterparts. In such scenarios, learning conducted in
the twin domain in real-time or stored as a dataset for later the twin domain must be robust to the difference between
use, which we will discuss further in Section III. Considering dynamics in the physical domain and generalizations made
the example of UAV-assisted networking, the physical domain in the source domain to enable effective sim-to-real policy
would include UAV hardware, software, and communication transfer. The core focus of domain adaptation is to modify
systems used to realize the aerial base station capabilities, learning algorithms and source domain parameters to over-
all comparable elements of network end-devices, as well as come these challenges, and is envisioned as the key to solv-
environmental and geographical features, such as wind speed, ing open research challenges associated with robustness and
RF interference, and blockages, that constrain the UAVs’ flight performance losses in transfer learning applications inherent
patterns and impact network coverage and performance. to DT-enabled systems [48]. In existing literature, domain
Twin Domain. The twin domain, aka source domain, adaptation for RL applications can be achieved by modifying
encompasses all exogenous elements of the DT system that observations of the source domain [48], simulation parameters
are designed to accelerate optimization of physical domain [49], or the reward function [50] of a well-defined Markov
applications, facilitate machine learning applications without decision process (MDP). In each of these approaches, a twin
impact on real-time system operation, or otherwise improve domain is constructed for rapid and efficient training, with
system performance over what is achievable in a deployment the goal of minimizing interaction with the physical domain
that is solely in the physical domain. In general, this will hence maximizing communications efficiency while maintain-
include virtualization of the target environment, synthetic data ing effective transfer learning performance between domains.
generation for policy convergence, and feedback with a do- We will discuss DT testbed development to experimentally
main adaptation process for effective policy transfer across the evaluate domain adaptation techniques for sim-to-real policy
sim-to-real gap. The sim-to-real gap refers to the discrepancy transfer in wireless networks in Section VI.

3
III. DATA ACQUISITION by emitting light in the form of pulsed laser and calculating
In a DT system, the role of data acquisition is to generate the time-of-flight based on reflections [51]. These points are
and maintain a virtual environment using ground truth data aggregated in the form of a point cloud highlighting key
from the physical domain. In addition to data required to build features in an environment, which is then converted into
the virtual model, timely updates from the physical domain are a representative mesh by a central controller (typically via
necessary to maintain the accuracy of event prediction, trajec- numerous filtering and reconstruction steps).
tory modeling, and control capabilities in the twin domain. Currently, some visual sensor based autonomous vehicles
For a DT-enabled wireless network as outlined in Fig. 1, this use simplified LiDAR sensors to accurately measure the
time-sensitive information can include mobile base station and distance between the vehicle and obstacles, then fuse the
user locations, performance metrics, and changes to protocol LiDAR’s data with other sensors’ data to make a more robust
specification such as modulation or bandwidth. map. Some LiDAR-based autonomous vehicles will use high-
In existing work, especially for physics-based or high- accuracy LiDAR as a primary sensor to generate the ambient
fidelity models, construction of the virtual environment is environment’s map. In addition, most vacuum robots use a
done manually and prior to simulation events based on expert simplified LiDAR system to build the map of the user’s
understanding of the target environment. The majority of house for path planning, collision avoidance, and localization.
works discussed in Sections I and II demonstrate the use of a LiDAR is leveraged in [51] to construct an interactive virtual
virtual environment designed and deployed prior to execution model of an environment in the Unity gaming engine. The
time, and otherwise do not consider an explicit interactive Unity engine was selected to maintain the 3D environment
construction of an environment model. However, especially for model due to its high-quality visualization and integrated
dynamic physical environments such as UAV networks [47], physics capabilities. While gaming engines such as Unity
the deployment environment may not be known ahead of time are typically optimized for rendering and visualization with
and the DT system must be able to generate blockage and user interactivity, they are not generally capable of rendering
boundary rules at execution time. dynamic meshes from data acquired in real-time.
The authors of [44] describe the virtual environment of In general, LiDAR sensors, especially high-fidelity long-
a DT system as a repository of environmental and system range sensors, can be very expensive, ranging from hundreds
signatures. Behavioral or physics-based modeling is of key to tens of thousands of dollars [52]. The sensing range of high-
importance to ensure accurate decision-making based on the accuracy LiDAR systems can reach more than 100 meters,
virtual environment [37], which requires efficient, reliable with an accuracy of 1 mm. However, a high-accuracy LiDAR
collection of high-fidelity environmental data. New methods system is heavy (5 kg+) and expensive ($10000+), while low-
of collecting these signatures automatically are currently being cost, simplified LiDAR systems’ scanning frequencies are too
investigated to accelerate the development and deployment of low. Therefore, LiDAR may not be the best choice for large-
DT systems, especially with the help of robots, UAVs, or other scale aerial mapping, where the weight of sensors must be
technology to enable autonomous mapping and unassisted minimized. The authors of [56] demonstrate the capability of
control. a lightweight (< 1 kg) LiDAR sensor for aerial mapping at
In the following section, we discuss the enabling technolo- an altitude of 10-20 meters.
gies for DT construction and deployment and discuss different Mm-Wave Radar. In addition to optical measurement
methods of environmental data acquisition in this context. methods, the use of millimeter-wave (mm-wave) radar has
attracted attention in recent literature as a method of mapping
a physical environment based on measured backscattering and
A. Enabling Technologies and Techniques time-of-flight of emitted high-frequency RF signals. Specifi-
We identify online environment virtualization using various cally, the use of Y-band (215 GHz) radar for indoor navigation
data acquisition techniques as a key enabling technology for and mapping is demonstrated in [53]. In this work, a portable
real-time and mission-critical applications of DT in unknown mm-wave radar system is assembled using commercial-off-
physical environments. These techniques include LiDAR [51], the-shelf RF components interfaced with a vector network ana-
[52], millimeter-wave radar [53], and SLAM [54], [55], to lyzer to collect range information. The system was mounted on
quickly and efficiently scan an environment and build an a turntable to enable full rotational scanning, and a LabView
interactive virtual model. Once an environment is generated, interface was developed to control the radar position and data
contextual datasets can be generated using ray tracing or other collection. Data was collected by emitting RF signals at 215
simulation methodologies to accelerate optimization tasks as GHz with 1 GHz bandwidth at four different types of polar-
shown in Fig. 1 [25], [47]. However, the automation of data ization - HH, HV, VH, and VV - for performance comparison.
acquisition to enable online, on-the-fly DT construction is of The reflected signal response was measured at 220 GHz over 5
critical importance to enabling DT for zero-touch networking. GHz bandwidth and processed using Hough transform, ghost
Online DT construction in general, to the best of our knowl- image elimination, and false blockage elimination techniques
edge, is a challenge that remains unaddressed in existing DT to extract a 2D model of the local environment, providing up
literature. to 15 cm resolution when scanning in HH polarization. Since
LiDAR. In recent literature, LiDAR has demonstrated excel- this method supports real-time map generation, it is considered
lent promise for generating high-fidelity environmental mod- a viable method for simultaneous localization and mapping of
els. LiDAR systems detect surface points in an environment DT environments.

4
The authors of [57] introduce the M-Cube, an experimental Recent
Place Recognition
ORB Extraction KeyFrame Optimize
software-defined millimeter-wave radio system which is con- Insertion
MapPoints
Culling
Essential
Loop
Fusion
Graph
structed using a low-cost 802.11ad radio and a programmable IMU Integration
Loop Coorection
baseband module. This system can provide full control over Initial Pose Estimation
New Points
Creation
Local
BA Compute Database
Sim3/SE3 Query
MIMO beamforming, providing up to 256 antenna elements Relocalization
Map creation
KeyFrames Local Map

across 8 reconfigurable arrays, and has been experimentally Track


IMU IMU Scale
Map Merging
Initialization Refinement Optimize Essential Graph
validated for both mm-wave (60 GHz) communication at Local Map

Welding BA
up to 325 Mbps as well as mm-wave radar based on AoA New KeyFrame Local KeyFrames Culling
Decision Merge Maps
estimation for object detection at a range of 1 m with 8 cm
Tracking Local Mapping Loop & Map Merging
resolution. This system represents a very interesting enabling
technology that improves accessibility of experimental mm- Map Update Full BA

wave approaches, reducing the overall cost and complexity Full BA

of applications and providing support for new sensing-based


virtualization techniques. Fig. 2: General diagram of ORB-SLAM3 mapping process.
SLAM. SLAM is the method of creating a feature map of
an unknown physical environment while tracking the location
the Microsoft Kinect V2, which leverages time of flight of
of an agent traversing through the environment at the same
emitted light for 3D sensing, similar to LiDAR; and the Asus
time, using monocular, stereo, RGB-D, or other visual sensing
ZenFone AR, which leverages a camera, a motion tracking
methods. SLAM requires sensors to detect the environment’s
camera, and an IR depth sensor to collect environmental
features, track specific features then calculate the shape of
signatures. These three approaches were compared to the ZEB-
the physical environment, the carrier’s movements, and its
REVO handheld LiDAR system. Each sensor was mounted
relative location. While SLAM systems can leverage a variety
on an Intel Aero UAV, which navigated around a facility
of sensor input types, including LiDAR and mm-wave sensors,
controlled by the native autopilot to collect environmental
visual SLAM (V-SLAM) is very popular among different
data. The environmental datasets collected by each sensor were
SLAM techniques because it only requires an ordinary camera
loaded into the Unity game engine for offline visualization,
as the input sensor.
observed in virtual reality using the Oculus Rift headset, and
SLAM is critical to the process of autonomous virtual
evaluated in terms of accuracy to the modeled environment and
environment construction. The authors of [58] propose ORB-
resolution of the selected hardware. It was shown that while
SLAM3, an open-source, extensible visual SLAM framework
the ZEB-REVO LiDAR sensor outperformed the other selected
library that supports monocular, stereo and RGB-D cameras
options, competent modeling performance can still be achieved
for data collection. In general, most SLAM algorithms are
for DT environment virtualization using cheaper visual sensor-
executed following three steps: tracking, local mapping, and
based methods. Additionally, most modern passenger cars use
loop closure. In the tracking phase, points in the environment
visual SLAM for Lane Centering Control (LCC) and Adaptive
are used to generate representative images of the environment
Cruise Control (ACC).
called keyframes. During local mapping, keyframes are in-
serted into a local model of the environment. Finally, the
loop closure process detects and manages redundant points B. Research Opportunities and Challenges
based on existing keyframes, integrates new information with The adoption of these methods for DT construction provides
the global model, and performs bundle adjustment (BA) to the following key research opportunities towards enabling DT
estimate camera trajectory. A more detailed overview of the in the wireless domain.
processes involved in each step of this method is shown in Online DT Construction: The construction of a DT is
Fig. 2. In ORB-SLAM3, the camera’s image input will be broadly defined as the process by which spatial and temporal
fused with acceleration data from an inertial measurement unit data is collected from the physical domain and used to generate
(IMU) as shown in Fig. 2. This can significantly improve the a virtual environment in the twin or source domain. Online DT
mapping accuracy and make the tracking continuous even if construction implies that data collection and virtual environ-
visual tracking is lost. The IMU fusion of V-SLAM, deployed ment construction are parallel complementary processes: as the
in the Tracking and Local Mapping modules in Fig. 2, has environmental data is collected, the virtual model is updated
been shown to surpass other state-of-the-art SLAM methods without delay. While [53] and [55] discuss the potential
on existing datasets collected using stereo and monocular for online environment construction and virtualization, the
cameras. Additionally, this framework library supports data efficiency of simultaneous exploration and virtualization of
collection and processing in real time, which is critical for an environment for use in a DT-enabled wireless simulation
online DT creation and can be leveraged for simultaneous remains an open problem.
virtualization and interaction. Continuous SLAM is a special case of online DT con-
The authors of [52] compare V-SLAM with LiDAR map- struction in which 3-D environmental data is collected con-
ping using low-cost sensors for environmental mapping and tinuously via SLAM to update the virtual model in the twin
mesh generation. The explored sensors include the Intel Re- domain. In general, this process will run in parallel with
alSense ZR300, which leverages stereo IR vision and visual- behavioral or analytical simulations in order to maximize
inertial odometry to perform 3D scanning and localization; spatial virtualization accuracy. An accurate DT simulation

5
LiDAR mm-Wave Monocular Camera Stereo Camera RGBD Camera Monocular Camera-IMU
Range 160m 300m Relative 35m 5m Need Further Research
Accuracy 1.5mm 20mm Relative 20mm 3.7mm Need Further Research
Cost $10000 $600 $40 $75 $200 $40

TABLE I: Mapping metrics for SLAM systems.

relies on the maintenance of the virtual model, and requires large-scale, low-resolution sensing. However, a monocular
constant updates to track or model environmental dynamics camera can only measure the relative distance between objects
in real-time. This is required for intelligence in the source instead of the absolute distance. Additionally, the point cloud
domain to provide timely, adaptive support to agents in the generated by a monocular system is far less dense than other
physical domain. Continuous SLAM poses a unique challenge visual methods. To address these challenges in a UAV-based
within the scope of online DT construction due to the amount monocular-SLAM system, the camera frame can be fused with
of end-device resources, especially computational capacity the UAV’s IMU data to improve monocular mapping accuracy
and link bandwidth, required to simultaneously virtualize an without additional hardware. Furthermore, if the UAVs use
environment and begin behavioral modeling. Furthermore, accurate GPS service, such as RTK differential GPS, the
behavioral modeling, based on mobility models or historical camera frame can also be integrated with absolute coordinates
data, may generalize too much or provide only temporarily to further improve mapping accuracy by integrating collected
valid solutions. Online, continuous generation of an interactive data with geographical information system (GIS) mapping in
model presents a research opportunity which can significantly the DT. However, the application of this approach to enable
improve the state-of-the-art for next-generation wireless net- large-scale environment sensing and virtualization remains an
works in dynamic, non-stationary environments. open research challenge in this area.
Large Scale Sensing: Current approaches to environment
virtualization pose several limitations when considering sensor IV. N ETWORK S OFTWARIZATION AND V IRTUALIZATION
accuracy range, especially above 100 m. The price, range, and
In the context of wireless networking research, the benefits
accuracy of several sensor types are compared in Table I.
of DT have attracted research attention as a key technology
The authors of [52], [59] explore the use of UAVs in ex-
towards enabling highly anticipated intelligent networking
panding sensing range for environment data collection, which
tasks. The use of high-fidelity simulation in DT leveraging
provides clear advantages in terms of observable area and
full environmental modeling for system monitoring and control
flexibility compared to manual measurement or static sensing
is the most widely-discussed implementation observed across
approaches. However, this approach poses several tradeoffs of
several industries [29], [61].
its own: UAVs are limited in battery life, which inherently lim-
its functional range and on-board hardware; LiDAR systems Applications of machine learning, especially reinforcement
capable of collecting data at long ranges without loss of fidelity learning and deep learning applications, require significant
can be prohibitively expensive; and cheaper optical/RGB-D amounts of time and data to generate and employ optimal
cameras suffer at long ranges (typically < 100 m). control parameters for a given scenario. A source domain
containing both simulation and optimization in a centralized
For LiDAR-based SLAM systems, LiDAR tracking is based intelligence center or edge server allows the synthesis of
on time-of-flight measurements, and LiDAR can only measure training data in place of experience which may be otherwise
the distance between the carrier vehicle and landmarks. If challenging or costly to obtain in the physical domain [48].
there is a moving obstacle between the carrier vehicle and This generation of contextual data by a virtual entity, termed
the landmark, LiDAR tracking may be lost. For optical- “synthetic sensing” [44], is considered a key feature of high-
camera-based SLAM systems, camera tracking is based on fidelity DT. Specifically, synthetic sensing has been shown to
angle changes. If the ambient light or viewing angle changes reduce the time cost of dataset generation associated with ML
rapidly, then camera tracking may be lost. In either case, applications to enable intelligent wireless network function-
the relocalization process is time consuming and significantly ality, such as data collection and reliability, computational
increases computational complexity of SLAM systems. capability requirements of end-devices, among others [21].
In order to prevent tracking loss, multi-sensor fusion can The fidelity/accuracy of synthetic data available in a DT is
be leveraged to significantly increases the robustness and directly correlated to the quality and quantity of available
mapping accuracy of the SLAM system. During continuous data for a physical context [62], as well as the capabilities
tracking, the SLAM system could use movement data of the of the DT platform to process this data. Due to the growing
carrier vehicle to correct the motion-caused deviation and prevalence of software-defined networking (SDN) and virtual
improve tracking accuracy. If the tracking is lost, the SLAM network control, virtualization fidelity can vary widely based
system will still be able to keep updating the map with on the requirements of the application and the tools leveraged
movement data acquired from the carrier vehicle GPS data to create a virtual environment [36]. We have identified sev-
or IMU unit. eral state-of-the-art network virtualization and softwarization
We envision one possible solution for large-scale DT con- platforms that have demonstrated promising synthetic sensing
struction is monocular-IMU data fusion. Traditional monocular capabilities to address this challenge, which we will introduce
camera mapping is a low-cost, low-complexity method for later in this section. In addition to accurate network simulation,

6
Platform Fidelity Accessibility Type Physics Interface
NS-3 High Free, Open-source Simulation Single (RF) C++, Python
EMANE Low Free, Open-source Emulation Multi (RF, mobility) C++, Python, XML
InSite High Paid, proprietary Simulation Single (RF) Software GUI
Colosseum High Free, proprietary Emulation Multi (RF, mobility) Linux VM
EXata High Paid, proprietary Emulation Single (RF) Software GUI
NSF Colosseum Architecture
UBSim (DTLow
survey)Free, Open-source Simulation Multi (RF, mobility) Python

TABLE II: Wireless network virtualization platforms.

Traffic Generator (TGEN)


Software Radio
Traffic Network Fabric
Node (SRN)
Software Container
(Guest OS, kernel, etc.) Colosseum
Quadrant … … … … Management
NET Layer
Infrastructure
SRN
SRN SRN
SRN SRN
SRN SRN
SRN • Website
MAC Layer SRN SRN SRN SRN
• Resource
Management
PHY Layer SRN x32 SRN x32 SRN x32 SRN x32 • Network Services
… … … … • Storage

USRP X310 SDR Massive Channel Emulator (MCHEM)

RF Scenario Server FPGA Fabric

Management Network

Fig. 3: Architecture of Colosseum [60]

we identify several frameworks that have been developed for network emulation as well, improving accuracy of modeled
establishing accurate virtual models of physical scenarios. networks.
While supporting experimental literature using these tools for While NS-3 can provide high-accuracy network simulation,
DT development in the wireless domain remains a key open this tool does not support explicit modeling of a physical
challenge in this area, these tools offer support for the future networking environment. Additionally, NS-3 provides very
value of DT for enabling ML applications in the wireless limited native infrastructure for simulation visualization and
domain. data processing and analysis, which necessitates the use of
3rd-party software for these tasks. While its accuracy is still
A. State of the Art limited by generalizations inherent to model-based simulation
We have identified several network simulators that have im- [69], this tool is expected to play a significant role in the
mediate potential for advancing research into DT for the wire- development of DT-enabled systems in the future due to its
less domain, including NS-3 [63], Colosseum [64], EMANE high fidelity, accessibility, and large community support.
[65], and Remcom Wireless InSite [66], among others. Refer Colosseum: Colosseum [64] is the world’s largest network
to Table II for a comparison of several key aspects of these emulator, comprised of 256 software-defined radios (SDR) to
platforms in the context of DT system design. provide a wide variety of emulated RF propagation scenarios.
NS-3: NS-3 [63] is a popular open-access, open-source net- The architecture of Colosseum is shown in Fig. 3 [60]. Each of
work modeling tool in both industry and academia, providing the 256 SDR nodes (SRN) is made up of a software container
high-fidelity wireless network simulation. NS-3 simulation is that specifies physical, link, and network layer protocol, as
built around three major elements: nodes, which serve as basic well as a USRP X310 SDR which transmits data generated in
computing device abstractions on which to run applications the traffic generator (TGEN) based on this protocol stack. The
and install network devices; packets, which provide data flow generated signals are transmitted through an FPGA fabric in
between applications; and channels, which are used to connect the massive channel emulator (MCHEM), which is configured
nodes via installed network devices [63]. All elements in a to apply channel effects by emulating predefined scenarios
simulation are constructed from behavioral models written stored on the RF scenario server. The platform is accessed,
in C++ based on explicit protocol definition at each layer managed, and maintained using a management network con-
of the network stack [67], with simulation control provided nected to all constituent elements.
via C++ and Python APIs. Simulated network traffic can be Similar to NS-3, it is considered an open-access tool, and
monitored and analyzed using standard network observation provides support for a variety of different protocols including
software, such as Wireshark [68]. This tool can be directly 4G/5G and IoT-type protocols with spectrum sharing. While
interfaced with radio hardware such as USRP to perform there are many different predefined physical networking sce-

7
narios available, current support for custom scenarios is very the core capabilities of Twin Builder rely on two key ele-
limited, reducing its flexibility in the context of DT-enabled ments: a multi-domain systems modeler, which can simulate
network deployments. We identify the need for an expansion interactions between synchronous modeled systems based on
of this framework to include user-definable networking sce- model libraries such as mechanical, hydraulic, and electronic
narios with complete control over both network topology and components, logic blocks, and characterized manufacturer’s
communications protocol and agent mobility and behavioral components; and a multi-domain systems solver, which uses
modeling. existing physics libraries for hydraulics, electronics, pneumatic
EMANE: The Extendable Mobile Ad-Hoc Network Emu- systems, and thermodynamics to simulate model behaviors.
lator [65], or EMANE, is an open-source real-time frame- While the platform itself is not immediately optimized for the
work for highly flexible simulation of mobile network sys- wireless domain, its use for simulation of hardware within
tems. Modular network development allows for independent an IoT network, without explicit wireless network dynamics,
physical-layer modeling of each network element, providing is discussed in [36]. Twin Builder supports integration of
accurate virtualization of system performance by considering third-party platforms as well, which implies compatibility with
signal propagation, antenna profile effects and interference other tools capable of explicitly modeling wireless network
sources between each emulated wireless link. In general, each behaviors.
emulator instance is comprised of a physical layer model Spirent 5G DT: The Spirent 5G Digital Twin [73] presents
instance paired with one or more radio waveform models, a very robust platform for emulating a full end-to-end 5G
which are designed in C++ and configured using XML. network. Various 5G network elements, including independent
Similar to the Colosseum MCHEM, emulation instances are channel emulation, virtual EPC and gNB, and full-stack end-
linked to a shared multicast channel which generates over- device emulation, are virtualized with very high fidelity in
the-air network behaviors such as signal propagation, antenna order to generate cost-effective accurate behavioral analysis,
effects, and interference. Additionally, EMANE provides radio enable evaluation of new security protocols, as well as other
waveform model plugins compatible with SDR hardware to “testing on demand” services. The authors of [37] outline
enable shared-code emulation, which is comparable to NS-3 several key functionalities of this platform, including wire-
emulation capabilities. While EMANE is limited to emulation less network automation and optimization, network slicing
of PHY and MAC layers, emulation of NET layer and above via SDR and network functions virtualization (NFV), and
protocols is typically handled in practice through integration accelerated 5G network planning and validation, among others.
with the Common Open Research Emulator (CORE) [70]. Pavatar: In the context of the Internet-of-Things (IoT),
Wireless InSite: Remcom Wireless Insite [66] is a propri- intelligent online monitoring systems envisioned for smart
etary electromagnetic (EM) propagation modeling tool for cities, Industrial IoT (IIoT), and other next-generation IoT
wireless networking, which can be leveraged for MIMO systems are anticipated to play a key role in supporting
dataset generation [25] via ray tracing as discussed in Sec- virtual environments constructed to support a high degree
tion I. The propagation behaviors are designed around several of virtualization [74]. A key example of the capability of
modeling theories such as Shooting Bouncing Ray (SBR), high-fidelity DT supported by distributed heterogeneous sensor
Adjacent Path Generation (APG), and Finite Difference Time networks is the Pavatar system [75]. Pavatar collects data at
Domain (FDTD), considering environmental reflection be- multiple system layers simultaneously in order to conduct
haviors based on the Uniform Theory of Diffraction (UTD) comprehensive sensing of all system components and human
[71]. From these models, this tool can recover significant activities in the operation environment. This heterogeneous
receiver-side information such as received power, path loss, data, which is in excess of 1 TB per day, is used to construct
direction of arrival, delay spread, and interference estimates. a VR representation of every system element for human inter-
Due to its high-fidelity modeling capabilities, this tool provides facing, as well as conduct error prediction, anomaly detection,
significant potential for accurate physics-based network event and root-cause diagnosis [75]. The use of different types of
simulation based on signal behaviors within the propagation data sources (e.g. RF, optical, temporal) to construct a robust
environment. However, this level of fidelity comes at a sig- system virtualization, termed “data fusion”, is necessary to
nificant time cost. While APG with GPU can accelerate some provide accurate simulation of physical system behaviors [31].
scenarios, in general this tool will require several minutes to In [76], emphasis is placed on detailed modeling of end-
calculate network performance for a given deployment, pre- device behavior and dynamic agent-based interactions in the
venting faster-than-real-time applications [71]. Additionally, virtual space as an integral component of DT for distributed
each scenario will need to be fully re-calculated in the case or decentralized networks.
of mobile transmitter, and partially re-calculated in the case Keysight EXata: EXata [77] is a network digital twin
of mobile receiver, further increasing this time cost in mobile development and analysis tool which uses network emulation
networking scenarios. and simulation for network virtualization. The platform is
ANSYS Twin Builder: The ANSYS Twin Builder [72] based on a software virtual network (SVN) to generate each
presents an open platform for DT development, with a set protocol layer, antenna, and device in the twin domain. This
of built-in tools for physical and behavioral modeling of SVN is stated to be interoperable with real radio hardware and
physical objects. This platform supports multiple modeling capable of interacting with real applications. Additionally, the
domains and languages, enabling multi-physics simulation and simulation kernel leverages parallel discrete-event drivers dur-
heterogeneous data fusion for operation [37]. Specifically, ing runtime, which can enable faster-than real-time processing

8
necessary for improved real-time ML algorithm training and optimization [79], and acceleration of machine learning for
deployment. wireless [80], among others. While the simulation fidelity of
UBSim: UBSim is a custom hybrid network simulator UBSim is low, its flexibility is intended to enable integration
designed for use in DT research. It is capable of simulating with high-fidelity platforms such as NS-3 or RF-SITL [81]
microwave, millimeter-wave, and terahertz-band communica- to enable rapid design and evaluation of DT systems through
tions in terrestrial, aerial, or hybrid aerial-ground networking multi-fidelity, multi-physics experimentation.
scenarios deployed in a fully configurable physical networking
area. It is fully open-source and open-access, written in B. Research Opportunities
Python for flexibility, and ease-of-use. It is comprised of three The tools discussed in this section provide an interesting
core elements: the network element module, which provides scope of customizable, potentially interoperable network sim-
behavioral definitions of all available simulation elements; ulation at varying levels of fidelity for network virtualization
the network control module, which provides control over all as introduced in Section I. However, very few tools have
deployed network elements; and the discrete event module, been accepted to be individually suitable for full DT imple-
which schedules simulation events. To facilitate ease of use, mentation following the interdisciplinary feature set shown
three sets of APIs have been designed: the environment in Figure 1. Additionally, due to the lack of open-source
definition API, which coordinates all environmental features and community support, they may not provide the level of
and blockages; the network configuration API, which specifies accessibility required for rapid experimental development in
network topology and communications parameters; and the this area. The contribution of a readily available, community-
custom algorithm API, which provides templates for data- oriented DT platform for the purpose of ML-based wireless
driven algorithm deployment. Each UBSim instance also pro- network experimentation remains a significant open challenge.
vides a feedback tunnel, as indicated in Fig. 4, to enable It is expected that a combination of these tools, combined
socket communications with external software. In order to with data fusion [16] and platform integration [82], can be
enable research into key technologies for comprehensive DT as leveraged for a widely available, high-fidelity DT toolchain
outlined in Section II, UBSim supports parallel learning across optimized for use in the wireless domain.
multiple simulation instances, configurable sim-to-sim policy Multi-fidelity Simulation: While data-driven methods can
transfer1 for rapid evaluation of domain adaptation algorithms provide significant improvements to network performance and
such as robust learning and system identification, and is services, they can be very time-consuming and data-expensive,
currently being modified to enable online DT construction and may require re-training if the target environment changes
UBSim
using Expansion
SLAM. UBSim (DT has survey)
been leveraged for experiments over time. To balance the tradeoff between optimization time
in domain adaptation [78], UAV network virtualization and and algorithm accuracy, we identify multi-fidelity simulation
as an important element of future DT-enabled wireless net-
1 Sim-to-sim policy transfer is similar to sim-to-real policy transfer, but the
working systems. Low-fidelity simulation can be used for
policy is transferred to another simulation environment instead of the real time-sensitive tasks, such as rapid control decisions, by lever-
environment. aging an approximation of wireless network behaviors based

VBEPs Behavior record Source Domain: UBSim


(from NetTopo module)
Feedback Tunnel
(from QoSPara module)
- sysID training loop
- Socket thread
Intent Specification NEM Net. Config.
API
Objective 1 Objective 2 Objective N
Agent and Env. Def.
state NCM API
Target Domain: OSWireless trajectories

WiNAS Mathematical OaaS Algorithm Forwarding Cust. Alg.


Subplane Specification Subplane Specification Specification DEM API

Forwarding Forwarding
Distribution Abstraction Software Algorithm Decisions

Data Plane Virtualization Software Control algorithms


(from AlgoGen Engine)

PPS Subplane Control policy


(Programmable Protocol Stack) weights

Fig. 4: Expansion of UBSim to include OSWireless as a physical domain framework, considering accelerated control algorithm
convergence and practical system identification.

9
on observable performance statistics, while high-fidelity mod- taking possibly several minutes to render a single environment
els can be used to maximize network performance in stable [71]. This trade-off between simulation fidelity and processing
environments, or leverage offline optimization algorithms for time may be problematic for time-critical applications. Domain
event prediction. adaptation seeks to minimize the need for this trade-off by
Standardization: Generating a standardized framework to improving the capability of simulation-accelerated learning
facilitate high degrees of virtualization in the wireless domain frameworks to generalize from simulation in the source do-
presents many challenges, including simulation for highly main to real-world deployment in the physical domain.
complex and volatile networking environments [62], support
for dynamic, heterogeneous, and distributed network architec- A. State of the Art
tures [37], and ready integration with machine learning and Instead of seeking a tradeoff between simulation fidelity
model-based network control [22]. Specifically, we identify the and operation time, domain adaptation seeks to improve the
need for an open, accessible DT framework that supports con- generalization capability of low fidelity simulation. The ideal
figurable network virtualization at multiple levels of fidelity. domain adaptation system seeks to minimize the importance
The authors of [79] demonstrate preliminary work in this area, of simulation fidelity on the accuracy of the resulting control
by designing middleware between a high-fidelity UAV network policy in the physical domain, instead focusing on solving
virtualization platform for environmental definition with a low- the contextual mismatch between domains and overcoming
fidelity network simulator for accelerated convergence of con- inherent simulation generalization to make sure the result-
trol algorithms. The continued development and distribution ing control policy works. We introduce three representative
of such a framework would enable many contributions in this examples of this line of research as system identification
area, providing a stronger definition of the capabilities of DT [87], [88], domain-agnostic feature extraction [89], [90], and
technology for use in next-generation wireless networks [46]. robust learning [78], [91], [92]. In system identification, source
Faster-than-real-time Optimization: As part of the envi- domain simulation parameters are adapted based on feedback
sioned model for 6G network architecture, DT is expected from the physical domain to improve behavioral accuracy.
to play a large role in the real-time or faster-than-real-time In domain-agnostic feature extraction, contextual features are
optimization of network deployments. In order to provide identified from low-level data to generate a shared observation
high-fidelity simulation – hence accurate control directives space across domains. In robust learning, the gap between
– protocols designed for UAV-assisted communications will source and physical domains is estimated through feedback
require support in virtual environments. Generalizable support and considered during training in the source domain. Readers
for custom wireless protocols is possible on SDR hardware, are referred to [93] and references therein for a survey of
and can be enabled based on integration of UBSim with robust reinforcement learning specifically and [94], [95] for
OSWireless [83]. OSWireless is a wireless network operating other general domain adaptation techniques.
system capable of decomposing operator intent to explicit System Identification. System identification is the most es-
network control algorithms in a zero-touch manner. Towards tablished approach to domain adaptation in existing literature
practical sim-to-real experimentation, we envision an expan- [50]. In practice, system identification seeks to iteratively
sion of UBSim to support integration with OSWireless, as improve behavioral parameters in a source domain simulation
detailed in Fig. 4. Specifically, OSWireless can serve as a based on feedback from the physical domain. An outline of
physical domain control system to decompose operator intent the general premise of system identification is depicted in
into custom control algorithms. These control algorithms can Fig. 5. While there are many application-specific variants of
be uploaded to UBSim, along with network state data from system identification, this approach faces some challenges in
the Wireless Network Abstraction Specification (WiNAS) Sub- general. Primarily, a significant amount of feedback is required
plane, for faster-than-real-time policy training based on low- from the target system to validate source domain performance.
fidelity simulation. UBSim will return the optimized control Additionally, some virtualization platforms such as Colosseum
policy to OSWireless for deployment on hardware nodes. To and InSite introduced in Section IV may not be fully open-
address the sim-to-real gap, UBSim will leverage the WiNAS source or parameterized to support system identification. Such
Subplane data to perform system identification, improving simulation platforms that are configurable and offer full control
fidelity by tuning parameters to match simulation performance of behavioral parameters are termed hybrid simulators, due to
to physical domain observations. their analytical and behavioral modeling capabilities.
The authors of [87] explore sim-to-real transfer learning
using robot navigation tasks, in which it was noticed that
V. D OMAIN A DAPTATION T ECHNIQUES
learners in the source domain are capable of exploiting a given
As discussed Section IV, synthetic sensing is a useful simulation to perform tasks beyond capabilities in the real
method to accelerate ML algorithm convergence for practical world. By modifying the simulator based on this feedback,
real-world applications [31], [84]. However, synthetic data is the source-to-target gap was reduced and the similarity be-
unable to fully represent all system behaviors in the physical tween simulated and real robot performance was improved.
domain, and thus introduces some inaccuracy during algorithm Similar to system identification, the use of domain-agnostic
training. Many works seek to leverage high-fidelity models to features can be used to enable domain adaptation. Instead of
improve accuracy of synthetic data [85], [86], but processing converging simulation parameters to maximize behavioral sim-
high-fidelity behavioral models can be quite time-consuming, ilarity, source domain observations can be adapted to extract

10
Source Domain Physical System state transition q, selected from an uncertainty set P, which
Simulation is comprised of all possible transitions in an environment.
Parameter- User-defined Ground truth In order to model contaminated agent trajectories, the R-
generating function performance generation contamination model generates a subset Psa of P for each
metrics
Event prediction Environmental s and a pair according to Psa = (1 − R)pas + Rqsa , qsa ∈ ∆|S| ,
sensing where ∆|S| is the simplex of state space S, and R represents
Behavioral the probability of state transition according to q.
Task execution
modeling
Control Algorithm
It is shown in [78] that the selection of random parameters
Control algorithm directives deployment from the environment can improve policy transfer performance
when the source and physical domains have different transition
Fig. 5: General outline of system identification. kernels due to differences in the environment dynamics. Both
[91] and [78] demonstrate the requirement for careful parame-
high-level features common to both the source and physical
ter selection prior to training, highlighting a key limitation of
domains, minimizing the impact of domain-specific dynamics
robust learning in the context of domain adaptation. While
on transfer learning performance. The authors of [96] and
robust learning can provide very conservative policies, the
[89] demonstrate this practice using pixel-level observation
authors of [97] propose soft-robust learning, which takes an
adaptation for policy transfer in tasks related to computer
average over the uncertainty set instead of selecting worst-case
vision.
scenarios to reduce the conservative nature of the resulting
Domain-Agnostic Feature Extraction. This method seeks policy. This yields a model capable of generalization while
to reduce the effect of the source-to-target gap by finding limiting performance degradation.
commonality between each domain. Specifically, this ap-
proach seeks to align behavioral inferences made in each
domain through extraction of high- or low-level features that B. Research Opportunities
can minimize domain-specific phenomena. For example, the Expertise Incorporated Learning: In order to advance the
approach to semantic image segmentation outlined in [89] use of domain adaptation for wireless networks, we consider
leverages pixel-level segment representation and classification constraint sampling reinforcement learning (CSRL) [98] as a
to improve unsupervised adversarial domain adaptation for promising method to quantify the reality gap using domain
computer vision. The authors of [90] propose transfer com- expertise. In this way, expert knowledge of the networking
ponent analysis, which seeks a set of features, termed transfer environment can be integrated during the training process
components, to minimize the difference in data distributions via sensing [51], [55] to enable effective policy transfer
between domains. By projecting transfer components onto a in unknown environments with minimal human interaction.
shared latent space and applying standard machine learning Sophisticated applications of this approach, especially in the
models for classification or regression tasks. wireless domain, remain an open challenge in this area.
Robustness Mechanisms. Robust learning aims to mitigate Reality Gap: The mathematical generalizations present in all
the effect of environmental perturbances, such as modeling simulations make sim-to-real transfer a persistent challenge
errors, time-varying dynamics, or unreliable data, on the due to the inherent non-linearities of natural phenomena.
resulting control policy. This can be typically accomplished Sim-to-sim experiments can be used to estimate the domain
by applying a random or adversarial noise process to a transfer performance of new algorithms for domain adaptation,
system during policy training. Defined in the scope of the DT but even high-fidelity models of physical domain hardware
framework outlined in Fig. 1, this noise is generated by the are incapable of predicting performance exactly. Additionally,
source domain during policy training to mitigate the effect of while synthetic sensing can help accelerate convergence of
the source-to-target gap during policy transfer [78], [91], [92], data-driven models, synthetic data will introduce inaccuracies
[97]. In many cases, the noise is added in the form of training to the learning model based on generalization [88].
samples manually selected from worst-case scenarios or an To overcome the reality gap between simulation results and
average of potential environmental anomalies. This is intended physical system behaviors, the authors of [93] discuss different
to generate a policy that will provide better generalization than applications of robust learning to provide performance guaran-
non-robust policies when faced with unexpected, unknown, or tees in the presence of environment uncertainties. While some
adversarial physical domain dynamics, at the cost of reduced contributions address challenges associated with the reality
maximum achievable performance. gap [48], [87], sim-to-real adaptation in the wireless domain
An effective approach to applying this policy noise to remains an open research area.
generate a robust policy is the R-contamination model [91]. Real-time Training: In ML applications, especially meth-
Leveraged in [91] and [78] to implement model-free robust ods that leverage neural networks, the training process is
reinforcement learning, the R-contamination model is used to generally time-consuming. Even considering faster-than-real-
probabilistically alter, or “contaminate,” observations made by time training capabilities provided by a DT system, significant
the agent with random or worst-case dynamics to encourage system or environmental changes may require re-training some
conservative policy learning. Instead of an agent following or all of a learned policy. In time-critical applications, this
a deterministic transition kernel pas for a given state s and may require interim behavioral models to be leveraged during
action a, this contamination probabilistically cause an arbitrary policy re-training. The authors of [22] explore the integration

11
of low-complexity inference models with ML algorithms to Advantage Actor-Critic (A3C) [100] can be used. Similar to
reduce the impact of long training times on system perfor- A2C, A3C uses an advantage function to reduce variance and
mance in such scenarios. Additionally, when simultaneously improve training stability. The only difference is that A3C
optimizing multiple agents, as in multi-agent reinforcement allows agents to interact with the environment in parallel. In
learning (MARL), the computation complexity and communi- A3C, virtual agents work individually in multiple instances
cation overhead increase exponentially due to the additional of the same environment to update a global policy asyn-
problem dimensionality. This may further exaggerate the sim- chronously. This parallelism can significantly reduce policy
to-real gap based on the aggregate generalizations made across convergence times.
multiple agent representations in the twin domain. In the To reduce the communication overhead of both centralized
example of a self-coordinating swarm UAV network, each and decentralized MARL scenarios, the Lazily Aggregated
agent may be required to relay significant state information Policy Gradient (LAPG) method [101] can be used to reduce
including location, speed, height, and network status of itself the frequency of communication. Most information related to
and other agents to the twin domain in each training step to collision avoidance and task completion does not need to be
avoid collisions, reduce interference, and take actions without exchanged in every time period, e.g., sensor malfunctions, low
loss of information. Additionally, as the number of agents battery states, and obstacle detection. LAPG sets a trigger
increase, the amount of information required by the twin condition for this kind of information to reduce the exchange
domain to maintain an accurate model of the physical domain frequency, which can reduce the overall network communica-
system increases as well, which further increases network tion overhead and computational complexity. The lower bound
resource consumption. The complexity of algorithm design of the trigger condition for LAPG can be written as:
and deployment can be further increased when considering D
ˆ k ||2 ≥ ξ X k+1−d
a decentralized scenario, in which a twin domain model needs ||δ ∇ m ||θ − θk−d ||2 + 6σm,N,δ/K
2
, (2)
to be maintained at each agent instead of a central controller. α2 M 2
d=1
The Advantage Actor-Critic (A2C) algorithm [99] has been where δ ∇ˆ k represents the importance of updating the infor-
m
demonstrated as an effective tool to minimize policy training mation, which is calculated by the difference between previous
times considering high-dimensional state and action spaces. and current policy parameters.
A2C uses the estimated optimal state-action value to update Another strategy that can be used to reduce the per-update
the policy, of which the gradient can be calculated as follows: communications overhead of a distributed network is by using
∇θ J(θ) = Eπθ [∇θ log πθ (s, a)Aπ (s, a)] (1) a Partially Observable Markov Decision Process (POMDP)
[102]. Instead of requiring the agents to fully observe the
where Aπ (s, a) = Qπθ (s, a)−Vπθ (s) is termed the Advantage environment in each time step, action selection for each agent
function, in which Qπθ (s, a) is the action-value function is based on a probability distribution given by the model
and Vπθ (s) is the state-value function. With this advantage instead of directly observing the underlying state. In this
function, variance of the gradient can be reduced which case, each agent has less information to maintain in the local
improves model training stability. It is very time consuming twin domain model and the required information to be shared
to find hyperparameters that stabilize the learning process, between agents per network update can be reduced.
since A2C relies on the initial estimation of values, therefore
A2C algorithms can be challenging to design or further time- VI. P HYSICAL S CENARIOS D EVELOPMENT
consuming for real-time training scenarios. While several works have proposed DT framework concepts
To further accelerate policy convergence, an Asynchronous to support adoption of DT-enabled technologies at scale [34],

Upper Middle Lower Upper Middle Lower

Server
Rack

Static USRP
N210
Static USRP
N210
Upper 𝑈 𝑈

𝑧 𝑈 𝑈
𝑈
𝑈

𝑈 𝑈 𝑈
𝑈
Static USRP 𝑈 𝑈
Static USRP Middle N210
N210
𝑥 𝑈 𝑈
𝑈
𝑈 𝑈

𝑦 𝑈 𝑈 𝑈 𝑈
Lower Ground robots with
USRP N210
𝑦
Ground
robots
𝑥
Server rack Dynamic nodes:
𝐵 𝐵 𝐵 𝐵 𝐵 𝐵 𝑀 𝑀
Millimeter-wave routers –
USRP B210 – srsRAN network
srsRAN network

(a) (b)

Fig. 6: (a) Snapshot of the UB NeXT testbed; (b) UB NeXT testbed topology.

12
[37], [74], the state of the art in this area generally relies on identify several key research opportunities for expanding the
performance inference based on sim-to-sim experimentation, scope and depth of continued research in this direction.
inflexible virtualization of pre-defined physical scenarios, or Sim-to-real gap Estimation: In general, domain adaptation
small-scale sim-to-real experiments with numerous experimen- methods seek to bridge the gap between physical and DT
tal constraints. domains. However, especially in the case of robust learning,
estimation of the sim-to-real gap may not guarantee optimal
A. State of the Art performance if the gap between physical and DT domain
Scenario development is a key consideration for high- behaviors is large or unknown. Furthermore, since there is no
quality wireless experimentation platforms, which must sup- unifying framework for sim-to-real gap measurement, methods
port user-defined network topologies, protocols, and control that seek to reduce the sim-to-real gap may require manual
problems in order to provide accurate validation for use in a tuning in the case of multi-physics optimization. System
practical DT system. The NSF PAWR platforms, including identification has shown promise in reducing the performance
POWDER, COSMOS, AERPAW and ARA, represent the gap between physical and DT domains for physical scenarios
state-of-the-art for wireless network scenario development and regarding mechanical or robotic systems [88], but there is
experimentation, considering scale, accessibility, and capa- insufficient investigation into how to quantify the sim-to-real
bility. For UAV-enabled wireless networking research, AER- gap between different physical scenarios for other methods of
PAW [103] provides a large-scale experimentation platform domain adaptation. Further research into the measurement or
comprised of static nodes, mobile ground nodes, and UAV estimation of the sim-to-real gap induced by various physical
systems equipped with SDR hardware. The goal of AERPAW networking scenarios is anticipated to accelerate design of
is to provide a general platform to develop and evaluate domain adaptation schemes and hence advance the state-of-
new capabilities for UAV-enabled wireless networks, and is the-art of practical DT-enabled wireless networking.
envisioned to enable research into scalable zero-touch control Portable environments for UBSim: We demonstrate in [78],
systems for hybrid aerial-ground networks. POWDER [104] [107] the need for multiple environmental models to enable
is a city-scale wireless networking research testbed in Salt experimentation in domain adaptation, with specific attention
Lake City, Utah, specializing in topics such as 5G O-RAN, to both sim-to-sim and sim-to-real gaps. Specifically, more
massive MIMO, and spectrum sharing in the sub-6 Ghz band. virtual models will be made available for future work to
COSMOS [105] is a testbed deployed in an ultra-dense area build on the contributions in [78], enabling rapid and repeat-
of New York City, specializing in research for ultra-high- able experimentation for domain adaptation in the wireless
bandwidth, low-latency wireless communications, millimeter- domain through sim-to-sim experimentation. By virtualizing
wave MIMO and beamforming, and advanced edge computing real testbeds, as done in [78], this will provide preliminary
scenarios. Finally, ARA [106] is a wireless living laboratory benchmark results required to motivate continued research for
focused on enabling research into rural broadband wireless sim-to-real transfer.
connectivity by connecting an open-access software-defined Building on the sim-to-sim framework outlined in [78], we
virtual infrastructure with a heterogeneous mesh of terrestrial identify the need for a flexible sim-to-real domain adapta-
radio hardware and LEO satellite communication terminals, tion framework that can accommodate different environmental
capable of providing 600 square miles of contiguous wireless models based on the physical domain specification. This will
coverage. enable the design of new domain adaptation algorithms for the
In addition to AERPAW, the UB NeXT testbed [80] pro- wireless domain as well as adaptation of existing algorithms.
vides a comprehensive framework for network virtualization Using the same simulation platform as [78] and [107], as well
and domain adaptation research in integrated aerial-ground as the indoor autonomy research facility and UB SOAR facility
wireless networking. We have included a picture and topology at University at Buffalo, many network configurations can be
diagram of the UB NeXT testbed in Fig. 6. The NeXT testbed observed, including heterogeneous aerial-ground networks and
platform is part of the UB indoor autonomy research facility, UAV-to-UAV networks. To achieve the short-term goal of sim-
and is comprised of 21 USRP N210 SDRs, 6 USRP B210 to-real experimentation, we plan direct integration with the
SDRs, and two millimeter-wave routers, with mobility support UB NeXT testbed platform [80]. In preliminary sim-to-sim
provided by three ground robots with 22 kg payload capacities experiments, we have virtualized the NeXT testbed [78] and
and a netted UAV enclosure for safe aerial network testing. The will use this DT environment to better understand the sim-
testbed networking environment has been fully virtualized in to-real gap through rigorous sim-to-real experimentation and
UBSim, which has been demonstrated in [78]. This testbed can domain adaptation algorithm design.
enable rapid, small-scale experimentation to address existing Testbed Sharing and Remote Access: Considering the need
challenges in DT research such as evaluation of the sim-to- for open, accessible DT experimentation platforms, we believe
real gap, integrated optimization-learning algorithm design for it is of critical importance to facilitate remote access and
efficient ML, and virtual network self-configuration via system control for a fully realized DT-enabled wireless networking
identification. testbed. There remains a lack of testbeds and networking
environments to enable validation of AI integration and further
B. Research Opportunities research into virtualization for network autonomy [16], espe-
Scenario development is at the core of validating DT- cially to support advancement towards zero-touch networking.
enabled experimental frameworks for the wireless domain. We To address this challenge, we emphasize the contributions

13
UnionLabs
User Plane
Testbed Owner
• Create new testbed
• Testbed management
• User and resource management

UnionLabs Administrator Testbed User


• Registration management • Testbed subscription
• Naming space definition • Resource requests
• Troubleshooting Internet • Start, monitor experiments
• Data/code sharing

User Portal
(AWS Cognito)
Federation Plane
Testbed Directory
(AWS DynamoDB)

Testbed 1
Code 1 Dataset 1 Device 1
VM1 VM2 … Testbed 2 …
Code 2 Dataset 2 Device 2
Code Data Hardware VM Image

Code Repository Dataset Repository VM Pool Data/Control Channel


• Code-testbed association • Dataset-testbed association VM1 VM2 • AWS Lambda
• Code deployment • Dataset deployment • API Gateway
• Consistency check • Consistency check
Code Code … • WebSocket
Dataset Dataset

Internet

Testbed Plane
Testbed 1 Testbed 1
Arena TeraNova
Institutional Gateway Institutional Gateway (USRP sub-6 GHz) (Teraherz)

Edge Cloud Edge Cloud


Comms Agent Event Agent Comms Agent Event Agent
Customized Tool Monitoring Tool Customized Tool Monitoring Tool
VM Instance Local Storage VM Instance Local Storage … M-Cube
(mmWave)
WISCA Net
(USRP, UAV)

Software-defined Front-end Software-defined Front-end

NWSL
SOAR
Sub-6GHz SDRs Robot Vehicle UAV Sub-6GHz SDRs LoRa IoT Underwater IoT (optical, mmWave,
(UAV)
UAV)

Fig. 7: Overview of UnionLabs testbed federation.

made in [82] as discussed in Sec. IV, and propose an expansion VII. C ONCLUSIONS
of the supported framework to include simulation/emulation
capabilities. We envision a new framework referred to as In this work, we reviewed existing literature regarding
UnionLabs for testbed sharing and federation. As illustrated in the use of DTs for ML-enabled wireless networks with an
Fig. 7, the architecture of UnionLabs consists of three planes, emphasis on UAV-enabled networking, and discussed the open
connected by the internet: the User Plane, which handles research challenges in the area. DT for the wireless domain
user/operator interactivity, registration, and management; the is a particularly important open research area, and can serve
Federation Plane, which coordinates testbed access and stores as an enabling technology for practical applications of data-
experimental code, datasets, and virtual machines; and the driven network self-optimization such as UAV network self-
Testbed Plane, which is comprised of all federated testbeds coordination and autonomous network control. Domain adap-
connected through institutional gateways. This initiative will tation, a key element to bridge the gap between simulations
provide a platform to share code, data, and software/hardware and real network deployments, requires further investigation
resources across a federation of cloud-enabled heterogeneous in the wireless domain. Several methods of domain adapta-
testbeds distributed throughout the country, with an emphasis tion, including system identification, domain-agnostic feature
on the advancement of research topics related to NextG extraction, and robust learning have been evaluated for use
wireless networks, zero-touch and network automation, and in a DT system, focusing on limiting interactions between
the wireless Internet of Things. domains to improve data efficiency. However, a comprehensive
exploration of the reality gap present in DTs remains an open

14
challenge in this area, as well as accelerating real-time training [18] R. Liu, M. Lee, G. Yu, and G. Y. Li, “User Association for Millimeter-
using domain adaptation in multi-agent systems such as UAV Wave Networks: A Machine Learning Approach,” IEEE Transactions
on Communications, vol. 68, no. 7, pp. 4162–4174, July 2020.
swarm networks. In order to further identify the reality gap [19] X. Liu, Y. Liu, and Y. Chen, “Machine Learning Empowered Trajectory
across domains, the topic of physical scenario development and Passive Beamforming Design in UAV-RIS Wireless Networks,”
also needs to be further explored especially for wireless UAV IEEE Journal on Selected Areas in Communications, vol. 39, no. 7,
pp. 2042–2055, July 2021.
networks. [20] S. Messaoud, A. Bradai, O. B. Ahmed, P. T. A. Quang, M. Atri, and
M. S. Hossain, “Deep Federated Q-Learning-Based Network Slicing for
Industrial IoT,” IEEE Transactions on Industrial Informatics, vol. 17,
R EFERENCES no. 8, pp. 5572–5582, August 2021.
[21] C. She, R. Dong, Z. Gu, Z. Hou, Y. Li, W. Hardjawana, C. Yang,
[1] Q. Wang, W. Zhang, Y. Liu, and Y. Liu, “Multi-UAV dynamic wireless
L. Song, and B. Vucetic, “Deep Learning for Ultra-Reliable and Low-
networking with deep reinforcement learning,” IEEE Communications
Latency Communications in 6G Networks,” IEEE Network, vol. 34,
Letters, vol. 23, no. 12, pp. 2243–2246, December 2019.
no. 5, pp. 219–225, September/October 2020.
[2] Z. Guan, N. Cen, T. Melodia, and S. Pudlewski, “Self-Organizing
[22] C. She, C. Sun, Z. Gu, Y. Li, C. Yang, H. V. Poor, and B. Vucetic,
Flying Drones with Massive MIMO Networking,” in Proc. of Mediter-
“A Tutorial on Ultra-Reliable and Low-Latency Communications in
ranean Ad Hoc Networking Workshop (Med-Hoc-Net), Capri, Italy,
6G: Integrating Domain Knowledge into Deep Learning,” arxiv.org,
June 2018.
Jan. 2020. [Online]. Available: https://arxiv.org/abs/2009.06010
[3] Q. Zhang, M. Jiang, Z. Feng, W. Li, W. Zhang, and M. pan, “IoT-
[23] S. AbdulRahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi,
enabled UAV: Network architecture and routing algorithm,” IEEE
and M. Guizani, “A Survey on Federated Learning: The Journey
Internet of Things Journal, vol. 6, no. 2, pp. 3727–3742, April 2019.
From Centralized to Distributed On-Site Learning and Beyond,” IEEE
[4] X. Chen, T. Chen, Z. Zhao, H. Zhang, M. Bennis, and Y. JI, “Re-
Internet of Things Journal, vol. 8, no. 7, pp. 5476–5497, April 2021.
source Awareness in Unmanned Aerial Vehicle-Assisted Mobile-Edge
Computing Systems,” in Proc. of IEEE 91st Vehicular Technology [24] A. Feriani and E. Hossain, “Single and Multi-Agent Deep Reinforce-
Conference (VTC2020-Spring), Antwerp, Belgium, May 2020. ment Learning for AI-Enabled Wireless Networks: A Tutorial,” IEEE
Communications Surveys & Tutorials, vol. 23, no. 2, pp. 1226–1252,
[5] A. Garcia-Rodriguez, G. Geraci, D. López-Pérez, L. G. Giordano,
Second quarter 2021.
M. Ding, and E. Björnson, “The Essential Guide to Realizing 5G-
Connected UAVs with Massive MIMO,” IEEE Communications Mag- [25] A. Alkhateeb, “Deepmimo: A generic deep learning dataset for
azine, vol. 57, no. 12, pp. 84–90, December 2019. millimeter wave and massive mimo applications,” arXiv preprint
[6] P. Chandhar, D. Danev, and E. G. Larsson, “Massive MIMO for arXiv:1902.06435, February 2019.
Communications With Drone Swarms,” IEEE Trans. on Wireless Com- [26] S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline Reinforcement
munications, vol. 17, no. 3, pp. 1604–1629, March 2018. Learning: Tutorial, Review, and Perspectives on Open Problems,”
[7] W. Sun, N. Xu, L. Wang, H. Zhang, and Y. Zhang, “Dynamic arXiv.org, Nov. 2020. [Online]. Available: https://arxiv.org/abs/2005.
Digital Twin and Federated Learning with Incentives for Air-Ground 01643
Networks,” IEEE Transactions on Network Science and Engineering, [27] D. Jones, C. Snider, A. Nassehi, J. Yon, and B. Hicks, “Characterising
vol. 9, no. 1, pp. 321–333, January 2020. the Digital Twin: A Systematic Literature Review,” CIRP Journal of
[8] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C. Manufacturing Science and Technology (Elsevier), vol. 29, pp. 36–52,
Liang, and D. I. Kim, “Applications of deep reinforcement learning May 2020.
in communications and networking: A survey,” IEEE Commun. Surv. [28] K. Xia, C. Sacco, M. Kirkpatrick, C. Saidy, L. Nguyen, A. Kircaliali,
Tutor., vol. 21, no. 4, pp. 3133–3174, Fourth Quarter 2019. and R. Harik, “A digital twin to train deep reinforcement learning
[9] S. K. Moorthy and Z. Guan, “Beam Learning in MmWave/THz- agent for smart manufacturing plants: Environment, interfaces and
band Drone Networks Under In-Flight Mobility Uncertainties,” IEEE intelligence,” Journal of Manufacturing Systems, vol. 58, no. B, pp.
Transactions on Mobile Computing, vol. 21, no. 6, pp. 1945–1957, 210–230, January 2021.
June 2022. [29] Q. Qi and F. Tao, “Digital Twin and Big Data Towards Smart Man-
[10] Z. Su, W. Feng, J. Tang, Z. Chen, Y. Fu, N. Zhao, and K.-K. ufacturing and Industry 4.0: 360 Degree Comparison,” IEEE Access,
Wong, “Energy Efficiency Optimization for D2D Communications vol. 6, pp. 3585–3593, Jan. 2018.
Underlaying UAV-assisted Industrial IoT Networks with SWIPT,” IEEE [30] S. Ivanov, K. Nikolskaya, G. Radchenko, L. Sokolinsky, and M. Zym-
Internet of Things Journal (early access), January 2022. bler, “Digital Twin of City: Concept Overview,” in Proc. of Global
[11] H. Alghafari and M. SayadHaghighi, “Decentralized Joint Resource Smart Industry Conference (GloSIC), Chelyabinsk, Russia, November
Allocation and Path Selection in Multi-hop Integrated Access Backhaul 2020.
5G Networks,” Computer Networks, vol. 207, April 2022. [31] R. Minerva, G. M. Lee, and N. Crespi, “Digital Twin in the IoT
[12] J. Du, W. Liu, G. Lu, J. Jiang, D. Zhai, F. R. Yu, and Z. Ding, “When Context: A Survey on Technical Features, Scenarios, and Architectural
Mobile-Edge Computing (MEC) Meets Nonorthogonal Multiple Ac- Models,” Proceedings of the IEEE, vol. 108, no. 10, pp. 1785–1824,
cess (NOMA) for the Internet of Things (IoT): System Design and Oct. 2020.
Optimization,” IEEE Internet of Things Journal, vol. 8, no. 10, pp. [32] R. Dong, C. She, W. Hardjawana, Y. Li, and B. Vucetic, “Deep
7849–7862, May 2021. Learning for Hybrid 5G Services in Mobile Edge Computing Systems:
[13] Y. Cheng, K. H. Li, Y. Liu, K. C. Teh, and G. K. Karagiannidis, Learn From a Digital Twin,” IEEE Trans. on Wireless Communications,
“Non-Orthogonal Multiple Access (NOMA) with Multiple Intelligent vol. 18, no. 10, pp. 4692–4707, Oct. 2019.
Reflecting Surfaces,” IEEE Transactions on Wireless Communications, [33] W. Sun, H. Zhang, R. Wang, and Y. Zhang, “Reducing Offloading
vol. 20, no. 11, pp. 7184–7195, Nobember 2021. Latency for Digital Twin Edge Networks in 6G,” IEEE Transactions
[14] Z. Guan and T. Melodia, “CU-LTE: Spectrally-Efficient and Fair on Vehicular Technology, vol. 69, no. 10, pp. 12 240–12 251, Oct. 2020.
Coexistence Between LTE and Wi-Fi in Unlicensed Bands,” in Proc. [34] L. U. Khan, W. Saad, D. Niyato, Z. Han, and C. S. Hong,
of IEEE Intl. Conference on Computer Communications (INFOCOM), “Digital-Twin-Enabled 6G: Vision, Architectural Trends, and Future
San Francisco, CA, USA, Apr. 2016. Directions,” arXiv.org, February 2021. [Online]. Available: https:
[15] J. Hu, S. K. Moorthy, A. Harindranath, Z. Guan, N. Mastronarde, E. S. //arxiv.org/abs/2102.12169
Bentley, and S. Pudlewski, “SwarmShare: Mobility-Resilient Spectrum [35] A. Rasheed, O. San, and T. Kvamsdal, “Digital Twin: Values, Chal-
Sharing for Swarm UAV Networking in the 6 GHz Band,”,” in Proc. lenges and Enablers From a Modeling Perspective,” IEEE Access,
of IEEE International Conference on Sensing, Communication and vol. 8, pp. 21 980–22 012, Jan. 2020.
Networking (SECON), Virtual Conference, July 2021. [36] Q. Qi, F. Tao, T. Ho, N. Anwer, A. Liu, Y. Wei, L. Wang, and
[16] E. Coronado, R. Behravesh, T. Subramanya, A. Fernández-Fernández, A. Nee, “Enabling Technologies and Tools for Digital Twin,” Journal
S. Siddiqui, X. Costa-Pérez, and R. Riggio, “Zero Touch Management: of Manufacturing Systems, vol. 58, pp. 3–21, Jan. 2021.
A Survey of Network Automation Solutions for 5G and 6G Networks,” [37] H. X. Nguyen, R. Trestian, D. To, and M. Tatipamula, “Digital Twin
IEEE Surveys & Tutorials (Early Access), October 2022. for 5G and Beyond,” IEEE Communications Magazine, vol. 59, no. 2,
[17] H. Li, K. Ota, and M. Dong, “Learning IoT in Edge: Deep Learning for pp. 10–15, February 2021.
the Internet of Things with Edge Computing,” IEEE Network, vol. 32, [38] L. Lei, Y. Tan, K. Zheng, S. Liu, K. Zhang, and X. Shen, “Deep
no. 1, pp. 96–101, Jan.-Feb. 2018. reinforcement learning for autonomous Internet of Things: model, ap-

15
plications and challenges,” IEEE Communications Surveys & Tutorials, visual-inertial and multi-map SLAM,” IEEE Transactions on Robotics,
vol. 22, no. 3, pp. 1722–1760, 2020. vol. 37, no. 6, pp. 1874–1890, 2021.
[39] N. Luong, D. Hoang, S. Gong, D. Niyato, P. Wang, Y. Liang, and [59] K. Chiang, G. Tsai, Y. Li, and N. El-Sheimy, “Development of LIDAR-
D. Kim, “Applications of Deep Reinforcement Learning in Communi- based UAV system for environment reconstruction,” IEEE Geoscience
cations and Networking: A Survey,” IEEE Communications Surveys & and Remote Sensing Letters, vol. 14, no. 10, pp. 1790–1794, October
Tutorials, vol. 21, no. 4, pp. 3133–3174, 2019. 2017.
[40] A. Feriani and E. Hossain, “Single and multi-agent deep reinforcement [60] L. Bonati et al., “Colosseum: large-scale wireless experimentation
learning for AI-enabled wireless networks: a tutorial,” IEEE Commu- through hardware-in-the-loop network emulation,” in Proc. of IEEE
nications Surveys & Tutorials, vol. 23, no. 2, pp. 1226–1252, March International Symposium on Dynamic Spectrum Access Networks,
2021. Virtual Conference, December 2021.
[41] Y. Qian, J. Wu, R. Wang, W. Zhu, and W. Zhang, “Survey on [61] E. H. Glaessgen and D. S. Stargel, “The Digital Twin Paradigm for
Reinforcement Learning Applications in Communication Networks,” Future NASA and U.S. Air Force Vehicles,” in Proc. of Structures,
Journal of Communication and Information Networks, vol. 4, no. 2, Structural Dynamics and Materials Conference - Special Session on
pp. 30–39, June 2019. the Digital Twin, Honolulu, HI, April 2012.
[42] A. E. Campos-Ferreira, J. de J. Lozoya-Santos, A. Vargas-Martı́nez, [62] T. Yang, J. Chen, and N. Zhang, “AI-Empowered Maritime Internet of
R. R. Mendoza, and R. Morales-Menéndez, “Digital Twin Applications: Things: A Parallel-Network-Driven Approach,” IEEE Network, vol. 34,
A review,” Memorias del Congreso Nacional de Control Automático, no. 5, pp. 54–59, September/October 2020.
pp. 606–611, Oct. 2019. [63] NSNAM. NS-3 Network Simulator. (2011-2021). [Online]. Available:
[43] R. Saracco, “Digital Twins: Bridging Physical Space and Cyberspace,” https://www.nsnam.org/
IEEE Computer, vol. 52, no. 12, pp. 58–64, Dec. 2019. [64] Colosseum. NSF Colosseum: The World0 s Most Powerful Wireless
[44] R. Minerva, F. M. Awan, and N. Crespi, “Exploiting Digital Twin as Network Emulator. [Online]. Available: https://www.northeastern.edu/
Enablers for Synthetic Sensing,” IEEE Internet Computing, vol. 26, colosseum/
no. 5, pp. 61–67, January 2021. [65] AdjacentLink. EMANE: Extendable Mobile Ad-Hoc Networking
[45] J. M. Rozanec and L. Jinzhi, “Towards Actionable Cognitive Digital Emulator. [Online]. Available: https://adjacentlink.com/documentation/
Twins for Manufacturing,” in Proc. of the International Workshop on emane/v1.2.1/
Semantic Digital Twins, Heraklion, Greece, July 2020. [66] Remcom. Wireless InSite 3D Wireless Prediction Soft-
[46] N. Kuruvatti, M. Habibi, S. Partani, B. Han, A. Fellan, and H. Schotten, ware. (2021). [Online]. Available: https://www.remcom.com/
“Empowering 6G Communication Systems With Digital Twin Technol- wireless-insite-em-propagation-software
ogy: A Comprehensive Survey,” IEEE Access, vol. 10, pp. 112 158– [67] R. Chaudhary, S. Sethi, R. Kechari, and S. Goel, “A study of compari-
112 186, October 2022. son of Network Simulator -3 and Network Simulator -2,” International
Journal of COmputer Science and Information Technologies, vol. 3,
[47] L. Lei, G. Shen, L. Zhang, and Z. Li, “Toward Intelligent Cooperation
no. 1, pp. 3085–3092, 2012.
of UAV Swarms: When Machine Learning Meets Digital Twin,” IEEE
[68] G. C. et al. Wireshark. [Online]. Available: https://www.wireshark.org/
Network, vol. 35, no. 1, pp. 386–392, January/February 2021.
[69] P. Fuxjaeger, “Validation of the NS-3 interference model for
[48] K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakr-
IEEE802.11 networks,” in Proc. of IFIP Wireless and Mobile Network-
ishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and
ing Conference (WMNC), Munich, Germany, October 2015.
V. Vanhoucke, “Using Simulation and Domain Adaptation to Improve
[70] J. Ahrenholz, T. Goff, and B. Adamson, “Integration of the CORE
Efficiency of Deep Robotic Grasping,” in Proc. of IEEE International
and EMANE network emulators,” in Proc. of Military Communications
Conference on Robotics and Automation, Brisbane, Australia, Septem-
Conference (MILCOM), Baltimore, MD, USA, November 2011.
ber 2018.
[71] M. Schmiedekamp, A. Kuhlman, R. Ohs, and S. Buscemi, “High
[49] Y. Chebotar, V. Makoviychuk, M. Macklin, J. Isaac, N. Ratliff, and
fidelity modeling of spatio-temporally dense multi-radio scenarios,” in
D. Fox, “Closing the sim-to-real loop: Adapting simulation random-
Proc. of Annual conference of the Applied Computational Electromag-
ization with real world experience,” in Proc. of IEEE International
netics Society (ACES), Denver, CO, USA, October 2013.
Conference on Robotics and Automation, Montreal, Canada, May 2019.
[72] ANSYS. 5G Network Digital Twin Whitepa-
[50] B. Eysenbach, S. Chaudhari, S. Asawa, and S. Levine, “Off-Dynamics per. [Online]. Available: https://www.spirent.com/assets/wp
Reinforcement Learning: Training for Transfer with Domain Classi- simplifying-5g-with-the-network-digital-twin
fiers,” in Proc. of International Conference on Learning Representa- [73] Spirent. ANSYS Twin Builder. [Online]. Available: https://www.ansys.
tions, Vienna, Austria, May 2021. com/products/digital-twin/ansys-twin-builder
[51] E. Brock, C. Huang, D. Wu, and Y. Liang, “LiDAR-Based Real- [74] K. M. Alam and A. E. Saddik, “C2PS: A Digital Twin Architecture
Time Mapping for Digital Twin Development,” in Proc. of IEEE Reference Model for the Cloud-Based Cyber-Physical Systems,” IEEE
International Conference on Multimedia and Expo, Shenzhen, China, Access, vol. 5, pp. 2050–2062, Jan. 2017.
July 2021. [75] Y. He, J. Guo, and X. Zheng, “From Surveillance to Digital Twin:
[52] M. Minos-Stensrud, O. H. Haakstad, O. Sakseid, B. Westby, , and Challenges and Recent Advances of Signal Processing for the Industrial
A. Alcocer, “Towards Automated 3D Reconstruction in SME Factories Internet of Things,” IEEE Signal Processing Magazine, vol. 35, no. 5,
and Digital Twin Model Generation,” in Proc. of International Confer- pp. 120–129, Sept. 2018.
ence on Control, Automation and Systems, PyeongChang, GangWon, [76] T. Jung, N. Jazdi, and M. Weyrich, “A Survey on Dynamic Simulation
Korea, October 2018. of Automation Systems and Components in the Internet of Things,”
[53] M. Moallem and K. Sarabandi, “Polarimetric Study of MMW Imaging in Proc. of IEEE International Conference on Emerging Technologies
Radars for Indoor Navigation and Mapping,” IEEE Transactions on and Factory Automation (ETFA), Limassol, Cyprus, Sept. 2017.
Antennas and Propagation, vol. 62, no. 1, pp. 500–504, January 2014. [77] Keysight. EXata Network Modeling. [Online].
[54] R. Mur-Artal and J. D. Tardos, “ORB-SLAM2: An Open-Source Available: https://www.keysight.com/us/en/product/SN100EXBA/
SLAM System for Monocular, Stereo, and RGB-D Cameras,” IEEE exata-network-modeling.html
Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, October 2017. [78] M. McManus, Z. Guan, N. Mastronarde, and S. Zou, “On the Source-
[55] A. Ali, Z. S. Hashemifar, and K. Dantu, “Edge-SLAM: Edge-Assisted to-Target Gap of Robust Double Deep Q Learning in Digital Twin
Visual Simultaneous Localization and Mapping,” in Proc. of Inter- Enabled Wireless Networks,” in Proc. of SPIE Big Data IV: Learning,
national Conference on Mobile Systems, Applications, and Services, Analytics, and Applications, Orlando, Florida, United States, April
Toronto, Ontario, Canada, June 2020. 2022.
[56] Y. Lin, Y. Cheng, T. Zhou, R. Ravi, S. M. Hasheminasab, J. E. Flatt, [79] S. K. Moorhty, A. Harindranath, M. McManus, Z. Guan, N. Mas-
C. Troy, and A. Habib, “Evaluation of UAV LiDAR for Mapping tronarde, E. S. Bentley, and M. Medley, “A Middleware for Digital
Coastal Environments,” Remote Sensing, vol. 11, no. 24, pp. 2893– Twin-Enabled Flying Network Simulations Using UBSim and UB-
2925, December 2019. ANC,” in Proc. of International Conference on Distributed Computing
[57] R. Zhao, T. Woodford, T. Wei, K. Qian, and X. Zhang, “M-Cube: A in Sensor Systems (DCOSS), Los Angeles, CA, USA, September 2022.
Millimeter-Wave Massive MIMO Software Radio,” in Proc. of Inter- [80] J. Hu, M. McManus, S. K. Moorthy, Y. Cui, Z. Guan, N. Mastronarde,
national Conference on Mobile Computing and Networking, London, E. S. Bentley, and M. Medley, “NeXT: A Software-Defined Testbed
United Kingdom, September 2020. with Integrated Optimization, Simulation and Experiment,” in Proc. of
[58] C. Campos, R. Elvira, J. J. Gomez, J. M. M. Montiel, and J. D. 2022 IEEE Future Networks World Forum (FNWF), Montreal, Canada,
Tardos, “ORB-SLAM3: An accurate open-source library for visual, October 2022.

16
[81] N. Mastronarde, D. Russell, Z. Guan, G. Sklivanitis, D. Pados, E. S. puting: An asynchronous advantage actorcritic learning approach,”
Bentley, and M. Medley, “RF-SITL: a software-in-the-loop channel em- IEEE Internet of Things Journal, vol. 8, no. 4, pp. 2342–2353,
ulator for UAV swarm networks,” in Proc. of International Symposium December 2021.
on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), [101] T. Chen, K. Zhang, G. B. Giannakis, and T. Başar, “Communication-
Belfast, United Kingdom, June 2022. efficient policy gradient methods for distributed reinforcement learn-
[82] S. K. Moorthy, C. Lu, Z. Guan, N. Mastronarde, G. Sklivanitis, ing,” IEEE Transactions on Control of Network Systems, vol. 9, no. 2,
D. Pados, E. S. Bentley, and M. Medley, “CloudRAFT: A Cloud-based pp. 917–929, May 2022.
Framework for Remote Experimentation for Mobile Networks,” in [102] M. Lauri, D. Hsu, and J. Pajarinen, “Partially observable markov
Proc. of CCNC 2022 WKSHPS: 2nd International Workshop on Com- decision processes in robotics: A survey,” IEEE Transactions on
munication and Networking for Swarms Robotics (RoboCom 2022), Robotics (early access), pp. 1–20, September 2022.
Virtual Conference, January 2022. [103] V. Marojevic, I. Guvenc, R. Dutta, M. L. Sichitiu, and B. Floyd, “Ad-
[83] S. K. Moorthy, Z. Guan, N. M. E. S. Bentley, and M. Medley, vanced Wireless for Unmanned Aerial Systems: 5G Standardization,
“OSWireless: Enhancing Automation for Optimizing Intent-Driven Research Challenges, and AERPAW Architecture,” IEEE Vehicular
Software-Defined Wireless Networks,” in Proc. of IEEE International Technology Magazine, vol. 15, no. 2, pp. 22–30, June 2020.
Conference on Mobile Ad-Hoc and Smart Systems (MASS), Denver, [104] J. Breen et al., “Powder: Platform for Open Wireless Data-driven
CO, USA, October 2022. Experimental Research,” in Proc. of International Workshop on Wire-
[84] H. Viswanathan and P. E. Mogensen, “Communications in the 6G Era,” less Network Testbeds, Experimental evaluation and Characterization
IEEE Access, vol. 8, 2020. (WiNTECH, London, United Kingdom, September 2020.
[85] M. Tehrani-Moayyed, L. Bonati, P. Johari, T. Melodia, and S. Basagni, [105] D. Raychaudhari et al., “Challenge: COSMOS: A city-scale pro-
“Creating RF Scenarios for Large-scale, Real-time Wireless Channel grammable testbed for experimentation with advanced wireless,” in
Emulators,” in Proc. of Mediterranean Communication and Computer Proc. of International Conference on Mobile Computing and Network-
Networking Conference (MedComNet), Ibiza, Spain, June 2021. ing (MobiCom), London, United Kingdom, April 2020.
[86] B. Sheen, J. Yang, X. Feng, , and M. M. U. Chowdhury, [106] H. Zhang et al., “ARA: A wireless living lab vision for smart and
“A Digital Twin for Reconfigurable Intelligent Surface Assisted connected rural communities,” in Proc. of 15th ACM Workshop on
Wireless Communication,” arxiv.org, Sept. 2020. [Online]. Available: Wireless Network Testbeds, Experimental Evaluation and CHaracteri-
https://arxiv.org/abs/2009.00454 zation (WiNTECH), New Orleans, Louisiana, USA, January 2022.
[87] A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, [107] S. K. Moorthy, M. McManus, and Z. Guan, “ESN Reinforcement
M. Savva, S. Chernova, and D. Batra, “Sim2Real predictivity: Learning for Spectrum and Flight Control in THz-Enabled Drone
does evaluation in simulation predict real-world performance?” IEEE Networks,” IEEE/ACM Transactions on Networking, vol. 30, no. 2,
Robotics and Automation Letters, vol. 5, no. 4, pp. 6670–6677, October pp. 782–795, April 2022.
2020.
[88] Y. Jiang, T. Zhang, D. Ho, Y. Bai, C. K. Liu, S. Levine, and J. Tan,
“SimGAN: Hybrid simulator identification for domain adaptation via
adversarial reinforcement learning,” in Proc. of IEEE International
Conference on Robotics and Automation, Xi’an, China, May 2021.
[89] R. Romijnders, P. Meletis, and G. Dubbelman, “A domain agnostic
normalization layer for unsupervised adversarial domain adaptation,” in
Proc. of IEEE Winter Conference on Applications of Computer Vision,
Hawaii, HI, United States, January 2019.
[90] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via
transfer component analysis,” IEEE Transactions on Neural Networks,
vol. 22, no. 2, pp. 199–210, February 2011.
[91] Y. Wang and S. Zou, “Online robust reinforcement learning with model
uncertainty,” in Proc. of Advances in Neural Information Processing
Systems, Virtual Conference, December 2021.
[92] M. Khodabandeh, A. Vahdat, M. Ranjbar, and W. Macready, “A robust
learning approach to domain adaptive object detection,” in Proc. of
International Conference on Computer Vision, Seoul, Korea, October-
November 2019.
[93] S. Chen and Y. Li, “An overview of robust reinforcement learning,” in
Proc. of IEEE International Conference on Networking, Sensing, and
Control, Nanjing, China, Oct-Nov 2020.
[94] W. M. Kouw and M. Loog, “A review of domain adaptation without
target labels,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 43, no. 3, pp. 766–785, October 2021.
[95] G. Wilson and D. J. Cook, “A survey of unsupervised deep do-
main adaptation,” ACM Transactions on Intelligent System Technology,
vol. 11, no. 5, pp. 51:1–46, July 2020.
[96] K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakr-
ishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and
V. Vanhoucke, “Using simulation and domain adaptation to improve
efficiency of deep robotic grasping,” in Proc. of IEEE International
Conference on Robotics and Automation, Brisbane, Australia, May
2018.
[97] E. Derman, D. J. Mankowitz, T. A. Mann, and S. Mannor,
“Soft-robust actor-critic policy-gradient,” arxiv.org, 2018. [Online].
Available: https://arxiv.org/pdf/1803.04848.pdf
[98] T. Mu, G. Theocharous, D. Arbour, and E. Brunskill, “Constraint
Sampling Reinforcement Learning: Incorporating Expertise for Faster
Learning,” in Proc. of AAAI Conference on Artificial Intelligence,
Virtual Conference, February 2022.
[99] X. Han, J. Wang, Q. Zhang, X. Qin, and M. Sun, “Multi-uav automatic
dynamic obstacle avoidance with experience-shared a2c,” in Proc. of
2019 International Conference on Wireless and Mobile Computing,
Networking and Communications (WiMob), 2019, pp. 330–335.
[100] L. Liu, J. Feng, Q. Pei, C. Chen, Y. Ming, B. Shang, and M. Dong,
“Blockchain-enabled secure data sharing scheme in mobile-edge com-

17

You might also like