You are on page 1of 42

MÄLARDALEN UNIVERSITY

SCHOOL OF INNOVATION, DESIGN, AND ENGINEERING


VÄSTERÅS, SWEDEN

Thesis for the Degree of Master of Science in Computer Science -


Intelligent Embedded Systems, DVA428, 15.0 credits

Systematic Gap Analysis of Robot Operating System


(ROS 2) in Real-time Systems

Sahar Mobaiyen
smn21012@student.mdh.se

Examiner: Saad mubeen


Mälardalen University, Västerås, Sweden

Supervisor: Mohammad Ashjaei


Mälardalen University, Västerås, Sweden

May 18, 2022


Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Abstract
Nowadays most high-tech industrial robots and autonomous vehicles perform under real-time
constraints. Robot Operating System (ROS) has been developed as a framework consisting of
open-source libraries and tools facilitating the building of robot applications and components.
Since the primary versions of ROS had defects in meeting real-time constraints a new version
of ROS, called ROS 2, was developed. ROS 2 benefits capabilities such as Data Distribution
Service (DDS) for real-time communication. However, ROS 2 still has shortcomings in real-
time performance, hence it is under development.
This thesis aims at conducting a systematic literature review of the previous works on the real-
time performance of ROS 2. It also aims at classifying the problems and solutions mentioned
in the research and identify gaps in the studies. The systematic literature review in this thesis is
conducted based on the methodology to define the research questions, inclusion and exclusion
criteria for selecting the related research, as well as techniques for extracting and synthesizing
data from previous research. As a result, an evaluation of the chosen papers shows that most of
the studies are conference papers published in recent years between 2019 and 2022.
Furthermore, the majority of them are analytical research focusing on the latency and
schedulability of ROS-based systems. The gap analysis demonstrates the lack of research on
multiple topics related to real-time ROS-based systems such as worst-case execution time and
the jitter. Moreover, it shows that the research on the real-time performance of ROS 2 should
consider experimental and use case analysis to better demonstrate the identified issues and their
solutions.

i
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Table of Contents
1. Introduction ...................................................................................................................... 1
1.1. Motivation and contributions ......................................................................................................................... 1
1.2. Problem formulation ........................................................................................................................................... 2
1.3. Outline ........................................................................................................................................................................ 2

2. Background ....................................................................................................................... 3
2.1. Real-Time systems ................................................................................................................................................ 3
2.1.1. Scheduling ............................................................................................................................................ 5
2.1.2. Response time..................................................................................................................................... 6
2.1.3. Real-Time Embedded Systems .................................................................................................... 7
2.1.4. Distributed embedded systems ................................................................................................... 7
2.2.. ROS 2 ........................................................................................................................................................................... 8
2.2.1. ROS 2 Features.................................................................................................................................... 8
2.2.2. ROS 2 architecture ............................................................................................................................ 8
2.2.3. ROS 2 abstractions ............................................................................................................................ 9

3. Related work ................................................................................................................... 11

4. Research methodology ................................................................................................... 12


4.1. Identification of Research ...............................................................................................................................13
4.2. Selection of primary studies ...........................................................................................................................14
4.3. Study quality assessment.................................................................................................................................15
4.4. Snowballing ...........................................................................................................................................................15
4.5. Data extraction and monitoring ..................................................................................................................17
4.6. Data synthesis .......................................................................................................................................................18

5. Results ............................................................................................................................. 19
5.1. Status of the studies ...........................................................................................................................................19
5.1.1. Publication years ............................................................................................................................. 19
5.1.2. Publication venues .......................................................................................................................... 20
5.1.3. Research type .................................................................................................................................... 21
5.1.4. Contribution of the studies .......................................................................................................... 22
5.2. Discussion about the outcomes of the studies........................................................................................24
5.2.1. Latency ................................................................................................................................................. 24
5.2.2. Response time................................................................................................................................... 28
5.2.3. Deadline............................................................................................................................................... 28
5.2.4. Communication cost ....................................................................................................................... 29
5.2.5. Jitter ...................................................................................................................................................... 29
5.2.6. Worst-case execution time (WCET)......................................................................................... 29
5.2.7. Data size .............................................................................................................................................. 30

6. Gap analysis and conclusion.......................................................................................... 31

ii
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

7. Threats to validity .......................................................................................................... 33


7.1. Construct validity ................................................................................................................................................33
7.2. Internal validity ...................................................................................................................................................33
7.3. External validity ..................................................................................................................................................33
7.4. Conclusion validity..............................................................................................................................................33

8. Future Work ................................................................................................................... 34

9. References ....................................................................................................................... 35

iii
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

List of Figures
Figure 1: Real-Time systems classifications ........................................................................................... 4

Figure 2: Real-Time scheduling taxonomy ............................................................................................. 6

Figure 3: Response time demonstration .................................................................................................. 6

Figure 4: Distributed real-time embedded system................................................................................... 7

Figure 5: ROS 2 Architecture .................................................................................................................. 9

Figure 6: Message transmission between nodes .................................................................................... 10

Figure 7: The stages of conducting the systematic review .................................................................... 12

Figure 8: The percentage of papers from each database ....................................................................... 14

Figure 9: Snowballing procedure .......................................................................................................... 16

Figure 10: The number of papers in each step ...................................................................................... 17

Figure 11: Number of publications in terms of assessment of ROS 2 in real-time systems ................. 19

Figure 12: Publication venues ............................................................................................................... 20

Figure 13: Research Type...................................................................................................................... 22

Figure 14: Number of studies in each category ..................................................................................... 32

iv
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

List of Tables
Table 1: Linux variants............................................................................................................................ 5

Table 2: Number of papers from each database .................................................................................... 13

Table 3: Number of publications in each event ..................................................................................... 21

Table 4: Classification of studies .......................................................................................................... 31

v
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

1. Introduction
In recent years, industrial systems such as autonomous vehicles, process control systems, modern robots
in the industry, and air traffic control systems have become more complex and diverse. While the
necessity for high-tech devices is extended every day, the need for new operating systems for controlling
these devices and communication among them is essential. The common characteristic among most of
the mentioned devices is real-time requirements such as reliability, time predictability, and
schedulability. Although some of them do not have strict timing constraints, most of them not meeting
timing limitations may cause damage or even result in a catastrophe. For example, it is necessary for a
mobile robot to detect an obstacle, make a decision, and react in a certain amount of time, otherwise, it
will provoke undesirable results. These situations are common in the robotic fields and usually, they
require real-time capabilities. Such systems often need to perform tasks or transfer data over the internal
network of the robot similar to distributed systems, moreover, data transmission must occur with
predefined timing constraints.
Robot Operating System (ROS) was developed as a framework composed of open-source software
libraries and tools that help roboticists to build robot applications and construct robot components. Since
the primary version of ROS does not support priority and synchronization for tasks and there was no
efficient approach to real-time support, it was not suitable for real-time robot applications [1]. However,
as ROS had been accepted as a standard middleware in the robotics domain, there was a rising demand
for including real-time capabilities in ROS.
As a response to ROS’s shortcomings, especially in the real-time field, it was upgraded to a new version
known as ROS 2 which has been maintained by Willow Garage and Open-Source Robotics Foundation
(OSRF) since 2007. Furthermore, due to use of the DDS communication service in ROS 2, it is more
appropriate for real-time embedded systems and provides various transport configurations (e.g.,
deadline and reliability).

1.1. Motivation and contributions


Despite having various capabilities, there are still some unsolved problems in ROS 2, especially in the
real-time systems field, making future developments necessary for ROS 2.
Some of the mentioned problems include the following [2]:
1) Underlying hardware is not real-time and mostly has unpredictable strategies like caching.
2) There is not any control over the host OS scheduler since ROS 2 runs on top of a host OS.
3) Dynamic memory allocation makes the execution time unpredictable.
4) There is a high resource requirement that sometimes reaches 100MB RAM which can be
challenging for small embedded systems with usually less than 1M of RAM.
5) The execution model of ROS 2 nodes is not predictable and there is no formal model to compute
its worst-case execution time (WCET)1.
Furthermore, nowadays, there are new use cases in the robotics area such as multiple robots, embedded
platforms, and real-time capabilities. Since ROS 2 is a relatively new field in robotics, there is not
enough research about its real-time capabilities, problems, and solutions. Thus, this thesis aims at a
systematic literature review [3] and classifies the results of previous works to find the research gaps and
identify future research challenges.
The main contributions of this thesis are as follows:
 A review of research topics on real-time challenges and solutions of ROS 2.
 Classifying the subjects regarding problems of ROS 2 in real-time systems applications.
 Identifying the research gaps that need to be addressed in future research.

1
Worst-case execution time (WCET): Is the maximum execution time of the task on a specific hardware
platform.

1
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

1.2. Problem formulation


The goal of this research is to systematically review the state of the art related to the utilization of ROS
2 in real-time systems and to identify the research gaps. The systematic review will answer the following
research questions:

RQ1: What is the current research status on ROS 2 development in real-time systems?
RQ2: What are the main shortcomings that are identified in the current literature for ROS 2 in real-time
systems?
RQ3: What are the proposed solutions to overcome the identified shortcomings in ROS 2 in real-time
systems?
RQ4: What is the research gap in the development of ROS 2 in real-time systems?

Although ROS 2 has capabilities such as supporting real-time systems, small embedded platforms, cross
platforms (e.g., Linux, Windows, Mac, and Real-Time OS (RTOS)), and non-ideal networks, it has
deficiencies that are not completely clear yet. Crucial points in real-time systems operations such as
predictable end-to-end chain latency in ROS-based systems remain a challenge.

This software’s modularity, composability, and especially its open-source community play a major role
in its development. Regular annual releases for ROS 2 integrate all new features and findings and make
them available in royalty-free versions for all users and reciprocally, ROS 2 users share their experiences
and practical projects on ROS 2 with others. Moreover, in both industrial utilization and academic
evaluations of ROS 2 most of its shortcomings are being explicit and solutions for them are being
explored. For gathering all valuable information about this common robotic operation system and
finding the subjects which have rarely been studied, this systematic literature review has been
conducted.

This thesis aims at shining a light on ROS 2 functionality from a real-time perspective to have a role in
facilitating future studies in this regard.

1.3. Outline
The structure of this thesis is as follows. Section 2 describes the basic knowledge about the real-time
systems and ROS 2 as a background. Section 3 provides the methodology for conducting the systematic
review and gathering information for the thesis. Section 4 is a discussion regarding the outcomes
followed by section 5 which is a gap analysis and conclusion of the study. Section 6 covers a prospect
for future research.

2
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

2. Background
In the Robotics domain, one of the characteristics that can distinguish real-time performance from other
computation types is “Time”. The word time means that the correctness of the system response not only
depends on the logical result of the performance but also on the time at which the task is executed. The
word real indicates the reaction of the system to external events that must occur during its evolution [4].
The existence of real-time processing is vital for robotics, healthcare, manufacturing, flight control
systems, chemical and nuclear plant control, and other industries that have hard timing requirements.
They are highly dependent on real-time data to guarantee safety, efficiency, and reliability.

2.1. Real-Time systems:


There are several important properties that real-time systems must have to support critical performance.
According to Buttazzo [4], the main characteristics of real-time systems are defined as follows:

 Timeliness. The correctness of results not only depends on the correctness of the value but also
on finishing the task execution in a specific time. For meeting the timing constraints, the
operating system must be able to provide a kernel mechanism to handle the tasks in the
determined time.

 Predictability. To satisfy the operation of real-time systems at the desired level, the system
must be able to predict the results of any scheduling decision. Guaranteeing all timing
requirements in safety-critical applications is vital. In other words, the system is considered to
be timing predictable if there is the possibility to demonstrate or prove that for a given system
model or a set of assumptions, all timing requirements will be satisfied during execution [5].
Thus, before putting the system in operation, an offline analysis must be performed. If some
tasks cannot satisfy their timing requirements, the system must be notified before real operation
and alternative actions must be planned.

 Efficiency. Since most real-time systems are embedded systems with limitations regard to size,
weight, energy consumption, memory, and computational power the management of these
resources in an efficient way to achieve a desired level of performance is crucial.

 Robustness. It is important for a real-time system not to continue working in high workload
conditions. Therefore, it must be designed to be able to manage the anticipated computational
loads. In real-time systems with various resources and variable workloads, adaptation to
different conditions is essential.

 Fault tolerance. All critical components of the real-time systems must be designed to be able
to handle single software or hardware failures without system disruption. Therefore, they must
be fault-tolerant.

 Maintainability. The architecture of real-time systems must be designed considering the


modularity features to make the system modifications applicable. [4]

One of the main differences between real-time and non-real-time tasks is the notion of the deadline,
which is the latest time that the real-time task must complete its execution. According to this
characteristic real-time systems are classified into three categories [4]:

- Soft real-time: When missing a deadline will not cause any critical result and the system can
continue its execution, though the quality of output will decrease. The majority of real-time
systems such as web browsing and gaming are in this group.

3
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

- Firm real-time: The information received after the deadline is counted as invalid data. Same
as a soft real-time system missing a deadline will not cause a failure for the system yet the
quality of service (QoS) will be reduced. Robotic assembly lines or financial forecast systems
are examples of this type of system.

- Hard real-time: When missing a deadline will result in catastrophic consequences and the
system's functionality will be terminated. Critical systems, such as autopilot systems, are in
this group.

Classification of real-time systems according to deadline is demonstrated in Figure 1.

Figure 1: Real-Time systems classifications

Most real-time systems are often low-latency systems and need to execute with minimum latency such
as automated piloting systems which must be reactive to sudden occurrences in the environment.
However, what is a defining factor for real-time systems is predictability. The real-time system must be
able to finish a certain task by a certain time. Therefore, it is important that the latency to be measurable
and the maximum possible latency for a task to be definable.
For a real-time computer system, both a real-time operating system and user code that delivers
deterministic execution are necessary. There are some examples of the real-time environment (Table 1):

4
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

 The RT_PREEMPT Linux kernel patch makes the Linux scheduler preemptible.
 Xenomai, a co-kernel (or hypervisor) that makes the Linux kernel behave as an idle task with
the lowest priority besides the real-time scheduler.

Linux variants:

OS Real-Time Max latency (μs)


Linux no 10
RT_PREEMPT soft 10 − 10
Xenomai hard 10

Table 1: Linux variants [6]

The term Latency (mentioned in Table 1) can be defined as the maximum time elapsed for an instance
of a task that has the highest priority among other ready tasks to become ready and the time instance
when it is allowed to execute [7].
2.1.1. Scheduling

Real-time systems must respond to events in a defined time. In other words, these systems must meet
deadlines under any conditions. For this reason, task scheduling is a crucial section of designing any
real-time system.
A Task or a Process or a Thread is a computation that is executed by the CPU in a sequential method.[4]
When several tasks have to be executed in a single processor some overlaps may occur. The CPU must
be able to manage these tasks according to a pre-defined criterion. The set of rules which determines the
order of task execution is known as the scheduling algorithm. [4]

2.1.1.1. Classification of scheduling algorithms

Scheduling algorithms are classified according to different properties. The main classes in real-time
scheduling are as follows [4]:

 Preemptive vs. Non-preemptive:


- In preemptive algorithms, according to the scheduling policy, the running tasks are allowed to
be interrupted by other ready tasks.
- In non-preemptive algorithms, the running task must finish its execution process to allow other
tasks to start.

 Static vs. Dynamic:


- In static algorithms, decisions are made and assigned to tasks before their execution starts.
- In dynamic algorithms, decisions are made according to dynamic parameters and assigned to
tasks during their execution process.

 Off-line vs. Online:


- A scheduling algorithm is off-line if the scheduling decisions are taken prior to the running of
the system. The scheduling decisions are saved in a table to be used during the run-time.
- A scheduling algorithm is online when scheduling is done during the run-time at the time a
new task enters the system or the execution of the running task finishes.

Figure 2 demonstrates the various classifications of scheduling algorithms.

5
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Figure 2: Real-Time scheduling taxonomy

2.1.2. Response time

A task’s Response time is the total time from when the task is activated (triggered for execution) to the
time that the execution of the task is completed considering interferences from all other tasks in the
system . The task is schedulable if the deadline is less than the response time.
In Figure 3, R is task T2’s response time. T2 has been preempted by the higher priority task, T1.

Figure 3: Response time demonstration

6
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

2.1.3. Real-Time Embedded Systems

Embedded systems are designed to perform a dedicated function, rather than be a general-purpose
computer doing several functions. They are managed by microcontrollers or digital signal processors
(DSP), Field -Programmable Gate Array (FPGA), and Application-Specific integrated circuits (ASIC).
The Embedded system consists of small portable devices (e.g., cellular phones, cameras, smart toys) to
larger systems (e.g., aircraft, cars, robots). Usually, the real-time computer is embedded into a larger
system. A real-time embedded system is a specific type of embedded system which operates on a
particular type of operating system that is called a Real-Time Operating System (RTOS) and works
based on real-time scheduling.
A real-time embedded system usually must meet requirements in the following conditions:

- Device size
- Limited memory
- Power consumptions
- Environment condition for device operation

2.1.4. Distributed embedded systems

A distributed embedded system is composed of various computing devices such as embedded


microcontrollers, networking devices, and embedded PCs which interact together through a network.
These devices divide the work and therefore, the job can be done more efficiently with more reliability
compared to the time when devices work separately.

A distributed embedded system can be designed with various methods, however, basically, it consists
of units called nodes where each node is dedicated to performing a specific function or a class of
functions. Each node has a communication controller, CPU, ROM, RAM, and an I/O interface to sensors
and actuators. The complete system is composed of several networks which are connected [8].
(See Figure 4).

Figure 4: Distributed real-time embedded system [8]

Typically, there are limitations to designing distributed embedded systems: they are small and highly
resource-constrained, bandwidth limited, and wirelessly connected. Furthermore, the heterogeneity,
especially in real-time embedded systems, is of a high degree since the system has made up a large
number of interacting elements which need to perform parallel tasks with different requirements and
often make decisions dynamically [9]. The heterogeneity allows system integrators to combine nodes
implementing different functionalities and utilize them in their systems to reach the desired
implementation.

7
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

To transmit data between nodes, several communication protocols are designed for real-time networks
such as the Controller Area Network (CAN), and the Local Interconnection Network (LIN) [8].
Due to the advantages of real-time distributed systems, several robots are designed based on distributed
architecture to provide modularity, reliability, and performing massive and complex computations while
executing real-time tasks.

2.2. ROS 2
As a prevailed framework, the Robot Operating System (ROS) has been utilized in the evolution of
many systems such as auto wares. Although ROS provides several facilities in the robotics domain such
as hardware abstraction, libraries, visualizers, device drivers, message-passing, and package
management and it can run on various operating systems, it does not meet the real-time requirements.
Thus, it is not recommended to be used in real-time systems. To address the shortcomings of ROS, the
second version of ROS, ROS 2, was initiated by the Open-Source Robotic Foundation. ROS 2 was first
released in 2015, while in 2019 the first version with Long-Term Support (LTS) was published.

Robot Operating System (ROS) with a distributed real-time system architecture, is a collection of
hardware drivers, software, tools, and open-source algorithms to help roboticists facilitate the design of
robotics systems.

2.2.1. ROS 2 Features

Several features have made ROS 2 popular in both industrial and academic environments. Using
distributed processing framework has enabled the designing of scalable application systems in single-
processor through multi-processor architecture. ROS 2’s extensive and open-source libraries help the
development of robotic projects so rapidly. In addition to libraries, lots of high-performance tools such
as simulation tools, visualization, and route planning are available for ROS 2 users. Some other features
of ROS 2 are as follows:

 Discovery, transport, and serialization over Data Distribution Service (DDS)


 Publish/subscribe over topics
 Quality of service settings for handling non-ideal networks
 DDS-Security support
 Launch system for coordinating multiple nodes
 Preliminary support for real-time code [10]

2.2.2. ROS 2 architecture

ROS 2 is composed of multiple layers. The application layer consists of language-specific client libraries
supported by programming languages such as C++ and Python. The ROS client library (rcl) is
responsible to provide consistency between programs written in various languages with the help of APIs.
ROS middleware library (rmw) facilitates communication between rcl and the Data Distribution Service
(DDS). DDS is an industry standard for real-time communication and it has been added to ROS 2 to
satisfy real-time constraints in transferring data in the publisher/subscriber method [11].
The simple architecture of ROS 2 is depicted in the figure below (Figure 5):

8
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Figure 5: ROS 2 Architecture- adapted from [11]

2.2.3. ROS 2 abstractions

ROS 2 is a middleware composed of Nodes, Topics, and Services that allows message transmission
between different ROS processes via the publish/subscribe mechanism (Figure 6). The principal part of
any ROS 2 system is the ROS graph which demonstrates the network of nodes and connections between
them.

Nodes in ROS 2
Each node in ROS 2 is responsible for a single task (e.g., one node for controlling the motors of the
robot's wheel, one node for controlling the robot’s camera, etc). The message transmission between the
nodes is performed in various ways such as topics, services, actions, or parameters.

Topics in ROS 2
The topic in ROS 2 implements as a bus in distributed systems to exchange messages. A node may
publish or subscribe a message to any number of nodes.

Services in ROS 2
In addition to the publisher/subscriber method for data transfer in ROS 2, there is another way of
communication between nodes which is called “Service”. In contrast to the publishing and subscribing
method, which is updating continuously, services only provide data when they are specifically called by
a client.

Parameters in ROS 2
Parameters are known as settings for the node. Parameters can be stored in nodes as various types such
as floats, integers, booleans, strings, and lists.

9
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Figure 6: Message transmission between nodes [10]

The callback and executer processes are described below.


Callback: ROS 2 node implementation based on different callback functions [2] which is the minimal
schedulable entity that is classified into five types: timer, subscription, service, client, and waitable
callbacks. Some of them are time-triggered such as a timer which means that they will arrive periodically
at a specific rate. Others are event-triggered, and they will be triggered by an external event. The
transportation of messages between publisher and subscriber is performed by the callback function in
ROS 2 [11].

Executer: The responsibility of the executor is to decide which callback from which node should be
executed every single time by the processor. In the other words, the OS scheduler schedules the
performance of executers but scheduling the nodes and their callbacks are managed by the executer’s
scheduler [2]. The timer is always the highest-priority callback. All callbacks are non-preemptive. When
all queues are empty, the executor updates the status of non-timer callbacks in their queues. However,
this delay makes the priority assignment for non-timer callbacks ineffective and results in chains running
in a round-robin manner [11].

The communication between ROS 2 nodes is conducted in the publisher/subscriber method. Each node
is able to execute in both roles of publisher and subscriber. While a message is published by the node
the subscriber must check the topic continuously to see when the message arrives and then subscribe to
it in a data structure. When the node is informed about the incoming message, the subscription will be
divided into three parts:

1) Memory assignment by the allocator for arriving message.


2) Filling the allocated memory with the received data by RCL.
3) Executing the callback by the message.
Most of the computation time is dedicated to memory allocation [12].

10
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

3. Related work
In this section, all previous works regarding gap analysis or systematic literature reviews for robot
operating systems (ROS 2) should be gathered and their main points, as well as strength, weaknesses
and findings, should be analyzed. According to an extensive search in various databases and Google
Scholar, there was not any accurately matching research with the topic of the current thesis. Hence, it is
concluded that this study is the first work that systematically analyzes the gaps in previous studies about
the real-time performance of robot operating system (ROS 2). Nevertheless, other studies found that
were close to the desired subject and their outcomes besides research methodology could be helpful
during the current work.

A systematic literature review by Köksal et al. [13] regarding obstacles in Data Distribution Service
(DDS) middleware has been conducted based on studies published since the introduction of the DDS in
2003 to 2017. In this in-depth study, 34 papers have been identified as primary studies and the challenges
of utilizing DDS in various domains such as cloud computing, wide area network, and component-
oriented development have been discussed in 11 categories. The identified obstacles included the
complexity of DDS configuration, measurement and optimization, performance prediction,
interoperability among DDS vendor implementations, data consistency, reliability, and scalability in
DDS. Moreover, the suggested solutions for the mentioned problems have been discussed according to
the previous studies.

Another comprehensive study by Tsardoulias et al. [14] compares various robotic frameworks,
architectures, and middleware such as ROS, HOP, RoboFrame, and RT middleware considering the
operating system and programming language. Moreover, the metrics used in this study are supporting
distributed execution, being open-source, having hardware interface and drivers, containing already
developed robotic algorithms and simulators, and providing real-time capabilities. This research
concludes that among all discussed frameworks and middleware ROS is the most popular one in the
robotic community since it provides modularity, various tools such as rviz for visualization, and tf for
investigation of geometric transforms. In addition, ROS supports the large number of submitted
algorithms that cover several applications such as mapping, navigation, and motor control. Furthermore,
this study indicates that only a few numbers of robotic frameworks and middleware support real-time
constraints.

A literature review by Balador et al. [15] focuses on communication middleware technologies for
industrial control systems. In this study three communication middleware named OPC UA, DDS, and
RT-CORBA have been evaluated and compared in terms of architecture, real-time capabilities, security,
and quality of service. The results of the research demonstrate that DDS is able to overcome many of
the problems and requirements of distributed control systems by having a publisher/subscriber
communication service. Creating minimal overhead in the procedure of data distribution, the ability to
control the quality of service (QoS), and a rich set of QoS policies are the specific capabilities of this
middleware. However, the results show that none of the mentioned communication middleware can
support all requirements and QoS attributes of distributed control systems.

Elkady et al. [16] have published a literature survey and attributed-based bibliography about various
robotic middleware such as Orocos, ROS, Pyro, Player, and Miro. In this research capabilities of robotic
frameworks regarding software, architecture, simulation environment, standard and technologies,
distributed environment, security for controlling access, fault detection and recovery, and real-time
performance have been evaluated. Furthermore, a guideline has been provided to assist developers to
select the proper middleware for robotics software based on the capabilities, strengths and weaknesses
of the evaluated frameworks.

11
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

4. Research methodology
Due to the sharply increased number of publications in various fields, the need for structured and
systematic research for finding answers to questions is essential. These types of secondary studies follow
specific guidelines and methodology. Although this kind of research was not utilized commonly in the
technological domain, in recent years, due to increased focus on evidence-based experiments the use of
systematic and empirical research methods is increasing.

A Systematic Literature Review (SLR) is a procedure of finding, reviewing, evaluating, and interpreting
all previous research in a specific manner and answering the specified research questions. Individual
studies contributing to a systematic study are primary studies and a systematic literature review is
classified as a secondary study [17].
There are some important reasons for conducting a systematic literature review such as [17]:

 Summarising the empirical evidence and findings of a technology


 Identifying gaps in current studies to clarify the path of future investigations
 Providing a background for new research activities

Most research starts with a kind of literature review to gather information about previous findings and
not begin the study from the scratch, but what makes the systematic literature review different is the
whole information and previous research must be collected and categorized to ensure the reader about
the completeness of the results. Furthermore, if the SLR is conducted accurately with a well-defined
methodology, the results are less likely to be biased, although there is a possibility to face biased research
at a primary level of SLR. The main disadvantage of SLR is that it requires more effort and time
compared to the traditional literature reviews.
The research methodology is depicted in Figure 7.

Identification of
research

Selection of
primary studies

Study quality
assessment

Snowballing

Data extraction
and monitoring

Data synthesis

Figure 7: The stages of conducting the systematic review

12
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Tools for systematic literature review

If the number of primary studies is too large, the process of SLR can be repetitive and laborious. There
are tools supporting systematic literature reviews such as StArt, Covidence, SR Toolbox, and several
other tools with various functionalities that facilitate the SLR process. One of the most popular SLR
tools in the software engineering domain is StArt, State of the Art through Systematic Review, which
provides support to SLR activities except for automated search in primary studies. Therefore, the
researcher must search manually through the databases and export the results as a BibTex file into StArt
which has facilities such as reference management, customization of attributes, and automatic
classification of papers [18].

4.1. Identification of Research


The main goal of the SLR is to determine the research area, the type of research, the quantity and the
results, and the number of publications in that area which are reflected in research questions [19]. Four
research questions have been mentioned in the problem formulation section of this thesis. The research
questions must be broken down into individual parts such as population, intervention, comparison, and
outcome [17]. Then, the search string must be written by finding the list of synonyms, alternatives, and
abbreviations of desired terms. The string must be related to the research question and driven by different
aspects of the search structure. This string is used accompanied by boolean expressions to collect
relevant studies from various databases. In this research, the search terms refer to different forms of the
words “ROS 2” and real-time in the abstract section of papers and it collects research that is published
after 2016-01-01. Although the public release of ROS 2 was in 2017 it has been introduced before that
and the most initial research was found in 2016.

In this thesis, four databases are selected: IEEEexplore, ACM, Scopus, and Web of Science with the
following search string:

("ROS 2" OR ROS2 OR "Robot* Operating System") AND ("Real Time" OR real-time OR "RT"
OR time-critical)

The total number of 807 papers has been found from the mentioned databases as follows:

Database URL Address Number of Papers


IEEEexplore https://ieeexplore-ieee-org 180
ACM https://dl-acm-org 26
Scopus https://www-scopus-com 359
Web of Science https://www-webofscience-com 242

Table 2: Number of papers from each database

13
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

22%
30%

3%

45%

IEEEexplore ACM Scopus Web of Science

Figure 8: The percentage of papers from each database

The next step is to remove duplicated studies from the initial list of papers. After applying this step, 387
papers out of 807 papers remained.

4.2. Selection of primary studies


After obtaining the primary relevant studies, they must be assessed for their accurate matching with the
research questions. For this purpose, the selection criteria must be defined. On the other hand, the initial
list of papers should be refined according to the including and excluding criteria, which are decided
based on the research questions, ensuring that the studies are classified accurately [17].
The inclusion and exclusion criteria defined in this thesis are as follows:

 Inclusion criteria:
IC 1. The study discusses at least one challenge in the field of Real-Time performance of ROS2.
IC 2. The study discusses at least one solution or suggestion in the field of Real-Time
performance of ROS 2.
IC 3. The study demonstrates any experimental results in the field of Real-Time performance
of ROS 2.

 Exclusion criteria:
EC 1. The study relates to the previous version of ROS.
EC 2. The study is the previous version of a later study.
EC 3. The full-text paper is not accessible.
EC 4. The paper is a survey.
EC 5. The paper is not peer-reviewed, such as white papers.

By applying this step to the title, abstract, and introduction sections, 77 papers were selected.

14
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

4.3. Study quality assessment


After the inclusion/exclusion criteria step, the quality of chosen studies must be assessed. This stage is
mandatory for providing more detailed and accurate inclusion/exclusion criteria and to find means for
weighting the studies when the results are being synthesized. Moreover, it will be beneficial in guiding
recommended sources for future research [17].

The unclear point in this stage is that there is not any clear definition for quality assessment, but it has
been recommended that the quality of studies be assessed in a way to minimize bias and maximizes the
validity [17]. In this step, the full text of the papers has been reviewed to find the directly relevant studies
to our research questions. As the result, 15 among 77 papers were identified as relevant papers to the
research in the analysis of ROS 2 for real-time applications.

4.4. Snowballing
The papers that have been selected after the quality assessment stage are the start set for the snowballing
process. According to snowballing guideline by Wohlin [20], the snowballing procedure must be
performed in iterative backward and forward approaches (Figure 9). In backward snowballing, the
reference list of selected papers in the previous step must be evaluated to identify new papers to include
in the final list. For this reason, the papers that do not satisfy the basic selection criteria such as language,
publication year, and the type of publication or the papers which have been examined before in previous
steps or iterations must be removed from the process. Nonetheless, to decide about the inclusion of new
studies more assessments such as defining where and how the paper has been referenced are needed
since the place of reference for the candidate paper may be representative of valuable information. After
examining all available information about the candidate paper in the paper that is being examined, for
the final decision, the inclusion candidate paper must be reviewed to see whether it fulfills the
inclusion/exclusion criteria questions to be listed as the final papers in the study.

Forward snowballing is referred to finding and evaluating the papers which have cited the paper under
examination. The citations are being examined by Google Scholar and for deciding about the inclusion
or exclusion of that paper in the first step the information provided in google scholar is being assessed.
If the information was insufficient for deciding, the citing paper must be reviewed. The process of paper
evaluation in forward snowballing is similar to the backward process. Forward and backward
snowballing is an iterative process that must be continued until no new paper could be found related to
the search questions.

15
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Figure 9: Snowballing procedure [20]

By applying the snowballing process to selected papers’ reference lists, two more papers were found
and added to the final list of papers.

16
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Figure 10: The number of papers in each step

4.5. Data extraction and monitoring


The most important action in this section is designing forms to record various information obtained from
studies. The extraction forms must be designed according to the required information for answering the
research questions. The required data can be collected in a spreadsheet in excel format in a structured
manner to make the next steps more convenient.
Types of data that are needed for this thesis are as follows:

 Title: The title of the paper helps to distinguish relevant and non-relevant studies in initial
assessments.
 Authors: The name and surname of the author/s who has written the publication.
 Date of publication: The year in which the paper has been published.
 Event: Indicates where the paper has been presented. It can be a journal, conference, or
symposium.
 DOI: Digital Object Identifier (DOI) is a unique and never-changing string assigned to online
articles, books, or other publications.
 Abstract: The summary of the paper written by the author of the paper.
 Keywords: The significant words in the paper are chosen by the author and help others to find
the article on databases.
 Source: The database which the paper has been downloaded from.
 Notes: The most important points of the paper were written by the reviewer.
 Research type: In this thesis, research type is defined in four categories as follows:
experimental, analytical, simulation, and concept.

17
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

- Experimental research is based on evidence obtained from observations or data collection


methods. In this type of research, a scientific investigation should be performed to measure the
experimental probability of the research variables.
- Analytical research requires critical thinking skills and the assessment of information and facts
related to the research area.
- Simulation requires creating a mathematical model for an authentic phenomenon and
mimicking the outcomes of that phenomenon in the real world by utilizing simulation tools for
observing the execution results.
- Conceptual research refers to a type of study that is being conducted by observing and
analyzing the present information about the given subject. It is related to just an abstract concept
or idea and does not involve any practical experience.
 Research contributions: It should reflect the main activities that have been done and the
outcomes that have been achieved.
 Level of the details of the contribution: In this thesis, the contribution of the research is
categorized into abstract, intermediate, and fine-grained levels.
- At the abstract level, the contribution of the paper is conceptual or theoretical. It evaluates the
topic critically. It should review the past research for creating a diverse perspective on the
existing work. However, it does not provide any proof of concept.
- Intermediate level contains the concept with a brief proof of it.
- Fine-grained contribution goes through the details of the concept and its proof.
 The degree of formalization of the contribution: It refers to the standardized and structural
research in which the subject of the study has been proved by mathematical methods. The
formalization level can be defined as formal, semi-formal, and informal.
 The challenges targeted in the paper: It demonstrates the research challenges that have been
discussed in the research.
 The solutions targeted in the paper: It shows the solutions suggested or proved for the
mentioned challenges during the study.
 Metrics: The metrics that have been utilized for experiments or tests.
 Strengths and weaknesses of the study: In this section, the positive and negative points of the
research is being discussed.
 The maturity classification: Research is known as unmatured when just basic ideas without
any proof are presented. When there is a proof of concept, and the usability of the approach has
been demonstrated in the use case the research can be categorized as somewhat mature. Mature
research is defined as a study when the approach has been thoroughly discussed and the concept
is proved. Furthermore, the usability of the approach has been demonstrated in use cases and
the approach has been used by the research community [21].

4.6. Data synthesis


This phase of the systematic research involves summarizing the results of the primary studies which can
be presented in descriptive (non-quantitative) and quantitative forms. Obtaining quantitative data is
conducted by statistical methods; furthermore, for representing quantitative results, it is important to
show them in a comparable way and mention the mean and the variance of data for each study. On the
other hand, extracting descriptive (narrative) information data should be structured in a way to highlight
the similarities and differences between various research outcomes and be consistent with the research
questions to facilitate finding answers for them. If any inconsistency (heterogeneity) is found in the
results, the potential sources of heterogeneity besides the impact of that on the results must be evaluated.
In terms of conducting systematic reviews in IT and software engineering, usually, the reviews are
qualitative (i.e., descriptive) in nature. Even when data is quantitative since the reporting protocols for
studies vary so much, performing a purely statistical analysis is impossible. In these types of studies
tabulating the information is necessary, however, an explanation is needed to clarify how the aggregated
data will be useful to answer the questions [22].

18
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

5. Results
The final and the chief part of a systematic review is analyzing results and leading the discussion about
the outcomes to answering the research questions. The presentation of results will be in two categories:
i) the current status of the studies, and ii) findings from the research papers.
In the research status evaluation, the abundance of studies about ROS 2 in recent years besides the events
in which the papers have been presented will be discussed. The outcomes section will be focused on
having a conclusion in terms of ROS 2’s challenges and solutions in real-time systems and finding the
gaps in these studies.

5.1. Status of the studies


These analyses have been done to answer RQ1.

5.1.1. Publication years

The year 2016 has been chosen as an initial year for finding the papers about ROS 2. Although in 2014
ROS 2 has been announced under the name of ROSCon, the first distribution and public release of ROS
2, called Ardent Apalone, was on 8 December 2017. The only paper found before 2018 was a paper by
Maruyama et al. [23] in 2016. It is one of the most initial and fundamental studies which evaluates the
ROS 2 performance and compares it with ROS. However, it is not focused on the real-time performance
of ROS 2, since it has comprehensively assessed ROS 2 and has been cited in most of the research it has
been selected for our research. What is obvious in Figure 10 is that there is a sharp increase in the number
of papers after 2019. Moreover, the search for papers for this thesis has been conducted in the early
months of 2022 and for this reason, any publication was not found for this topic in 2022. This dramatic
rise in the number of publications in recent years demonstrates the increasing popularity of ROS 2 in
real-time systems and implies the importance of further research in terms of the utilization of ROS 2 in
real-time systems. In another approach, by utilizing ROS 2 more extensively in both academic and
industry, the shortcomings of this framework become more explicit. Thus, the research for overcoming
these defects becomes wider. Therefore, being familiar with ROS 2, using it at a professional level,
recognizing its deficiencies, finding a solution for them, and conducting experimental tests for
identifying the outcomes are time-consuming processes, most of the studies often focus on the limited
aspects of ROS 2. It is crucial to gather all this information to classify and reach a conclusion.

Number of publications per year


9

0
2016 2017 2018 2019 2020 2021

Figure 11: Number of publications in terms of assessment of ROS 2 in real-time systems

19
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

5.1.2. Publication venues

In terms of the publication venues, the majority of papers selected for the review (10 out of 17) are
conference papers and 4 of them have been presented in symposiums. Only 3 papers have been published
in journals (Figure 11).

18%
23%

59%

Journal Conference Symposium

Figure 12: Publication venues

As depicted in Figure 11, the majority of publications are related to conferences and symposiums, while
just 18 percent have been published in journals. Generally, the conference papers are considered formal
and fundamental research sources, however, since the review process for the papers published in
journals is longer, they are considered more credible resources. It should be noted that the subjects of
the conferences mentioned above were related to embedded systems and real-time computing, intelligent
robotics, automation science and engineering. Table 3 shows the events and the number of publications
in each event.

20
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Number of
Event
publications
Proceedings of the 13th International Conference on Embedded
Software, 2016
1
31st Euromicro Conference on Real-Time Systems. Schloss Dagstuhl,
2019
1
2018 IEEE 24th International Conference on Embedded and Real-
Time Computing Systems and Applications (RTCSA)
1

2018 International Journal of Advanced Robotic Systems 1

2020 IEEE Real-Time Systems Symposium (RTSS) 1

IEEE Access 2020 1

International Journal of Parallel Programming 2020 1


2020 IEEE International Conference on Embedded Software and
1
Systems (ICESS)
2020 IEEE 16th International Conference on Automation Science and
1
Engineering (CASE)
2021International Conference on Intelligent Robotics and
Applications- Springer Nature Switzerland
1
2021 IEEE 27th Real-Time and Embedded Technology and
2
Applications Symposium (RTAS)
2021 IEEE 17th International Conference on Automation Science and
Engineering (CASE), Lyon, France
1
2021 IEEE International Conference on Multisensor Fusion and
1
Integration for Intelligent Systems (MFI)
2021 IEEE/RSJ International Conference on Intelligent Robots and
1
Systems (IROS)-September 27, Prague, Czech Republic
2021 IEEE International Conference on Robotics and Automation
(ICRA 2021), Xi'an, China
1

2021 IEEE Real-Time Systems Symposium (RTSS) 1

Table 3: Number of publications in each event

5.1.3. Research type

In terms of research type, this category has been divided into two sections: preliminary research type
and secondary research type. The reason behind this classification is that all the studies selected for this
thesis have two sections. Mainly, the beginning part of the study is an analytical or conceptual evaluation
regarding one of ROS 2’s challenges or solutions in real-time systems which has been discussed
extensively to demonstrate or prove the mentioned cases. In all the selected papers, this main body is
followed by an experimental test case to show the correctness of the outcomes related to the previous
section. In this case, preliminary research refers to the type of research in the beginning section and the
secondary research type mentions the type of research in the second part. The type of research in the
mentioned sections has been defined according to the definitions in section 3.4.

21
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Figure 12 depicts that the majority of studies (11 out of 17) start with an analytical evaluation of ROS
2’s specific characteristics and in the second part of the study an experiment is being conducted to prove
the findings of the previous section. Three of the studies present only concepts in the beginning part,
but prove them by experiments [24], [25], [23]. In one study simulation tool has been utilized [26] and
two of the papers are completely experimental [27].

Even though most of the studies use an experimental case for showing the correctness of the presented
analysis and findings in the research, in ROS 2 experimental tests are tools for finding defects and new
cases for future studies. Since in distributed systems and ROS-base applications considering all
effecting factors besides environmental impacts on the system is relatively impossible, experimental
tests or case studies in the real world can show undefined situations. Furthermore, long-term
experiments have significant importance, because in ROS-based systems several nodes are interacting
with each other and any negligible fault in each of them can be accumulated over long periods and cause
real-time problems.

Research Type

20 17

15
11

10

3
5 2
1
0 0 0
0
Preliminary Research Type Secondary Research Type

1- Experimental 2- Analytical 3- Simulation 4- Conceptual

Figure 13: Research Type

5.1.4. Contribution of the studies

Since the implementation of ROS 2 in real-time systems is a novel topic there is not so much research
in this regard. Each of the studies selected for this systematic research has evaluated ROS 2’s specific
property.
Since the main change between ROS and ROS 2 was the utilization of Data Distribution Service (DDS)
as a communication method in ROS 2, the first study by Maruyama et al. [23] in 2016 provides proof
of concept for utilizing DDS in ROS 2 and discusses its performance.
Regarding DDS in ROS 2 a paper by Diluffo et al. [28] aims at exploring the systematic security model
for ROS 2 besides the assessment of possible risks associated with the cognitive layer. For this reason,
a detailed analysis has been done on DDS security standard. Furthermore, another research by Morita
et al. [29] mentions that several DDS implementations are required for various messages, which must
be used selectively. Thus this study proposes a mechanism for binding a suitable DDS implementation
dynamically for ROS 2.

22
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Setting the theoretical foundations for automated analysis tools by exploring the timing behavior of ROS
2 besides presenting and validating a model of ROS applications and calculating the end-to-end response
time analysis is another most-cited research has been conducted by Casini [30].
One of the valuable and initial studies about how callback priority assignment affects the response time
of processing chains on ROS 2 executor and developing a new technique for response time analysis of
processing chains on ROS 2 executer has been done by Tang et al [26].
Another research that was done by Park et al. [27] provided indicators that are required to implement a
ROS-based system to see whether it satisfies the real-time constraints.
Since the latency of the ROS 2 message passing process is limited by converting and deconverting
procedure, an adaptive two-layer serialization algorithm to determine the order of message converting
and serialization adaptively was designed by Jiang et al. [31] in 2020. In addition, in the same year, a
new real-time executor (Callback-group-level Executor) has been applied to ROS 2 by Yang Y. and
Azumi T [32].
Analytical research by Puck et al. [33] discussed the basic setup for distributed and real-time capabilities
on ROS 2 and showed its limitations.
The first extensive research regarding utilizing PREEMPT_RT to add real-time capabilities to the Linux
Kernel and use EtherCAT master with ROS architecture to analyze timing performance was conducted
by Ye et al. [25].
Although the topic of the scheduler in the real-time area is of significant importance, the first research
in this field is done recently by Choi et al. [11]. The focus of the paper is on designing a priority-driven
chain-aware scheduler (PiCAS) for ROS 2 in a multi-core environment.
Since latency is a proper metric for evaluation of ROS-based systems performance it has been mentioned
in several studies. The work by Puck et al. [12] assesses the performance and limitations of ROS 2 from
a real-time perspective regarding latency and jitter. Another study in terms of ROS 2’s latency has been
done by Kronauer et al. [24] focusing on the investigation of the end-to-end latency of ROS 2 for
distributed systems profiling the ROS 2 stack and pointing out the latency bottleneck.
Apart from studies on ROS 2’s various characteristics in the real-time systems field, some extensions
for ROS 2 have been introduced which improve its implementation. In the work by Dehnavi et al. [2] a
hardware-software architecture named CompROS has been introduced. CompROS is a Multi-processor
System on Chip (MPSoC) platform for ROS 2-based robotic systems which improves the ROS 2’s real-
time performance.
In another study by Barut et al. [34] the performance of two frameworks, ROS 2 and OROCOS, have
been compared in the PREEMPT_RT kernel to demonstrate the capabilities of ROS 2 in real-time
systems.
In terms of latency, the study by Blass et.al [35] introduces the ROS Live latency manager (ROS-Llama)
for controlling a cause-effect latency.
The last study selected for this research is by Blaß and colleagues [36] which introduces a new response-
time analysis for ROS 2 processing chains.

23
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

5.2. Discussion about the outcomes of the studies


This section provides a discussion to answer RQ2, RQ3, and RQ4.

5.2.1. Latency

Most of the papers use the latency metric for examining the outcomes of their solutions since it is the
most important measurement which can demonstrate the quality of a ROS-based or any other distributed
system’s response. Generally, end-to-end latency is measured for the publisher/subscriber model from
a publish function on a specific node to the callback function of another node.

An article by Maruyama, Kato, and Azumi [23] compares ROS 1 and ROS 2 implementation from
various aspects. Regarding the communication system, ROS 1 uses TCPROS/UDPROS which requires
a master node while ROS 2 has built upon the DDS standard and does not need a master node which is
an important point in fault tolerance. In ROS 1, the nodelete option realizes non-serialized data transport
between nodes by passing a pointer. Similarly, ROS 2 has an option called intra-process communication
which executes without DDS and solves some of the nodelets problems such as safe memory access. It
should be noted that in ROS 2 the deadline period, depth of history, and communication reliability are
configured by QoS policy. The results obtained from experiments show that in ROS 2 reliable policy
avoids the challenge of losing messages when subscriber nodes join late to communication while in this
situation the published message will be lost in ROS 1. In addition, this QoS policy increase fault
tolerance. Furthermore, results indicate that ROS 2 is not suitable for handling large messages since
ROS 2 has variable DDS overhead and the effect of DDS is considerable when the data size is large.
Nonetheless, there are alternate APIs in DDS vendors such as asynchronous publishers and flow
controllers (which are not abstracted from ROS 2) to address this problem. In addition, the impact of
shared memory on the latency for large data is considerable. If the network is not ideal, a reliable policy
has higher latencies compared to a best-effort policy while with a larger fragment size, the latency is
reduced hence, in the smaller fragments larger data must be divided into several datagrams and it affects
the implementation of QoS policy. Significant differences between multiple subscriber nodes in multiple
destination publishers for ROS 1 demonstrate that ROS 1 is not suitable for real-time systems since it
schedules messages in order. In contrast, ROS 2 shows small differences and is more suitable than ROS
1 for multiple subscriber nodes. The throughput experiment shows that throughput is just limited by the
network and not by DDS.

Since the usage of ROS 2 is increasing in robotic systems, its security becomes an important challenge.
ROS 2 utilizes DDS for message transmission and DDS security extension for data protection in motion.
In research by Diluoffo et al. [28] detailed analysis of the DDS security standard and the effect of
applying various security models on ROS-based systems' performance, latency, throughput, and real-
time specifications has been conducted. The new DDS security extension enables ROS 2 to protect data
in motion. However, applying that to ROS-based systems has challenges such as overprotecting
(resulting in lower performance) or under-protecting (resulting in unexpected vulnerabilities).
Furthermore, it can provoke security (or performance) concerns due to the high message traffic using
publisher/subscriber techniques or problems related to hardware and software elements. experimental
analysis in this paper shows that adding full security levels to protect data in motion can degrade the
system's performance regarding latency, throughput (average packets per second), and transmission
speed. the trade-off between performance and security level must be explored by applying various
security policies to different portions of the robotic systems. moreover, the ROS 2's standard DDS
security can reduce the communication vulnerabilities in robotic systems.

The paper by Casini [30] gives an evaluation of the response-time analysis of ROS 2 under reservation-
based scheduling. Basically, in reservation servers: i) each reservation has a deadline, ii) for the r
workload the reservation algorithm guarantees at least Q units of service in each P time unit, and iii)
there exists a bounded maximum delay for service. The focus of this study is on the single-threaded
executor. Meanwhile, the executor is a non-preemptive scheduler that executes callbacks to the

24
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

completion. The response time analysis has been conducted based on the Compositional Performance
Analysis (CPA) approach which the response time upper bound for each callback is computed for
analyzing the complex ROS graph. The end-to-end response time of a chain can be computed by
summing the response time of the callbacks of each chain. However, the CPA is not able to compute
per-callback response time, as they are unaware of the ROS scheduling mechanism. Therefore, a ROS-
specific response time analysis is described for callbacks. The proposed method calculates the response
time for single callbacks, then extends it to sub-chains allocated to a single reservation. The results of
the case study show the benefits of automated response time analysis. Furthermore, the CPA method by
searching iteratively for a global fixed point at which all jitters and response times are consistent is able
to solve some challenges of ROS 2, such as 1-Response time dependency to predecessor tasks and
release jitter which creates a cyclic dependency. 2-In scheduling policy, the messages arriving during a
processing window are not considered until the next polling point which causes priority inversion. 3-
Considering ready sets instead of ready lists since the algorithm does not know how many instances of
non-timer callbacks are ready and processes at most one instance of any callback per processing
window.

In another study by yang and Azumi [32], the real-time executor of ROS 2 has been assessed by
considering the latency factor. Obviously, in ROS 2’s standard executor for C++ (rclcpp) there were
several limitations such as precedence of timers and non-preemptive round-robin scheduling for non-
timer handles. Moreover, the ROS 2’s standard executor looks for pending callbacks in wait queues in
order and executes them in the order they were registered, consequently, this executor can not classify
or prioritize the incoming callbacks. In addition, it cannot utilize the real-time characteristics of the
underlying operating system completely and the worst-case latency range of each callback is limited by
the first-in-first-out (FIFO) mechanism. Due to overcoming these defects, a new executor called
callback-group-level executor has been introduced which takes the advantage of the callback group
concept that exists in rclcpp. In this executor, each real-time callback can classify into a dedicated
callback group when it is created allowing the node to assign callbacks with various real-time
specifications to different executor instances in one process. The results show that the priority of the
executor's threads has an impact on the latency. By increasing the busy loop, the latency for low-priority
tasks increases. On the other hand, data size does not have any effect on the latency. Furthermore, the
number of topics in high-priority nodes is not affected by the busy loop.

Enhancing the real-time capabilities of ROS 2 must be evaluated from another aspect, including the
correct and optimized configuration. Research that has been done in this regard is a paper by Puck and
colleagues [33]. To explore ROS 2 limitations and set up a real-time system based on ROS, first, the
effect of OS must be evaluated because ROS-based applications are constrained by OS configurations.
In the next step, the optimized real-time capabilities can be used with ROS 2 hardware controller. While
the common approach to enhancing the Linux kernel's real-time capabilities is the PREEMPT_RT patch,
basically, it does not provide hard real-time capabilities and does not contain any mathematical proof to
guarantee task deadlines. The main challenge in the real-time section is to reduce response times for
each system, regarding the latencies for scheduled interrupts. In this paper, settings for hardware and
software, real-time requirements, and network evaluations are discussed and the configuration and
adaptable validation of real-time capabilities with consumer-based hardware and open-source software
has been introduced. Furthermore, these evaluations give a modular approach for real-time critical
systems using Linux. As a result, the final assessments demonstrate an acceptable stress-robust
distributed network with no additional overhead in communication procedure. Overall, since the Linux
network stack causes non-deterministic latencies and jitter, it is recommended to use a dedicated
network for the time-synchronization with the precision timing protocol (PTP) instead of overloading
the local network.

One of the main parameters which can affect the end-to-end latency in ROS-based systems is the
scheduling method. In a paper by Choi, Xiang, and Kim [11] a new scheduling technique has been
discussed. Providing a timing guarantee for ROS 2 is challenging because 1) The applications usually

25
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

are in the chain mode which contains callbacks dependent on each other. 2) The techniques for end-to-
end chain latencies cannot be applied to the ROS 2 framework since its scheduling behavior is related
to various abstractions such as executors and nodes. This paper aims to minimize end-to-end chain
latency by assigning priorities to callbacks in the chain based on their criticality. For this purpose, a
priority-driven chain-aware scheduler (PiCAS) for ROS 2 in a multi-core environment has been
introduced. The system model for the experiment has designed as a multi-core system where all CPU
cores run at the same clock frequency. Since the scheduling of executors by OS has a great impact on
the timing behavior of callbacks, in this research each executor has been allocated to one core and it has
been scheduled by SCHED_FIFO, which is a fixed-priority preemptive real-time scheduling policy in
Linux. The implementations in the experiment conducted in the research mainly modify the ROS 2's
callback scheduling policy by i) updating ready callbacks in the executer queue and ii) assigning
priorities for individual callbacks. Furthermore, the introduced scheduler updates ROS 2's ready queues
whenever a callback completes, and when two or more callbacks are ready, the scheduler chooses which
one should execute according to the PiCAS priority assignment, instead of ROS 2's default assignment.
Case studies in uniprocessor and multi-core systems with ROS 2 default scheduler with no analysis,
ROS 2-SD (ROS 2 default scheduler with resource reservation and WCRT analysis), and ROS 2-PiCAS
depict a noticeable decrease in latency with PiCAS. Moreover, PiCAS provides more accurate upper
bounds for analyzed real-time chains and can schedule chains by considering their semantic priority.
Schedulability tests under workload show that the schedulability ratio decreases as the utilization
increases and the schedulability ratio decreases as the chain priority decreases. In addition, regarding
the analysis running time, results show that the proposed analysis for PiCAS is much faster.

The research focusing on communication latencies of ROS 2 as well as the influence of CPU and
network load on that is conducted by Puck et al. [12]. In this study in addition to communication
latencies, the effects of various message allocation methods of ROS 2 (from a real-time perspective),
besides the jitter across local and distributed ROS 2 nodes are evaluated. The communication between
ROS 2 nodes follows the publisher-subscriber method which has shortcomings in the real-time field, for
instance, 1) Since the dynamic memory allocation is usually not real-time safe the standard ROS 2
memory allocator is not real-time ready and might not act deterministically. Because of that, utilizing
the Two-Level Segregate Fit allocator (TLSF) with a real-time safe implementation has been
recommended to allow the subscription to be in bounded time. In addition, pre-allocating messages
decrease the latencies of subscriptions. 2) All tasks must run with real-time priority to ensure minimum
message delays. Especially, since DDS adds other processes which have to be considered. Moreover,
shielding the processes decreases disturbances and allows for more robust real-time communication.
3) The local ROS 2 network is able to reduce the latency by a factor of two using the host's loopback
device and double the control frequency. Overall, findings from this research indicate that ROS 2 meets
hard real-time constraints with the correct configuration.

A paper by Kronauer and colleagues [24] is dedicated to an extensive latency analysis of ROS 2.
Although ROS 2 utilizes the similar publish/subscribe mechanism as ROS 1, ROS 2 uses the DDS
standard that has real-time capabilities. In the development of ROS 2, it has been tried not to change
much of the user-code of ROS 1 and the goal was to hide the DDS middleware and its API to the ROS
2 user. This paper aims at answering the question that how does a ROS 2 system cope with scalability?
and provide the user with guidelines for decreasing the system's latency. It should be noted that running
the applications on localhost, shared memory, and interprocess communication reduces latency
significantly yet for distributed systems using multiple hardware systems, it is not possible.
The utilized use case in this study is a distributed system and the focus of the evaluation is measuring
the latency caused by the ROS system, not by the DDS. Therefore, the call stack between publishing
and subscriber callback has been profiled to calculate the overhead of ROS 2 core and the middleware
interfaces. Finally, this research ends up with a guideline for ROS 2 users: -Latency increases with the
payload size. - The higher the frequency, the lower the latency. - Latency depends on the hardware and
the parameters settings. - The most important factor in the overall latency is DDS middleware and the

26
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

delay between message notification and message retrieval by ROS 2.- Latency depends highly on
energy-saving features of the OS and the hardware.

Another research by Barut et al [34] has compared two operating systems, ROS 2 and Orocos in terms
of delay time executing on Linux. Generally, Linux has two ways of meeting real-time capabilities: 1)
PREEMPT_RT which is a kernel patch. 2) Xenomai which is a co-kernel and acts as a hypervisor for
the Linux kernel. Despite lots of capabilities in PREEMPT_RT such as providing preemptible critical
sections, interrupting disabled code sequences, interrupt handler, and implementing priority inheritance,
it cannot guarantee hard real-time. In contrast, since Xenomai benefits from an additional co-kernel
besides the Linux kernel, it can handle the RT process better than PREEMP_RT. However, because
Xenomai does not support ROS 2, it has not been utilized in this study. There are a few differences
between OROCOS and ROS 2, for instance, the real-time communication method in OROCOS is RTT
which consists of activities, each activity defined by 'Period', 'Priority', and 'Scheduler'. Conversely,
ROS 2's real-time communication middleware is DDS which uses a publisher/subscriber technique. The
experiment results show that in the Vanilla kernel and without stress, ROS 2 has bounded latencies,
however, some spikes can be seen, although, under stress cycle delays and request delays in ROS 2 get
out of any bounds due to linear rise while respond delay does not have that problem. In a similar test,
while OROCOS has an acceptable behavior without stress, under stress it does not clear trend.
In another experiment in PREEMPT_RT kernel without stress, ROS 2 has relatively similar behavior
with Vanilla and the performance under stress is better in comparison with Vanilla while OROCOS
shows a poor performance with and without stress. Challenges mentioned during tests are that some
trends are visible only for an extended period and the impact of stress is more obvious when there are
several components in the system, thus, it is better to conduct experiments in longer periods with a
reasonable number of components. On the other hand, although both ROS 2 and OROCOS provide
toolsets to simplify the creation of real-time applications it is recommended for critical cases the system
calls be utilized directly to have complete control over system characteristics.

An automatic latency manager for ROS 2 has been introduced by Blass and colleagues [35]. ROS has
been composed of third-party components with standard functionalities to avoid beginning robotic
projects from scratch. Furthermore, ROS employs a topic-based publish-subscribe method to combine
these "black box" components. For using the real-time capabilities of ROS 2 there are some hurdles:
first of all, the ROS components' integrator may not have enough information about low-level system
details such as the number of concurrent tasks, their activation and functional interactions, and worst-
case execution time. Secondly, those system details cannot be analyzed statistically, and finally, the
most important problem is that the performance of the components depends on the environment which
varies dynamically and makes finding a constant WCET impossible.
To overcome these challenges the ROS Live latency manager (Llama) with the main focus on the
requirements of latency management such as topics, callbacks, and executors is introduced. Commonly
the latency is determined by two factors: (i) the processor time that is allocated to the thread hosting the
related executor and (ii) queueing delays which are caused by the executor when sequencing the pending
callback activation. Basically, Llama benefits from Casini et al. [30] DAG (directed acyclic graph)
model for calculating the response time. One of the advantages of ROS-Llama is that it does not need
professional configurations and operates largely automatically. Moreover, ROS Llama uses a model
extractor to extract the model of the system (containing executors, callbacks, and topics). In addition, a
budget manager has been utilized to prepare all threads in a way that the set latency goals are being
satisfied to the extent possible. For scheduling the mentioned tasks, Linux's SCHED_DEADLINE
scheduler is being used. Although Linux avoids partitioned scheduling on multicore platforms (Because
in a dynamic environment mapping from tasks to cores requires additional calculations), ROS-Llama
has the required information for mapping data and is able to use partitioned scheduling without any extra
workload on the system integrator. The evaluation of ROS-Llama has been done in comparison with
two baselines: First, a standard Linux setup with a global CFS scheduler, and second, SCHED_RR fixed
priority scheduler. The experiment results indicate that although ROS-Llama comes with significant
costs, it shows a satisfactory trade-off between performance and predictability. Overall, this research

27
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

demonstrates that automatic latency management is a practical solution in real-time systems but there
are also limitations to the degree of automation if developers do not prefer structured solutions whenever
possible.

5.2.2. Response time

A study by Tang et al. [26] evaluates the response time and priority assignment of processing chains on
ROS 2 executor. To set up a ROS application, nodes are distributed to hosts and mapped onto OS
processes. ROS 2 has two built-in sequential and parallel executors which are responsible to manage the
execution of the nodes' callbacks. This paper focuses on the single-thread executor, which executes
workload sequentially. Furthermore, this study proves that the response time-bound is only affected by
the priority of the sink callback of the analyzed chain. If the priority gets higher, the response time
decreases, however, it is not dependent on the relative priority order of that callback. In ROS 2 the
priority of callbacks is decided on two levels: 1) callback type: timer priority> subscriber priority >
service priority > client priority. 2) registration order: among callbacks of the same type, the callback
which has registered earlier has higher priority. The results obtained from the experimental test show
that the new analysis method has a much better performance compared to Casini's method and has a
significant margin under different parameter settings. Moreover, by increasing the priority of sink
callbacks, the obtained response time bounds are improved. The results of the case study demonstrate
that changing the priority of non-sink callbacks causes no significant change in ACET (average-case
execution time) and WCET (worst-case execution time). Contrary, changing the sink callback priority
with the highest priority regular callback in that chain leads to improvement in both ACET and WCET
response time.

Another comprehensive analysis of ROS 2’s response time has been done by Blaß and colleagues[36].
Since timing correctness was a central challenge in ROS-based systems, the real-time community started
to develop response-time analyses for systems that are using ROS 2. In contrast with previous works
which supported the processor demand of any callback by only scalar worst-case execution times
(WCETs), this paper describes the execution time as execution-time curves. Commonly, callbacks are
executed in a round-robin schedule which makes it difficult to prioritize callbacks. However, the round-
robin's advantage which is reducing WCET in burst case competition among callbacks has been utilized
in the new response-time analysis method introduced in this paper. In ROS callback scheduler sampling
follows round-robin characteristics. It means that in each polling just one instance runs in a processing
window that is not dependent on its priority or the number of pending instances and has a fair approach
for polled callbacks. The final results obtained from two case studies (synthetic workload and real
workload) demonstrate that both Round-robin and Busy-window approaches have advantages over
baseline and since each of them is excel in various conditions, utilizing the combination of them will be
more beneficial.

5.2.3. Deadline

An empirical study by Park, Delgado, and Choi [27] has been done to assess the real-time characteristics
of ROS 2. For evaluating the performance of the development software architecture, a software stack
performance evaluation is performed which evaluates the schedulability of the Linux kernel and ROS
nodes. Since the schedulability of real-time tasks is dependent on the timing accuracy of the task, a
timing assessment of real-time tasks is conducted. Furthermore, to get ensure meeting deadlines the
system periodicity analysis is performed. A communication performance assessment is conducted to
evaluate the network quality regarding the real-time characteristics and stability of ROS. Results of
scheduling latency and task periodicity in a multi-tasking environment confirm that ROS 2 has a better
performance compared to ROS 1 in satisfying real-time constraints. Since ROS 2 under task response
time tests shows the deterministic behavior satisfying periodic and response time requirements the
suitability of ROS 2 for real-time tasks is confirmed. In communication level evaluation, the message

28
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

loss test shows stable communication without losing a message, when the size of data is less than
10 bytes. Additionally, in the communication latency test, ROS 2 with lower maximum latency and
greater performance difference in an unstable environment with network traffic compared to ROS 1 had
better performance. Conducting an experiment with a multi-agent robot confirms that a ROS 2-based
system can operate stably by satisfying real-time constraints without a notable change in performance
even in an unstable environment.

5.2.4. Communication cost

There was only one research by Jiang and colleagues [31] regarding message passing optimization in
ROS 2 which evaluates the results by communication cost. Since all programming languages in ROS 2
share the same lower layer, called ROS 2 middleware layer, by optimizing this layer all the applications
implemented by different languages can benefit. Moreover, this layer that adopts DDS is pluggable and
customizable by users. Although it makes ROS 2 more extensible, these features may cause even higher
latency. In communication, all messages in different programming languages must be converted in the
ROS 2 middleware layer which results in extra overhead. The experimental results show that convert
and de-convert account for nearly 90% of communication costs. Generally, this study aims to simplify
the message structure to reduce the overhead knowing that the simplest structure is serialized message.
Despite the traditional serialization in the lower layer (ROS 2 middleware layer), the proposed
serialization method is being done in the programming language layer. The results of experiments depict
that the adaptive Two-layer Serialization Algorithm (ATSA) improves the performance of message
passing with complex structures. Overall, ATSA can enhance the message passing performance by 93%.

5.2.5. Jitter

A few studies consider the jitter as an assessment parameter of ROS 2 performance. One of them is an
article by Ye et al. [25] which discusses real-time design based on PREEMP_RT and timing analysis of
ROS 2. Although dual-core systems have advantages in reducing latency, they require new hardware
and many interface configurations. This research aims at using PREEMPT_RT to make the Linux kernel
have real-time capabilities by adding scheduling and priority configuration. ROS and ROS 2 are
developed based on the Linux Ubuntu which has a delay of hundreds of milliseconds. Thus, to use ROS
for real-time systems, the Linux kernel needs to be modified. The real-time Linux collaboration project
started in 2016 by developing a real-time PREEMPT_RT patch. In contrast to Xenomai which uses a
dual-core solution for real-time and non-real-time threads, PREEMPT_RT has a preemption solution to
interrupt threading and assign a priority for each process. In the first part of the research, the design
architecture of the real-time robot control system has been introduced and the jitter of the real-time
system under different frequencies and loads has been evaluated. The results demonstrate that the native
system cannot be used for real-time high precision motion controls. In contrast, the designed system
shows an acceptable real-time performance, especially with small average jitter. Performing a load test
on the designed system also shows positive results. Based on the constructed real-time system, the real-
time performance of ROS and ROS 2 timing callbacks are evaluated. The maximum jitter of ROS was
10 us with large curve fluctuation but kept in a small range. In the same test condition, the highest value
for ROS 2 jitter was 80 us and the overall jitter was approximately large noted that this depends on the
configuration of the internal timer of the system. Based on the obtained results, the built-in timer of ROS
2 has a larger jitter compared to ROS. Therefore, other timing methods should be used for ROS 2 such
as the clock_gettime function.

5.2.6. Worst-case execution time (WCET)

Worst-case execution time and worst-case response time are the factors that have been evaluated in
Dehnavi et. al [2] article. There are some challenges in utilizing ROS 2 in real-time systems; for instance:
1) Underlying hardware is not real-time. 2) There is not any control over the host OS scheduler. 3)

29
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

Dynamic memory allocation is unpredictable. 4) There is a high resource requirement. 5) The execution
model of ROS nodes is not predictable. This paper aims to propose an integrated hardware-software
architecture for robotic development in Multi-Processor System on Chip (MPSoC) platforms. According
to related work that has been mentioned in this paper since CompSOC is the only platform that satisfies
both cycle-accurate predictability and composability it has been utilized as the underlying hard real-time
(HRT) subsystem. As claimed in this research, previous works in the real-time embedded systems field
did not offer a timing guarantee or ignore QoS. In addition, they did not satisfy timing guarantees at the
hardware level, had a large memory footprint, and considered only a single processor. To address the
mentioned problems, CompROS, a hardware-software architecture for ROS 2 is introduced. CompROS
is composed of three sections: Hard real-time (HRT) for real-time control tasks, a Soft Real-Time (SRT)
for supervisory control tasks, and a Non-Real-Time (NRT) PC for monitoring tasks. The software has a
multilayer architecture that includes a Local Real-Time Publish-Subscribe (LRTPS) communication, a
bare-metal implementation of the XRCE-DDS standard, and a lightweight predictable implementation
of ROS 2 layers on the proposed hardware architecture. By mapping ROS 2 entities to CompROS
entities, the formal real-time execution model of ROS 2 was presented that facilitates the computing of
WCRT of a ROS 2 node.

5.2.7. Data size

A study by Morita et al. [29] is in terms of dynamic binding a proper DDS implementation in ROS 2.
Data Distribution Service (DDS) is utilized as an inter-module communication framework in ROS 2.
Each DDS implementation has different characteristics which are resulted from internal design, shared
memory use, or footprint size. Since the messages have different characteristics it is hard to find one
DDS implementation to be suited ideally for all message transmission in the system. However, the ROS
2 current implementation requires to specify only one DDS implementation in application program code
or at runtime. to define the characteristics of each communication in ROS 2 the topics in the
publisher/subscriber method are being considered. For characterizing each transmission three
parameters have been defined: message size, communication range (location of subscribers), and QoS
(reliable and best_effort). Furthermore, this implementation must be changed dynamically whenever a
new topic is registered. The introduced DDS dynamic binder can manage these characteristics and finds
the optimal DDS implementation dynamically.

30
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

6. Gap analysis and conclusion


Tabulating the findings of the papers gives a vivid approach to research fields. Table 3 classifies the
studies selected for this thesis according to the type of study and the parameters that have been used for
evaluating ROS 2 performance in real-time systems.

Communication
Response Time

Schedulability

Data loss rate

Transmission
consumption
Timing jitter

Throughput
Number of

Number of
Data size
Deadline

Memory
Latency

threads
WCET

nodes

speed

QoS
cost
Experimental [27] [27] [27] [29] [27] [29]

[36]
[35],[34],
, [11]
[12],[11],
Analytical [12] [31] [12] [2] , [32] [32] [28] [28]
[33],[32],
, [30]
[30],[28]
[33]

Simulation [26]

Conceptual [24],[23] [25] [23] [23] [23]

Table 4: Classification of studies

Table 3 demonstrates that while the majority of studies are analytical research assessing latency and
response time of ROS-based systems, few studies consider jitter, worst-case execution time,
schedulability, throughput, memory consumption, and the rate of data loss during communication.
Moreover, there is not enough research to assess the impact of threads number, size of data, and the
number of nodes on the quality of real-time performance.
On the other hand, there is only one study that utilizes simulation tools for evaluating the system's
performance, whereas, observing the performance of the system in a real-time execution process can
have high importance. Moreover, because of the characteristics of distributed embedded systems which
are executed in a dynamic environment with several changeable parameters, simulation tools are useful
for conducting experiments by mimicking the real world’s properties in the controllable territory.
Definitely, one of the chief parameters in real-time systems is the scheduling method and assigning
priority to tasks that influence other related factors such as latency and meeting deadlines. However,
new and sophisticated scheduling methods are able to increase the schedulability of tasks considerably,
since those methods require lots of computations they may have a negative effect on task response time
which is crucial in real-time systems. Thus, more research for finding a balance between schedulability
and computation overhead is essential.
Furthermore, because ROS-based systems act as distributed systems consisting of various nodes
interacting with each other the quality of their communication can be another significant parameter in
evaluating a system’s performance. Among studies selected for this thesis, there was only one paper
[31] discussing the message passing process comprehensively, however, there are several parameters
that should be considered for this process such as the number of nodes, data size, rate of data loss, and
communication cost. Therefore, certainly, future research will be mandatory in this regard.
Figure 13 shows the number of studies in each category. Due to the fact that ROS 2 and its real-time
functionality are new topics, a few works have addressed these issues. Consequently, in most of the
areas, there was only one study which made it impossible to compare the results and assess the validity.
The lack of adequate research in most of the fields mentioned above was obvious. Since ROS 2 is under

31
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

active development and the utilization of ROS-based systems, especially in the real-time systems is
drastically rising, in the following years, it is expected to see an increasing rate of studies in this regard.

QoS
Transmission speed
Memory consumption
Number of threads
Throughput
Data loss rate
Data size
Number of nodes
Schedulability
WCET
Timing jitter
Communication cost
Deadline
Response Time
Latency

0 2 4 6 8 10 12

Experimental Analytical Simulation Conceptual

Figure 14: Number of studies in each category

Particularly, in recent years, ROS 2 has attracted increasing popularity in the area of the autonomous
vehicles to create state-of-the-art techniques for combining software, hardware, tools, and data analytics
to develop modern and more complex systems. Since DDS is a communication standard suitable for
critical infrastructures such as spacecraft, military, and financial systems it can solve many of the
problems in terms of building reliable real-time robotic systems with ROS 2. ROS 2 has been used
successfully in high-tech real-time robotic systems by several companies on land, sea, and air such as
NASA’s Volatiles Investigating Polar Exploration Rover (VIPER).

Although there is an extensive range of applications utilizing ROS 2 as their operating system, still there
are several common threads, especially in real-time systems (discussed in section 5.2). Nowadays a
large number of organizations such as Ubuntu, Intel, and Microsoft are participating in ROS 2’s
Technical Steering Committee (TSC) to contribute to the development of ROS 2 [10]. However, ROS
2 may not be a secure robotic operating system for hard real-time or safety-critical systems the
standardization around ROS 2 in various industries will lead to faster development of ROS 2 and
reaching its peak maturity.

32
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

7. Threats to validity
Although all researchers try to avoid any invalidity, mistake, or anything that may decrease the reliability
of the research results it is unavoidable to have threats to the validity of the research. According to Zhou
et al. [37] in a systematic literature review in the software engineering field there can be four types of
threats that can affect the validity of the results: Construct validity, internal validity, external validity,
and Conclusion validity. Each of these four types of possible threats will be evaluated according to the
conducted systematic study in this thesis.

7.1. Construct validity: This is according to identifying correct operational measures for the studied
concept [37]. The first threat in this area is that the primary studies selected for the literature review
may not sufficiently represent the information needed for research questions. To mitigate this
threat during the thesis has been tried to design an accurate search string based on Kitchenham
guideline [17] to cover all studies in the field of real-time challenges and solutions of ROS 2.
However, the search string could be extended with words that were not directly related to desired
fields yet, could find studies with some useful points in them such as searching ROS 2 followed
by words such as deadline or response time. Since in this thesis the focus was on the studies that
explicitly had worked only on challenges and solutions of ROS 2 in real-time systems it was
decided to exclude any indirect work. On the other hand, to be sure of the quality of the primary
studies selected for this literature review the primary search was conducted in four major databases
in the software engineering field.
Another construct validity in finding primary studies could be finding valuable works not
published in any journal or conference, which means grey literature. As ROS 2 has been utilized
in so many industries it is more possible to find analytical and experimental works on companies’
websites or in many discussion forums which are available online. Since these works are not peer-
reviewed there was not any possibility for the author of this thesis to assess their scientific validity.
Therefore, all white papers were excluded from the study.

7.2. Internal validity: Refers to seeking to establish a casual relationship, where a certain condition
leads to another condition while it must be distinguishable from a spurious relationship [37]. The
selection and evaluation of the studies should be unbiased to increase the internal validity of the
results. In order to satisfy the requirements of having internal validity in this thesis, the guideline
of Kitchenham [17] for literature review has been followed carefully. Besides decreasing
uncertainties, all process has been checked by the supervisor.

7.3. External validity: It means to which domain the findings of the research can be generalized
[37]. The outcomes' generalizability may be threatened when the selected primary studies do not
sufficiently represent the results of the research subject. To overcome this challenge, in addition
to designing accurate search string and selecting the comprehensive and validated databases in the
software engineering area, it has been tried applying the precise inclusion and exclusion criteria
to select the studies which are most relevant to the research questions to reaching the most validate
results to be able to generalize the outcomes to similar cases more confidently.

7.4. Conclusion validity: Identifies that the process of the study such as data collection or data
extraction can be repeated with the same outcome [37]. Hence the existing study has been
conducted based on approved guidelines in a systematic literature review all steps have been done
with vivid definitions. Moreover, the results obtained for each step have been documented.
Therefore, they are easily accessible and can be repeated with the previous result. Yet, since the
current systematic literature review has taken approximately one month, at the end of the study
other related works might have been published which could affect the outcomes of the study. For
this reason, at the end of the research, the databases have been searched again with the same search
string to get sure that no new search is missed.

33
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

8. Future Work
To the best of our knowledge, this thesis is the first systematic study of analyzing real-time performance
in ROS 2. As the main goal of this study was to identify gaps in previous research we conducted a
systematic analysis. Obviously, there is an extensive field for future studies regarding ROS 2’s
shortcomings and its utilization in real-time systems that can be continued by researchers. Furthermore,
since new use cases of ROS 2, such as autonomous cars, require interacting more nodes with high
computational requirements finding solutions for having optimized calculations besides designing new
methods of scheduling, considering a trade-off between complexity and computation overhead, and
meeting real-time constraints may be a field of interest for future works.

34
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

9. References

1. Saito, Y., T. Azumi, S. Kato and N. Nishio. Priority and synchronization support for ROS. in 2016 IEEE
4th International Conference on Cyber-Physical Systems, Networks, and Applications (CPSNA). 2016.
IEEE.
2. Dehnavi, S., M. Koedam, A. Nelson, D. Goswami and K. Goossens. CompROS: A composable ROS2
based architecture for real-time embedded robotic development. in 2021 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS). IEEE.
3. Dario Sambunjak, M.C.a.C.W. Introduction to conducting systematic reviews. 2017 2022-04-18];
Available at: https://training.cochrane.org/interactivelearning/module-1-introduction-conducting-
systematic-reviews.
4. Buttazzo, G.C., Hard real-time computing systems: predictable scheduling algorithms and applications.
Vol. 24. 2011: Springer Science & Business Media.
5. Mubeen, S., E. Lisova and A. Vulgarakis Feljan, Timing predictability and security in safety-critical
industrial cyber-physical systems: A position paper. Applied Sciences, 2020. 10(9): p. 3125.
6. Kay, J. and A.R. Tsouroukdissian, Real-time control in ROS and ROS 2.0. ROSCon15, 2015.
7. de Oliveira, D.B., D. Casini, R.S. de Oliveira and T. Cucinotta. Demystifying the real-time linux
scheduling latency. in 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). 2020. Schloss
Dagstuhl-Leibniz-Zentrum für Informatik.
8. Pop, P., P. Eles, Z. Peng and T. Pop. Analysis and optimization of distributed real-time embedded
systems. in Proceedings of the 41st annual Design Automation Conference. 2004.
9. Salibekyan, S. and P. Panfilov, A new approach for distributed computing in embedded systems. Procedia
Engineering, 2015. 100: p. 977-986.
10. Available at: https://docs.ros.org/.
11. Choi, H., Y. Xiang and H. Kim. PiCAS: New design of priority-driven chain-aware scheduling for ROS2.
in 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS). 2021.
IEEE.
12. Puck, L., P. Keller, T. Schnell, C. Plasberg, A. Tanev, G. Heppner, A. Roennau and R. Dillmann.
Performance Evaluation of Real-Time ROS2 Robotic Control in a Time-Synchronized Distributed
Network. in 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE).
2021. IEEE.
13. Köksal, Ö. and B. Tekinerdogan, Obstacles in data distribution service middleware: a systematic review.
Future Generation Computer Systems, 2017. 68: p. 191-210.
14. Tsardoulias, E. and P. Mitkas, Robotic frameworks, architectures and middleware comparison. arXiv
preprint arXiv:1711.06842, 2017.
15. Balador, A., N. Ericsson and Z. Bakhshi. Communication middleware technologies for industrial
distributed control systems: A literature review. in 2017 22nd IEEE International Conference on
Emerging Technologies and Factory Automation (ETFA). 2017. IEEE.
16. Elkady, A. and T. Sobh, Robotics middleware: A comprehensive literature survey and attribute-based
bibliography. Journal of Robotics, 2012. 2012.
17. Kitchenham, B. and S. Charters, Guidelines for performing systematic literature reviews in software
engineering. 2007.
18. Hernandes, E., A. Zamboni, S. Fabbri and A.D. Thommazo, Using GQM and TAM to evaluate StArt-a
tool that supports Systematic Review. CLEI Electronic Journal, 2012. 15(1): p. 3-3.
19. Petersen, K., R. Feldt, S. Mujtaba and M. Mattsson. Systematic mapping studies in software engineering.
in 12th International Conference on Evaluation and Assessment in Software Engineering (EASE) 12.
2008.
20. Wohlin, C. Guidelines for snowballing in systematic literature studies and a replication in software
engineering. in Proceedings of the 18th international conference on evaluation and assessment in software
engineering. 2014.
21. Redwine, S.T. and W.E. Riddle, Software technology maturation, in Proceedings of the 8th international
conference on Software engineering. 1985, IEEE Computer Society Press: London, England. p. 189–200.
22. Brereton, P., B.A. Kitchenham, D. Budgen, M. Turner and M. Khalil, Lessons from applying the
systematic literature review process within the software engineering domain. Journal of systems and
software, 2007. 80(4): p. 571-583.
23. Maruyama, Y., S. Kato and T. Azumi. Exploring the performance of ROS2. in Proceedings of the 13th
International Conference on Embedded Software. 2016.

35
Systematic Gap Analysis of Robot Operating System (ROS2) in Real-time Systems

24. Kronauer, T., J. Pohlmann, M. Matthé, T. Smejkal and G. Fettweis. Latency Analysis of ROS2 Multi-
Node Systems. in 2021 IEEE International Conference on Multisensor Fusion and Integration for
Intelligent Systems (MFI). 2021. IEEE.
25. Ye, Y., P. Li, Z. Li, F. Xie, X.-J. Liu and J. Liu. Real-Time Design Based on PREEMPT_RT and Timing
Analysis of Collaborative Robot Control System. in International Conference on Intelligent Robotics and
Applications. 2021. Springer.
26. Tang, Y., Z. Feng, N. Guan, X. Jiang, M. Lv, Q. Deng and W. Yi. Response time analysis and priority
assignment of processing chains on ros2 executors. in 2020 IEEE Real-Time Systems Symposium
(RTSS). 2020. IEEE.
27. Park, J., R. Delgado and B.W. Choi, Real-time characteristics of ROS 2.0 in multiagent robot systems:
an empirical study. IEEE Access, 2020. 8: p. 154637-154651.
28. DiLuoffo, V., W.R. Michalson and B. Sunar, Robot Operating System 2: The need for a holistic security
approach to robotic architectures. International Journal of Advanced Robotic Systems, 2018. 15(3): p.
1729881418770011.
29. Morita, R. and K. Matsubara. Dynamic Binding a Proper DDS Implementation for Optimizing Inter-
Node Communication in ROS2. in 2018 IEEE 24th International Conference on Embedded and Real-
Time Computing Systems and Applications (RTCSA). 2018. IEEE.
30. Casini, D., T. Blaß, I. Lütkebohle and B. Brandenburg. Response-time analysis of ros 2 processing chains
under reservation-based scheduling. in 31st Euromicro Conference on Real-Time Systems. 2019. Schloss
Dagstuhl.
31. Jiang, Z., Y. Gong, J. Zhai, Y.-P. Wang, W. Liu, H. Wu and J. Jin, Message passing optimization in robot
operating system. International Journal of Parallel Programming, 2020. 48(1): p. 119-136.
32. Yang, Y. and T. Azumi. Exploring real-time executor on ros 2. in 2020 IEEE International Conference
on Embedded Software and Systems (ICESS). 2020. IEEE.
33. Puck, L., P. Keller, T. Schnell, C. Plasberg, A. Tanev, G. Heppner, A. Rönnau and R. Dillmann.
Distributed and synchronized setup towards real-time robotic control using ROS2 on Linux. in 2020 IEEE
16th International Conference on Automation Science and Engineering (CASE). 2020. IEEE.
34. Barut, S., M. Boneberger, P. Mohammadi and J.J. Steil. Benchmarking Real-Time Capabilities of ROS
2 and OROCOS for Robotics Applications. in 2021 IEEE International Conference on Robotics and
Automation (ICRA). 2021. IEEE.
35. Blass, T., A. Hamann, R. Lange, D. Ziegenbein and B.B. Brandenburg. Automatic Latency Management
for ROS 2: Benefits, Challenges, and Open Problems. in 2021 IEEE 27th Real-Time and Embedded
Technology and Applications Symposium (RTAS). 2021. IEEE.
36. Blaß, T., D. Casini, S. Bozhko and B.B. Brandenburg. A ROS 2 Response-Time Analysis Exploiting
Starvation Freedom and Execution-Time Variance. in 2021 IEEE Real-Time Systems Symposium
(RTSS). 2021. IEEE.
37. Zhou, X., Y. Jin, H. Zhang, S. Li and X. Huang. A map of threats to validity of systematic literature
reviews in software engineering. in 2016 23rd Asia-Pacific Software Engineering Conference (APSEC).
2016. IEEE.

36

You might also like