Professional Documents
Culture Documents
Community Led
internal system states with minimal interference. The frame-
work is user-friendly, easily extendable for evaluating custom
ROS 2 computational graphs, and collaborates with major
hardware acceleration vendors for a standardized benchmark-
ing approach. It aims to foster research and innovation as an OMPL Benchmark [30] ✓ ✗ ✗ ✗ ✗ ✓ ✗
open-source project. We validate the framework’s capabilities MotionBenchMaker [31] ✓ ✗ ✗ ✗ ✓ ✓ ✗
by conducting benchmarks on diverse hardware platforms, OpenCollBench [32] ✗ ✗ ✓ ✗ ✓ ✗ ✗
BARN [33] ✗ ✗ ✗ ✓ ✓ ✗ ✗
including CPUs, GPUs, and FPGAs, thereby showcasing DynaBARN [34] ✓ ✗ ✗ ✓ ✓ ✗ ✗
RobotPerf’s utility in drawing valuable performance insights. MAVBench [25] ✓ ✓ ✓ ✓ ✓ ✓ ✗
RobotPerf’s source code and documentation are available Bench-MR [35] ✓ ✗ ✗ ✗ ✓ ✗ ✗
RTRBench [23] ✓ ✓ ✗ ✗ ✗ ✓ ✗
at https://github.com/robotperf/benchmarks
and its methodologies are currently being used in industry RobotPerf (ours) ✓ ✓ ✓ ✓ ✗ ✓ ✓
to benchmark industry-strength, production-grade systems.
TABLE I: Comparative evaluation of representative existing
II. BACKGROUND & R ELATED W ORK robotics benchmarks with RobotPerf across essential charac-
A. The Robot Operating System (ROS and ROS 2) teristics for robotic systems.
ROS [10] is a widely-used middleware for robot devel- ing their performance as well as a plethora of workshops
opment that serves as a structured communications layer and tutorials focusing on benchmarking robotics applica-
and offers a comprehensive suite of additional functionalities tions [12], [13], [14], [15], [16], [17], [18], [19], [20], [21],
including: open-source packages and drivers for various [22]. However, most of these robotics benchmarks focus on
tasks, sensors, and actuators, as well as a collection of algorithm correctness (functional testing) in the context of
tools that simplify development, deployment, and debugging domain specific problems, as well as end-to-end latency on
processes. ROS enables the creation of computational graphs CPUs [30], [31], [32], [33], [36], [34], [37], [35], [38], [39],
(see Figure 1) that connect software processes, known as [40], [41], [42], [43], [44], [45], [46], [47]. A few works
nodes, through topics, facilitating the development of end- also analyze some non-functional metrics, such as CPU
to-end robotic systems. Within this framework, nodes can performance benchmarks, to explore bottleneck behaviors in
publish to or subscribe from topics, enhancing the modularity selected workloads [23], [24], [48].
of robotic systems. Recent work has also explored the implications of oper-
ROS 2 builds upon ROS and addresses many of its key ating systems and task schedulers on ROS 2 computational
limitations. Constructed to be industry-grade, ROS 2 adheres graph performance through benchmarking [49], [50], [51],
to industry Data Distribution Service (DDS) and Real-Time [52], [53] as well as by optimizing the scheduling and com-
Publish Subscribe (RTPS) standards [28]. Based on the munication layers of ROS and ROS 2 themselves [54], [55],
Data Distribution Service (DDS) standard, it enables fine- [56], [57], [58], [59], [60], [61]. These works often focused
grained, direct, inter- and intra-node communication, enhanc- on a specific context or (set of) performance counter(s).
ing performance, reducing latency, and improving scalability. Finally, previous work has leveraged hardware acceler-
Importantly, these improvements are also designed to support ation for select ROS Nodes and adaptive computing to
hardware acceleration [29], [5]. Over 600 companies have optimize the ROS computational graphs [62], [63], [64],
adopted ROS2 and its predecessor ROS in their production [65], [66], [67], [68], [69], [70], [71], [72], [73], [74],
environments, underscoring its significance and widespread [75], [76], [77], [78]. However, these works do not provide
adoption in the industry [11]. comprehensive frameworks to quickly analyze and evaluate
ROS 2 also provides standardized APIs to connect user new heterogeneous computational graphs except for two
code through language-specific client libraries, rclcpp and works that are limited to the context of UAVs [25], [27].
rclpy, which handle the scheduling and invocation of call- Research efforts most closely related to our work include
backs such as timers, subscriptions, and services. Without a ros2 tracing [79] and RobotCore [5]. ros2 tracing
ROS Master, ROS 2 creates a decentralized framework where provided instrumentation that demonstrated integration with
nodes discover each other and manage their own parameters. the low-overhead LTTng tracer into ROS 2, while RobotCore
illuminates the advantages of using vendor-specific tracing
B. Robotics Benchmarks to complement ros2 tracing to assess the performance
There has been much recent development of open-source of hardware-accelerated ROS 2 Nodes. Building on these
robotics libraries and associated benchmarks demonstrat- two specific foundational contributions, RobotPerf offers a
comprehensive set of ROS 2 kernels spanning the robotics
C RITERIA Grey-Box Black-Box
pipeline and evaluates them on diverse hardware.
Table I summarizes our unique contributions. It includes P RECISION Utilizes tracers from in- Limited to ROS 2 mes-
code instrumentation. sage subscriptions.
a selection of representative benchmarks from above and P ERFORMANCE Low overhead. Driven by Restricted to ROS 2 mes-
provides an evaluation of these benchmarks against Robot- kernelspace. sage callbacks. Recorded
Perf, focusing on essential characteristics vital for robotic by userspace processes.
systems. We note that while our current approach focuses F LEXIBILITY Multiple event types. Limited to message sub-
scriptions in current im-
only on non-functional performance benchmarking tests, plementation.
RobotPerf’s architecture and methodology can be extended P ORTABILITY Requires a valid tracer. Standard ROS 2 APIs.
to also measure functional metrics. Standard format (CTF). Custom JSON format.
E ASE OF USE Requires code modifica- Tests unmodified soft-
III. ROBOT P ERF : P RINCIPLES & M ETHODOLOGY tions and data postpro- ware with minor node
cessing. additions.
RobotPerf is an open-source, industry-strength robotics R EAL -ROBOTS Does not modify the Modifies the computa-
benchmark for portability across heterogeneous hardware computational graph. tional graph adding extra
dataflow.
platforms. This section outlines the important design prin-
ciples and describes the implementation methodology. TABLE II: Grey-box vs. black-box benchmarking trade-offs.
A. Non-Functional Performance Testing more granularity and dives deeper into the internal workings
Currently, RobotPerf specializes in non-functional perfor- of ROS 2, allowing users to generate more accurate measure-
mance testing, evaluating the efficiency and operational char- ments at the cost of increased engineering effort. As such,
acteristics of robotic systems. Non-functional performance each method has its trade-offs, and providing both options
testing measures those aspects not belonging to the system’s enables users flexibility. We describe each method in more
functions, such as computational latency, memory consump- detail below and highlight takeaways in Table II.
tion, and CPU usage. In contrast, traditional functional 1) Grey-Box Testing: Grey-box testing enables precise
performance testing looks into the system’s specific tasks and probe placement within a robot’s computational graph, gen-
function, verifying its effectiveness in its primary goals, like erating a chronologically ordered log of critical events using
the accuracy of the control algorithm in following a planned a tracer that could be proprietary or open source, such
robot’s path. While functional testing confirms a system as LTTng [80]. As this approach is fully integrated with
performs its designated tasks correctly, non-functional testing standard ROS 2 layers and tools through ros2 tracing, it
ensures it operates efficiently and reliably. incurs a minimal average latency of only 3.3 µs [79], making
it well-suited for real-time systems. With this approach,
B. ROS 2 Integration & Adaptability optionally, RobotPerf offers specialized input and output
RobotPerf is designed specifically to evaluate ROS 2 nodes that are positioned outside the nodes of interest to
computational graphs, rather than focusing on independent avoid the need to instrument them. These nodes generate
robotic algorithms. We emphasize benchmarking ROS 2 the message tracepoints upon publish and subscribe events
workloads because the use of ROS 2 as middleware allows which are processed to calculate end-to-end latency.
for the easy composition of complex robotic systems. This 2) Black-Box Testing: The black-box methodology uti-
makes the benchmark versatile and well-suited for a wide lizes a user-level node called the MonitorNode to evaluate
range of robotic applications and enables industry, which is the performance of a ROS 2 node. The MonitorNode sub-
widely using ROS, to rapidly adopt RobotPerf. scribes to the target node, recording the timestamp when each
message is received. By accessing the propagated ID, the
C. Platform Independence & Portability MonitorNode determines the end-to-end latency by com-
RobotPerf allows for the evaluation of benchmarks on paring its timestamp against the PlaybackNode’s recorded
a variety of hardware platforms, including general-purpose timestamp for each message. While this approach does not
CPUs and GPUs, reconfigurable FPGAs, and specialized need extra instrumentation, and is easier to implement, it
accelerators (e.g., ray-casting accelerators). Benchmarking offers a less detailed analysis and alters the computational
robotic workloads on heterogeneous platforms is vital to graph by introducing new nodes and dataflow.
evaluate their respective capabilities and limitations. This fa-
E. Opaque Performance Tests
cilitates optimizations for efficiency, speed, and adaptability,
as well as fine-tuning of resource allocations, ensuring robust The requirement for packages to be instrumented directly
and responsive operation across diverse contexts. within the source code poses a challenge to many bench-
marking efforts. To overcome this hurdle, for most bench-
D. Flexible Methodology marks, we refrain from altering the workloads of interest and,
We offer grey-box and black-box testing methods to suit instead, utilize specialized input and output nodes positioned
different needs. Black-box testing provides a quick-to-enable outside the primary nodes of concern. This setup allows for
external perspective and measures performance by eliminat- benchmarking without the need for direct instrumentation of
ing the layers above the layer-of-interest and replacing those the target layer. We term this methodology “opaque tests,” a
with a specific test application. Grey-box testing provides concept that RobotPerf adheres to when possible.
Category Benchmark Name Description
a1 perception 2nodes Graph with 2 components: rectify and resize [81], [82].
a2 rectify rectify component [81], [82].
a3 stereo image proc Computes disparity map from left and right images [83].
a4 depth image proc Computes point cloud from rectified depth and color images [84].
Perception
a5 resize resize component [81], [82].
d1 xarm6 planning and traj execution Manipulator planning and trajectory execution [91].
d2 collision checking fcl Collision check: manipulator and box (FCL [92]).
d3 collision checking bullet Collision check: manipulator and box (Bullet [93]).
Manipulation
d4 inverse kinematics kdl Inverse kinematics (KDL plugin [94]).
d5 inverse kinematics lma Inverse kinematics (LMA plugin [95]).
d6 direct kinematics Direct kinematics for manipulator [91].
KK NO I5R NR I7K
NN AR I7R I7R
I7K KV I7H
I5K
Fig. 3: Benchmarking results on diverse hardware platforms across perception, localization, control, and manipulation
workloads defined in RobotPerf beta Benchmarks. Radar plots illustrate the latency, throughput, and power consumption
for each hardware solution and workload, with reported values representing the maximum across a series of runs. The
labels of vertices represent the workloads defined in Table III. Each hardware platform and performance testing procedure
is delineated by a separate color, with darker colors representing Black-box testing and lighter colors Grey-box testing. In
the figure’s key, the hardware platforms are categorized into four specific types: general-purpose hardware, heterogeneous
hardware, reconfigurable hardware, and accelerator hardware. Within each category, the platforms are ranked based on their
Thermal Design Power (TDP), which indicates the maximum power they can draw under load. The throughput values for
manipulation tasks and power values for localization tasks have not been incorporated into the beta version of RobotPerf.
As RobotPerf continues to evolve, more results will be added in subsequent iterations