Marvell Octeon 10 Dpu Platform White Paper

White Paper
OCTEON® 10
Machine Learning
Industry’s First Integrated Inferencing Platform
June 2021
OCTEON DPU Background
OCTEON DPUs have traditionally been used extensively as integrated network and security processors in enterprise networking
equipment. This equipment was largely built to support a hierarchical network topology of core, distribution and access with firewalls
protecting the entry and exit at the perimeters. Most enterprises owned and maintained networking hardware on premise. Over the
last decade, enterprises have started moving workloads and access outside of this controlled environment and into public, private and
hybrid networks accessed through both wired and, increasingly, wireless communications networks.
The combination of remote access, the drive to reduce costs and upgrade services, and the expansion of cloud network capabilities has
led to a borderless (or perimeter-less) digital workplace and significantly changed the dynamics of building and managing connectivity
and access. This disaggregation of data center resources has opened possibilities to add accelerators for these workloads into the
cloud, so called Workload Accelerators. In addition to networking and security, new storage, video, and 5G workloads have been added
to the list of infrastructure use cases, each needing workload acceleration. As a final kicker, machine learning inferencing capability can
augment these workloads with anomalous fault detection, security and quality improvements, and deep application insight. These
capabilities must be delivered with a comprehensive, optimized, open-sourced software framework.
Ethernet Security Offload

OCTEON 10
1GbE
10GbE SVE2 SVE2
AES AES
16x 56G SerDes
25GbE
50GbE FPU FPU
100GbE NIX
SMMU SMMU
VGIC VGIC
I/O & Co-Processor Interconnect
Integrated ARM N2 ARM N2

Ethernet NPA
Icache Dcache Icache Dcache
Switching
1MB L2 1MB L2
SSO
App Acc
Manager 16 – 24x Armv9 N2 CPUs
6x SerDes
PCIe Gen5
3 Controllers IO Coprocessor XCalibur Interconnect
Networks Bridge Up to 2 GHz
Misc I/O ML / AI Up to 48 MB
10GbE Debug/Management
USB3.1. XSPI, eMMC Accelerator Coherent Last Level Cache
Power Secure Boot 6x 40b DDR5-4800

Management eHSM/TrustZone Memory Controller
June 2021 OCTEON 10 Machine Learning 2

Integrated Inferencing
Enter the 5nm OCTEON 10 DPU’s integrated machine learning inference processor, which can be programmed with ML workloads in
either a fully offloaded fashion, or as a hybrid offload of both engine-acceleration and OCTEON 10 Neoverse N2 core acceleration.
Armv9 Neoverse N2 cores include optimized vector instructions for ML workloads, resulting in much higher performance over previous
generation Arm processors. This hybrid integrated inferencing capability both eliminates the other required computing equipment in
the network performing offline or near real-time machine learning jobs and transforms the jobs to real-time.
Integrated inferencing eliminates the extra data movement between network nodes dramatically reducing inference latency and
reduces power and network requirements. It opens the door for applications requiring ML to be fully contained on the OCTEON 10
either as bare metal, or in VMs and Containers. The OCTEON 10’s Machine Learning inference accelerator includes patented
technology resulting in best-in-class performance per watt all in an easy-to-use, open, software suite.
Shared/System memory
XBAR/Interconnect
ML Inference Processor
Tile
Inferencing Tile
SRAM
MAC
High Performance Efficient Robust
> 100x improvement vs Sub-2W Power Full-featured, optimized,

software open software suite
Minimal I/O overhead
Native INT8 and FP16 Virtual Platform support
Ultra-low latency
Matrix-Multiply-Add
Optimized for
computation
Infrastructure
Accelerated Tanh and
Sigmoid Activation
functions
8 MB On-chip SRAM

Machine Learning Integrated Inferencing Workloads
Integrated inferencing benefits infrastructure workloads across multiple verticals. Here are some examples:
Networking
ML inferencing acceleration can be applied to software implementations of traditional networking equipment in switching, tunneling
and edge, as well as in providing recommendations related to quality and telemetry metrics (jitter, latency, reliability) on network
traffic. Network functions stand to benefit from using ML in combination with the traffic management, batch packet processing and
scheduling accelerators in the OCTEON 10 data path.
Security
Distributed workforce and multi-service cloud applications demand traffic analysis (DPI, QoE, TM), security (firewalls, IDS/IPS) and
network visibility to ensure data flows are mapped optimally are ideal targets for an integrated inferencing hardware accelerator.
Using a software-based DPI in combination with ML to identify malicious traffic flows can be used for detecting bad network actors,
as well as applying ML to network function traffic analytics to determine traffic SLAs of bandwidth, latency, drops and jitter are other
emerging use cases of interest.
Storage
The third infrastructure accelerator class is storage – inextricably tied to network quality of service – by selecting methods for
encryption and compression for data-at-rest and data-in-motion. Based on storage resources usage, the inferencing block can decide
the choice of encryption and compression, tagging of storage blocks and the placement of data in hot, warm, and cold storage
regions. This hardware-based integrated inferencing accelerator along with the over 3x compute performance of Neoverse N2 (vs
A72) helps support legacy software-based storage applications as well as the newer computational storage use cases for edge
deployments.
5G/Edge
5G opens a new class of DPU offloads for which machine learning holds promise: Massive MIMO channel estimation, spectrum sensing,
resource allocation, decision making under unknown network conditions are some of the use cases that are relevant. ML as a micros-
ervice at the edge is another use case for monitoring, surveillance and self-healing network use cases where the power-optimized
OCTEON 10 DPU with full containerized virtualization software can fit in a role where a combination of CPU, DPU, GPU functions are
performed in a single device.
Automotive
The automotive space for machine learning is very broad, not just limited to machine perception and self-driving, but also beneficial
in driver assistance, safety systems, energy efficiency, analytics, and manufacturing. OCTEON 10’s excellent inferencing performance
and efficiency married with other OCTEON 10 workload accelerators make this a compelling choice.

Machine Learning Software
Machine Learning hardware is only as useful as the software running on it. Marvell’s Machine Learning software suite includes a highly
optimized, broad capability toolchain for compilation and execution of machine learning models on Arm Neoverse N2 CPUs and the
OCTEON 10 ML Inference Processor. Software supports common machine learning formats and open, compilation and deployment
frameworks.
Runtime
Compiled Model
Application
TVM/GLOW
ML Library DPDK Driver

Marvell Compiler
Offline Toolchain
Backend
Quantizer
Firmware
Compiled Model ML Inferencing Processor
The Marvell ML toolchain is optimized and integrated into ML compiler frameworks such as TVM and GLOW. Machine Learning models
developed in these frameworks can be easily compiled for OCTEON 10 Neoverse N2 and/or the ML Inference Processor. These models
can then be deployed on target hardware or tested and tuned on Marvell virtual platforms, including a functional and cycle-accurate
ML inferencing processor emulator. Drivers are available that can be easily integrated into existing applications targeted at
networking.
The OCTEON 10 DPU with Integrated Inferencing is sampling in 2H 2021.
To deliver the data infrastructure technology that connects the world, we’re building solutions on the most powerful foundation: our partnerships with our customers.
Trusted by the world’s leading technology companies for 25 years, we move, store, process and secure the world’s data with semiconductor solutions designed for our
customers’ current needs and future ambitions. Through a process of deep collaboration and transparency, we’re ultimately changing the way tomorrow’s enterprise,
cloud, automotive, and carrier architectures transform—for the better.
Copyright © 2021 Marvell. All rights reserved. Marvell and the Marvell logo are trademarks of Marvell or its affiliates. Please visit www.marvell.com for a complete list of
Marvell trademarks. Other names and brands may be claimed as the property of others.

Marvell Octeon 10 Dpu Platform White Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Marvell Octeon 10 Dpu Platform White Paper

Uploaded by

Copyright:

Available Formats

White Paper

Ethernet Security Offload

Integrated ARM N2 ARM N2

Power Secure Boot 6x 40b DDR5-4800

June 2021 OCTEON 10 Machine Learning 2

High Performance Efficient Robust

> 100x improvement vs Sub-2W Power Full-featured, optimized,

June 2021 OCTEON 10 Machine Learning 3

June 2021 OCTEON 10 Machine Learning 4

ML Library DPDK Driver

Compiled Model ML Inferencing Processor

The OCTEON 10 DPU with Integrated Inferencing is sampling in 2H 2021.

You might also like