Low Latency 5g Upf Using Priority Based 5g Packet Classification

white paper
Low Latency 5G UPF Using Priority

Based 5G Packet Classification
Communications Service Providers
5G UPF Packet Core
Abstract Device Personalization (DDP). No

This paper summarizes a other investment in additional
collaboration between Intel and SK acceleration technologies were
Telecom to demonstrate a 5G required.
Standalone (SA) User Plane Function
(UPF) capability based on 2nd In summary, we have demonstrated
generation Intel® Xeon® Scalable the following results:
Processors and Intel® Ethernet 800
Series Network Adapters. The results With the 5G UPF system loaded up to
demonstrate that low latency and 87% CPU utilization, we observed
About the Authors
jitter can be achieved for a range of that:
This whitepaper is jointly applications when using standard
developed by SK Telecom and • Normal packets resulted in 0.34ms
server infrastructure for the 5G SA
Intel Corporation. (packet size: 175B) and 0.3ms
UPF.
(packet size: 550B) Round Trip Time
SK Telecom — Executive Summary (RTT) latency.
5GX Labs, R&D Center The ability to efficiently deliver on the • Low Latency packets resulted in
DongJin Lee 5G requirements (eMBB, mMTC, and 0.07ms (packet size: 175B) and
5G Core Architect URLLC) and implement a common 0.09ms (packet size: 550B) RTT
dongjin@sk.com NFV infrastructure with high levels of latency; up to 78% reduction in
throughput, utilization, and latency.
JoongGunn Park
determinism is a key enabler to • Normal packets resulted in ±0.1ms
5G Core Architect
clark.park@sk.com deploy a virtualized UPF system for (packet size: 175B) and ±0.087ms
5G and beyond. (packet size: 550B) jitter.
Intel Corporation —
Data Platforms Group The 5G SA UPF, developed by Intel • Low Latency packets resulted in
Andriy Glustsov and SK Telecom, shows improved consistent ±0.014ms jitter for both
performance in latency and jitter for 175B and 550B; up to 88% reduction
Software Architect
andriy.glustsov@intel.com high priority traffic while still running in jitter.
best effort for lower priority traffic at
Khaled Qubaiah This showcases how a standards-
high throughput rates and high
Software Architect based 5G SA UPF enables a highly
infrastructure utilization. This is
khaled.qubaiah@intel.com scalable, flexible, and reliable
accomplished by taking advantage of
system for deployments in the core,
Peter McCarthy intelligent packet classification and
at the central office or on premise
steering, coupled with enhancements
Software Architect for both B2C and B2B services.
to the 5G User Plane software stack to
peter.mccarthy@intel.com
selectively process high priority traffic.
Rory Browne The solution utilizes the Intel® Xeon®
Platform Solutions Architect Gold CPU SKUs and Intel® Ethernet
rory.browne@intel.com Network Adapters with Dynamic
Low Latency 5G UPF Using Priority Based Packet Classification 2
Table of Contents 1. Introduction 5G Systems consist of various network

elements that contribute to overall end-
The requirements for the 5G Standalone
Executive Summary . . . . . . . . . . . . . 1 to-end latency (e.g. RAN, Transport,
(SA) wireless network have been
Core, etc.). The System Architecture
1. Introduction . . . . . . . . . . . . . . . . . . 2 standardized by 3GPP. These standards
caters to a wide range of workloads that
2. Architecture . . . . . . . . . . . . . . . . . . 4 define the Cloud Network Functions /
individually have varying end-to-end
Virtualized Network Functions (CNF /
2.1 Typical Network Model . . . . . 4 latency needs. Table 1 below
VNF) as microservices which can
2.2 Low Latency Model. . . . . . . . . 4 illustrates packet delay budget
communicate with each other via APIs in
2.3 Packet Classification and (PDB) requirements in a 5G System.
a Service Based Architecture (SBA). A
This defines an upper boundary that a
Steering. . . . . . . . . . . . . . . . . . . . . . 5 modular 5G core network can be
packet may be delayed between the
2.4 Utilizing DDP. . . . . . . . . . . . . . 6 achieved by designing with NFV and SBA
UE and the UPF that terminates the N6
concepts and allows for the easy addition
3. Prototype Implementation. . . . . 7 interface towards the data network
and removal of the Network Functions
3.1 Hardware Based Priority Aware (ref: 3GPP TS 23.501). PDB is used
(NFs). Utilizing the SBA, the Control Plane
across network elements to support
Packet Steering . . . . . . . . . . . . . . . 7 and User Plane functions are separated.
the configuration of scheduling and
3.2 Software Implementation . . 8 The Control Plane could be effectively
link layer functions such as priority
managed in the Core network or in the
4. Test Setup . . . . . . . . . . . . . . . . . . . 10 weights and HARQ target operating
Central Office and the User Plane
points in the gNB. For example, for
5. Measurements Results . . . . . . . 12 Function (UPF) can be distributed to be
Guaranteed Flow Bit Rate (GFBR) flows
located geographically closer to end
5.1 Results For Test Profile 1 . . 13 using the delay-critical resource type,
users to achieve low latency
5.2 Results For Test Profile 2 . . 14 packets that are delayed beyond the
requirements.
PDB bounds are considered as lost if
5.3 Results For Test Profile 3 . . 15 Standards development organizations the data burst is not exceeding
5.4 Results For Test Profile 4 . . 17 (SDOs) from GSMA, NGMN, and ITU have Maximum Data Burst Volume (MDBV)
latency requirements as low as 1ms (ITU within the period of PDB and the QoS
M.2083) in order to provide service such Flow is not exceeding GFBR
Conclusion . . . . . . . . . . . . . . . . . . . . 18 as Ultra-Reliable Low Latency configuration.
Communications (URLLC). Various
Glossary . . . . . . . . . . . . . . . . . . . . . . . 19 Generally, PDB applicable to the radio
applications and use cases require URLLC
interface is determined by subtracting
service, such as augmented and virtual
the static value of the Core Network
reality, telemedicine, intelligent
Packet Delay Budget (CN PDB).
transportation, and industrial
automation. While the UPF is just one
portion of overall end-to-end network
latency and jitter, it is critical that the UPF
behaves in a deterministic fashion under
heavy load to meet performance
requirements.
Table 1.
One of the most stringent Depending on B2C, B2B, and private threshold for latency and jitter, for
communication requirements is for network use cases, a well-designed various traffic profiles. The solution
timeliness and availability of a service and optimized UPF system could be shows that it is possible to achieve
where transmission occurs over a extended to integrate various improved latency and jitter performance
well-defined time interval with two functions of the NSA and SA core for in the UPF system under various traffic
main attributes of periodicity and optimization. These functions could load scenarios. This was achieved by
determinism. Periodicity means the include: using run-to-completion (RTC) model
transmission is repeated over a given on the user plane pipeline, a multi-
• NSA Core SGW-U, and PGW-U
time interval, whereas determinism queue software architecture and Intel’s
refers to the delay between the • DPI, NAT, and Firewall Dynamic Device Personalization (DDP)
transmission of a message and its within the Intel® Ethernet Series 800
• NSA/SA RAN CU-UP
reception at the target. Typically, Network Adapter.
message transfers are deemed • Edge computing enabling servers
deterministic when they are bound and systems
by a certain threshold for latency, as
While this paper focuses on the low-
well as bounds on its variation.
latency aspects of the packet
Table 2 below enumerates Periodic
processing for the 5G SA core, the
Deterministic Communication
same solution provides the agility to
characteristics for select services (re:
process the above functions.
3GPP TS 22.104).
Typically, virtualized systems are
SK Telecom’s 5G Core network
overprovisioned and engineered so that
infrastructure for UPF provides such
the system utilization load does not
URLLC services depicted in the table
exceed a certain threshold value to
and is a key research focus for this
make sure that there are no packet
paper. The ability for UPF to process
drops, throughput degradation or
packets efficiently when under
downtime during service migrations.
maximum system utilization can allow
operators the flexibility to operate To support such URLLC services,
different network topologies including research was needed to increase the
near edge and far edge deployments. system utilization while maintaining a
Example Services End-To-End Latency: Service Bit Rate: User Message Size [Byte] Transfer Interval:
Maximum Experienced Data Rate Target Value
Motion control < transfer interval value – 50 500 μs

Motion control < transfer interval value – 40 1 ms
Wired-2-wireless 1 Gbit/s link < transfer interval value 250 Mbit/s ≤ 1 ms
replacement
Mobile robots < transfer interval value – 40 to 250 1 ms to 50 ms
Mobile control panels – remote < transfer interval value – 40 to 250 4 ms to 8 ms
control of e.g. assembly robots,
milling machines
Mobile Operation Panel: < 1 ms 12-16 Mbit/s 10-100 1 ms
Motion control
Process automation – closed < transfer interval value – 20 ≥ 10 ms
loop control
Robotic Aided Surgery < 2 ms 2-16 Mbit/s 250 to 2000 1 ms
Robotic Aided Surgery < 20 ms 2-16 Mbit/s 250 to 2000 1 ms
Robotic Aided Diagnosis < 20 ms 2-16 Mbit/s 80 1 ms
Cooperative carrying – < 0,5 * transfer interval 2,5 Mbit/s 250 > 5 ms
fragile work pieces; (ProSe 500 with localization > 2,5 ms
communication)
information
> 1,7 ms
Table 2.
2. Architecture 2.2 Low Latency Model 2. Steering of packets into one or more
queue groups, with multiple queues in
To mitigate the unintended
2.1 Typical Network Model each queue group. An example would
consequences that might occur in the
Figure 1 below illustrates a typical be steering high priority packets into
typical network processing model, a
packet scheduling mechanism in many one queue group, medium priority
solution could be to separate out packets
of today’s networks. The network packets into a second queue group,
based on QoS flows (e.g. Guaranteed Bit
adapter determines the target worker and low priority packets into a third
Rate, Non-Guaranteed Bit Rate, Delay
core and ensures UE to core pinning. queue group.
Critical Guaranteed Bit Rate), and then pin
It then posts packets into a queue for the processing of the packets belonging 3. Receive Side Scaling (RSS) based load
a worker core to pull packets from and to such flows to dedicated worker cores. distribution of packets within queue
process them in sequential order The implication of this approach could groups, such that queues within queue
while ensuring packet ordering for result in the network infrastructure not groups can be assigned to worker
every packet of a given UE. This being optimally utilized and being over cores.
typical model has an unintended provisioned. Hence, the goal of this work
consequence in which higher priority 4. Enhancements to packet processing
is to improve packet latency/jitter
packets (e.g. VoLTE, URLLC) might get scheduling logic in software that runs
characteristics of designated low latency
backed up behind low priority packets on each of the worker cores, that polls
traffic types under all CPU load levels and
in the queue and is known as Head Of- packets from one or more queues
traffic conditions in the 5G system. This is
Line-Blocking (HOLB). This can result based on its priority/weighted-priority
implemented by:
in potentially higher latency/jitter for and processes packets.
such packet flows, including drops 1. Enhancements to packet parsing
and packet loss under high CPU load classification capabilities of the Intel® Each worker core keeps track of its
and traffic conditions. Furthermore, Ethernet 800 Series Network Adapter utilization (i.e. system loading at low
low priority traffic processing might to be able to identify QFI values in granularity) and implements scheduling
also need higher compute complexity GTP headers’ PDU Session Container to pull packets from queues based on
capabilities like Deep Packet as well as DSCP codes in IP headers. loading. Under high loading, more
Inspection (DPI), or Application These classifications are done by weight is given to higher priority flows
Detection that can add to the overall software configurable header field versus low priority flows; there by
latency for packet processing of high values via Dynamic Device ensuring latency/jitter characteristic
priority traffic. Personalization (DDP) capabilities. requirements are met.
Figure 1. Typical and Desired UPF Latency Behavior

This implementation supports QoS Dynamic Device Personalization (DDP) • New protocols/standards: eCPRI/
characteristics for all low latency is a capability that was introduced with ORAN, Radio over Ethernet (RoE).
traffic independent of CPU load. The Intel® Ethernet 700 Series Network
• Extensibility for custom protocol
system maps Differentiated Services Adapters to load an additional package
parsing/classification.
Code Points (DSCPs) for downlink to enable classification and steering of
traffic and QoS Flow Identifiers (QFIs) additional specified packet types and
In Figure 2 below, we can see how this
for uplink to ensure packet steering performance of additional inline
type of sophisticated traffic control may
into the correct priority queue group. actions. DDP can be used to optimize
be used to realize 5G core functions. DDP
After that, the scheduling logic packet processing performance for
can effectively classify and steer traffic
(software algorithm) ensures that the different network functions, native or
within the server based on control plane
traffic processing is priority-aware running in a virtual environment. By
(N1/2, N4), user plane (N3, N6) or
when dealing with multiple traffic types applying a DDP profile to the network
between UPF handover (N9) interfaces.
in the same system. controller the following use cases can
For example, the foundational NIC can
be addressed.
steer control plane protocols such as
2.3 Packet Classification and Steering
Extended support for protocols: PFCP into the SMF or control plane part
Intel® Ethernet 700 Series and Intel® of UPF and can steer UE session either
• 5G GTP support for 5G user plane.
Ethernet 800 Series, are relatively low based on PDU session, flow, QoS class
cost, low power network interface cards • 5G SDAP/PDCP support for 5G NR etc. on N3 and N6. Furthermore, DDP may
that enable a pure software VNF/CNF user plane. be used to support extended header (EH)
model. Foundational NICs do not for 5G user plane traffic.
• 5G/4G PFCP (CP-UP separation)
typically offer offloads for functions
support.
such as vSwitch acceleration, VXLAN
TEP or inline IPSec. They do, however, • IP protocols as new flow types, for
offer advanced features in order to example L2TPv3, ESP/AH for
scale the VNF performance by IPSec.
optimizing packet steering in the
• Legacy protocols: PPPoE,
server.
PPPoL2TPv2.
Figure 2. DDP Classification for 5G UPF

Figure 3. DDP Classification Example
packets in a very deterministic way to Queues (ADQ) and Dynamic Device

2.4 Utilizing DDP
the appropriate function for 5GC UPF Personalization (DDP). Intel
With no DDP profile, the NIC has a pipeline processing without using continues to develop more DDP
coarse traffic steering capability that specific cores for load distribution. This profiles based on customers demand
steers packets to the appropriate load frees up valuable resources (cores) and and these are available for download
distribution function running in RX- decreases RX queue levels, which will here: https://www.intel.com/content/
Cores based on outer five tuples (as make the system more deterministic www/us/en/search.html?
shown on upper part of Figure 3). Then, with low latency and low jitter at high ws=text#q=DDP&t=Downloads&layo
the load balancer core distributes UE traffic and CPU load. ut=table.
packets to worker cores that executes
DDP may also be used to extend
5GC UPF pipeline functions. The load The Intel® Ethernet 800 Series Intel®
functionality into other areas based
distribution function on RX-Cores adds Network Adapter carries over the Intel®
on steering and applying preferential
extra latency and jitter due to the Ethernet 700 Series Network Adapter
QoS to other IP protocols. Typically,
increased number of RX-Queue levels, features described in Figure 3 and adds
Operators want to ensure that
and software function usage. This more options. The Intel® Ethernet 800
control plane, management plane
creates a bottleneck for performance Series Network Adapter focuses on
and OAM protocols get prioritized in
and introduces even more latency and meeting customer requirements and
the infrastructure. Examples may
jitter at high traffic loads. targets for connectivity and latency
include, but are not limited to:
along with providing a 100Gbps
The lower part of the Figure 3 explains connection. This results in reducing the • BGP, OSPF, or ISIS for routing
how DDP may be used to parse deeper variability in application response time, control planes and fast convergence.
into the packet and steer packets improving predictability, and increasing • OAM protocols like Bidirectional
directly to worker cores based on inner throughput. Forwarding Detection (BFD), Multi-
header fields such as GTP-TEID and/or Protocol Label Switching Operations
source/destination IP address. This Intel is accomplishing this through two and Maintenance (MPLS-OAM), Two-
enables the NIC to distribute UE’s technologies - Application Device Way Active Measurement Protocol
(TWAMP), Virtual Router Redundancy algorithms for packet placement into Figure 5 below describes logical
Protocol (VRRP). receive queues at ingress. processing of GTP-U and IP based traffic
in the system. The NIC reads the DSCP
• Service Based Architecture (SBA)
The application layer may be bare value of the IP packet or QFI value in the
interfaces for consistently high
metal, VM-based or containers and is PDU session container extension header
control plane processing in 5G Core,
decoupled from the multi-queue of GTP-U to identify the priority queue
or related critical OSS/BSS
architecture. Figure 5 illustrates just two group and steers the traffic to a specific
application traffic.
priorities, but the architecture is queue within that priority group based
• IEEE1588 PTP (Precision Time extendable to support a higher on the inner source IP address for the
Protocol) for synchronous operations. number of priority queue groups. N3 and N9 uplink or inner destination IP
address for N9 or N6 downlink. This is
3. Prototype Implementation PDU Type for uplink and downlink effectively the UE IP address. By
3.1 Hardware-Based Priority Aware selection and QFI values are parsed consistently steering the UE traffic to a
Packet Steering from the PDU session information in fixed queue the system ensures that all
“PDU session container” extension user plane processing for that UE can
Figure 5 shows the multi-queue headers of GTP-U packets as per be handled by the same CPU core using
architecture implemented on Figure 4 below. The system maps the run-to-completion (RTC) methods.
foundational NICs from Intel. Traffic is 64 possible QFI values into two or
prioritized and steered via more queue groups.
configurable policies into one of two
queue groups which are served by RSS
DL PDU Session Information (PDU Type 0) UL PDU SESSION INFORMATION (PDU Type 1)
Bits Bits
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
PDU Type (=0) Spare 1 PDU Type (=1) Spare 1
PPP RQI QoS Flow Identifier 1 Spare QoS Flow Identifier 1
PPI Spare 0 or 1
Padding 0-3
Padding 0-3
0 3 4 78 13 14 15 16 18 19 24 31
Version IHL DSCP ECN Total Length
Identification Flags Fragment offset

20 Bytes
Time to live Protocol Header checksum
Source address
Destination Address
Options Padding
Highlighted fields to be used for packet placement (flow director)
Figure 4. DSCP (RFC 2474) and PDU Session Parsing (TS 38.415)
Figure 5. N3, N9 & N6 Queue Selection example
This capability ensures that: packets must be aware of the receive • VPP framework dispatches nodes
queue priority mechanism to enable the connected in the form of directed
1. UEs stay on the same cores
efficient handling of high priority graph to process vector where the
irrespective of mobility events or
packets. vector itself can be subdivided
other session events.
during processing.
2. UE context is in the local cache and The User Plane Function (UPF)
the full pipeline executes on that application used in the context of this Packets to the graph nodes of VPP
core. work is based on the FD.io Vector Packet framework are delivered in the form of
3. The system becomes very linear Processor (VPP) framework. This frames where the maximum number of
and deterministic in terms of framework utilizes Data Plane packets in a single frame is limited by
performance. Development Kit (DPDK) functionality to VLIB_FRAME_SIZE compile time
4. The system becomes very easy to fetch received packets from the NIC parameter. The average time spent on one
dimension. queues and deliver them for further packet processing in VPP graph node
processing. The DPDK plugin is a part of decreases when the size of the frame
Similarly, on the N6, downlink traffic is the VPP project that exposes packet (number of packets in the frame)
steered and queued based on a receive functionality over the dpdk-input increases. Thus, the VPP framework
combination of destination IP and node. Default implementation of the implements self-adjusting mechanisms
DSCP into either high or low priority dpdk-input node enables handling of for packet handling. For low packet rate
RSS queues serving consistent CPU multiple RX queues in the context of a and low CPU utilization level, the average
resources. If desired, the system can single worker thread. However, the frame size becomes lower. High packet
implement even further granularity by DPDK plugin has no notion of RX queue rates result in larger frames and more
binding a UE flow type or application to priority and handles packets from all RX efficient CPU usage yielding improved
a core. queues with the same priority. performance.
The packet processing logic of the VPP

3.2 Software Implementation In this model, the packet processing time
framework can be simplified as a
also depends on the size of the vector
The Intel® Ethernet 800 Series provides continuous cycle where:
generated by the input node. It takes more
the capability to steer packets of
• Input node generates a vector of time to process a frame of N packets in the
different priority into specific queue
packets (e.g. dpdk-input node fetches VPP stack than to process a frame with a
groups as described in Figure 5. The
packets from NIC receive queues). single packet. If a single high priority
software receiving and processing
packet is bundled with many low The algorithm in Figure 6 implements Other improvements can be added to
priority eMBB packets in one frame, the strict priority logic where packets from the current implementation, such as:
average packet processing time for the higher priority queues are always
• Enabling mix of packets of different
high priority packet will be the same as fetched first and next priority queues
priorities in one frame, e.g. fetching up
for the eMBB packet. are not served until all packets from
to MAX_RBS packets from receive
high priority queues are in the
queues of multiple priorities in one
Additional functionality is introduced in processing stage. This ensures that
dpdk-input node call.
the DPDK plugin of the VPP framework high-priority traffic is served even if
in order to enable priority-aware low priority traffic has to be dropped. • Using flexible MAX_RBS parameter
receive queue handling and latency The MAX_RBS parameter limits the (e.g. which value depends on CPU
control. This functionality is enabled maximum number of lower priority load or packet rate) or configuring
and configured over two basic packets that are delivered to the VPP MAX_RBS on per priority queue basis.
parameters: framework in a single call to the input
• Using weighted receive queue priority
• Receive queue priority (RQP) node and as a result, it limits
mechanism.
parameter is a numeric value assigned maximum interval between fetching
to a receive queue where the value 0 high-priority packets. • Dedicating cores to serve high-
identifies the highest priority. Each priority· queues.
receive queue has one priority value Latency results in this paper are
• Ensuring high priority packets also
assigned and multiple queues with based on strict priority receive queue
given priority in transmit path.
the same priority value are allowed. logic described above.
• Maximum read burst size (MAX_RBS)

parameter defines maximum number
of packets that could be fetched from
receive queues in one call to the
dpdk-input node function. Unlike
VLIB_FRAME_SIZE, this parameter
could be changed at runtime. The
MAX_RBS parameter value cannot
exceed VLIB_FRAME_SIZE.
In every call to the dpdk-input

node function, it executes an
algorithm simplified to these steps:
1. Set current priority P=highest priority

(e.g. P=0).
2. Try to fetch up to MAX_RBS packets
from the queue(s) with the current
priority P.
3. If one or more (up to MAX_RBS)
packets received from queues with
priority P -return control to the VPP
framework to allow further processing
of received packets.
4. If there are no packet(s) from queues
with priority P received and more
queues available switch to next
priority P=P+1 and goto (2).
5. Return control to the VPP framework
with actual number of fetched packets
(zero or more). Figure 6. Input Node Packet Processing Flow Chart
4. Test Setup the system. The UE and RAN On the N6 side, the traffic terminates on
elements are simulated by Landslide the uplink (UL) sink server and downlink
The development and testing
test equipment. Please note that a (DL) replies are generated with in-house
environment for this solution is shown
full 5GCN characterization is not software tools. Downlink traffic is sent
in Figure 7 and consist of the following:
reported here as the effort focuses through the UPF and terminated in the
• 5G Core Network User Plane Function on the user plane performance for DL sink server on the bottom right.
(UPF) reference stack from ASTRI specific call models and use cases. Using this harness, we can easily change
(Applied Science and Telecoms traffic profile configurations and
Research Institute - www.astri.org). In our testing we stress the
measure throughput latency and jitter in
The stack is used used to demonstrate performance of the UPF as this is the
real time.
a typical pipeline performance (via main forwarding element in the core
compiled binaries) which we have architecture – and thus the main
Details of the 5G UPF configuration are
additionally patched and configured component in jitter and latency for user
shown in Table 3 below.
for our Low-Latency research. plane traffic. The effort concentrates on
technological advances that aid the
• 5G Core Network Control Plane We deployed UPF functionality on 16
implementation of low-latency on the
components include Access and physical cores from CPU2 on the Intel
Intel architecture.
Mobility Management Function (AMF) S2600WFT server shown in Figure 8 and
and Session Management Function The 5GCN test harness is shown in 9 below. This is an internal Intel
(SMF) for testing the user plane as Figure 7. The device under test (DUT) reference board with 2x16PCIe Gen3
sessions are established, deactivated, hosts the UPF top middle server. AMF lanes and 96GB of DRAM (6x16GB) per
and moved due to mobility events. and SMF are hosted in the top right socket. One Intel® Ethernet 800 Series
server. Network Adapter (1x100G) feeds CPU2.
• A combination of standard test
For optimal performance the PCIe x16
equipment (Spirent Landslide) and in- The Spirent Landslide systems are
slot on the same NUMA that is running
house tools in order to characterize connected into the ToR switch and
the UPF is used.
generate the UL traffic into the UPF.
Figure 7. 5G Test Harness

Category Description
Processor Product Intel® Xeon® Gold 6252N Processor
Frequency 2.3 to 3.5GHz
Cores per Processor 24 Cores/48 Hyper threads
Memory DIMM Slots per Processor 6 Channels per Processor
Capacity 192GB DRAM, 1.5TB DCPMM (FW version 01.02.00.5346)
Memory Speed 2666 MHz, DDR4
Network NIC Intel Foundational NIC: E810_CQDA2 100GbE QSFP

CVL-FW: FW-1.1.16.40 NVM-1.02 0x80002b68
DDP: ice-1.3.10.0.pkg
Driver: ice-0.12.34
Number of Ports 1 port from E810_CQDA2 100GbE QSFP NIC
Server Vendor Intel S2600WFT
Host OS Vendor/Version Ubuntu 18.04.2 LTS

4.15.0-20-generic x86_64
BIOS Vendor/Version Intel Corporation
SE5C620.86B.02.01.0010.010620200716 Release Date:
01/06/2020
5G UPF Vendor/Version ASTRI rel. 19.07-rc2 with code changes from Intel
Table 3.
Figure 9. UPF Server (2x6252N

with Intel® Network Adapters)
Figure 8. Intel S2600WFT Layout

5. Measurements Results Profile Traffic Mix Low Latency Traffic

Characteristic
We tested four scenarios and measured
latency and jitter with the Low-Latency 1 100Gbps, ~20MPPS total UPF traffic 175 Byte Low-Latency packet
implementation. Each scenario has a size, ~ 1:1 UL/DL
~1.8 MPPS for low latency traffic, 2.5 Gbps
different traffic profile, and Utilizing 2.5% of aggregated
~18.9 MPPS for low priority traffic, 97.5
TPT
measurements were done under Gbps
different CPU loads. All measurements
2 100Gbps, ~22.7MPPS total traffic thru the 550 Byte Low-Latency packet
were based on the Intel® Xeon® 6252N
UPF size, 1:1 UL/DL,
(24 core, 2.3Ghz) CPU based platform.
~2.2 MPPS for low latency traffic, 10 Gbps Utilizing 10% of aggregated
All latencies reported included the TPT
~20 MPPS for low priority traffic, 90 Gbps
delay in traffic emulator transmit and
receive stacks and ethernet switch 3 1:3 UL to DL traffic profile 1400 Byte Low-Latency packet
components. All tests involved 100Gbps, ~21MPPS total UPF traffic
size, 1:9 UL/DL
emulating 50,000 UEs with up to Utilizing 44% of aggregated
3.9 MPPS for low latency traffic, 44 Gbps
100Gbits of aggregated throughput on TPT
17 MPPS for low priority traffic, 56 Gbps
a single CPU. Each profile tested
different packet size values and priority 4 1:3 UL to DL traffic profile 780 Byte Low-Latency
ratios as explained in Table 4 to the 100Gbps, ~21MPPS total UPF traffic packet size, 1:9 UL/DL
right.
705 KPPS for low latency traffic, 4.4 Gbps Utilizing 4.4% of aggregated
TPT
20 MPPS for low priority traffic, 95.6 Gbps
Jitter measurements in this document
are based on the algorithm described Table 4.
in RFC3550, Appendix A.8. This is
implemented on a per-flow basis
where an estimate of the statistical
variance of the packet interarrival
times is made. Mean deviation of the
relative transit time of packets is
measured in microseconds. Areas for
potential future optimization of VPP
performance were noted with respect
to deterministic performance across
different traffic profiles.
5.1 Results for Test Profile – 1 measured, along with the latency and It was noted that the latency and jitter
jitter characteristics. Similar characteristics for high priority traffic
This test profile has approximately
measurements were also taken for stay consistent across the load line of
2.5% of high priority traffic (175B
additional CPU loading points by various measurement points of CPU
packets), while the rest of the traffic
modulating the packet rate/packet size loading, demonstrating a 78%
(645B packets) is regular mobile
mix to check for latency and jitter at reduction in latency for high priority
broadband (i.e., eMBB). The overall
each of these sample points. Results traffic over low priority traffic, while
packet size, packet rate, and per UE
are shown in Figure 10 below. the jitter reductions are up to 88%.
packet rate are described in Table 5
Consistent latency measurements of
below.
~31-38 microseconds were noted with
~13 microseconds jitter for high priority.
With this traffic profile, the peak
performance at zero packet loss was
Table 5. Profile 1
Figure 10. Profile 1

5.2 Results for Test Profile – 2 measured, and along with it, the with ~12 microseconds jitter for high
latency and jitter characteristics were priority traffic. It is notable that the
This test profile has approximately 10%
noted. Similar measurements were latency and jitter characteristics for
of high priority traffic (550B packets),
also taken for additional CPU loading high priority traffic stay consistent
while the rest of the traffic is regular
points by modulating the packet rate/ across the load line of various
mobile broadband (i.e., eMBB). The
packet size mix to check for latency measurement points of CPU loading,
overall packet size, packet rate, and per
and jitter at each of these sample demonstrating a 69% reduction in
UE packet rate are described in Table 6
points with results shown in Figure 11 latency for high priority traffic over low
below.
below. Consistent latency of ~32-45 priority traffic, while the jitter
microseconds latency were measured reductions are up to 84%.
With this traffic profile, the peak
performance at zero packet loss was
Table 6. Profile 2

5.3 Results for Test Profile – 3 priority traffic. It was noted that the
was measured, and the latency and
This test profile has approximately jitter characteristics were noted. Similar latency and jitter characteristics for
44% of high priority traffic (1400B measurements were also taken for high priority traffic stay consistent
packets), while the rest of the traffic is additional CPU loading points by across the load line of various
regular mobile broadband (i.e., eMBB). modulating the packet rate/packet size measurement points of CPU loading,
The overall packet size, packet rate, mix to check for latency and jitter at demonstrating a 61% reduction in
and per UE packet rate are described each of these sample points with results latency for high priority traffic over
in Table 7 below. shown in Figure 12 on the following low priority traffic, while the jitter
page. Consistent latency measurements reductions are up to 69%.
With this traffic profile below, the of ~32-40 microseconds was noticed
peak performance at zero packet loss with ~12 microseconds jitter for high
Table 7. Profile 3

5.4 Results for Test Profile – 4 points by modulating the packet rate/ while the jitter reductions are up to
packet size mix to check for latency 62%. It was observed that the UL to DL
This test profile has approximately
and jitter at each of these sample TPT ratio in this test case is
4.4% of high priority traffic (780B
points. Results are shown In Figure significantly higher than first 2 traffic
packets), while the rest of the traffic
13 on the following page. Consistent profiles and does result in slightly
is regular mobile broadband (i.e.,
measurements of latency of ~32-38 lower latency and jitter reductions.
eMBB). The overall packet size, packet
microseconds was measured with ~14 This has been attributed to traffic
rate, and per UE packet rate are
microseconds jitter for high priority generators used in the setup that
described in Table 8 below.
traffic. It was noted that the latency generate ‘n’ times the number of
With the traffic profiles below, the and jitter characteristics for high packets in the DL for every packet
peak performance at zero packet loss priority traffic stay consistent across received in the UL direction (i.e. UL to
was measured and the latency and the load line of various measurement DL packet ratio). In an actual live
jitter characteristics were noted. points of CPU loading, demonstrating network, the actual reductions are
Similar measurements were also a 59% reduction in latency for high expected to be similar to the first two
taken for additional CPU loading priority traffic over low priority traffic profiles.
Table 8. Profile 4
Conclusion latency and 88% reduction in jitter.

The 5G SA UPF collaboration by Intel Hardware-based packet steering and
and SK Telecom demonstrates how a software-based prioritization were used
standards-based solution can be used to achieve results.
to achieve low latency and jitter and
deliver a scalable and flexible system These capabilities show how it is
for network deployments. possible for MNOs to utilize Intel
architecture and software optimizations
The ability to efficiently service eMBB, to meet customer demand and generate
MTC, and URLLC traffic types on the new revenue streams for latency
same NFV infrastructure with high sensitive 5G applications and services.
levels of throughput, utilization, and Use cases that can benefit include
determinism is a key enabler of the factory automation, AI-enabled vision
deployment for virtualized and processing, and video analytics. This
containerized packet core systems for flexible architecture with associated
5G and beyond. capabilities can also reduce CapEx as
Using intelligent classification, steering, different services with different latency
and processing of traffic, this solution, or jitter requirements can own different
based on the Intel® Xeon® processor slices of the same architecture without
6252N processor and Intel® Ethernet compromising each other.
800 Series Network Adapters,
demonstrates low latency packet Looking forward, Intel and SK Telecom
processing in UPF, with significant will continue to collaborate on the 5G
reduction in latency and jitter of the 5G core with new CPU architectures, Intel
user plane. This is accomplished while NIC technologies, and software
still running lower priority traffic at high optimization techniques to demonstrate
rates and infrastructure utilization. further performance improvements for
Based on the test profiles executed, we 5G SA UPF.
demonstrated up to 78% reduction in
Glossary
Term Description
AMF Access and Mobility Management Function
CAGR Compound Annual Growth Rate
COTS Commercial Off-The-Shelf
CPU Central Processing Unit
CoSP Communication Service Provider
CUPS Control and User Plane Separation
DDP Dynamic Device Personalization
DNN Data Network Name
DPDK Data Plane Development Kit
DPI Deep Packet Inspection
EPC Evolved Packet Core
FWA Fixed Wireless Access
HQoS Hierarchical Quality of Service
NEF Network Exposure Function
NFV Network Function Virtualization
NFVI NFV Infrastructure
NIC Network Interface Controller
RAN Radio Access Network
RSS Receive Side Scaling
RTC Run To Completion

SMF Session Management Function
SP Service Providers
SR-IOV Single Root I/O Virtualization
TEM Telecom Equipment Manufacturers
URLLC Ultra Reliable Low Latency Communication
UPF User Plane Function
VM Virtual Machine
VNF Virtual Network Function
Please Recycle 343269-001US

Low Latency 5g Upf Using Priority Based 5g Packet Classification

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Low Latency 5g Upf Using Priority Based 5g Packet Classification

Uploaded by

Copyright:

Available Formats

white paper

Low Latency 5G UPF Using Priority

Abstract Device Personalization (DDP). No

Table of Contents 1. Introduction 5G Systems consist of various network

Motion control < transfer interval value – 50 500 μs

Figure 1. Typical and Desired UPF Latency Behavior

Figure 2. DDP Classification for 5G UPF

Figure 3. DDP Classification Example

packets in a very deterministic way to Queues (ADQ) and Dynamic Device

PDU Type (=0) Spare 1 PDU Type (=1) Spare 1

PPP RQI QoS Flow Identifier 1 Spare QoS Flow Identifier 1

Version IHL DSCP ECN Total Length

Identification Flags Fragment offset

Time to live Protocol Header checksum

Highlighted fields to be used for packet placement (flow director)

Figure 5. N3, N9 & N6 Queue Selection example

The packet processing logic of the VPP

• Maximum read burst size (MAX_RBS)

In every call to the dpdk-input

1. Set current priority P=highest priority

Figure 7. 5G Test Harness

Processor Product Intel® Xeon® Gold 6252N Processor

Frequency 2.3 to 3.5GHz

Cores per Processor 24 Cores/48 Hyper threads

Memory DIMM Slots per Processor 6 Channels per Processor

Capacity 192GB DRAM, 1.5TB DCPMM (FW version 01.02.00.5346)

Memory Speed 2666 MHz, DDR4

Network NIC Intel Foundational NIC: E810_CQDA2 100GbE QSFP

Server Vendor Intel S2600WFT

Host OS Vendor/Version Ubuntu 18.04.2 LTS

Figure 9. UPF Server (2x6252N

Figure 8. Intel S2600WFT Layout

5. Measurements Results Profile Traffic Mix Low Latency Traffic

Figure 10. Profile 1

Figure 11. Profile 2

Figure 12. Profile 3

Figure 13. Profile 4

Conclusion latency and 88% reduction in jitter.

AMF Access and Mobility Management Function

CAGR Compound Annual Growth Rate

COTS Commercial Off-The-Shelf

CPU Central Processing Unit

CoSP Communication Service Provider

CUPS Control and User Plane Separation

DDP Dynamic Device Personalization

DNN Data Network Name

DPDK Data Plane Development Kit

DPI Deep Packet Inspection

EPC Evolved Packet Core

FWA Fixed Wireless Access

HQoS Hierarchical Quality of Service

NEF Network Exposure Function

NFV Network Function Virtualization

NFVI NFV Infrastructure

NIC Network Interface Controller

RAN Radio Access Network

RSS Receive Side Scaling

RTC Run To Completion

SMF Session Management Function

SR-IOV Single Root I/O Virtualization

TEM Telecom Equipment Manufacturers

URLLC Ultra Reliable Low Latency Communication

UPF User Plane Function

VNF Virtual Network Function