Professional Documents
Culture Documents
NFV Performance -
Challenges and Solutions
Ian Wells, Distinguished Engineer
Nikolai Pitaev, Engineer, Technical Marketing
BRKSDN-2411
#CLUS
“What’s in it for me?"
This session will help you to understand bottlenecks, key performance
parameters and optimization techniques for Virtual Network Functions (VNF) on
an NFV platform (NFVI and VIM).
In this session Out of scope
Introduction and overview: what is NFV, NFV
applications versus normal cloud applications, Generic introduction to virtualization basics.
performance measurement.
Bottlenecks on different levels:
1. Physical level (BIOS, NIC)
Detailed description of one specific VNF use
2. Host OS / Hypervisor
case.
3. IO and vSwitch (SR-IOV, VPP, OVS-DPDK)
4. VNF (using hugepages, vCPU pinning)
Performance optimization based on real life
Troubleshooting and debugging deep dive.
projects with Cisco VNFs (CSR1000V, XRv).
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
Agenda
• Introduction
• Bottlenecks in a Linux/KVM/QEMU environment
• Methodology for Performance testing
• Finding optimal VNF setup
• Future and Conclusion
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Cisco Webex Teams
Questions?
Use Cisco Webex Teams to chat
with the speaker after the session
How
1 Find this session in the Cisco Live Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space
#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Introduction
What is Network Function Virtualization (NFV)
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
What do you need from an NFV infrastructure?
A good VNF will run badly on a poor NFVI – choose and tune your
software and hardware carefully.
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Why is it different from a compute-centric cloud?
Network centric use cases – like Virtual Packet Core, virtual Managed
Services, SD-WAN, vBNG – sit in the flow of traffic rather than
answering requests – and it changes the nature of the traffic.
NFV
IMIX
Internet
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Common VNF variations
Server
VM VM
Load balancer
Load balancer
VM VM
VM VM
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
Why it is different from physical
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
NFV performance is needed in all markets and
segments
Service Providers
Public Cloud
Enterprise
Private Cloud
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
NFV Community Landscape
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Bottlenecks and
benchmarks
What do you need?
Performance Consistency
• Ultimate measurements will be • Maintaining this performance over
business relevant, e.g. customers extended periods
per server
• Some features of modern
• But they usually end up meaning hardware (e.g. SpeedStep and
‘fast network performance’ – Turbo Boost) will allow you to get
measured in packets per second or one VM running fast for short
Gbps periods but can’t maintain this on a
loaded system
• PPS is important for vSwitches –
processing cost and limitations are • How is it affected by failover,
related to number of packets upgrades, maintenance…
processed
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Using public clouds for NFV
It can be done Use it wisely
• For instance, running CSRs inside of AWS or • Public cloud SLA – ‘it will work except when
Google cloud to terminate IPSec connections it doesn’t’
• We’ve even run a mobile packet core in • Performance consistency – machines can be
AWS overcommitted and VMs will slow down
when they get more full
• Location – not always in the flow of traffic;
sometimes thousands of miles out of the
way
• Traffic type – clouds work for TCP and UDP
– not for multicast, L2, MPLS, …
• DoS – test a VNF in a public cloud? That’s a
DoS attack!
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Throughput interpretation
Compute node
1Mpps
VNF
vlan1
traffic
generator
vswitch
vlan2
1Mpps
VNF
sent by traffic generator per direction
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Bottlenecks exist on different levels
Guest Guest
User Kernel
VM1
Application .. VM n
Application
At platform level, other bottlenecks may Hypervisor Bottleneck I/O driver I/O driver
affect throughput
• Physical NIC capacity vNIC
Boundary Bottleneck vNIC
Host User
• Virtual Switching
/Qemu
• Hypervisor performance
vSwitch Bottleneck vSwitch
• CPU share
Kernel
• vNIC connection
KVM
Host
pNIC Driver pNIC Driver
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Symptoms for active bottlenecks at different levels
* The simple version
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Symptoms for active bottlenecks at different levels
* For later reference
Drops on the physical NIC Can’t get packets into host as Virtual switch issues – faster switch, or
fast as they’re arriving skip switch with PCI passthrough
Drops on tx from vSwitch to VM is struggling to accept VM starved of CPU – give it more
VM packets CPUs if it can use it, stop other
processes competing with it for CPU
VM is stalling – VM competing for CPU
VM generally accepting traffic with something else – find ways to
but occasional queue isolate it better
overflows Optimize VM placement
Faster VM code – DPDK
VM isn’t fast enough SRIOV
Drops on VM output vSwitch CPU problems Give the vSwitch more cores
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
NFVBench Test Tool
Aim:
• Simulate a VNF running on test infrastructure
• Find out its performance
Test Node • Find out why it doesn’t perform better
TOR-A
TRex
TOR-B
NFVbench
VPC Compute Node N
Compute Node 2
Compute Node 1
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Different packet paths
traffic
DC-SW
generator
vswitch VNF1b
NIC Compute node B
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
NFVBench report
End to end view of drops in the whole path! Traffic generator (TRex)
+----------------------+----------+-----------------+---------------+---------------+-----------------+---------------+---------------+
| Interface | Device | Packets (fwd) | Drops (fwd) | Drop% (fwd) | Packets (rev) | Drops (rev) | Drop% (rev) |
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Methodology for
performance
testing
Optimization depends on the use case and VNF
One high throughput VNF, which takes the whole server with 28
cores needs different optimization compared to 14 x 2 vCPU VNFs.
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Methodology for VNF performance optimization
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Example: vBNG CSR 1000V optimization
1. Test parameters:
RFC 2544 with 8 iterations, Cisco IMIX, 1 Min. per test run, PDR=0.01%
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Will faster CPU provide linear NFV performance
increase?
Two servers with:
Impact of Different server Core Speeds
• 16 core @ 3.2 GHz CSR 1000v, IMIX, SR-IOV, IOS XE 16.3
• 24 core @ 2.6 GHz
SR-IOV with 2 x 10 GE Ports used 7.367
3.2 GHz, 16 core
CEF (IP forwarding) tested 20
6.001
For 1 VM, performance increase 2.6 GHz, 24 core
18.101
proportional to the CPU Cycle
difference 3.2 7.4 0 5 10 15 20 25
≈ linear!
2.6 6 1x2vCPU 3x2vCPU
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Drop Rate Definition has significant impact to
throughput
Typical definitions for Drop Rates Throughput as a func on of acceptable
Traffic Loss (%, normalized, KVM, XE 3.13)
• Non-drop Rate (NDR) = 0 packets lost 180%
120%
100%
60%
to significant higher throughput: 40%
compared to NDR
% of acceptable traffic loss per VM
% increase in Throughput
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
RFC 2544 parameters impact performance
results
binary search = the next transmission rate is one half of the difference between the previous
failed and the previous successful rate.
Example: 800 𝑓𝑎𝑖𝑙 −0 (𝑠𝑢𝑐𝑐𝑒𝑠𝑠)
400 Mbps = 2
Key parameters: resolution, duration of single run, success criteria (drop rate)
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Optimizations
Key BIOS optimization parameters BIOS Host
IO
VNF
vSwitch
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Key Host optimization parameters BIOS Host
IO
VNF
vSwitch
CPU Isolation: Dedicate VM’s CPUs to VMs – don’t run system tasks on them
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Why hugepages? BIOS Host
IO
VNF
vSwitch
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
How to avoid TLB misses BIOS Host
IO
VNF
vSwitch
TLB
4kb Page
4kb Page
4kb Page
4kb Page
...
... CPU
5 entries:
4kb Page
20kb Addressed Memory
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
How to avoid TLB misses BIOS Host
IO
VNF
vSwitch
TLB
2Mb Page
2 entries:
IO
VNF
vSwitch
If your VM loses time in big chunks, it will drop packets when the
input queue fills up
If your VM loses time in smaller chunks, it will underperform
• Remember that 67.2ns? ‘Small’ is a relative term
• (processes usually get scheduled for 25ms at a time)
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
CPU scheduling BIOS Host
IO
VNF
vSwitch
If you have two workloads on a CPU, the Linux kernel schedules them for
you
If you’re running a VM:
• It loses some slices of time while other processes run
• It loses some slices of time when the kernel runs
• It loses some slices of time when interrupts happen
• It can’t do anything about this
If you’re running more VMs cores than physical cores, this will happen
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Real-time kernels BIOS Host
IO
VNF
vSwitch
There are many ways to define “real time”
• Hard realtime is what we’re aiming for – we get predictable CPU time
• Soft realtime is available from a special ‘realtime’ or ‘pre-emptive’ kernel – it keeps
the interruption lengths low so that your desktop doesn’t get jerky, but they still
happen and they still take the same amount of time away from the VM
The pre-emptive kernel doesn’t solve your problem
Tuning for isolation is required
• Keep other processes off of your VM CPUs
• Even if they have nothing to do, if they could run there, the kernel will check to see if they’re
ready
• Allocate your VMs to specific CPUs, and your CPUs to specific VMs
• Redirect hardware interrupts
• If your VM uses a clock interrupt, your CPU will receive a clock interrupt no matter what you do
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Key IO / vSwitch optimization parameters BIOS Host
IO
VNF
vSwitch
Copying a packet from one area of memory to another using CPU is expensive
Avoid multiple packet copy operations by choosing I/O technology:
PCI passthrough (SRIOV*) lets the physical NIC copy packets straight into the VM
DPDK is a userspace library – using it bypasses the kernel, context switches
VPP (a DPDK app) works processes multiple packets in batches for CPU efficiency
OVS-DPDK is a DPDK-ized version of OVS
Kernel space forwarders like conventional OVS are very costly
Optimize QEMU queue size for better absorption of packet arrival rate - example will
follow later in the presentation
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
User space and kernel space forwarders BIOS Host
IO
VNF
vSwitch
Tap Device
OVS / LB
Compute Host Compute Host Compute Host
Kernel space Kernel Drivers Kernel space Kernel space
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
DPDK, the Data Plane Development Kit BIOS Host
IO
VNF
vSwitch
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
DPDK forwarders BIOS Host
IO
VNF
vSwitch
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
How many DPDK worker threads do I need?
BIOS Host
IO
VNF
vSwitch
Number of DPDK worker threads can have positive impact on total system
throughput if I/O path is the bottleneck.
Placement of worker threads on sockets / NUMA nodes does matter!
Balance the interface association to worker threads on the sockets
System Throughput Effect of allocating different number of VPP Worker Threads
(2vCPU, CEF, 0.01% PLR, IOS XE 16.3)
25
System Throughput
20
15
10
0
1 2 3 4 5
1 Worker 6.31 5.286 4.395 4.047 3.979
2 Workers 6.13 8.365 8.199 8.919 9.077
4 Workers 6.328 11.933 23.32
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Key VNF optimization parameters BIOS Host
IO
VNF
vSwitch
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
Design your CPU mapping for better BIOS Host
performance IO
vSwitch
VNF
CPU00 CPU01 CPU02 CPU03 CPU04 CPU05 CPU06 CPU07 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 CPU17
Socket0 Socket1
Physical Physical
Interface 1 Interface 2
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Same example, different design BIOS Host
IO
VNF
vSwitch
Do you see any room for improvement in following design?
CPU00 CPU01 CPU02 CPU03 CPU04 CPU05 CPU06 CPU07 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 CPU17
Socket0 Socket1
To improve: Physical
Interface 1
Physical
Interface 2
1. physical NIC – VPP mismatch
2. CSR3 – socket crossing “tax”
3. Emulator pin for VMs 4-6 on different socket
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Example
Example: Multi-VM and Multi-Feature CSR
1000V performance with SR-IOV and VPP
x86 Host (FD.io VPP)
Test methodology and Profile: VM 1
Application .. VM n
Application
Guest User
• Features
DPDK-VirtIO Ptr
DPDK-VirtIO
Ptr Ptr Ptr
Bi-Directional NAT, Firewall, ... Ptr Ptr
Ptr
Ptr
Ptr
Ptr
Ptr
Ptr
Kernel
Guest
• 2 min test time
• UDP IMIX IPv4 vNIC Shared Pkt Mem vNIC
Host User
/Qemu
(vHost_user) Pkt Pkt Pkt Pkt
(vHost_user)
Pkt Pkt Pkt Pkt
Server details:
FD.io VPP
Kernel
Host
• RHEL 7.2 on UCS C240 M4L pNIC Driver
Traffic Generator
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
Multi-Feature Test Results
Insignificant difference between SR- Total System Throughput vs. Number of VNFs
IOV and VPP 2vCPU, multi-feature Set, IOS XE 3.16, 0.01% PLR
20
18
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Default QEMU queue size was the main bottleneck
CEF results with SR-IOV Total System Throughput vs. Number of VNFs
2vCPU, multi-feature Set, IOS XE 16.3, 0.01% PLR
Variability due to features 40
NAT, Firewall, 20
15
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
The Future
Realtime workloads
Realtime has many definitions – here, we mean ‘guaranteeing to do work
before it is due 100% of the time’, hard realtime
• 5G is coming, and with it it brings Cloud RAN
• Cloud RAN puts cell site radio control into virtual machines
• Cell site radio likes to hear from its software regularly
• … like, 1000 times a second regularly
• Do you work within 1ms or don’t bother
• And if you don’t bother, everyone’s calls hang up
This is a new field for NFV, one that we’re already working in; we’ll keep you
updated as we learn more
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
Why are we talking about VMs?
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
What are we looking for in a platform?
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
Containers as we use them today
Host
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
One vision for NFV
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
What is memif?
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
Community progress
Divide and conquer – what NFV requires for packets is not what
Kubernetes offers for traditional microservices
Network service mesh project
• Inspired by Istio microservice mesh
• Delivers packets to containers based on service definitions
• Provides appropriate – SRIOV, memif or other – network
interfaces that are efficient at delivering packets
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
Summary: Key messages
IO
VNF
vSwitch
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
Complete your
online session • Please complete your session survey
evaluation after each session. Your feedback
is very important.
• Complete a minimum of 4 session
surveys and the Overall Conference
survey (starting on Thursday) to
receive your Cisco Live water bottle.
• All surveys can be taken
in the Cisco Live Mobile App.
Cisco Live sessions will be available for viewing
on demand after the event at ciscolive.cisco.com.
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Continue your education
Demos in the
Walk-in labs
Cisco campus
#CLUS BRKSDN-2411 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Thank you
#CLUS
#CLUS