You are on page 1of 127

NSX Design: Networking

Data Center Profile


Experience, background, and history
• Name: Paul A. Mancuso
• VMware Instructor/VCI 2006:
• Cisco Instructor/CCSI: 1996
• Industry Experience: 24+ years
– VCDX-NV, VCI, CCSI, CCNP Data Center,
MCSE 2012 (Since 1994), MCT, CISSP
• Contact/Email/Twitter
• pmancuso@vmware.com
• datacentertrainer@gmail.com
• @pmancuso
• 954.551.6081

• Background:
• Current Emphasis: Datacenter technologies, Network Virtualization, Server and Desktop virtualization, SAN switching
• Overview: Started in LAN Networking, Network Mgmt Systems, and gradually moved into Data Center Architecture and Management
• Publications: Author of MCITPro 70-647: Windows Server 2008 R2 Enterprise Administration
• Publication: MCITPro 70-70-237: Designing Messaging Solutions with Microsoft® Exchange Server 2007
• Certifications: Dozens of Networking/DataCenter/SAN switching/Server Administration technical certifications from Cisco, VMware, Microsoft, Novell, and Industry recognized
security certifications from ISC2 (CISSP),
• Education: Graduated with honors from Ohio State University with Bachelor of Science degree in Zoology (Pre-Med) and minors in Finance and Economics.

• History:
• VMware NSBU: Technical Enablement Architect 02/2015- Prsent
• Firefly Director of Cisco Integration for VMware and Microsoft Integration 01/2013- 2/2015
• Firefly Senior Instructor and PLD for Cisco Data Center Virtualization 11/2009 – 12/2012
• CEO NITTCI: Prof Services, Training, Content Development (Courseware and books) 10/2003 – Present
• CEO Dynacomp Network Systems: Consulting, Training, Courseware Development 02/1989 – 10/2003
• Including LAN Networking specializing in Novel Netware, Directory Services, LAN design
Agenda – Part 1

1 vSphere Distributed Switch (Whiteboard)

2 NSX for vSphere Overview

3 Physical Network Design Considerations

4 vSphere Design Considerations for NSX

5 NSX Design Considerations


Session Objectives

 vSphere Distributed Switching


 NSX Introduction
 Overview of Physical Network designs for Network Virtualization
 Cover vSphere design impacts on VMware NSX
 Highlight key design considerations in NSX for vSphere deployments
 Also refer to the NSX-v Design Guide:
 https://www.vmware.com/files/pdf/products/nsx/vmw-nsx-network-
virtualization-design-guide.pdf
Agenda – Part 1

1 vSphere Distributed Switch (Whiteboard)

2 NSX for vSphere Overview

3 Physical Network Design Considerations

4 vSphere Design Considerations for NSX

5 NSX Design Considerations


vSphere and
vSphere Distributed Switch
Whiteboard Discussion

CONFIDENTIAL
6
Agenda – Part 1

1 vSphere Distributed Switch (Whiteboard)

2 NSX for vSphere Overview

3 Physical Network Design Considerations

4 vSphere Design Considerations for NSX

5 NSX Design Considerations


NSX for vSphere Overview
Quick refresh on NSX components and basic architecture
NSX Customer and
Business Momentum

1200+
NSX Customers

250+
Production Deployments
(adding 25-50 per QTR)

100+
Organizations have spent
over US$1M on NSX

Stats as of end of Q4 2015


Today’s situation
Internal and external forces

Better security Faster time to market Higher availability

Be more efficient

Run things cheaper


Our vision:
Deliver

Inherently secure IT at the speed Data center anywhere


infrastructure of business

Be more efficient Improved data center operations

Run things cheaper CapEx (Increase compute efficiency, ensure full life of network hardware, etc.)
Primary Use Cases with NSX
Security: Automation: Application Continuity:
Inherently Secure IT at the Speed Datacenter
Theme Infrastructure of Business Anywhere

Micro-segmentation IT Automating IT Disaster Recovery


Lead
Project

Reduce infrastructure
Secure infrastructure
provisioning time from weeks Reduce RTO by 80%
at 1/3 the cost
to minutes
Value
DMZ Anywhere Developer Cloud Metro/Geo Pooling

Other
Projects
Secure End User Multi-tenant NSX in Public Cloud
Infrastructure

12
What is NSX?

Provides
A Faithful Reproduction of Network & Security Services
in Software

Switching Routing Firewalling Load VPN Connectivity


Balancing to Physical

13
NSX Components
Cloud Consumption • Self Service Portal
• vCloud Automation Center, OpenStack,
Custom CMP

NSX Manager
• Single configuration portal
Management Plane • REST API entry-point
Logical Network

NSX Controller
• Manages Logical networks
Control Plane • Control-Plane Protocol
• Separation of Control and Data Plane

Distributed Services
NSX Edge
• High – Performance Data Plane
Data Plane • Scale-out Distributed Forwarding Model
Logical Distributed Firewall
Switch Logical Router

Hypervisor Kernel Modules


ESXi
Physical
Network


NSX vSwitch and NSX Edge
ESXi

NSX vSwitch

Hypervisor Kernel Modules


(vSphere VIBs)

NSX Logical
VDS VXLAN Logical Router Firewall NSX Edge Services GW
Router Control VM

vSphere NSX Edge Logical Router NSX Edge Services GW

 NSX vSwitch (VDS)  Control Functions only  ECMP, Dynamic Routing

 VMkernel Modules  Dynamic routing and • BGP & OSPF

• VXLAN
updates controller  L3-L7 Services:
• Distributed Routing  Determines active ESXi • NAT, DHCP, Load Balancer,
host for VXLAN to VLAN VPN, Firewall
• Distributed Firewall
layer 2 bridging
• Switch Security  VM form factor
• Message Bus  High Availability

15 | 22
Virtual Networks (VMware NSX)

Design decision: Should VMware NSX™ be included in the design?

 A virtual network is a software container that delivers network services.


 VMware NSX virtualized logical switching (layer 2) over existing physical networks.
 VMware NSX virtualized logical routing (layer 3) over existing physical networks.
 VMware NSX also provides the following features:
• Logical Firewall: Distributed firewall, kernel integrated, high performance
• Logical Load Balancer: Application load balancing in software
• Logical Virtual Private Network (VPN): Site-to-site and remote access VPN in software
• VMware® NSX API™: REST API for integration into any cloud management platform
 For more information see VMware NSX Network Virtualization Design Guide at
http://www.vmware.com/files/pdf/products/nsx/vmw-nsx-network-virtualization-
design-guide.pdf

16
Virtual Extensible LANs (VXLAN)
Design decision: Should VXLAN be included in the design?

• Ethernet in IP overlay network:


• Entire L2 frame encapsulated in User Datagram Protocol (UDP)
• 50+ bytes of overhead
• VXLAN can cross layer 3 network boundaries.
• Allows network boundary devices to extend virtual network boundaries over physical IP
networks.
• Expands the number of available logical Ethernet segments from 4094 to over 16 million
logical segments.
• Encapsulates the source Ethernet frame in a new UDP packet.
• VXLAN is transparent to virtual machines.
• VXLAN is an overlay between VMware ESXi hosts. Virtual machines do not see VXLAN
ID.
17
VXLAN Terms
– A VTEP is an entity that encapsulates an Ethernet frame in a VXLAN frame or de-
encapsulates a VXLAN frame and forwards the inner Ethernet frame.
– A VTEP proxy is a VTEP that forwards VXLAN traffic to its local segment from another
VTEP in a remote segment.
– A transport zone defines members or VTEPs of the VXLAN overlay:
• Can include ESXi hosts from different VMware vSphere® clusters
• A cluster can be part of multiple transport zones
– A VXLAN Number Identifier (VNI) is a 24-bit number that gets added to the VXLAN frame:
• The VNI uniquely identifies the segment to which the inner Ethernet frame belongs
• Multiple VNIs can exist in the same transport zone
• VMware NSX for vSphere starts with VNI 5000

18
VXLAN Frame Format
Original L2 frame header and payload is encapsulated in a UDP/IP
packet
 50 bytes of VXLAN overhead
• Original L2 header becomes payload, plus: VXLAN, UDP, and IP headers
Frame VXLAN Packet

FCS
Original Frame Header
MAC Header IP Header UDP Header VXLAN Header & Payload

Outer MAC Header Outer IP Header Outer UDP Header VXLAN Header Inner L2
Destination Address 6 Misc Data 9 Source Port 2 VXLAN Flags 1 Destination Address 6

Source Address 6 Protocol 0x11 1 VXLAN Port 2 Reserved 3 Source Address 6

Payload 1500
VLAN Type 0x8100 2 Header Checksum 2 UDP Length 2 VNI 3 VLAN Type 0x8100 2

VLAN ID Tag 2 Source IP 4 Checksum 0x0000 2 Reserved 1 VLAN ID Tag 2

Ether Type 0x0800 2 Destination IP 4 8 8 Ether Type 0x0800 2

14+ 20 14+

VXLAN Overhead

19
NSX for vSphere VXLAN Replication Modes
NSX for vSphere provides three modes of traffic replication (one that is Data Plane based and two that are
Controller based)

 Multicast Mode
Requires IGMP for a Layer 2 topology and
Multicast Routing for L3 topology

 Unicast Mode
All replication occurs using unicast

 Hybrid Mode
Local replication offloaded to physical network,
while remote replication occurs via unicast

 All modes require an MTU of 1600 bytes


VXLAN Replication: Control Plane
• In unicast or hybrid mode, an ESXi host sending the
communication will select one VTEP in every remote NSX Controller
segment from its VTEP mapping table as a proxy. This VXLAN Directory
Service
selection is per VNI (balances load across proxy VTEPs).
• In unicast mode, this proxy is called a Unicast Tunnel End
Point (UTEP). MAC Table

• In hybrid mode, this proxy is called a Multicast Tunnel End ARP Table
Point (MTEP).
• This list of UTEPs or MTEPs is NOT synced to each VTEP. VTEP Table

• If a UTEP or MTEP leaves a VNI, the ESXi host sending the


communication will select a new proxy in the segment
NSX – Logical View

Web LS VM1 VM2 VM3 Edge Routing Service


172.16.10.0/24 - NAT
Distributed Logical - Firewall Physical Routers
Router - LB
Logical Firewall

Transit LS VLAN PG

App LS VM4 VM5


172.16.20.0/24

Logical Switching

CONFIDENTIAL 22
Enterprise Topology
• A common enterprise-level topology.

External Network

Physical Router

VLAN 20
Uplink

NSX Edge Services


Gateway

VXLAN 5020
Uplink
LR Instance 1

Web1 App1 DB1 Web2 App2 DB2 Webn Appn DBn

VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM
Servicer Provider: Multiple Tenant Topology
• Multiple tenants to the same NSX Edge gateway.

External Network

NSX Edge Services


Gateway

VXLAN 5020 VXLAN 5030


Uplink Uplink

Tenant 1 Tenant 2
LR Instance 1 LR Instance 2

VM VM VM VM VM VM VM VM VM VM VM VM

Web Logical Switch App Logical Switch DB Logical Switch App Logical Switch DB Logical Switch
Web Logical Switch
NSX Multiple Tenant Topology (IP Domain Separation)

External Network

NSX Edge Services


Gateway
VXLAN 5021 to
VXLAN 5029 VXLAN 5031 to
VXLAN 5039

Tenant 1.. Tenant 9 Tenant 10.. Tenant 19

LR Instance 1 LR Instance 10

Web Logical Web Logical


App Logical Switch DB Logical Switch App Logical Switch DB Logical Switch
Switch Switch

VM VM VM VM VM VM VM VM VM VM VM VM

Web Logical Switch App Logical Switch DB Logical Switch Web Logical Switch App Logical Switch DB Logical Switch
NSX– Physical View NSX Edge Controller Manager

VM4 VM5
VM1 VM2 VM3

App LS
Web Logical Switch

Transport Zone

Compute Cluster Compute Cluster Edge Cluster Management Cluster

Physical Network

Transport Subnet A 192.168.150.0/24 Transport Subnet B 192.168.250.0/24

26
CONFIDENTIAL
Management Plane Components

vRA/Openstack/Custom

vSphere APIs NSX REST APIs

Management Plane
1:1
vCenter NSX Manager

NSX Manager NSX Manager


vSphere Plugin 3rd Party
Management Console
Single Pane of Glass
NSX Control Plane Components

vSphere Cluster
 vSphere HA
 DRS with Anti-affinity
NSX Controllers

Properties
 Virtual Form Factor (4 vCPU, 4GB RAM) Host Agent

 Data plane programming


Data-Path Kernel Modules
 Control plane Isolation
 Benefits
VM ESXi
 Scale Out VM VM

 High Availability
 VXLAN - no Multicast
 ARP Suppression

28
Deploying and Configuring VMware NSX
Consumption
Deploy VMware NSX
+ + +

Programmatic
Virtual
Network Deployment
NSX NSX
Mgmt Edge

Virtual Infrastructure
Logical Networks
Component Deployment
One Time

Logical Network/Security Services

Recurring
Deploy NSX Manager
Deploy NSX Controller Cluster Deploy Logical Switches per tier

Deploy Distributed Logical Router


Preparation or connect to existing

Host Preparation Create Bridged Network


Logical Network Preparation
29
Cross-VC NSX Logical Networks
Universal Object Configuration
(NSX UI & API)
Universal Configuration Synchronization

Universal
Controller
USS Cluster

vCenter & NSX Manager A vCenter & NSX Manager B vCenter & NSX Manager H

Primary Secondary Secondary

Local VC Inventory Local VC Inventory Local VC Inventory

Universal Distributed Logical Router


Universal Logical
Switches

Universal
DFW

CONFIDENTIAL 30
Cross-VC NSX Components & Terminology
• Cross-VC NSX objects use the term Universal and include:
– Universal Synchronization Service (USS)
– Universal Controller Cluster (UCC)
– Universal Transport Zone (UTZ)
– Universal Logical Switch (ULS)
– Universal Distributed Logical Router (UDLR)
– Universal IP Set/MAC Set
– Universal Security Group/Service/Service Group

• NSX Managers have the following roles:


– Standalone
– Primary
– Secondary
– Transit

• Universal Distributed Logical Routing adds:


– Locale ID
CONFIDENTIAL 31
Agenda – Part 1

1 vSphere Distributed Switch (Whiteboard)

2 NSX for vSphere Overview

3 Physical Network Design Considerations

4 vSphere Design Considerations for NSX

5 NSX Design Considerations


Classical Access/Agg/Core Network

WAN/Internet
 VLANs carried throughout the Fabric
 L2 application scope is limited to a
L3 L3
single POD L2
L2
 Default gateway – HSRP/VRRP at the
aggregation layer
 Ideally multiple aggregation PODs, to
limit the Layer 2 domain size, although
not always the case POD A POD B

 Inter POD traffic is L3 routed


WAN/Internet
Physical Network Trends
L3 L3
• From 2- or 3-tier to spine/leaf fabrics L2 L2

• Density & bandwidth jump

• ECMP for layer 3 (and layer 2)


POD A POD B

• Reduce network oversubscription

• Wire & configure once

• Uniform configurations
L3
L2

WAN/Internet
34
L3 Fabric Topologies & Design Considerations

 L3 ToR designs have dynamic routing


protocol between leaf and spine
 BGP, OSPF or ISIS can be used L3
L2
 Rack advertises small set of prefixes (one
per VLAN/subnet).
WAN/Internet
 Equal cost paths to the other racks prefixes
 801.Q trunks with a small set of VLANs for L3 Uplinks

VMkernel traffic
 ToR provides default gateway service for L3
each VLAN subnet L2

• L2 Fabric designs are also available

...
VLAN Boundary 802.1Q
Hypervisor 1

802.1Q Hypervisor n
Physical Fabric Options with NSX

 Network Virtualization enables greater scale and flexibility regardless


of physical network design
 NSX works over any reliable IP Network supporting 1600 byte MTU,
these are the only requirements
 Most customers are still using hierarchical networks – which work
great with NSX
 NSX capabilities are independent of network topology

 NSX enables both choice and protection of existing investments


Agenda – Part 1

1 vSphere Distributed Switch (Whiteboard)

2 NSX for vSphere Overview

3 Physical Network Design Considerations

4 vSphere Design Considerations for NSX

5 NSX Design Considerations


vSphere Design Considerations
for NSX
vSphere Cluster Design – Collapsed Edge/Infra Racks

Spine
WAN
Internet
L3
L2

Leaf L2 VLANs
for bridging
L3 L3
Edge Leaf (L3 to DC Fabric, L2 to
L2 L2 External Networks)
vCenter 1
Max supported
number of VMs
Edge Clusters

vCenter 2 Storage
Max supported
number of VMs Management
Cluster
Compute Clusters Infrastructure Clusters (Edge, Storage,
vCenter and Cloud Management System)

Cluster location determined by connectivity requirements


vSphere Cluster Design – Separated Edge/Infra

WAN
Spine
Internet
L3
L2

Edge Leaf (L3 to


DC Fabric, L2 to
External Networks)
L3 L3
Leaf
L2 L2
vCenter 1
Max supported
number of VMs

vCenter 2
Max supported
number of VMs

Compute Clusters Infrastructure Clusters Edge Clusters (Logical Router


(Storage, vCenter and Cloud Control VMs and NSX Edges)
Management)

Cluster location determined by connectivity requirements


Management and Edge Cluster Requirements
Routed DC Fabric

 Management Cluster L3 L3
Leaf L2 L2
L2 required for Management workloads such
as vCenter Server, NSX Controllers, NSX L2

Manager and IP Storage which use VLAN


backed networks VMkernel VMkernel
VLANs VLANs for VLANs
Management
VMs

WAN
Routed DC Fabric Internet
 Edge Cluster L3
L2
L2 required for External 802.1Q VLANs &
Edge Default GW
Needed as Edge HA uses GARP to announce L3 L3

new MAC in the event of a failover L2 L2


VLANs for L2 and
L3 NSX Services

VMkernel VLANs
L2 Fabric - Network Addressing and VLANs Definition Considerations

L3 POD A POD B
Compute Racks - IP Address Allocations
and VLANs L2
Function VLAN ID IP Subnet
Management 66 10.66.Y.0/24
vMotion 77 10.77.Y.0/24
VXLAN 88 10.88.Y.0/24
Storage 99 10.99.Y.0/24
Compute
Cluster
L2 Fabric A
32 Hosts

For L2 Fabric – Y identifies the POD number


Compute
Cluster
B
32 Hosts

VMkernel VLAN/IP Subnet Scope VMkernel VLAN/IP Subnet Scope

VXLAN Transport Zone Scope (extends across ALL PODs/clusters)


L3 Fabric - Network Addressing and VLANs Definition Considerations

Compute Racks - IP Address Allocations


and VLANs
Function VLAN ID IP Subnet
Management 66 10.66.R.x/26 L3
vMotion 77 10.77.R.x/26 L2
VXLAN 88 10.88.R.x/26
Storage 99 10.99.R.x/26
Compute
L3 Fabric Cluster A
32 Hosts

For L3 Fabric – R identifies the Rack number

Compute
Cluster B
32 Hosts

VMkernel VLAN/IP Subnet VMkernel VLAN/IP Subnet


Scope Scope

VXLAN Transport Zone Scope (extends across ALL racks/clusters)


VMkernel Networking
Routed uplinks (ECMP)
SVI 66: 10.66.1.1/26 L3 ToR Switch
SVI 77: 10.77.1.1/26
SVI 88: 10.88.1.1/26
SVI 99: 10.99.1.1/26

VLAN 66 VLAN 77 VLAN 88 VLAN 99


Span of VLANs

Span of VLANs
VLAN Trunk (802.1Q)

vSphere Host (ESXi)

A1
Mgmt vMotion VXLAN Storage
10.66.1.25/26 10.77.1.25/26 10.88.1.25/26 10.99.1.25/26
DGW: 10.66.1.1 GW: 10.77.1.1 DGW: 10.88.1.1 GW: 10.99.1.1

44
Slide 44

A1 This one should precede slide 35


Author, 1/30/2015
VMkernel Network Addressing
 To keep static routes manageable as the fabric scales, larger address blocks can be
allocated for the VMkernel functions (/16 as an example):
• 10.66.0.0/16 for Management
• 10.77.0.0/16 for vMotion
• 10.88.0.0/16 for VXLAN
• 10.99.0.0/16 for Storage

 Dynamic routing protocols (OSPF, BGP) used to advertise to the rest of the fabric
 Scalability and predictable network addressing, based on number of ESXi hosts per rack or cluster
 Reduces VLAN usage by reusing VLANs with a rack (L3) or POD (L2)
VMkernel Networking
 Multi instance TCP/IP Stack  Separate routing table, ARP table and default
• Introduced with vSphere 5.5 and leveraged by: gateway per stack instance
VXLAN  Provides increased isolation and reservation
NSX vSwitch transport network of networking resources
 Enables VXLAN VTEPs to use a gateway
independent from the default TCP/IP stack
 Management, vMotion, FT, NFS, iSCSI
leverage the default TCP/IP stack in 5.5
 VMkernel VLANs do not extend beyond the
rack in an L3 fabric design or beyond the
cluster with an L2 fabric, therefore static
routes are required for Management, Storage
and vMotion Traffic
 Host Profiles reduce the overhead of
managing static routes and ensure
persistence

46
VMkernel Networking
 Static Routing
• VMkernel VLANs do not extend beyond the rack in an L3 fabric design or beyond the cluster with an L2
fabric, therefore static routes are required for Management, Storage and vMotion Traffic
• Host Profiles reduce the overhead of managing static routes and ensure persistence
• Follow the RPQ (Request for Product Qualification) process for official support of routed vMotion.
Routing of IP Storage traffic also has some caveats

• A number of customers have been


through the RPQ process and use
routed vMotion with full support
from VMware today
• Future enhancements will simplify
ESXi host routing and enable
greater support for L3 network
topologies

47
VMkernel Networking
 VMkernel Teaming Recommendations
• LACP (802.3ad) provides optimal use of available bandwidth and quick convergence,
but does require physical network configuration

• Load Based Teaming is also a good option for VMkernel traffic where there is a desire NSX vSwitch

to simplify configuration and reduce dependencies on the physical network, while still
effectively using multiple uplinks
ESXi Host
• Explicit Failover allows for predictable traffic flows and manual balancing of VMkernel
traffic

• Refer to VDS best practices White Paper for more details on common configurations:
http://www.vmware.com/files/pdf/techpaper/vsphere-distributed-switch-best-
practices.pdf

• 2x 10Gbe network adapters per server is most common


• Network partitioning technologies tend to increase complexity

 Overlay Networks are used for VMs


• Use VLANs for VMkernel interfaces to avoid circular dependencies Physical Switch

• NSX introduces support for multiple VTEPs per host with VXLAN

48
Recap: vCenter – Scale Boundaries
10,000 powered on VMs
DC Object vCenter Server 1,000 ESXi hosts
Max. 500 hosts 128 VDS

Cluster
Max. 32 hosts

ESXi ESXi ESXi ESXi ESXi ESXi ESXi ESXi

VDS 1 VDS 2

DRS-based vMotion

Manual vMotion
NSX for vSphere – Scale Boundaries
Cloud Management System

1:1 mapping of
vCenter to NSX API NSX API
NSX Cluster vCenter Server vCenter Server
(Manager) (Manager)

Controller Cluster Controller Cluster

ESXi ESXi ESXi ESXi ESXi ESXi ESXi ESXi

VDS VDS VDS

DRS-based vMotion

Manual vMotion

Logical Network Span


vSphere Cluster Design for NSX
 There are two common models for cluster design with NSX for vSphere:
• Option 1 with a single vCenter Server attached to Management, Edge and
Compute Clusters
• This allows NSX Controllers to be deployed into the Management Cluster
• Reduces vCenter Server licensing requirements
• More common in POCs or small environments

Management Cluster Edge Cluster


Compute A Compute Z
vCenter
Server A WebVM VM WebVM VM
NSX Manager
vCAC WebVM VM
WebVM VM

NSX Edges & WebVM VM WebVM VM

NSX DLR Control VMs


Controller Cluster VM VM
WebVM WebVM
vSphere Cluster Design for NSX
 Option 2
• A common VMware services best practice to have the Management Cluster managed
by a dedicated vCenter Server
• In this case NSX Manager would be attached to the vCenter Server managing the Edge
and Compute Clusters
• NSX Controllers must be deployed into the same vCenter Server NSX Manager is
attached to, therefore the Controllers are also deployed into the Edge Cluster

Management Cluster Edge Cluster Compute A Compute Z

WebVM VM WebVM VM
vCenter vCenter
Server A Server B
WebVM VM WebVM VM
vCAC

NSX Manager NSX


Controller WebVM VM WebVM VM
Cluster

NSX Edges & WebVM VM WebVM VM


DLR Control VMs
Agenda – Part 1

1 vSphere Distributed Switch (Whiteboard)

2 NSX for vSphere Overview

3 Physical Network Design Considerations

4 vSphere Design Considerations for NSX

NSX Design Considerations


5
(Multiple Sections)
NSX Manager and Controller
Design Considerations
NSX Manager

 NSX Manager is deployed as a virtual appliance


• 4 vCPU, 12 GB of RAM per node
• Consider reserving memory for VC to ensure good Web Client performance
 Resiliency of NSX Manager provided by vSphere HA
 Catastrophic failure of NSX Manager is rare, however periodic backup is recommended to
restore to the last known configuration
• If NSX Manager is unavailable, existing data plane connectivity is not impacted

NSX Manager NSX Manager


vSphere Plugin 3rd Party
Management Console
NSX Controllers
 Controller nodes are also deployed as virtual appliances
API Provider
• 4 vCPU, 4GB of RAM per controller node
• CPU Reservation of 2048 MHz Persistence Server
• No memory reservation required
Logical Manager
• Modifying settings is not supported
 Can be deployed in the Mgmt or Edge clusters Switch Manager
 Cluster size of 3 Controller nodes is the only supported configuration Directory Server
 Controller majority is required for having a functional controller cluster
Controller Cluster Node
• Existing Data plane maintained even under complete controller cluster failure
 By default the DRS and anti-affinity rules are not enforced for controller
deployment
• The recommendation is to manually enable DRS and anti-affinity rules
• Minimum 3 host is required to enforce an anti-affinity rule keeping the
Controller VMs on separate hosts
NSX Controllers
 NSX Controllers must be deployed into same vCenter Server that NSX Manager is
attached to
 Controller password is defined during deployment of the first node and is consistent
across all nodes
 Controllers require connectivity to NSX Manager and vmk0 (Management VMkernel
interface) on all ESXi hosts participating in NSX Logical Networks
 NSX Control Plane Protocol operates on TCP port 1234 – connections are initiated
from ESXi hosts to Controllers
 Internal API on TCP port 443 – NSX Manager is the only consumer
 Controller interaction is via CLI, while configuration operations are also available
through NSX for vSphere API
NSX Control Plane Security

• NSX Control Plane communication occurs over the management


network
• The Control Plane is protected by:
• Certificate based authentication
• SSL

• NSX Manager generates self-signed certificates for each of the ESXi


Hosts and Controllers
• These certificates are pushed to the Controller and ESXi hosts over
secure channels
• Mutual authentication occurs by verifying these certificates
NSX Control Plane Security
Certificate
NSX Manager 1 Generation

5 SSL NSX Manager DB


OVF
3 4 Bus
Message REST API 2 Deployment

UW Agent VTEP UW Agent VTEP

UW Agent VTEP UW Agent VTEP


Controller Cluster

UW Agent VTEP 5 SSL 5 SSL UW Agent VTEP

vSphere Cluster A vSphere Cluster B


NSX Management Plane Security

• NSX Management Plane communication also occurs over the


management network
• The following secure protocols are used:
– REST API (HTTPS)
– VC APIs (HTTPS)
– Message bus (AMQP)
– Fallback to VIX (SSL)
Designing VXLAN Logical
Switching and vDS
Design Considerations – VDS and Transport Zone Management Cluster

Compute A Compute N

vCenter
WebVM VM
Server
WebVM VM

WebVM VM WebVM VM Controller Cluster NSX Edges


NSX Manager

VXLAN Transport Zone Spanning Three Clusters

Compute VDS Edge VDS

VTEP 192.168.230.100 VTEP 192.168.240.100 VTEP 192.168.220.100

vSphere Host vSphere Host vSphere Host


Compute Cluster 1

192.168.230.101 192.168.240.101 192.168.220.101

vSphere Host vSphere Host vSphere Host


Compute Cluster N

Edge Cluster
VDS Uplink Connectivity Options in NSX
 NSX supports multiple teaming policies for VXLAN traffic
 NSX for vSphere also supports multiple VTEPs per ESXi host (to load balance VXLAN traffic across
available uplinks)

Teaming and Failover Mode NSX Support Multi-VTEP Support

Route based on Originating Port ✓ ✓

Route based on Source MAC hash ✓ ✓

LACP ✓ ×
Route based on IP Hash (Static Ether Channel) ✓ ×
Explicit Failover Order ✓ ×
Route based on Physical NIC Load (LBT) × ×

63
Uplink Connectivity Recommendation for VXLAN Traffic
 Teaming and Failover mode recommendation for VXLAN traffic depends on:
VXLAN bandwidth requirements per ESXi host
NSX Administrator’s familiarity with Networking configuration

Recommended Teaming Benefits


and Failover Mode

- Simplicity of configuration and troubleshooting


- Single uplink can handle VXLAN traffic requirements (current generation blade servers can do 20+ Gbps
Explicit Failover
bi-directional traffic e.g. UCS B200 M3)
- Separate all infrastructure traffic to one uplink and all VXLAN traffic to the other

- Standards based, multiple active uplinks for VXLAN traffic


LACP - More advance configuration compared to Explicit Failover
- Dependency on MLAG/vPC support on physical switches (for ToR redundancy)

- Use where LACP isn’t available or bandwidth requirements for VXLAN traffic exceed a single uplink
Load Balance – SRC ID
- Recommended for Edge cluster to avoid complexity/support of routing over LACP

 Route based on Src-ID with Multi-VTEP works well, but it is a more advanced configuration

CONFIDENTIAL 64
Network Adapter Offloads

VXLAN TCP - Operating system sends large sized TCP packets to NIC
Segmentation Offload (VXLAN encapsulated)
(VXLAN TSO)
- NIC segments packets as per physical MTU

- NIC distributes packets among queues


Receive Side Scaling
- Unique receive thread per queue to drive multiple CPUs

Important features for NSX performance

CONFIDENTIAL 65
VMware internal slide – do not share

NIC Drivers Support for VXLAN TSO and RSS


vSphere 5.5 Inbox Driver Async Driver for vSphere 5.5
NIC Model /
NIC Driver
Controller Version VXLAN TSO RSS Version VXLAN TSO RSS

82599
Intel (ixgbe) X540 3.7.13.7.14iov‐NAPI Yes Yes  3.21.4 Yes Yes
I350

57810
Broadcom (bnx2x) 57711
1.72.56.v55.2 No No 1.78.58.v55.3 Yes Yes

Connect X-2
Mellanox is planning an async release to 
Mellanox (mlx4_en) Connect X3 1.9.7.0 No Yes
Connect X3 Pro support VXLAN TSO for Connect X3‐Pro

Cisco VIC (enic) VIC 12xx 2.1.2.50 No No 2.1.2.59 No No

BE2
No No Emulex is planning an async release to 
BE3
Emulex (elxnet)
Skyhawk Yes No support RSS

CONFIDENTIAL 66
VXLAN Design Recommendations
 Unicast Mode is appropriate for small deployments, or L3 Fabric
networks where the number of hosts in a segment is limited
 Hybrid Mode is generally recommended for Production deployments
and particularly for L2 physical network topologies
 Hybrid also helps when there are is multicast traffic sourced from
VMs
 Validate connectivity and MTU on transport network before moving on
to L3 and above
 Not all network adapters are created equal for VXLAN

 Don’t overlap Segment IDs across NSX Domains


L2 Bridging - VXLAN to VLAN
Overlay to VLAN Gateway Functionality
• The Overlay to VLAN gateway allows communication between virtual and physical world

Physical Workload
VM

VXLAN
VLAN L2 payload

NSX: Virtual Network, Physical Network


VXLAN tunnels VXLAN  VLAN VLAN backed network
gateway

69
Use Cases: Migration
• L2 as well as L3
• Virtual to virtual, physical to virtual
• Temporary, bandwidth not critical

Physical to Virtual Physical Workload


VM

Virtual to Virtual Virtualized


VM
Workload (VLAN backed)

VXLAN VLAN
70
Use Cases: Integration of non-Virtualized Workloads
• Typically necessary for integrating a non-virtualized appliance
• A gateway takes care of the on ramp/off ramp

Physical Services / Workload

VM

VXLAN VLAN
71
Software Layer 2 Gateway Form Factor
• Native capability of NSX
• High performance VXLAN to VLAN gateway in hypervisor kernel
Scale-up
Flexibility & Operations
• x86 performance curve
• Rich set of stateful services
• Encapsulation & encryption offloads
• Multi-tier logical routing
Scale-out as you grow
• Advanced monitoring
• Single gateway can handle all P/V traffic
VLAN 10
• Then additional gateways can be introduced VLAN 20
VLAN 30

72
Hardware Layer 2 Gateway Form Factor
• Some partner switches integrate with NSX and
provide VXLAN to VLAN gateway in hardware
Software Gateway: L2 extended
• Main benefits of this form factor:
VLAN
– Bandwidth
– Scale and
– Low-latency

Virtualized Compute Racks Database Racks

• Also allows extending VXLAN to areas that Hardware Gateway: L3 end-to-end


VXLAN
cannot host a Software Gateway

Virtualized Compute Racks Database Racks

73
L2 Connectivity of Physical Workloads
Physical workloads in same subnet (L2)

NSX Bridging instance  Bridging function performed in the kernel of


the ESXi host
VXLAN

VLAN
?  10+ Gbps performance
 1:1 mapping between VXLAN and VLAN

 Primary use cases


– Migrate workloads without changing IP addressing (P2V or V2V)
– Extend Logical Networks to physical devices
– Allow Logical Networks to leverage a physical gateway
– Access existing physical network and security resources

74
Logical to Physical – NSX L2 Bridging
DLR Control VM Standby DLR Control VM Active

Compute Cluster Edge Cluster

VXLAN 5001

Physical Workload

VLAN 100
 Migrate workloads (P2V or V2V)
 Extend Logical Networks to Physical
 Leverage Network/Security Services on VLAN backed
networks
Physical Gateway 75
NSX L2 Bridging Design Considerations
Usage of ESXi dvUplinks

 Bridged traffic enters and leaves host via


the dvUplink that is used for VXLAN traffic
VDS teaming/failover policy for the VLAN is not used
VLAN 10
for bridged traffic

Other
 Need to ensure VLAN 10 is carried on the
VXLAN Traffic
types of uplink used for VXLAN traffic
traffic
Switch port must also allow traffic from/to that VLAN
Bridged traffic  Can achieve more than 10G for bridged
traffic by bundling together 2 10G physical
interfaces
VXLAN 5000

76
VXLAN to VLAN SW L2 Bridging – Considerations
 Multiple Bridge Instances vs. separate Logical Routers
Bridge instances are limited to the throughput of a single ESXi host
Bridged traffic enters and leaves host via the dvUplink that is used for VXLAN traffic – VDS
teaming/failover policy is not used

 Interoperability
VLAN dvPortgroup and VXLAN logical switches must be available on the same VDS
Distributed Logical Routing cannot be used on a logical switch that is Bridged
Bridging a VLAN ID of 0 is not supported
 Scalability
L2 bridging provides Line Rate throughput
Latency and CPU usage comparable with standard VXLAN
 Loop prevention
Only one bridge active for a given VXLAN-VLAN pair
Detect and filter packets received via a different uplink by matching MAC address
NSX L2 Bridging Design Considerations
Routing + Bridging Use case - Not Supported

DLR Instance 1

Physical Workload

Web VM App VM

VM VM

VXLAN 5001 VXLAN 5002 VLAN 10

Bridge Instance 1

Same Layer 2 Domain


VXLAN 5002 and VLAN 10
NSX L2 Bridging Gateway Design Considerations
Routing + Bridging Use case – Supported

NSX Edge

Physical Workload

Web VM App VM

VM VM

VXLAN 5001 VXLAN 5002 VLAN 10

Bridge Instance 1

Same Layer 2 Domain


VXLAN 5002 and VLAN 10
NSX Layer 2 Gateway Design Considerations
Single Instance per VXLAN/VLAN Pair
 Current implementation only allows a  Scale-out model with multiple bridging instances
single bridging instance per LS active for separate VXLAN/VLAN pairs
Bandwidth limited by single bridging instance  May allow to reduce the spanning of VLANs to a
The bridged VLAN must be extended between single rack if physical servers in a VLAN are
racks to reach physical devices spreading contained in that rack
across the racks
L3 (VXLAN) only
VXLAN VLAN extended
network
VLAN
Bridging Instance 1
(VXLAN 5000 to VLAN 10)

Bridging Instance 2
(VXLAN 5001 to VLAN 20)

SW VTEP

Physical Servers
(VLAN 10) VXLAN 5000 VXLAN 5001 Physical Servers Physical Servers
(VLAN 10) (VLAN 20)
80
VXLAN to VLAN L2 Bridging – Summary
NSX-v SW L2 Bridging Instance vs HW VTEPs
 Always lead with NSX-v SW native bridging, performance is sufficient for nearly all
use cases and is HW agnostic
 Some customers believe they need HW L2 VTEP when they don’t due to
positioning of network vendors. Find out what their use cases are first and whether
L2 bridging is actually a requirement
 The following are potential use cases for HW L2 VTEP:
Low latency traffic
Very large volumes of physical servers
High amount of guest initiated storage traffic

 Data-plane only multicast based options are available today


Validated on Nexus 9000 and Arista 7150, expected to work on all capable HW

 OVSDB support with NSX-v  planned for 2015


NSX-v and HW VTEPs Integration
Deployment Considerations Pre NSX 6.2.2

 Mandates deploying multicast in the network infrastructure to handle the delivery of VXLAN
encapsulated multi-destination traffic
Broadcast, Unknown Unicast, Multicast (BUM) traffic
Multicast mode only needed for the VXLAN segments that are bridged to VLANs
 NSX-v has no direct control over the hardware VTEP devices
No control-plan communication with the controller, nor orchestration/automation capabilities (manual
configuration required for HW VTEPs)
Note: full control-plane/data-plane integration only available with NSX-MH
 End-to-end loop exposure
No capabilities on HW VTEPs to detect a L2 loop caused by a physical L2 backdoor connection
 Unsupported Coexistence of HW and SW VTEPs
Can only connect bare-metal servers (or VLAN attached VMs) to a pair of HW VTEP ToRs
Let’s Compare:

HW Vendor
Virtualization Solution

VM VM VM
VM

HV HV VM VM VM VLAN-backed
VM
vSphere
HV HV

NSX Virtualization Model Hardware Vendor Model

83
VXLAN Hardware Encapsulation Benefits

HW SW
VXLAN L2 payload VXLAN L2 payload
NSX virtualization VM HV vSwitch

same same
performance performance

HW Vendor Model VM HV vSwitch

VLAN L2 payload HW GW VXLAN L2 payload

No performance* benefit for the HW Vendor Model:


• vSwitch performance is independent of the output encapsulation
• HW Switches and HW Gateways have similar performances too…

* Here “performance” is defined as packets per second and throughput


84
HW Gateways Make Sense for non-Virtualized Payloads

HW GW Performance

L2 payload VXLAN L2 payload

vs.

SW GW Feature rich, no HW requirement

L2 payload VXLAN L2 payload

This is the use case we’re advocating


for HW Gateways with NSX

85
NSX Bridging Instance vs. Hardware Gateway
• A single bridging instance per Logical Switch • Several Hardware Gateways can be deployed at several
• Bandwidth limited by single bridging instance locations simultaneously

• L2 network must be extended to reach all the • With Hardware Gateways, VLANs can be kept local to a

physical devices rack and don’t need to be extended

VLAN extended L3 (VXLAN) only


between racks between racks

VXLAN
VLAN

Non-virtualized
VM VM VM VM devices (part of the
same L2 segment)

VLAN 10 VLAN 10 VLAN 10 VLAN 20


86
Logical Routing
Distributed and Centralized
NSX Logical Routing Components

ESXi

Hypervisor Kernel Modules


(VIBs)

Distributed
Logical Router

LIF1 LIF2
DLR Control VM NSX Edge
vSphere
Host

DLR Kernel Module


Distributed Logical Routing Centralized Routing
Optimized for E-W Traffic Patterns Optimized for N-S Routing

88
NSX Logical Routing : Components Interaction
VXLAN New Distributed Logical Router Instance is
External Network
1 created on NSX Manager with Dynamic
VLAN Routing configured
Peering

OSPF, BGP
Controller pushes new logical router
NSX Edge 2 Configuration including LIFs to ESXi hosts
(Acting as next hop router)

OSPF/BGP peering between the NSX


192.168.10.1 DLR Control VM
NSX Mgr
3 Edge and logical router control VM
Control 1
192.168.10.3
Data Learnt routes from the NSX Edge are
Path 6 3 4
Control pushed to the Controller for distribution
4
Controller Cluster
192.168.10.2
Controller sends the route updates to all
5 ESXi hosts
DLR 5 2
Routing kernel modules on the hosts
6 handle the data path traffic

172.16.10.0/24 172.16.20.0/24 172.16.30.0/24


89
Logical Routing
Logical Topologies
DLR – Design Considerations (Multiple VDS)
Management and Edge Cluster

Compute A Compute B
vCenter
Web Web Server
VM VM
VM VM

Web Web
VM VM Controller NSX
VM VM
NSX Manager Cluster Edges

VXLAN Transport Zone Spanning Three Clusters

Compute VDS Edge VDS

VTEP 192.168.230.100 192.168.240.100 192.168.220.100

vSphere Host vSphere Host vSphere Host

Only VXLAN
192.168.230.101 192.168.240.101
LIFs Supported 192.168.220.101

vSphere Host vSphere Host vSphere Host

Compute Cluster 1 Compute Cluster 2 Mgmt./Edge Cluster


DLR – Design Considerations (Single VDS)
Management and Edge Cluster

Compute A Compute B
vCenter
Web Web Server
VM VM
VM VM

Web Web
VM VM Controller NSX
VM VM
NSX Manager Cluster Edges

VXLAN Transport Zone Spanning Three Clusters

Compute/Edge VDS

VTEP 192.168.230.100 192.168.240.100 192.168.220.100

vSphere Host vSphere Host vSphere Host

Both VXLAN and


192.168.230.101 192.168.240.101 192.168.220.101
VLAN LIFs
vSphere Host vSphere Host Supported vSphere Host

Compute Cluster 1 Compute Cluster 2 Mgmt./Edge Cluster


Distributed Logical Routing - Key Takeaways

 VLAN LIFs introduces important constraints on the vSphere & Network


design:
• Only one VDS supported
• Same L2 VLAN spanning all hosts in the VDS (potentially up to 500 hosts)
• In the recommended design for network virtualization VLAN span is contained
• Limited testing
• Supporting VLAN LIFs with new product features is a low priority
• Failover of Designated Instance is based on Controller keep-alives
• Slow – up to 45 seconds
• VXLAN LIFs don’t have any of these constraints
 Basically don’t use VLAN LIFs
Single DLR Routing Topology

 Typical enterprise topology optimizing E-W External Network


VLAN
communication VXLAN
 Single DLR instance with multiple LIFs (992 is the
max value supported in 6.1 release, 8 reserved as Physical Router
uplinks)
VLAN 20

Routing Peering
 Recommended use of a VXLAN (not Edge Uplink
VLAN) for the Transit Link between DLR
and NSX Edge NSX Edge

VXLAN 5020
Transit Link

Distributed
Routing

DB1 Webn Appn DBn


Web1 App1

94
Multi Tenant Routing Topology

External Network

NSX
 A single NSX Edge can provide centralized Edge
routing for multiple connected tenants
Up to 9 tenants supported on a single NSX Edge for pre-6.1 VXLAN 5020 VXLAN 5029
NSX releases Transit Link Transit Link

 East-West communication optimized per tenant


 Inter-tenant communication through the Edge 
can apply centralized security policies to provide
isolation Tenant 1
DLR Instance 1 DLR Instance 9
 No overlapping IP addresses supported between Tenant 9
tenants connected to a shared Edge Web Logical Web Logical
Switch App Logical SwitchDB Logical Switch
… Switch App Logical SwitchDB Logical Switch

95
Multi Tenant Routing Topology (Post-6.1 NSX Release)
 From NSX SW Release 6.1, a new type of
interface is supported on the NSX Edge (in External Network
addition to Internal and Uplink), the “Trunk”
interface VLAN
VXLAN
 This allows to create many sub-interfaces on
a single NSX Edge vNic and establish NSX Edge

peering with a separate DLR instance on Single vNIC


each sub-interface
Routing
 Scale up the number of tenants supported with Peering
VXLAN Trunk
Interface
a single ESG (assuming no overlapping IP
addresses across tenants)
 Aggregate of 200 sub-interfaces per NSX Edge
Tenant 1
supported in 6.1 Tenant 2
 Only static routing & BGP supported on sub- Tenant n
interfaces in 6.1 Web Logical
Switch App Logical Switch DB Logical Switch
 OSPF support will be introduced in a following
6.1.x maintenance release
 Scale numbers for Dynamic Routing (max
Peers/Adjacencies) are under review

96
High Scale Multi Tenant Topology
 Used to scale up the number of tenants
(only option before VXLAN trunk
introduction) External Network
VLAN
VXLAN
 Support for overlapping IP addresses
between Tenants connected to different first
tier NSX Edges NSX Edge X-Large
(Route Aggregation Layer)
Can configure NAT on the first tier NSX Edge

VXLAN 5100
Transit
Tenant NSX Edge Tenant NSX Edge
Services Gateway Services Gateway

VXLAN Uplinks (or VXLAN Uplinks (or


VXLAN Trunk*) VXLAN Trunk)

Tenant 1
Web Logical Web Logical
Switch App Logical Switch DB Logical Switch Switch App Logical Switch DB Logical Switch

*Supported from NSX Release 6.1 onward 97


Multi Tenant Topology - NSX (Today)
T1 Tenant 1 VRF
 NSX Edge currently it is not VRF aware Tenant 2 VRF
T2
 Single routing table does not allow to keep MPLS Network
tenants logically isolated VLAN

 Each dedicated Tenant Edge can connect to Physical Router VXLAN


a separate VRF in the upstream physical (PE or Multi-VRF CE)
T1 T2
router
 Current deployment option to integrate with an VLAN 10 VLAN 20
MPLS network
Tenant NSX ESG Tenant NSX ESG

VXLAN Uplinks (or VXLAN Uplinks (or


VXLAN Trunk*) VXLAN Trunk*)

Tenant 1
Web Logical Web Logical
Switch App Logical Switch DB Logical Switch Switch App Logical Switch DB Logical Switch

*Supported from NSX Release 6.1 onward 98


Hierarchical Topology – Option 1 - Not Supported

External Network

DLR Instance 3
ROUTING
PEERING

VXLAN 5030
VXLAN 5020

DLR Instance 1 DLR Instance 2

Web Logical Web Logical


Switch App Logical Switch DB Logical Switch Switch App Logical Switch DB Logical Switch
Hierarchical Topology – Option 2 - Not Supported

External Network

NSX Edge Services


Gateway
ROUTING
PEERING

VXLAN 5020

DLR Instance 1 DLR Instance 2

Web Logical Web Logical


Switch App Logical Switch DB Logical Switch Switch App Logical Switch DB Logical Switch
Logical Routing
High Availability Models
Logical Routing High Availability (HA)

Active/Standby HA Model ECMP Model

1 2

Introduced with NSX 6.1


External Network Release

Active/Standby HA Model
NSX Edge

DLR Control VM

Active Standby Distributed


Routing

Web1 App1 DB1

102
Active/Standby HA Model
 All North-South traffic is handled by the Active NSX Physical Router
Edge R1> show ip route VXLAN
O 172.16.1.0/24 via 172.16.1.2 Core
The Active NSX Edge is the only one establishing adjacencies to the O 172.16.2.0/24 via 172.16.1.2 VLAN
DLR and the physical router O 172.16.3.0/24 via 172.16.1.2
Routing
Adjacency
Physical Router

.1
172.16.1.0/24

E1 .2 E2

Active Standby
ESXi Host Kernel
.2
net-vdr -l --route Default+Edge-1
192.168.1.0/24
O 0.0.0.0 via 192.168.1.2
.1

DLR
Active Standby

Web App DB

172.16.1.0/24 172.16.2.0/24 172.16.3.0/24 103


Active/Standby HA Model
 All North-South traffic is handled by the Active NSX Physical Router
Edge R1> show ip route VXLAN
O 172.16.1.0/24 via 172.16.1.2 Core
The Active NSX Edge is the only one establishing adjacencies to the O 172.16.2.0/24 via 172.16.1.2 VLAN
DLR and the physical router O 172.16.3.0/24 via 172.16.1.2
Routing
Adjacency
 On failure of the Active NSX Edge E1:
Physical Router
Standby NSX Edge detects the failure at the expiration of the “Declare
Dead Time” Timer  15 seconds by default, can be tuned (not
.1 Routing
recommended below 9 sec in production) 172.16.1.0/24 Adjacency OSPF/BGP Timers
(40 sec, 120 sec)
At that point traffic forwarding restart leveraging the FIB entries (in
sync with the failed Edge) while the new Edge restarts network
services E1 .2 E2
For this to happen, it is required to set longer routing protocol timers
(40, 120 sec) so that the physical router and the DLR Control VM keep
up the adjacencies and maintains routing entries in their forwarding
tables
ESXi Host Kernel
Failed

net-vdr -l --route Default+Edge-1


X192.168.1.0/24
.2
Active

OSPF/BGP Timers
O 0.0.0.0 via 192.168.1.2 (40 sec, 120 sec)
 Other HA recommendations:
.1
vSphere HA should be enabled for the NSX Edge VMs
DLR
 Stateful services supported on the NSX Edge pair Active Standby
FW, Load-Balancing, NAT
Web App DB
104
Active/Standby HA Model
Failure of the Control VM
 Failure of the Active Control VM triggers the failover to the VXLAN
Standby VM Core
VLAN
Routing
Adjacency
Physical Router

E1 E2
Active Standby
.2
192.168.1.0/24

DLR X
Active Standby
net-vdr -l --route Default+Edge-1
Web
O 0.0.0.0 via 192.168.1.2
App DB
ESXi Host Kernel
105
Active/Standby HA Model
Failure of the Control VM
 Failure of the Active Control VM triggers the failover to the VXLAN
Standby VM Core
VLAN
 Heartbeat Dead Timer tuning on the Control VM is not required Routing
to improve convergence in this failure scenario Adjacency
Physical Router
 South-to-North flows keep flowing based on the forwarding
information programmed in the kernel of the ESXi hosts
This is true despite the fact that the routing protocol is not running yet on the
newly activated DLR Control VM
E1 E2
 North-to-South flows keep flowing based on the information
programmed on the NSX Edge forwarding table Active Standby
The (30, 120 sec) protocol timers setting ensures that the NSX Edge maintains
.2
active the routing adjacency to the DLR, preventing flushing the info in the
forwarding table 192.168.1.0/24

 Within the 120 seconds period, the newly activated Control VM


restarts its routing services and re-establish routing adjacency
with the NSX Edge (leveraging Graceful Restart capabilities)
DLR X
Failed Active
net-vdr -l --route Default+Edge-1
Web
O 0.0.0.0 via 192.168.1.2
App DB
ESXi Host Kernel
106
What is ECMP Introduced in NSX 6.1?

VXLAN
Core
 ECMP support on the DLR and on the NSX Edge VLAN
Both have the capability of installing in their forwarding tables up to 8
equal cost routes for a given destination
 8 NSX Edges can be simultaneously deployed for a Physical Routers
given tenant
Increase the available bandwidth for North-South communication (up
to 80 Gbps*)
Reduces the traffic outage in an ESG failure scenario (only 1/Xth of E1 E2 E3 E8
the flows are affected) …
 Load-balancing algorithm on NSX Edge:
Based on Linux kernel flow based random round robin algorithm for
the next-hop selection  a flow is a pair of source IP and destination
IP
 Load-balancing algorithm on DLR:
Hashing of source IP and destination IP defines the chosen next-hop DLR
Active Standby

Web App DB

107
Enabling ECMP on DLR and NSX Edge

 ECMP is Disabled by default on the DLR


and on the NSX Edge
 Use the “Enable” button on the UI or API calls
to enable ECMP
 Active/Active ECMP currently implies
stateless behavior
 No support for stateful FW, Load Balancing or
NAT across ESG nodes
 VDS ACLs can be used for traffic filtering (only
if really required)
 From release 6.1.2, enabling ECMP does
NOT disable FW services on the edge
 The user must explicitly disable the FW when
deploying ESG nodes in ECMP mode (to
avoid traffic drops because of asymmetric
routing)

108
Why FW is Not Disabled from 6.1.2?
Core
Physical
Routers
Next-Hop 1

Next-Hop 2 Standby
Active

DLR

The main reason is to support use cases


where ECMP is enabled on ESG nodes
deployed in Active/Standby mode

… ECMP Edges

A/S Edges
Tenant Requiring Tenant NOT Requiring
Stateful Services Stateful Services

109
ECMP HA Model (Up to 8 NSX Edges)

VXLAN
 North-South traffic is handled by all Active NSX Edges Core
 Active routing adjacencies are established with the DLR Control VM
VLAN
and the physical router
 Traffic is hashed across equal cost paths based on Src/Dst IP
address values Physical Router
Routing
Adjacencies

E1 E2 E3 E8

DLR
Active Standby

Web App DB 110


ECMP HA Model (Up to 8 NSX Edges)

VXLAN
 North-South traffic is handled by all Active NSX Edges Core
Active routing adjacencies are established with the DLR and the physical VLAN
router
Traffic is hashed across equal cost paths based on Src/Dst IP address
values Physical Router
 On failure of an NSX Edge, the corresponding flows are Routing
re-hashed through the remaining active units Adjacencies

The DLR and the physical router time out the routing adjacencies with
the failed Edge and remove routing table entries pointing to that specific E1 E2 E3 E8

next-hop IP address
Recommended to aggressively tune the hello/holdtime keep alive/hold
down routing timers (1/3 seconds) to speed up traffic recovery

 Other HA recommendations:
X
No need for deploying a Standby for each Active NSX Edge
vSphere HA should remain enabled DLR
Active Standby

Web App DB 111


Logical Routing
ECMP Deployment Considerations
ECMP - DLR Active Control VM Failure
Core

 North-to-South traffic initially flows based on dynamic ESG Forwarding Table ESG Forwarding Table
routing information provided by the Active DLR E1> show ip route E8> show ip route
Control VM to the ESGs O 172.16.1.0/24 via 192.168.1.1 O 172.16.1.0/24 via 192.168.1.1
O 172.16.2.0/24 via 192.168.1.1 O 172.16.2.0/24 via 192.168.1.1
 South-to-North traffic flows based on the routing O 172.16.3.0/24 via 192.168.1.1 O 172.16.3.0/24 via 192.168.1.1
information (usually just a default route) programmed
in the kernel of the ESXi hosts
E1 E8

ESXi Host Kernel …


net-vdr -l --route Default+Edge-1 .2 .9
O 0.0.0.0 via 192.168.1.2 192.168.1.0/24
………….
O 0.0.0.0/0 via 192.168.1.9
.1

DLR
Active Standby

Web App DB

172.16.1.0/24 172.16.2.0/24 172.16.3.0/24

113
ECMP - DLR Active Control VM Failure
Core

 North-to-South traffic initially flows based on dynamic ESG Forwarding Table ESG Forwarding Table
routing information provided by the Active DLR E1> show ip route E8> show ip route
Control VM to the ESGs O 172.16.1.0/24 via 192.168.1.1 O 172.16.1.0/24 via 192.168.1.1
O 172.16.2.0/24 via 192.168.1.1 O 172.16.2.0/24 via 192.168.1.1
 South-to-North traffic flows based on the routing O 172.16.3.0/24 via 192.168.1.1 O 172.16.3.0/24 via 192.168.1.1
information (usually just a default route) programmed
in the kernel of the ESXi hosts
E1 E8
 After the failure of the Active Control VM, all the
adjacencies with the NSX Edges are brought down ESXi Host Kernel …
(until the Standby takes over and restarts the routing net-vdr -l --route Default+Edge-1 .2 .9
services)
This is because of the aggressive timers settings required to speed
O 0.0.0.0 via 192.168.1.2
………….
O 0.0.0.0/0 via 192.168.1.9
X X 192.168.1.0/24

.1

X
up convergence
On ESGs, Logical Network routes dynamically learned from DLR are DLR
removed from the forwarding tables  north-to-south traffic flows Active Standby
stop
South-to-north traffic keeps flowing based on the forwarding table Web App DB
information available in the ESXi hypervisors at the time of failure
172.16.1.0/24 172.16.2.0/24 172.16.3.0/24

114
ECMP - DLR Active Control VM Failure
Use of Static Routes to Remove Traffic Outage
Core
 North-to-South traffic outage can be avoided by
leveraging static routes on the ESGs to reach the
Logical Switches prefixes E1> show ip route E1> show ip route

If possible, the recommendation is to configure a static S 172.16.0.0/16 via 192.168.1.1 S 172.16.0.0/16 via 192.168.1.1

route summarizing all the logical address space


This static route may also be used to send to the physical E1 E8
router a summary for the LS prefixes
 South-to-North traffic still forwarded based on the

information on the forwarding tables of the ESXi .2 .9

hosts (in the kernel) X X 192.168.1.0/24

.1
 With this configuration, failure of the DLR Active
Control VM results in zero packets outage DLR X
Active Standby

Web App DB

172.16.1.0/24 172.16.2.0/24 172.16.3.0/24

115
ECMP - Simultaneous Failure of NSX Edge and Control VM
Core
 A specific failure scenario is the one where the DLR
Active Control VM fails at the same time as an ESG
 This happens if both VMs are co-located on the same ESXi host

 The forwarding tables in the ESXi hosts are “frozen” with


the information that was available before the failure
 Equal cost paths active across all the ESGs, including the one that
E1 E8
has failed

 All the South-to-North traffic flows originally sent through


the failed ESG are black-holed until the newly activated
ESXi Host Kernel
net-vdr -l --route Default+Edge-1
X
.10

.80
Control VM is able to restart the routing services
 Could lead to a worst case outage of 120 seconds
O 0.0.0.0 via 192.168.1.10
………….
O 0.0.0.0/0 via 192.168.1.80
X X 192.168.1.0/24

.1
 Recommendation is to use anti-affinity rules to prevent
deploying the DLR Control VM on the same ESXi host DLR X
Active Standby
with an Active ESG
 DLR Control VMs could be deployed on dedicated hosts part Web App DB
of the Edge Cluster or (recommended option) on the
Compute Clusters 172.16.1.0/24 172.16.2.0/24 172.16.3.0/24

116
VLAN Traffic and ESXi Uplinks Design (Option 1)
North-South Communication
Physical View Logical View
 Design principle: the number of ESG
logical uplinks matching the number of
802.1q Core
ESXi physical uplinks Routed DC Trunks
Core
L3
Fabric R1 R2
ESXi Uplink = VLAN ID = Routing Adjacency = L2 Physical Routers
R1 R2
Active Path
ToR2 ToR4
ToR1 ToR3
 Assuming an ESXi host is equipped with
2x10GE uplinks, this implies two logical uplinks External VLAN20
External VLAN10
on each ESG
 Each ESG logical uplinks is mapped to a E1 E2 E3 E4 E5 E6 E7 E8
unique VLAN backed port-group carried only
on a specific ESXi host uplink (i.e. VLAN 10 on Edge Racks Transit VXLAN
uplink 1, VLAN 20 on uplink 2)
ToR1 ToR2 ToR3 ToR4
Both physical uplinks can be utilized by a single NSX DLR
Edge

 If an ESXi host physical uplink fails, the Web App DB


adjacency goes down and traffic is recovered E1 E2 E5 E6
via the second uplink
Carry E3 E4 Carry E7 E8
VLAN 10 VLAN 20
117
VLAN Traffic and ESXi Uplinks Design (Option 2)
North-South Communication
Physical View Logical View

Core
Routed DC L3 Links Core
 As alternative deployment model, the Fabric

VLANs used on the ESXI uplinks can


remain confined to each Edge rack R1 R2 R3 R4
L3 R1 R2 R3 R4
L2

 Allows to position the default gateway for


those VLANs on the ToR devices
E1 E2 E3 E4 E5 E6 E7 E8
 Span of all VLANs is always limited inside each
rack
Transit VXLAN
 No need for L2 extension between the edge Edge Racks
racks
ToR1 ToR2 ToR3 ToR4
DLR
 After a failure, ESG can only be moved R1 R2 R3 R4
inside the same rack (by leveraging Web App DB
vSphere HA) E1 E2 E5 E6

E3 E4 E7 E8

118
Edge HA Models Comparison – BW, Services & Convergence
Physical Router
Active/Standby HA Model Routing
Adjacency
Single Path
Bandwidth E2
(~10 Gbps/Tenant) E1
Active Standby
Stateful Services Supported - NAT, SLB, FW

> 20 seconds with stateful services enabled


Experienced traffic outage
10-20 seconds with only routing enabled
DLR

Web App DB

Physical Router
ECMP HA Model Routing
Adjacencies
Up to 8 Paths
Bandwidth
(~80 Gbps/Tenant) E1 E2 E3 E8

Stateful Services Not Supported …


Experienced traffic outage ~ 3 sec with (1,3 sec) timers tuning
DLR

Web App DB
NSX Edge Services Gateway
NSX Edge Gateway: Integrated network services
• Multi-functional & multi-use VM model. Deployment varies based
Routing/NAT on its use, places in the topology, performance etc.
Firewall • Functional use – N/S routing only, LB Only, Perimeter FW etc.
• Form factor – X-Large to Compact (one license)
Load Balancing
• Stateful switchover of services(FW/NAT, LB, DHCP & IPSEC/SSL)
L2/L3 VPN • Multi-interface routing Support – OSPF & BGP
DDI DHCP/DNS relay • Can be deployed in high availability or stand alone mode
• Per tenant edge services – scaling by interface and instance

• Requires design consideration for following


• Edge placement for north-south traffic
VM VM VM VM VM • Edge services with multi-tenancy
NSX Edge Services Gateway Sizing

Edge Services Gateway


vCPU Memory MB Specific Usage
Form

Suitable for L7 High


X-Large 6 8192
Performance LB
Suitable for High
Quad-Large 4 1024
Performance Routing, FW
Suitable for most
Large 2 1024
deployment
Small Deployments or single
Compact 1 512
service used
• Edge services gateway can be deployed in four sizes depending on services used
• Multiple Edge nodes can be deployed at once e.g. ECMP, LB and Active-Standby for NAT
• When needed the edge size can be increased or decreased as required
• X-Large is required for high performance L7 load balancer configurations only

122 |
34
NSX Edge Design Considerations
 High Availability considerations
 Use Edge HA where stateful services are required

 Minimum of 3 hosts for Edge Cluster

 Use Dynamic Routing failover if Edges are only performing N/S Routing

 Edge placement (N/S vs Services only)


 N-S Edge Gateways performing L3 are located in edge cluster
 Edges running Services (one-arm LB, DHCP etc) typically reside with application logical
switch in compute cluster
 Except for specific cases (in-line LB, VPN) where Services Edges would be located in the
edge cluster

123
3-Tier App Logical to Physical Mapping NSX Manager
vCAC
NSX Controller Cluster

vCenter

vSphere Host vSphere Host

Host 1 Host 2

Management Cluster
Web App Web App Web DB

Web App Web App Web DB


Logical Router
Control
VMs
Web App Web App Web DB Edge
VMs

vSphere Host vSphere Host vSphere Host vSphere Host vSphere Host

Host 3 Host 4 Host 5 Host 6 Host 7

Compute Cluster Edge Cluster


124
NSX Edge Services Gateway High Availability
▪ Active standby model
• Health check interval for heartbeat 1 sec
• Failover time ~15 seconds default (can be tuned to 6 seconds)
• NSX Manager also performs keep-alives to verify Edge is alive
▪ Modes of configuration
• Advanced/Manual mode: Internal vNic designated by the user
• Auto configure mode: NSX Manager uses first available internal vNic
▪ Other Redundancy
• Physical redundancy with host monitoring and vSphere HA
• Process restart redundancy with process monitoring

125
NSX Edge High Availability Failover Behavior
Feature Behavior

Firewall / NAT Stateful failover for firewall connections. Connection entries are synced to the standby
appliance.
Failover to standby in 15 seconds by default, can be configured (down to 6)

DHCP When Standby becomes active the HA link synchronization will preserve the DHCP allocation
table state.

Load Balancer For L7, Sticky tables are synced. Health of backend pool servers is synced.
Will perform a back-end status health check before becoming available.

Dynamic Routing Forwarding Table (fib entries) are synced.


Failover to standby in 15 seconds by default, can be configured (down to 6)
IPSec VPN When Standby becomes active the tunnels will reconnect automatically

SSL VPN / L2VPN When Standby becomes active the client will reconnect automatically

126