Professional Documents
Culture Documents
Network Virtualization
Network Virtualization in Fabric
in IP IP Fabric
withwith
BGPBGP
EVPN
EVPN
Version 2.0
9035383
February 2018
© 2018, Extreme Networks, Inc. All Rights Reserved.
Extreme Networks and the Extreme Networks logo are trademarks or registered trademarks of Extreme Networks, Inc. in the United States and/or other countries. All
other names are the property of their respective owners. For additional information on Extreme Networks Trademarks please see
www.extremenetworks.com/company/legal/trademarks. Specifications and product availability are subject to change without notice.
Brocade, the B-wing symbol, and MyBrocade are registered trademarks of Brocade Communications Systems, Inc., in the United States and in other countries. Other
brands, product names, or service names mentioned of Brocade Communications Systems, Inc. are listed at http://www.brocade.com/en/legal/brocade-Legal-
intellectual-property/brocade-legal-trademarks.html. Other marks may belong to third parties.
Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, equipment feature,
or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, without notice, and assumes no
responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocade sales office for information on
feature and product availability. Export of technical data contained in this document may require an export license from the United States government.
The authors and Brocade Communications Systems, Inc. assume no liability or responsibility to any person or entity with respect to the accuracy of this document
or any loss, cost, liability, or damages arising from the information contained herein or the computer programs that accompany it.
The product described by this document may contain open source software covered by the GNU General Public License or other open source license agreements. To find
out which open source software is included in Brocade products, view the licensing terms applicable to the open source software, and obtain a copy of the programming
source code, please visit http://www.brocade.com/support/oscd.
2
Contents
Contents ............................................................................................................................................................................................................................... 3
List of Figures........................................................................................................................................................................................................................ 5
Preface .................................................................................................................................................................................................................................. 7
Extreme Validated Designs .......................................................................................................................................................................................... 7
Purpose of This Document ........................................................................................................................................................................................... 7
Target Audience ........................................................................................................................................................................................................... 7
Authors ......................................................................................................................................................................................................................... 8
Document History ........................................................................................................................................................................................................ 8
Introduction ......................................................................................................................................................................................................................... 9
3
Design Considerations ...................................................................................................................................................................................................... 112
BGP Route scale considerations ............................................................................................................................................................................... 112
BGP TTL Security ...................................................................................................................................................................................................... 112
4
List of Figures
Figure 1 Leaf-Spine L3 Clos Topology ......................................................................................................................................................................... 14
Figure 2 Optimized 5-Stage L3 Clos Topology............................................................................................................................................................. 15
Figure 3 eBGP for Underlay ........................................................................................................................................................................................ 17
Figure 4 iBGP for Underlay ......................................................................................................................................................................................... 18
Figure 5 VTEPs and L2 Extension with Flood and Learn.............................................................................................................................................. 20
Figure 6 Routing Between VxLAN networks in Flood and Learn topology.................................................................................................................. 21
Figure 7 VTEPs and L2 Extension with BGP EVPN Control-plane ................................................................................................................................ 22
Figure 8 ARP Suppression ........................................................................................................................................................................................... 25
Figure 9 VLAN Scoping at the Leaf/ToR Level ............................................................................................................................................................. 26
Figure 10 VLAN Scoping at the Port Level within a ToR ............................................................................................................................................... 26
Figure 11 Asymmetric IRB ............................................................................................................................................................................................ 27
Figure 12 Symmetric IRB .............................................................................................................................................................................................. 28
Figure 13 Multitenancy ................................................................................................................................................................................................ 29
Figure 14 MCT Pair for Dual-homing and Leaf Redundancy ........................................................................................................................................ 30
Figure 15 EBGP based 3-stage IP Fabric ....................................................................................................................................................................... 31
Figure 16 EBGP based Optimized 5-stage IP Fabric...................................................................................................................................................... 32
Figure 17 Illustration topology for a simple 3-stage fabric .......................................................................................................................................... 53
Figure 18 L2/L3 Extension between Racks ................................................................................................................................................................... 66
Figure 19 VLAN Scoping at the ToR Level..................................................................................................................................................................... 78
Figure 20 Port VLAN scoping within the ToR ............................................................................................................................................................... 87
Figure 21 Layer-2 Handoff with VPLS ........................................................................................................................................................................... 95
Figure 22 Layer-3 Handoff with MPLS/L3VPN ............................................................................................................................................................ 103
5
6
Preface
Extreme Validated Designs
Helping customers consider, select, and deploy network solutions for current and planned needs is our mission. Extreme Validated Designs offer a fast track
to success by accelerating that process.
Validated designs are repeatable reference network architectures that have been engineered and tested to address specific use cases and deployment
scenarios. They document systematic steps and best practices that help administrators, architects, and engineers plan, design, and deploy physical and
virtual network technologies. Leveraging these validated network architectures accelerates deployment speed, increases reliability and predictability,
and reduces risk.
Extreme Validated Designs incorporate network and security principles and technologies across the ecosystem of service provider, data center,
campus, and wireless networks. Each Extreme Validated Design provides a standardized network architecture for a specific use case, incorporating
technologies and feature sets across Extreme products and partner offerings.
All Extreme Validated Designs follow best-practice recommendations and allow for customer-specific network architecture variations that deliver
additional benefits. The variations are documented and supported to provide ongoing value, and all Extreme Validated Designs are continuously
maintained to ensure that every design remains supported as new products and software versions are introduced.
By accelerating time-to-value, reducing risk, and offering the freedom to incorporate creative, supported variations, these validated network
architectures provide a tremendous value-add for building and growing a flexible network infrastructure.
It should be noted that not all features such as automation practices, zero-touch provisioning, and monitoring of the Extreme IP Fabric are included in
this document. Future versions of this document are planned to include these aspects of the Extreme IP Fabric solution.
The design practices documented here follow the best-practice recommendations, but there are variations to the design that are supported as well.
Target Audience
This document is written for Extreme systems engineers, partners, and customers who design, implement, and support data center networks. This
document is intended for experienced data center architects and engineers. This document assumes that the reader has a good understanding of data
center switching and routing features and of Multi-Protocol BGP/MPLS VPN[5] for understanding multitenancy in VXLAN EVPN networks.
7
Authors
• Krish Padmanabhan
Sr Principal Engineer, System and Solution Engineering
• Eldho Jacob
Principal Engineer, System and Solution Engineering
The authors would like to acknowledge the following at Extreme Networks for their technical guidance in developing this validated design:
• Abdul Khader
Director, System and Solution Engineering
• Vivek Baveja
Director, Product Management
The authors would also like to acknowledge the following for their meticulous review of the document.
• Wim van Laarhoven
• Lavanya Venkatesan
Document History
Date Part Number Description
February 2018 9035383 Network Virtualization with BGP EVPN in IP Fabric (SLX platforms)
8
Introduction
Extreme has expanded its product portfolio with SLX platforms and SLX-OS positioned for network virtualization architectures to meet the growing
customer demand for higher levels of scale, agility, and operational efficiency.
This document describes cloud-optimized network designs using Extreme IP Fabrics for building data-center sites. The configurations and design
practices documented here are fully validated and conform to the Extreme IP Fabric reference architectures. The intention of this Extreme Validated
Design document is to provide reference configurations and document best practices for building cloud-scale data-center networks using Extreme SLX
switches and Extreme IP Fabric architectures.
• Extreme IP Fabric deployed in 3-stage and optimized 5-stage folded Clos topologies
• Network virtualization and multi-tenancy using BGP EVPN in these 3-stage and 5-stage fabrics
9
10
Technology Overview
Extreme IP Fabric provides a Layer 3 Clos deployment architecture for data center sites. In an Extreme IP Fabric, all links in the Clos topology are Layer
3 links. It includes the networking architecture; the protocols used to build the network; turnkey automation features used to provision, manage, and
monitor the networking infrastructure; and the hardware differentiation with Extreme SLX and VDX switches. The following sections describe the
validated design for data center sites with Extreme IP Fabrics. Because the infrastructure is built on IP, advantages like the following are leveraged:
loop-free communication using industry-standard routing protocols, ECMP, very high solution scale, and standards-based interoperability.
11
Terminology
Term Description
12
Functional Components of IP Fabric
Leaf-Spine Layer 3 Clos Topology (Two-Tier)
The leaf-spine topology has become the de facto standard for networking topologies when building medium- to large-scale data center
infrastructures. The leaf-spine topology is adapted from Clos telecommunications networks. The Extreme IP Fabric within a PoD resembles a two-tier
or 3-stage folded Clos fabric. The two-tier leaf-spine topology is shown in Figure 1.
The bottom layer of the IP fabric has the leaf devices (top-of-rack switches), and the top layer has spines. The role of the leaf is to provide connectivity
to the endpoints in the data center network. These endpoints include tenant workloads such as compute and storage devices, as well as other
networking devices like routers, switches, load balancers, firewalls, and any other physical or virtual networking endpoints. Because all endpoints
connect only to the leaf, policy enforcement, including security, traffic-path selection, QoS marking, traffic policing, and shaping, is implemented at
the leaf.
More importantly, Leafs act as the first hop gateways with Anycast gateway addresses for the server segments to facilitate mobility with the VXLAN
overlay.
A set of Leafs act as Border-leafs or Edge-leafs that provide connectivity to services such as Firewall, load balancer, storage etc., and external
connectivity to the PoD.
The role of the spine is to provide connectivity between leafs. The major role of the spine is to participate in the control-plane and data- plane
operations for traffic forwarding between leafs. The spine devices serve two purposes: BGP control plane (route distribution using BGP protocol and its
extensions), And data-plane IP forwarding based on the outer IP header in the underlay network. Since there are no network endpoints connected to
the spine, tenant VRFs or VXLAN segments are not created on spines. Their routing table size requirements are also very light to accommodate just the
underlay reachability. Note that all spine devices need not act as BGP route reflectors; only selected spines in the spine layer can act as BGP route
reflectors in the overlay design. More details are provided in BGP EVPN Control Plane.
• Each compute/storage rack has a Leaf or ToR (Top of the Rack) switch. The rack may have a pair of redundant switches in a MCT/vLAG pair
referred to as a dual ToR or a MCT pair ToR or a vLAG pair ToR. Dual ToR provides node and link redundancy to the workloads in the rack.
• Leafs and border-leafs connects to all spines in the PoD. These links are referred to as Fabric Infrastructure Links.
• Leafs are not interconnected with each other for data-plane purposes. (Leafs in a MCT or vLAG pair are interconnected for control-plane
operations such as forming a server-facing LAG.)
This type of topology has the predictable latency and also provides the ECMP forwarding in the underlay network. The number of hops between two
leaf devices is always two within the fabric. This topology also enables easier scale out in the horizontal direction as the data center expands and is
limited by the port density and bandwidth supported by the spine devices.
This validated design recommends the same hardware in the spine layer. Mixing different hardware is not recommended.
• All these links are configured as Layer 3 interfaces with /31 IPv4 address.
13
• The MTU for these links is set to jumbo MTU. This is a requirement to handle the VXLAN encapsulation of Ethernet frames.
• Multiple parallel links between two nodes in the fabric must be avoided.
Server-Facing Links
The server-facing or access links are on the leaf nodes connecting the workloads. These links are either individual or a LAG in case of a dual-
ToR. In the validated design,
• Spanning-tree is typically disabled for server connectivity unless there are downstream L2 switches.
Internet
MPLS Network
SPINE
WAN EDGE
BORDER LEAF
The connection between the spines and the super-spines follows the Clos principles:
• Each spine connects to all super-spines in the network. In the validated design, both 40 and 100-GbE links were tested independently. It is
not recommended to mix the links of different bandwidths between two layers of the IP fabric.
14
Figure 2 Optimized 5-Stage L3 Clos Topology
There are several ways that the border leafs connect to the data center site. In three-tier (super-spine) architectures, the border leafs are connected to
the super-spines as depicted in Figure 2. In two-tier topologies, the border leafs are connected to the spines as depicted in Figure 1. Certain topologies
may use the spine as border leafs (known as a border spine), overloading two functions into one. This topology adds additional forwarding
requirements to spines—they need to be aware of the tenants, VNIs, and VXLAN tunnel encapsulation and de-encapsulation functions.
• IPv4 network address assignments to the links connecting the nodes in the fabric: spines, leafs, super-spines, and border leafs.
• Control-plane protocol used for reachability between the nodes. A smaller scale topology might benefit from a link-state protocol such as
OSPF. (Note that this is not validated in this design, though it is supported). Large scale topologies, however, typically use BGP. Extreme
validated design recommends BGP as the protocol for underlay network reachability.
15
There are several underlay deployment options. When using BGP as the only routing protocol in the fabric, there are two models:
• eBGP for Underlay—eBGP peering between each tier of nodes: between the leaf and the spine; between the spine and the super-spine; and
between the super-spine and the border leaf.
• iBGP for Underlay—iBGP peering between the leaf and the spine within the PoD and spines as BGP route reflectors. eBGP peering between
the PoDs through the super-spine layer for inter-PoD reachability. (Note that this is not validated in this design, though it is supported).
• Each leaf in a PoD is assigned its own AS number. (Note that a dual-ToR is considered as a single leaf and both nodes in this pair will have
same AS number).
16
Figure 3 eBGP for Underlay
Note that this model is given for informational purposes only and not part of this validated design. However iBGP model is supported.
17
Figure 4 iBGP for Underlay
18
Network Virtualization with BGP VxLAN Based EVPN
Network virtualization is the process of creating virtual, logical networks on physical infrastructures. With network virtualization, multiple physical
networks can be consolidated to form a logical network. Conversely, a physical network can be segregated to form multiple virtual networks. Virtual
networks are created through a combination of hardware and software elements spanning the networking, storage, and computing infrastructure.
Network virtualization solutions leverage the benefits of software in terms of agility and programmability, along with the performance acceleration
and scale of application-specific hardware.
Virtual Extensible LAN (VXLAN) is an overlay technology that provides Layer 2 connectivity for workloads residing across the data center network.
VXLAN creates a logical network overlay on top of physical networks, extending Layer 2 domains across Layer 3 boundaries. VXLAN provides
decoupling of the virtual topology provided by the VXLAN tunnels from the physical topology of the network. It leverages Layer 3 benefits in the
underlay, such as load balancing on redundant links, which leads to higher network utilization. In addition, VXLAN provides a large number of logical
network segments, allowing for large-scale multitenancy in the network. VXLAN is based on the IETF RFC 7348 standard. VXLAN has a 24-bit Virtual
Network ID (VNI) space, which allows for 16 million logical networks compared to a traditional VLAN, which supports a maximum of 4096 logical
segments. VXLAN eliminates the need for Spanning Tree Protocol (STP) in the data center network, and it provides increased scalability and improved
resiliency. VXLAN has become the de facto standard for overlays that are terminated on physical switches or virtual network elements.
The traditional Layer 2 extension mechanisms using VXLAN rely on "Flood and Learn" mechanisms. These mechanisms are very inefficient, delaying
MAC address convergence and resulting in unnecessary flooding. Also, in a data center environment with VXLAN- based Layer 2 extension
mechanisms, a Layer 2 domain and an associated subnet might exist across multiple racks and even across all racks in a data center site. With
traditional underlay routing mechanisms, routed traffic destined to a VM or a host belonging to the subnet follows an inefficient path in the network,
because the network infrastructure is aware only of the existence of the distributed Layer 3 subnet, but it is not aware of the exact location of the
hosts behind a leaf switch.
With the Extreme BGP-EVPN, network virtualization is achieved by creating a VXLAN-based overlay network. It leverages BGP EVPN to provide a
control plane for the virtual overlay network. BGP EVPN enables control-plane learning for end hosts behind remote VXLAN tunnel endpoints
(VTEPs). This learning includes reachability for Layer 2 MAC addresses and Layer 3 host routes.
Some key features and benefits of Extreme BGP-EVPN network virtualization are summarized as follows:
Active-active MCT/vLAG pairs—Multi-Chassis port channel for dual homing of network endpoints are supported at the leaf. Both switches in a
MCT/vLAG pair participate in the BGP-EVPN operations and are capable of actively forwarding traffic.
Static anycast gateway—with static anycast gateway technology, each leaf is assigned the same default gateway IP and MAC addresses for all
connected subnets. This ensures that local traffic is terminated and routed at Layer 3 at the leaf. This also eliminates any suboptimal inefficiencies
found with centralized gateways. All leafs are simultaneously active forwarders for all default traffic for which they are enabled. Also, because the
static anycast gateway does not rely on any control-plane protocol, it can scale to large deployments.
Efficient VXLAN routing—with the gateway moved to the leaf, routing of packets between VXLAN networks occur at the leaf. Routed traffic from the
network endpoints is terminated in the leaf and is then encapsulated in the VXLAN header to be sent to the remote site. Similarly, traffic from the
remote leaf node is VXLAN-encapsulated and gets decapsulated and routed to the destination. This VXLAN routing operation in to and out of the
tunnel (RIOT) on the leaf switches is enabled in the Extreme SLX and VDX platform ASICs. VXLAN routing performed in a single pass is more efficient
than competitive ASICs.
Data-plane IP and MAC learning and Control-plane —With IP host routes and MAC addresses learned from the data plane and advertised with BGP
EVPN, the leaf switches are aware of the reachability of the hosts in the network. Any traffic destined to the hosts takes the most efficient route in the
network.
Layer 2 and Layer 3 multitenancy—BGP EVPN provides the control plane for VRF routing and for Layer 2 VXLAN extension. BGP EVPN enables a
multitenant infrastructure and extends it across the data center to enable traffic isolation between the Layer 2 and Layer 3 domains, while
providing efficient routing and switching between the tenant endpoints.
Dynamic tunnel discovery—With BGP EVPN, the remote VTEPs are automatically discovered. The resulting VXLAN tunnels are also automatically
created. This significantly reduces operational expense (OpEx) and eliminates errors in configuration.
19
ARP/ND suppression—The BGP-EVPN EVI leafs discover remote IP and MAC addresses and use this information to populate their local ARP tables.
Using these entries, the leaf switches respond to any local ARP queries. This eliminates the need for flooding ARP requests in the network
infrastructure.
Conversational ARP/ND learning—Conversational ARP/ND reduces the number of cached ARP/ND entries by programming only active flows into the
forwarding plane. This helps to optimize utilization of hardware resources. In many scenarios, there are software requirements for ARP and ND
entries beyond the hardware capacity. Conversational ARP/ND limits storage-in-hardware to active ARP/ND entries; aged-out entries are deleted
automatically.
VM mobility support—if a VM moves behind a leaf switch, with data-plane learning, the leaf switch discovers the VM and learns its addressing
information. It advertises the reachability to its peers, and when the peers receive the updated information for the reachability of the VM, they
update their forwarding tables accordingly. BGP-EVPN-assisted VM mobility leads to faster convergence in the network.
Open standards and interoperability—BGP EVPN is based on the open standard protocol and is interoperable with implementations from other
vendors. This allows the BGP-EVPN-based solution to fit seamlessly in a multivendor environment.
Ingress Replication
IP Network Underlay
I MAC-H2
MAC-H1 10.10.1.2
VTEP-A VTEP-B
10.10.1.1
10.10.10.1 10.10.10.2
VLAN10, VNI10 VLAN20, VNI10
VTEP-C
10.10.10.3
VTEP-C L2 table
VLAN30, VNI10
MAC VTEP-IP
H1 10.10.10.1
H2 10.10.10.2
MAC-H3 H3 Local Eth port
10.10.1.3
VXLAN tunnel end point (VTEP) may be implemented in hardware (leaf or ToR switch) or in virtualized environments. Each VTEP has a unique IP
address and MAC address. Each VTEP can reach other VTEPs over the underlay IP network.
Each VTEP has its own end host/server segment connected to it. In this topology, all hosts belong to one Layer 2 broadcast domain or, in simple
terms, one VLAN and one IP subnet. The local VLAN numbers may be different in each VTEP, but they are bound to one VNI number, which is
common on all VTEPs. So for all practical purposes, the LAN segment is now identified by a VXLAN VNI, and the VLAN numbers are only locally
significant.
The logical dashed lines shown inside the IP network between the VTEPs represent the head-end or ingress replication paths. This is used to send
what is known as the BUM traffic: Broadcast, Unknown Unicast, and Multicast frames on the Layer 2 segment. The VTEP unicasts these packets to all
other VTEPs connected to a VXLAN segment. This may require additional configuration or provisioning of tunnels on each VTEP device to all other
devices.
20
Let's consider that H1 wants to communicate with H2:
• VTEP-A learns H1 as a local MAC and also maps this host to the VNI, and because the packet is a broadcast packet, it is encapsulated into the
VXLAN packet and replicated; it is then unicast to each of the remote VTEPs participating in this VNI segment. The outer-src-ip is set to
10.10.10.1, and the outer-dst-ip is the remote VTEP IP.
• VTEP-B and VTEP-C decapsulate the packet and flood it into their local VXLAN network.
• They also learn three pieces of information: the source-ip of VTEP-A, the inner-src-mac of H1, and the VNI. This creates an L2-MAC-to-VTEP-
IP binding: {mac H1, VTEP-ip 10.10.10.1, VNI 10}.
• When H2 responds to the ARP request, the packet is unicast to H1. This packet is encapsulated in a VXLAN packet by VTEP-B and sent as a
unicast IP packet based on its routing table:
• VTEP-A decapsulates the packet and sends it to H1. It also creates an L2-MAC-to-VTEP-IP binding: {MAC H2, VTEP-IP 10.10.10.2, VNI 10}.
• Now the communication between H1 and H2 will be unicast. VTEP-A and VTEP-B now know sufficient information to encapsulate the packets
directly between them.
When the hosts are in different subnets, we need a Layer 3 gateway in the network to connect to all VNI segments. As seen in Figure 6, VTEP-C is
configured with all VNI numbers in the network and acts as the router or gateway between these VNI segments (see the blue and red dotted arrows
routing between VLAN10 and VLAN20). When hosts send ARP messages for the gateway in their respective VLANs, VTEP-C will respond. For first-hop
router redundancy, multiple VTEPs may be configured with all VNIs, and they may run an FHRP protocol between them.
VTEP-C 10.10.10.3
VLAN10, VNI10, GW-IP
VLAN20, VNI20, GW-IP
VLAN30, VNI30, GW-IP
Host-C
VLAN10
21
BGP EVPN for VXLAN
As we have seen in the VXLAN flood and learn case, the MAC learning is data frame-driven and flooding of broadcast or unknown unicast frames
depends on ingress replication by VTEPs in the network.
With the BGP EVPN control plane, the MAC learning happens via BGP similar to IPv4/IPv6 route learning in a Layer 3 network. This reduces flooding in
the underlay network except for remarkably silent hosts. This control-plane-based MAC learning enables several additional functions with BGP as the
unified control plane for both Layer 2 and Layer 3 forwarding in the overlay network.
In Figure 7, each VTEP, being a BGP speaker, advertises the MAC and IP addresses of its local hosts to other VTEPs using the BGP EVPN control plane. A
BGP route reflector may be used for distribution of this information to the VTEPs. Both VTEP discovery and MAC/IP or MAC/IPv6 host learning happen
through the control plane.
Since IPv4/IPv6 addresses are also exchanged in the control plane, each VTEP may act as a gateway for the VNI subnets configured on it. A centralized
Layer 3 gateway is not required. This feature is also referred to as distributed gateway. Also, since each VTEP is aware of MAC/IP or MAC/IPv6 host
bindings, ARP requests need not be flooded between the VTEPS. The VTEP may respond to the ARP requests on behalf of the target host, if the host
address has already been learned. This is referred to as ARP/ND suppression in the fabric.
MAC-H1 I MAC-H2
10.10.1.1 10.10.1.2
VTEP-C 10.10.10.3
VLAN30, VNI10
MAC-H3
10.10.1.3
BGP EVPN control-plane-based learning allows more flexibility to control the information flow between the VTEPs. It enables layer-2 multi-tenancy
using MAC-VRF constructs. In a simple terms, each VLAN or a Bridge-domain can be considered as a MAC-VRF, and MAC addresses from the remote
VTEPs get downloaded into it.
BGP-EVPN also enables layer-3 multitenancy using VRFs similar to MPLS-VPN. Each VTEP may host several tenants and each tenant with a set of
VXLAN segments. Depending on the interest, other VTEPs may import the tenant-specific information. This way both Layer 2 and Layer 3 extensions
can be provisioned on a tenant basis.
BUM traffic is accommodated using ingress replication at the VTEP. Since VTEP discovery also happens through the control plane, setting up ingress
replication does not require additional provisioning or configuration about remote VTEPs.
Let’s look at the functional components of the BGP EVPN implementation of a data center IP Fabric.
VTEP
In an IP fabric, the leaf and border leaf act as VTEPs. Note that only one VTEP is allowed per device. Every VTEP has an overlay interface, which
identifies the VTEP IP address. The VTEP information is exchanged, and remote VTEPs are discovered over BGP EVPN.
22
Static Anycast Gateway
Each leaf or VTEP has a set of server-facing VLANs that are mapped to VXLAN segments by a VNI number. These VLAN segments have an associated VE
interface (a Layer 3 interface for the VLAN). Each tenant VLAN has anycast gateway IPv4/IPv6 addresses and associated anycast gateway MAC
addresses. These gateway IP/IPv6 addresses and gateway MAC address are consistent for the VLAN segments shared on all leafs in the fabric.
Overlay Gateway
Each VTEP or leaf is configured with an overlay gateway. This defines the VTEP IP address, which is used as the source IP when encapsulating packets
and is used as the next-hop IP in the EVPN NLRIs. In this validated design, we are using an IPv4 underlay; hence the overlay interface is associated with
the IPv4 address of a loopback interface on the leaf.
In the leaf-spine topology (3-stage Clos or 5-stage Clos), all leafs and border leafs should be enabled with the BGP EVPN Address- Family to exchange
EVPN routes (NLRI) and participate in VTEP discovery. Spine and super-spines do not participate in the VTEP functionality. However, selected spines in
the spine layer are enabled with the BGP EVPN Address-Family for distribution of routes, and all leafs including border leafs must be peered with these
spines who have the BGP EVPN Address-Family enabled.
In the deployment model where eBGP is used, a minimum of two spines in a 3-stage PoD should be enabled with the EVPN Address-Family. Note that
all spines participate in the eBGP underlay, but only a few designated spines participate in the EVPN.
In the deployment model where iBGP is used, two spines are selected as route reflectors for the EVPN Address-Family, and each VTEP leaf has two
iBGP neighbors that are the two spine BGP route reflectors. Each spine BGP route reflector has all VTEP leaf nodes as route-reflector clients and
reflects EVPN routes for the VTEP leaf nodes.
In the 5-stage Clos topology, a minimum of two super-spines should be enabled with the EVPN Address-Family.
• Route Type-1—Ethernet Auto Discovery route. This route is used in multi-homing cases to achieve split-horizon, aliasing, and fast
convergence.
– MAC-only route that carries {MAC address of the host, L2VNI of the VXLAN segment}. This route carries only the Layer 2 information of
a host. Whenever a VTEP learns a MAC from its server-facing subnets, it advertises this route into BGP.
– MAC/IP route that carries {MAC address of the host, IPv4/IPv6 address of the host, L2VNI of the VXLAN segment, L3VNI of the tenant
VRF of the host}. This route carries both the Layer 2 and Layer 3 information of the hosts. This route is advertised by the VTEP when it
learns the IPv4/IPv6 host addresses via ARP or ND from the server-facing subnets. This information enables ARP/ND suppression on
other VTEPs.
• Route Type-3—Inclusive Multicast Ethernet Tag route. This route is required for sending BUM traffic to all VTEPs interested for a given
bridge domain or VXLAN segment.
• Route Type-4—Ethernet Segment route. This route is used for multi-homing of server VLAN segments. Note that only MCT or VLAG- based
multi-homing is supported.
• Route Type-5—IPv4/IPv6 prefix advertisement route {IPv4/IPv6 route, L3VNI, Router-MAC}. This route is advertised for every Layer 3 server-
facing subnet behind a VTEP or external routes.
23
Tunnel Attribute
Extended community type 0x3, sub-type 0x0c, and tunnel encapsulation type 0x8 (VXLAN). This is included with all EVPN routes.
Each tenant VRF is configured with a unique Layer 3 VNI. This is required for inter-subnet routing. This VNI must be the same for a tenant VRF on all
VTEPs including the border leaf. Both Type-2 and Type-5 routes carry this Layer 3 VNI.
The router-mac is the MAC address of the VTEP advertising a route. This is also required along with the Layer 3 VNI for inter-subnet routing, as
explained in Integrated Routing and Bridging section; and it is carried in both Type-2 MAC/IP routes and Type-5 prefix routes. In the data plane, this
MAC address is used as the inner destination MAC address when a packet is routed.
MAC-Mobility Attribute
Extended community type EVPN (0x06) and sub-type 0x00. Carries a 32-bit sequence number.
This enables MAC or station moves between the VTEPs. When a MAC moves, for example, from VTEP-1 to VTEP-2, VTEP-2 advertises a MAC (or
MAC/IP) route with a higher sequence number. This update triggers a best-path calculation on other VTEPs, thereby detecting the host move to VTEP-
2.
ARP Suppression
Control-plane distribution of MAC/IP addresses enables ARP suppression in the fabric for Layer 2 extensions between racks. A portion of the fabric is
shown in Figure 8 to illustrate the ARP suppression functionality in the fabric.
When the hosts come up, they typically ARP for the gateway IP that is hosted by leafs. Let's consider the case where H2 ARPs for the gateway address.
Note that both leafs have the same anycast gateway address for the host VXLAN segment.
• Leaf2 will advertise the MAC/IP route into the BGP EVPN Address-Family.
• Leaf1 will learn this route and populate it in its MAC/IP binding table.
• Extending the same information flow for H1, when Leaf2 learns H1's MAC/IP route, it will respond to ARP requests on behalf of H1.
Compared to the data-plane-based learning in Layer 2 extension technologies such as VPLS or VXLAN flood and learn, where ARP traffic is also sent
over an overlay network, VXLAN EVPN significantly reduces ARP/ND flooding in the fabric.
24
Figure 8 ARP Suppression
Leaf1 Leaf2
VNI 10 VNI 10
Anycast Gateway Anycast Gateway
(3) H1 sends ARP for H2; (1) H2 sends ARP for Gateway IP;
Leaf1 responds on behalf of H2 Leaf2 update its host table
H1 H2
MAC M1 MAC M2
IP 10.10.1.1 IP 10.10.1.2
VLAN 10 VLAN 10
VLAN Scoping
As discussed earlier, in VXLAN networks, each VLAN is mapped to a VNI number of a VXLAN segment. This provides an interesting option to break
the 4K limit of the 802.1Q VLAN space. The VLAN tag (or c-tag) on the wire or the port VLAN membership may be locally scoped or locally significant
at the leaf level or at the port level within a leaf.
In this example,
By mapping to the same VNI, the two VLAN segments (VLAN 10 and VLAN 20) are on the same bridge domain. With this mapping, hosts on these
VLANs have Layer 2 extension between them, and they belong to one VXLAN segment identified by the VNI 10.
25
Figure 9 VLAN Scoping at the Leaf/ToR Level
Leaf1 Leaf2
VLAN10 VNI 10 VLAN20 VNI 10
Anycast GW 10.10.1.254 Anycast GW 10.10.1.254
H1 H2
MAC M1 MAC M2
IP 10.10.1.1 IP 10.10.1.2
VLAN 10 VLAN 20
Refer to Figure 10. In this example, Port1, VLAN tag 10, and Port2, VLAN tag 20, are mapped to bridge-domain BD 100 , and BD 100 is mapped to VNI
4196. With this mapping, the hosts H1 (VLAN 10), H2 (VLAN 20), and H3 (VLAN 501) are bound to one VXLAN segment identified by the VNI 4196.
Leaf1 Leaf2
BD 100 VNI 4196 BD 100 VNI 4196
BD 100: BD 100:
- Port1, tag 10 Port2 Port1, tag 501
- Port2, tag 20 Port1 Port1 Anycast GW 10.10.1.254
Anycast GW 10.10.1.254
H1 H2 H3
MAC M1 MAC M2 MAC M3
IP 10.10.1.1 IP 10.10.1.2 IP 10.10.1.3
VLAN tag 10 VLAN tag 20 VLAN tag 501
26
Conversational Learning
Conversational learning helps conserve the hardware forwarding table by programming only those ARP/ND or MAC entries for which there are active
conversations or traffic flows. With this feature, the control plane may hold more host entries than what the hardware table can support. When there
is sufficient space in hardware, all host entries are programmed. When there is no space, conversational learning kicks in and starts aging out the
inactive entries. Note that the host subnets are inserted into the hardware (LPM table) regardless of the activity. The host entries are inserted in the
hardware (/32 IPv4 or /128 IPv6 host route table) based on the traffic.
Asymmetric IRB
Figure 11 Asymmetric IRB
Leaf1 Leaf2
VLAN 10 VNI 10, Anycast GW 10.10.1.254 VLAN 10 VNI 10, Anycast GW 10.10.1.254
VLAN 20 VNI 20, Anycast GW 10.20.1.254 VLAN 20 VNI 20, Anycast GW 10.20.1.254
Tenant VRF SALES Tenant VRF SALES
H1 H2 H3
MAC M1 MAC M2 MAC M3
IP 10.10.1.1 IP 10.10.1.2 IP 10.20.1.1
VLAN 10, VNI 10 VLAN 10, VNI 10 VLAN 20, VNI 20
In Figure 11, a tenant, SALES, is provisioned in the fabric with two VNI segments, VNI 10 and VNI 20. Leaf1 has servers connected to it on VNI 10 only.
However, it is provisioned with both VLAN 10 and VLAN 20 and mapped to VNI 10 and VNI 20 respectively. Similar configuration is done on Leaf2. Both
Leafs act as first-hop gateways for these VLANs with anycast gateway address.
If H1 in VNI 10 needs to communicate with H3 in VNI 20, Leaf1 routes the packet first between the segments and then bridges the packet on VNI 20
and the packet is sent on the Overlay. Leaf2 will decapsulate the VXLAN headers and send the packet to H3.
Essentially, the ingress VTEP both routes and bridges the packet; this method is referred as asymmetric IRB. This also means that every VTEP must be
configured with all VLANs irrespective of the existence of local workloads on those VLANs.
Symmetric IRB
Figure 12 depicts symmetric IRB. Here, every tenant is assigned a Layer 3 VNI. This is analogous to a Layer 3 routing interface between two switches.
This VNI must be the same for a given tenant on all leafs where it is provisioned.
27
The MAC/IP host routes are advertised by the VTEP with the L2 VNI as well as an L3 VNI and the router-mac address of the VTEP. When a packet is
routed over the L3 VNI, the dst-mac of the inner Ethernet payload is set to the router-mac of the remote VTEP. In Figure 12, routing from H1 to H3
always occurs over this L3 VNI. That is, both leaf devices route the packet once: by the ingress leaf from the server VLAN/VNI to the L3 VNI and by the
egress leaf from the L3 VNI to the server VLAN/VNI.
A significant advantage of this method is that all VNIs of a given tenant need not be created on all leafs. They are created only when there is server
connectivity to those VNIs. In Figure 12, Leaf1 is not configured with VNI 20. Also note that on Leaf2, even though VNI 10 is present, a packet from H3
to H1 will be routed directly on to the L3 VNI of the tenant. This adds the additional requirement that the host routes on all VXLAN segments in a given
tenant need to be downloaded to the leaf's forwarding table.
L3 VNI 2000
Leaf2
Leaf1 RMAC RM2
RMAC RM1 VLAN 10 VNI 10, Anycast GW 10.10.1.254
VLAN 10 VNI 10, Anycast GW 10.10.1.254 VLAN 20 VNI 20, Anycast GW 10.10.20.254
Tenant VRF Sales, L3 VNI 2000 Tenant VRF Sales, L3 VNI 2000
H1 H2 H3
MAC M1 MAC M2 MAC M3
IP 10.10.1.1 IP 10.10.1.2 IP 10.20.1.1
VLAN 10, VNI 10 VLAN 10, VNI 10 VLAN 20, VNI 20
In the Extreme BGP EVPN implementation, we get the best of both schemes:
• There is no need to create all server VNIs on all leafs for a tenant.
• If a target VNI segment is not local and is extended behind one or more remote VTEPs, download the host routes on that target segment
into hardware based on traffic activity. Traffic to these hosts will be routed over the L3 VNI.
Multitenancy
Layer 2 multitenancy is achieved by a MAC-VRF construct used for extending a VLAN between multiple VTEPs or ToRs.
In BGP EVPN, multiple tenants can co-exist at the Layer 3 level and share a common IP transport network while having their own separate routing
domain in the VXLAN overlay network. Every tenant in the EVPN network is identified by a VRF (VPN routing and forwarding instance), and these
tenant VRFs can span multiple leafs in a data center. (Similar to Layer 3 MPLS VPNs with tenant VRFs on multiple PE devices.). Each VRF can have a set
of server-facing VLANs, routing interfaces for those VLANs with anycast gateways, and a Layer 3 VNI used for symmetric routing purposes. This VNI
should be the same if the same tenant VRF is provisioned on other leafs including a border leaf.
We recommend the separation of the tenant routing domain from the underlay routing domain (or default VRF), which is used for setting up the
overlays or tunnels between the VTEPs.
28
Even if Layer 3 multitenancy is not required in a deployment (this is the case with a single tenant), we recommend moving the server subnets to a
separate VRF and keeping a clear separation of underlay and overlay routing domains. By using a separate VRF for server subnets, there is a visibility
into host routes and we can leverage the host route optimization in the data plane. A tenant VRF also allows provisioning a L3 VNI that enables
symmetric IRB.
A tenant VRF may not be needed in the case of pure L2 or VLAN extension between the VTEPs.
Figure 13 Multitenancy
L3 VNI 5003
L3 VNI 5001
L3 VNI 5002
VRF101 VRF101
VRF101
L3 VNI 5001 L3 VNI 5001
L3 VNI 5001
VLAN 200, 201 VLAN 201. 202 VRF11
L3 VNI 5002
VRF11 VRF21
L3 VNI 5002 L3 VNI 5003 VRF21
VLAN 100,102 VLAN 300, 301 L3 VNI 5003
Ingress Replication
Although host reachability information is exchanged over the control plane to drastically reduce flooding in a VLAN, certain situations require the
flooding of frames, as in traditional Ethernet networks such as but not limited to:
• MAC aging
• Silent hosts
• L2 multicast or broadcast
Ingress replication is a technique used to accommodate flooding in such cases by the VTEPs in the IP fabric. Each VTEP for a given VXLAN segment (or
server VLAN) computes the list of VTEPs having the same segment using the IMR (Inclusive Multicast Route) routes. Whenever the VTEP must flood a
frame in a VXLAN segment, it replicates the frame in hardware and unicasts the frame to each of the VTEPs in the IMR list for that segment.
MCT Pair
SLX platforms support MCT between two nodes. This is the recommended solution for leaf level redundant connectivity for the workloads. Multi-
homing is supported only using a MCT pair. Multi-homing to two separate VTEPs or leaf nodes, is not supported.
When the two leafs are in a MCT pair, they act as one logical VTEP or endpoint. As shown in Figure 14, both leafs are configured with the same VTEP IP
address. From other VTEPs in the network, this pair appears as a single VTEP. This is very important because having two physical switches in this mode
on each rack does not result in an increased number of VTEPs or additional tunneling requirements on other VTEPs in the network.
VDX platforms support similar functionality with two devices in a vLAG pair.
29
Figure 14 MCT Pair for Dual-homing and Leaf Redundancy
Leaf1 Leaf2
VTEP-IP 10.121.1.1 VTEP-IP 10.121.1.1
VLAN 10 VNI 10, Anycast GW 10.10.1.254 VLAN 10 VNI 10, Anycast GW 10.10.1.254
ICL
VLAN 20 VNI 20, Anycast GW 10.20.1.254 VLAN 20 VNI 20, Anycast GW 10.20.1.254
H1 H2 H3
MAC M1 MAC M2 MAC M3
IP 10.10.1.1 IP 10.10.1.2 IP 10.20.1.1
VLAN 10, VNI 10 VLAN 10, VNI 10 VLAN 20, VNI 20
30
Validated Designs
This section provides the details of the deployment model with the validated configuration templates. Extreme validated design recommends a
deployment model for the IP fabric deployment, which uses eBGP as the control protocol between the tiers of the nodes in the Fabric. Depending on
the scale, the user may choose either a 3-stage or a 5-stage fabric. The number of racks inside a PoD is determined by the port density of the Spines.
The location of the workloads and connectivity challenges also determine the size of a PoD.
Figure 15 shows the design for a 3-stage IP fabric using eBGP as the control protocol to exchange both underlay (IPv4 unicast) and overlay
(L2VPN/EVPN) routes. Note that the border leafs are connected to the spines in this design. BGP uses IPv4 as transport for peering between the tiers.
There is no BGP peering between the nodes in the same tier.
SPINE
AS 4200000000 Internet
MPLS Network
WAN EDGE
Leaf Border-Leaf
31
The design shown in Figure 16 is a 5 stage IP Fabric where a super-spine layer interconnects multiple 3-stage PoDs.
Super-Spine
AS 4200003000
Internet MPLS Network
WAN EDGE
AS 4200000000 AS 4200002000
Spine Spine
Edge Services
AS 4200007000
Border-Leaf
Leaf Leaf
POD1 POD2
32
Hardware and Software Matrix
TABLE 1 Platforms Used in This Validated Design
33
3-Stage Fabric
Fabric Infrastructure Configuration
This section covers the aspects of provisioning the building blocks of the IP fabric underlay infrastructure. This involves the common configurations
on the fabric nodes, the loopback interfaces used as the router ID and VTEP address, and the interfaces or links between the fabric nodes (also
referred to as the fabric infrastructure links).
• Loopback interface with a unique IPv4 address as VTEP-IP on Leaf and Edge-Leaf.
• Dual homing and redundancy at the Leaf and Edge-Leaf using a pair of nodes in MCT or vLAG.
• All these links are configured as Layer 3 interfaces with /31 IPv4 address.
• The MTU for these links is set to Jumbo MTU. This is a requirement to handle the VXLAN encapsulation of Ethernet frames. In the
configuration shown below, the IPv4 MTU of the link between Spine and Leaf is set to 9100.
The link configurations on two nodes – a spine and a leaf – interconnected with a fabric link, are shown below.
Loopback Interfaces
Each device in the fabric needs one loopback interface with a unique IPv4 address for the purpose of Router-ID.
34
interface Loopback 1
ip address 10.1.1.11/32
no shutdown
!
ip router-id 10.1.1.11
Each leaf and border leaf needs a loopback interface with a unique IPv4 address to use as VTEP-IP. This interface is used under overlay gateway
configuration as shown below. This is not required on spine and super-spines.
interface Loopback 2
ip address 10.1.1.1/32
no shutdown
!
overlay-gateway mct-leaf
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
Each device in the MCT pair connects to every Spine in the network with L3 links also referred to as Fabric Infrastructure Links.
Each device must be configured with their own loopback interfaces for unique router-ids.
However both of them share same VTEP IP address. This is achieved by configuring same IP address on another loopback interface.
Cluster Configuration
This involves the following steps:
• ICL member ports and a port-channel. We strongly recommend a minimum of two ports in this bundle. They must of the same type and
speed. In the validated design, as shown in the illustration below, the MCT pair is interconnected by two 40G Ethernet ports for ICL port-
channel.
35
• EVPN instance and BGP peering between the peers. The BGP AS number is same as the one assigned to the Leaf pair in the fabric. Both
devices in a cluster use the same AS number.
The server facing port-channel configuration is provided under “Tenant Provisioning” section under “Network Virtualization with BGP EVPN”.
VDX platforms support dual-homing with vLAG feature. Please refer to the appendix for provisioning VDX vLAG as ToR.
MCT-Peer1 MCT-Peer2
ICL Links
mLAG
node-id 1 node-id 2
cluster management principal-priority 1 !
! !
vlan 4090 vlan 4090
router-interface Ve 4090 router-interface Ve 4090
description MCT control-vlan description MCT control-vlan
! !
interface Ve 4090 interface Ve 4090
ip address 10.0.1.8/31 ip address 10.0.1.9/31
no shutdown no shutdown
! !
cluster pod1-cluster 1 cluster pod1-cluster 1
peer-interface Port-channel 1 peer-interface Port-channel 1
peer 10.0.1.9 peer 10.0.1.8
df-load-balance df-load-balance
deploy deploy
! !
evpn default evpn default
route-target both auto ignore-as route-target both auto ignore-as
rd auto rd auto
! !
router bgp router bgp
neighbor 10.0.1.9 remote-as 4200000001 neighbor 10.0.1.8 remote-as 4200000001
neighbor 10.0.1.9 bfd neighbor 10.0.1.8 bfd
address-family ipv4 unicast address-family ipv4 unicast
no neighbor 10.0.1.9 activate no neighbor 10.0.1.8 activate
address-family l2vpn evpn address-family l2vpn evpn
neighbor 10.0.1.9 encapsulation nsh neighbor 10.0.1.8 encapsulation nsh
neighbor 10.0.1.9 activate neighbor 10.0.1.8 activate
36
BGP Underlay Configuration
When enabling network virtualization with EVPN overlay, the underlay configuration needs to accommodate the BGP peers that exchange only
IPv4 routes and the BGP peers that exchange both IPv4 and EVPN routes. This is accomplished by using BGP peer groups.
Leaf Configuration
This is applicable to all leafs. They form bgp peering with all Spines inside the 3-stage PoD.
For MCT pair leaf and border-leaf, each node in the pair must be configured with BGP peering to spines in a similar manner. Nodes in a MCT pair will
advertise a common VTEP IP address into the underlay.
• Configure the directly connected IP addresses of the spines into a peer groups: spine-group.
router bgp
local-as 4200000001 4 Byte AS Number of this leaf.
capability as4-enable Enable 4 Byte AS capability.
fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3
!
neighbor spine-group peer-group
neighbor spine-group remote-as 4200000000 Peer-group config pointing to spines.
neighbor spine-group description To spine Enable MD5 authentication and BFD
neighbor spine-group password <password>
neighbor spine-group bfd
!
neighbor 10.0.1.1 peer-group spine-group
neighbor 10.0.1.3 peer-group spine-group
neighbor 10.0.1.5 peer-group spine-group
neighbor 10.0.1.7 peer-group spine-group Enable IPv4 Address-Family.
! Advertise the VTEP-IP.
address-family ipv4 unicast Enable graceful-restart.
network 10.1.1.1/32
maximum-paths 8
graceful-restart
!
37
Border/Edge leaf Configuration
The configuration is similar to Leaf, but the local-as number is different. Both Edge-leafs are in the same AS. In addition, border-leaf may also have BGP
peering to WAN/DCI Edge nodes.
router bgp
local-as 4200007000
capability as4-enable
fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3
!
neighbor spine-group peer-group
neighbor spine-group remote-as 4200000000
neighbor spine-group description To spine
neighbor spine-group password <password>
neighbor spine-group bfd
!
neighbor 10.31.1.0 peer-group spine-group
neighbor 10.32.1.0 peer-group spine-group
neighbor 10.33.1.0 peer-group spine-group
neighbor 10.34.1.0 peer-group spine-group
!
address-family ipv4 unicast
network 10.61.1.1/32
maximum-paths 8
graceful-restart
!
Spine Configuration
This is applicable to all spines in the 3-stage fabric. Note that each Leaf or a MCT pair Leaf and Border-leaf pair are in separate autonomous systems.
The remote-as must be specified separately and can’t be configured under peer-groups.
• Configure the directly connected leafs' IP addresses in one peer group: leaf-group.
• Configure the directly connected edge-leafs' IPs into a peer group: edge-group.
38
router bgp 4 Byte AS number of this device
local-as 4200000000 Enable 4 byte AS number capability
capability as4-enable
fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3
!
neighbor edge-group peer-group Configure a peer-group “edge-group” for
neighbor edge-group remote-as 4200000021 the directly connected links’ IPv4
neighbor edge-group password <password> addresses of the border-leafs.
neighbor edge-group bfd Enable MD5 authentication and BFD
!
neighbor 10.32.1.1 peer-group edge-group
neighbor 10.32.2.1 peer-group edge-group
!
neighbor leaf-group peer-group
Configure a peer-group “leaf-group”
neighbor leaf-group description To leaf/TOR
neighbor leaf-group password <password> Enable MD5 authentication and BFD
neighbor leaf-group bfd
!
neighbor 10.0.1.0 remote-as 4200000001
neighbor 10.0.1.0 peer-group leaf-group
neighbor 10.0.2.0 remote-as 4200000001
neighbor 10.0.2.0 peer-group leaf-group Add the directly connected leafs’ IPv4
neighbor 10.0.3.0 remote-as 4200000003 addresses into leaf-group.
neighbor 10.0.3.0 peer-group leaf-group Each leaf is in a different AS, so it must
neighbor 10.0.4.0 remote-as 4200000004 be specified seperately and not with
neighbor 10.0.4.0 peer-group leaf-group peer-group
neighbor 10.0.5.0 remote-as 4200000004
neighbor 10.0.5.0 peer-group leaf-group
neighbor 10.0.6.0 remote-as 4200000006
neighbor 10.0.6.0 peer-group leaf-group Enable IPv4 address-family
!
address-family ipv4 unicast
maximum-paths 8
graceful-restart
!
!
• Associate the loopback interface whose IPv4 address is used as the VTEP IP. (Loopback 2 interface)
• Map the VLANs to the VNI number. In this validated design, we're using the auto mapping of VLAN to a VNI. For instance, VLAN 101 is
mapped to VNI 101. (This simplified mapping option should work for most implementations unless there is a specific requirement to map the
server VLAN range to a specific VNI range in the VXLAN domain.)
39
overlay-gateway leaf1 Configure the overlay gateway with a name
!
type layer2-extension
!
ip interface Loopback 2 Enable Layer 2 extension
!
map vlan vni auto
!
activate Specify the loopback interface
! used as the VTEP IP
• Enable the next-hop unchanged configuration to the peers. When EVPN routes are advertised into eBGP by a node, the next hop is set to its
peering address. This follows standard BGP behavior. The next hop should always point to the IP address of the VTEP that originated these
routes.
Spine Configuration
This is applicable to the two spines in the 3-stage fabric to exchange EVPN routes with leafs and edge leafs.
• Since spines do not have either L2 or L3 tenants, these routes get filtered because of the route-targets. To avoid this spines must retain all
EVPN routes so they can be propagated to all leafs and border-leafs in the fabric.
• Activate both leaf and border-leaf peer-groups into the EVPN Address-Family.
• Enable the next-hop unchanged configuration to the peers. When EVPN routes are advertised into eBGP by a node, the next hop is set to its
peering address. This follows standard BGP behavior. The next hop should always point to the IP address of the VTEP that originated these
routes.
40
router bgp Address family l2vpn evpn
address-family l2vpn evpn
graceful-restart
! This enables Spines to advertise all evpn
retain route-target all routes it receives from its peers without
! any filtering
neighbor edge-group encapsulation vxlan
neighbor edge-group next-hop-unchanged
neighbor edge-group activate Activate evpn route exchange with edge-
! leafs peer-group
neighbor leaf-group encapsulation vxlan As mentioned in leaf config, set nexthop
neighbor leaf-group next-hop-unchanged unchanged
neighbor leaf-group activate
!
!
Activate the evpn exchange with the leaf
peer-group. Again, set nexthop unchanged
Tenant Provisioning
Tenant provisioning refers to the configuration on leafs to enable server/workload VLANs and network connectivity to tenant VRF contexts, and
mapping these VLANs and VRFs to the overlay control and forwarding planes to establish Layer 2 extension between the racks and Layer 3
multitenancy. This section is common both 3-stage and 5-stage Clos fabrics.
VLANs are referred to as MAC-VRF in BGP EVPN and they provide multi-tenancy at Layer-2. VLAN or a Broadcast-Domain (BD) extension between two
racks or two PoDs is Layer-2 extension and uses the MAC-VRF construct in EVPN.
VRFs provide multi-tenancy at the L3 level. For instance, a tenant may have multiple VLAN subnets, and extended across multiple Racks or PoDs and
requires Inter-VLAN routing. The L3 subnets of these VLANs are configured within a virtual routing context referred to as L3 VRF or simply VRF. Even
for deployments that do not require multi-tenancy at Layer-3, we recommend using one VRF for the workload connectivity if gateway services are
needed at the Fabric Leafs. The workload or server facing VLAN subnets must never be in the underlay or default VRF context of the Leafs.
The MAC addresses must be different for IPv4 and IPv6, but the OUI portion (first three bytes) must be same.
ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202
Tenant VRF
The underlay routing domain is in the default VRF of a leaf device. This provides the reachability and provisioning of tunnels or overlays to other VTEPs
in the network. VRF provides the multi-tenancy at layer-3. For server subnets or workloads, we recommend using a separate VRF. This is the separate
underlay and overlay routing domains. This also allows the use of an L3 VNI for symmetric IRB, host route visibility, and optimization in the forwarding
plane.
41
The following are the steps involved in tenant VRF configuration.
1. Assign a unique RD. Every tenant must have a unique RD value per leaf/ToR where it is provisioned. In the validated design, we are using
the following format: IPv4_Address:nn where
• “nn” is a unique number for the tenant VRF. This value is re-used on other leafs as well where the same tenant is provisioned.
For example, vrf201 has the following RD values on leafs where it is provisioned.
– On leaf1: 10.1.1.11:101
– On leaf5: 10.1.2.11:101
– On border-leaf1: 10.123.4.1:201
2. Assign a unique L3 VE routing interface. This gets mapped to a L3 VNI used for symmetric routing in the tenant VRF.
3. Assign import and export route targets for IPv4 and IPv6 tenant routes.
In the configuration templates below, the following tenant profile is enabled on a leaf:
• Name: vrf101
• Route-target 101:101
This is the routing interface for the Integrated Routing and Bridging (IRB) operation on the leaf. The bridge-domain is equivalent to an underlying VLAN
for the L3 interface.
42
Bridge-domain for the L3 VNI
bridge-domain 3001 Specify the Router interface for this Bridge-domain
router-interface ve 3001
!
interface Ve 3001 Router interface for the L3 VNI
vrf forwarding vrf101 Associate the interface to the tenant VRF
ipv6 address use-link-local-only
no shutdown
! Enable V6 forwarding on this interface
vlan 101
description L2 Tenant Vlan; Tenant VRF vrf101
router-interface Ve 101 Routing interface for the VLAN
suppress-nd
suppress-arp
! VLAN subnet belongs to the tenant vrf101
interface Ve 101
vrf forwarding vrf101
ip anycast-address 10.0.101.254/24 IPv4 anycast gw address. Same address on all
ipv6 anycast-address fdf8:10:0:65::254/96 Leafs where this vlan is extended
no shutdown
!
IPv6 anycast gw address. Same address on all
leafs where this VLAN is extended
Next step is to enable this VLAN on the server facing port of the Leaf.
• The MTU for these links is set to the default: 1500 bytes.
1 If there are L2 switches or bridges between leaf and servers, spanning tree protocol must be enabled. If there is a possibility of enabling bridges
inadvertently under the leaf nodes, we recommend enabling spanning tree and configuring the server ports as edge ports.
43
interface Ethernet 0/34 Enabled as a trunk port
switchport Add the required VLANs to the trunk port
switchport mode trunk
switchport trunk allowed vlan add 101
switchport trunk tag native-vlan
spanning-tree shutdown
no shutdown
The RD and RT configuration is set to auto in this design for simplicity and may be followed for most of the deployments. Advanced users may define a
different scheme of RD and RT. A user-defined RD/RT is not covered in this document.
For L3 symmetric IRB, the bridge-domain associated with the L3 VNI of the tenant VRF is added to the EVPN instance.
Enable the server vlan’s VNI. This enables L2 Enable the Bridge-domain used for L3 VNI of the tenant VRF.
extension of the workload VLAN. For any For additional tenant VRFs, add the respective L3 VNI’s bridge-
additional vlans, add the respective VNIs domains.
IPv6
router bgp Activate layer3 IPv4 routes exchange from the
address-family ipv6 unicast vrf vrf101 tenant
redistribute connected
maximum-paths 8
! Advertise the connected subnets inside the tenant
! vrf – vlan subnets in this case
44
The configuration is broadly divided in two blocks.
• MCT cluster configuration shown in the section: Fabric Infrastructure Configuration > SLX-9140 MCT pair as Leaf
• MCT tenant VLAN, VRF and L2/L3 extension over VxLAN overlay.
The configuration of two switches in Dual-ToR MCT pair is shown side-by-side for comparison.
• Loopback1 interface has unique IP address on each node, this is used as the IP Router-ID for the node.
• Loopback2 interface has the same IP address on both nodes, this is used as VETP-IP under overlay-gateway. Refer to the section Fabric
Infrastructure Configuration > SLX-9140 MCT pair as Leaf
Conversational Learning
Anycast gateway
45
MC-LAG or Dual-Homed Server Port-Channel
This section shows the creation of dual-homed server connectivity using a MLAG. Here a server is connected to the dual ToR on a two port bundle or a
LAG. This LAG is deployed or activated in the MCT cluster. The MCT number identifies that this particular bundle is dual-homed between the two
switches in the MCT cluster.
46
Tenant VRF and Layer3 Extension
5-Stage Fabric
This configuration is applicable to the model shown in Figure 16, where eBGP is used as the control protocol for underlay. A 5-stage
fabric includes multiple 3-stage PoDs interconnected by a set of Super-spines. This section includes the incremental configuration
required on top of a 3-stage fabric. The configuration templates shown for each PIN in a PoD and the Super-spines that can be used
as reference and replicated across devices in multiple PoDs.
The fabric links and router-id configurations for super-spines are similar to the configurations on Spines. Super-spines also do not act
as VTEPs and they don’t need a VTEP IP address. Refer to Fabric Infrastructure Configuration under 3-stage Fabric section.
• Super-spine tier is in one AS, i.e. every super-spine is configured with the same BGP AS number.
47
• Each spine in a POD is connected to every Super-spine with IPv4 Fabric links.
• Each spine in a POD will have EBGP peering with every Super-spine and exchange both underlay and overlay routes.
• Each Edge leaf is connected to every super-spine with IPv4 Fabric links.
• Edge leafs exchange IPv4 underlay and overlay routes with the super-spines.
Spine Configuration
This is applicable to the two spines designated to exchange only IPv4 routes with leafs and super-spines.
• Configure the directly connected leafs IP addresses in one peer group: leaf-group.
• Configure the directly connected super-spine IPs into another peer group: superspine-group.
• Enable the IPv4 Address-Family. Both peer-groups are activated by default for this AFI.
• Enable the L2VPN/EVPN AFI and activate both leaf-group and superspine-group peer-groups.
48
router bgp
local-as 4200000000
capability as4-enable
fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3
!
neighbor leaf-group peer-group
neighbor leaf-group description To leaf/TOR
neighbor leaf-group password <password>
neighbor leaf-group bfd
!
neighbor 10.0.1.0 remote-as 4200000001
neighbor 10.0.1.0 peer-group leaf-group
neighbor 10.0.2.0 remote-as 4200000001
neighbor 10.0.2.0 peer-group leaf-group
neighbor 10.0.3.0 remote-as 4200000003
neighbor 10.0.3.0 peer-group leaf-group
neighbor 10.0.4.0 remote-as 4200000004
neighbor 10.0.4.0 peer-group leaf-group
neighbor 10.0.5.0 remote-as 4200000004
neighbor 10.0.5.0 peer-group leaf-group
neighbor 10.0.6.0 remote-as 4200000006
neighbor 10.0.6.0 peer-group leaf-group
!
neighbor superspine-group peer-group
neighbor superspine-group remote-as 4200000020
neighbor superspine-group password 2 <password>
neighbor superspine-group bfd
!
neighbor 10.0.51.0 peer-group super-spine-group
neighbor 10.0.52.0 peer-group super-spine-group
!
address-family ipv4 unicast
maximum-paths 8
graceful-restart
!
address-family l2vpn evpn
graceful-restart
retain route-target all
!
neighbor leaf-group encapsulation vxlan
neighbor leaf-group next-hop-unchanged
neighbor leaf-group activate
!
neighbor super-spine-group encapsulation vxlan
neighbor super-spine-group next-hop-unchanged
neighbor super-spine-group activate
!
!
Super-Spine Configuration
• Create a peer group for each PoD:
o pod1-spine-group — Add the directly connected neighbor addresses of all spines in PoD1 to this group.
o pod2-spine-group — Add the directly connected neighbor addresses of all spines in PoD2 to this group.
• Create a separate peer group for the edge leafs — edge-group. Add the directly connected neighbor addresses of edge leafs to this group.
• Enable IPv4 AFI. All peer-groups are activated under this AFI by default
49
router bgp
local-as 4200000010
capability as4-enable
fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3
!
neighbor edge-leaf peer-group
neighbor edge-leaf remote-as 4200007000
neighbor edge-leaf password <password>
neighbor edge-leaf bfd
!
neighbor 10.0.61.0 peer-group edge-leaf
neighbor 10.0.62.0 peer-group edge-leaf
!
neighbor pod1-spine-group peer-group
neighbor pod1-spine-group remote-as 4200000000
neighbor pod1-spine-group password <password>
neighbor pod1-spine-group bfd
!
neighbor 10.0.51.1 peer-group pod1-spine-group
neighbor 10.0.52.1 peer-group pod1-spine-group
neighbor 10.0.53.1 peer-group pod1-spine-group
neighbor 10.0.54.1 peer-group pod1-spine-group
!
neighbor pod2-spine-group peer-group
neighbor pod2-spine-group remote-as 4200002000
neighbor pod2-spine-group password <password>
neighbor pod2-spine-group bfd
!
neighbor 10.22.1.9 peer-group pod2-spine-group
neighbor 10.22.1.11 peer-group pod2-spine-group
neighbor 10.22.1.13 peer-group pod2-spine-group
neighbor 10.22.1.15 peer-group pod2-spine-group
!
address-family ipv4 unicast
maximum-paths 8
graceful-restart
!
address-family l2vpn evpn
graceful-restart
retain route-target all
neighbor pod2-spine-group encapsulation vxlan
neighbor pod2-spine-group next-hop-unchanged
neighbor pod2-spine-group activate
neighbor pod1-spine-group encapsulation vxlan
neighbor pod1-spine-group next-hop-unchanged
neighbor pod1-spine-group activate
neighbor edge-leaf encapsulation vxlan
neighbor edge-leaf next-hop-unchanged
neighbor edge-leaf activate
!
!
Edge-Leaf Configuration
The configuration of edge or border leafs is similar to that of leafs. They peer with the super-spines. They exchange both IPv4 routes EVPN routes with
super-spines.
• Configure another peer group super-spine -group. Add the super-spine addresses to this group. These super-spines exchange both IPv4 and
EVPN routes.
50
• In addition, the border-leaf may be configured with peering to WAN Edge or DCI Edge devices.
router bgp
local-as 4200000061
capability as4-enable
fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3
!
neighbor super-spine-group peer-group
neighbor super-spine-group remote-as 4200000010
neighbor super-spine-group description To super spines
neighbor super-spine-group password <password>
neighbor super-spine-group bfd
!
neighbor 10.0.61.1 peer-group evpn-super-spine-group
neighbor 10.0.61.3 peer-group evpn-super-spine-group
!
address-family ipv4 unicast
network 10.61.1.1/32
maximum-paths 8
graceful-restart
!
address-family l2vpn evpn
graceful-restart
neighbor super-spine-group encapsulation vxlan
neighbor super-spine-group next-hop-unchanged
neighbor super-spine-group activate
!
51
Use Cases
In this section we illustrate the use cases by using sections of the validated design network topology as appropriate. This will help the reader to
further understand the deployment scenarios.
52
Simple 3-stage BGP VxLAN Based EVPN Fabric Illustration
This case is included for the purpose of illustrating a complete fabric by putting together all the building blocks. As shown in the diagram below, it
includes 2 spines and 3 workload racks. To illustrate various platforms that can be used as Leaf, we have chosen a MCT pair based of SLX 9140 and a
vLAG pair based of VDX-6740 switches. We also included a stand-alone SLX-9140 leaf.
Please note that the number of Spines depend on the oversubscription ratio desired in the fabric.
SLX-9240
Eth 0/1-5 Spines
Eth 0/1-5
53
Configuration
Configuration on the SLX-9240 Spines
Note: Spines require only the underlay configuration.
UNDERLAY CONFIGURATION
54
retain route-target all retain route-target all
neighbor leaf-group encapsulation vxlan neighbor leaf-group encapsulation vxlan
neighbor leaf-group next-hop-unchanged neighbor leaf-group next-hop-unchanged
neighbor leaf-group activate neighbor leaf-group activate
UNDERLAY CONFIGURATION
SLX: MCT Leaf peer1 SLX: MCT Leaf peer2 SLX: non-MCT Leaf
! BGP configuration for underlay and MCT ! BGP configuration for underlay and MCT
router bgp router bgp
local-as 4200000001 local-as 4200000001
capability as4-enable capability as4-enable
fast-external-fallover fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3 bfd interval 300 min-rx 300 multiplier 3
neighbor evpn-spine peer-group neighbor evpn-spine peer-group
neighbor evpn-spine remote-as 4200000000 neighbor evpn-spine remote-as 4200000000
neighbor evpn-spine description To spine neighbor evpn-spine description To spine
neighbor evpn-spine password password neighbor evpn-spine password password
neighbor evpn-spine bfd neighbor evpn-spine bfd
neighbor 10.0.1.1 peer-group evpn-spine neighbor 10.0.2.1 peer-group evpn-spine
neighbor 10.0.1.3 peer-group evpn-spine neighbor 10.0.2.3 peer-group evpn-spine
neighbor 10.0.1.9 remote-as 4200000001 neighbor 10.0.1.8 remote-as 4200000001
neighbor 10.0.1.9 bfd neighbor 10.0.1.8 bfd
address-family ipv4 unicast address-family ipv4 unicast
network 10.1.1.1/32 network 10.1.1.1/32
maximum-paths 8 maximum-paths 8
no neighbor 10.0.1.9 activate no neighbor 10.0.1.8 activate
address-family l2vpn evpn address-family l2vpn evpn
neighbor 10.0.1.9 encapsulation nsh neighbor 10.0.1.8 encapsulation nsh
neighbor 10.0.1.9 activate neighbor 10.0.1.8 activate
OVERLAY CONFIGURATION
SLX: MCT Leaf peer1 SLX: MCT Leaf peer2 SLX: non-MCT Leaf
! LAG towards Host ! LAG towards Host ! Ethernet Interface towards Host
interface Ethernet 0/1 interface Ethernet 0/1 interface Ethernet 0/1
description MCT lag member to Server Racks description MCT lag member to Server Racks switchport
channel-group 101 mode active type standard channel-group 101 mode active type standard switchport mode trunk-no-default-native
no shutdown no shutdown switchport trunk allowed vlan add 101,331
interface Port-channel 101 interface Port-channel 101 no shutdown
description MCT LAG description MCT LAG
speed 10000 speed 10000 interface Ve 3001
switchport switchport vrf forwarding vrf101
switchport mode trunk-no-default-native switchport mode trunk-no-default-native ipv6 address use-link-local-only
switchport trunk allowed vlan add 101,131 switchport trunk allowed vlan add 101,131 no shutdown
no shutdown no shutdown interface Ve 101
cluster pod1-cluster 1 cluster pod1-cluster 1 vrf forwarding vrf101
client MCT 101 client MCT 101 ip anycast-address 10.0.101.254/24
client-interface Port-channel 101 client-interface Port-channel 101 ip arp learn-any
deploy deploy ipv6 anycast-address fdf8:10:0:65::254/96
no shutdown
interface Ve 3001 interface Ve 3001 interface Ve 331
vrf forwarding vrf101 vrf forwarding vrf101 vrf forwarding vrf101
ipv6 address use-link-local-only ipv6 address use-link-local-only ip anycast-address 10.1.75.254/24
no shutdown no shutdown ip arp learn-any
interface Ve 101 interface Ve 101 ipv6 anycast-address fdf8:10:0:14b::254/96
vrf forwarding vrf101 vrf forwarding vrf101 no shutdown
ip anycast-address 10.0.101.254/24 ip anycast-address 10.0.101.254/24 !
ip arp learn-any ip arp learn-any router bgp
ipv6 anycast-address fdf8:10:0:65::254/96 ipv6 anycast-address fdf8:10:0:65::254/96 address-family ipv4 unicast vrf vrf101
no shutdown no shutdown redistribute connected
interface Ve 131 interface Ve 131 maximum-paths 8
vrf forwarding vrf101 vrf forwarding vrf101 address-family ipv6 unicast vrf vrf101
ip anycast-address 10.0.131.254/24 ip anycast-address 10.0.131.254/24 redistribute connected
ip arp learn-any ip arp learn-any maximum-paths 8
ipv6 anycast-address fdf8:10:0:83::254/96 ipv6 anycast-address fdf8:10:0:83::254/96 address-family l2vpn evpn
no shutdown no shutdown neighbor evpn-spine encapsulation vxlan
! ! neighbor evpn-spine activate
router bgp router bgp neighbor evpn-spine next-hop-unchanged
address-family ipv4 unicast vrf vrf101 address-family ipv4 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8 !!! User must clear BGP sessions from exec-prompt
address-family ipv6 unicast vrf vrf101 address-family ipv6 unicast vrf vrf101 !! “clear ip bgp neighbor all”
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
address-family l2vpn evpn address-family l2vpn evpn
neighbor evpn-spine encapsulation vxlan neighbor evpn-spine encapsulation vxlan
neighbor evpn-spine activate neighbor evpn-spine activate
neighbor evpn-spine enable-peer-as-check neighbor evpn-spine enable-peer-as-check
!!! User must clear BGP sessions from exec-prompt !!! User must clear BGP sessions from exec-prompt
!! “clear ip bgp neighbor all” !! “clear ip bgp neighbor all”
57
Configuration on the VDX-6740 Leaf vLAG pair
UNDERLAY CONFIGURATION
!! Configuring vcs-id and rbridge-id is not a requirement if both vLAG peers !! Configuring vcs-id and rbridge-id is not a requirement if both vLAG peers
!! are pre-configured to be in the same vcs fabric. !! are pre-configured to be in the same vcs fabric.
!! Box would require reboot after this command. !! Box would require reboot after this command.
vcs vcsid 1 rbridge-id 1 logical-chassis enable vcs vcsid 1 rbridge-id 2 logical-chassis enable
!! After VCS fabric is up, its expected that the configuration are performed from !! After VCS fabric is up, its expected that the configuration are
the primary node. !! performed from the primary node.
!! ISL trunks between vLAG switches !! ISL trunks between vLAG switches
interface FortyGigabitEthernet 1/0/51 interface FortyGigabitEthernet 2/0/51
description vLAG ISL description vLAG ISL
fabric isl enable fabric isl enable
fabric trunk enable fabric trunk enable
no shutdown no shutdown
interface FortyGigabitEthernet 1/0/52 interface FortyGigabitEthernet 2/0/52
description vLAG ISL description vLAG ISL
fabric isl enable fabric isl enable
fabric trunk enable fabric trunk enable
no shutdown no shutdown
58
!! BGP underlay config !! BGP underlay config
rbridge-id 1 rbridge-id 2
router bgp router bgp
local-as 4200000006 local-as 4200000006
capability as4-enable capability as4-enable
fast-external-fallover fast-external-fallover
neighbor evpn-spine peer-group neighbor evpn-spine peer-group
neighbor evpn-spine remote-as 4200000000 neighbor evpn-spine remote-as 4200000000
neighbor evpn-spine password password neighbor evpn-spine password password
neighbor evpn-spine bfd neighbor evpn-spine bfd
neighbor 10.0.6.1 peer-group evpn-spine neighbor 10.0.7.1 peer-group evpn-spine
neighbor 10.0.6.3 peer-group evpn-spine neighbor 10.0.7.3 peer-group evpn-spine
address-family ipv4 unicast address-family ipv4 unicast
maximum-paths 8 maximum-paths 8
network 10.1.6.1/32 network 10.1.6.1/32
OVERLAY CONFIGURATION
rbridge-id 1 rbridge-id 2
ip anycast-gateway-mac 0201.0101.0101 ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202 ipv6 anycast-gateway-mac 0201.0102.0202
rbridge-id 1 rbridge-id 2
vrf vrf101 vrf vrf101
rd 10.1.6.11:1 rd 10.1.6.12:1
vni 7097 vni 7097
address-family ipv4 unicast address-family ipv4 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
address-family ipv6 unicast address-family ipv6 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
evpn-instance pod1-vdx evpn-instance pod1-vdx
route-target both auto ignore-as route-target both auto ignore-as
rd auto rd auto
vni add 101,431 vni add 101,431
rbridge-id 1 rbridge-id 2
interface Ve 7097 interface Ve 7097
vrf forwarding vrf101 vrf forwarding vrf101
ipv6 address use-link-local-only ipv6 address use-link-local-only
no shutdown no shutdown
interface Ve 101 interface Ve 101
vrf forwarding vrf101 vrf forwarding vrf101
ip anycast-address 10.0.101.254/24 ip anycast-address 10.0.101.254/24
ip arp learn-any ip arp learn-any
ip arp-aging-timeout 25 ip arp-aging-timeout 25
ipv6 anycast-address fdf8:10:0:65::254/96 ipv6 anycast-address fdf8:10:0:65::254/96
no shutdown no shutdown
interface Ve 431 interface Ve 431
vrf forwarding vrf101 vrf forwarding vrf101
ipv6 anycast-address fdf8:10:0:1af::254/96 ipv6 anycast-address fdf8:10:0:1af::254/96
ip anycast-address 10.1.175.254/24 ip anycast-address 10.1.175.254/24
ip arp learn-any ip arp learn-any
ip arp-aging-timeout 25 ip arp-aging-timeout 25
no shutdown no shutdown
rbridge-id 1 rbridge-id 2
router bgp router bgp
address-family ipv4 unicast vrf vrf101 address-family ipv4 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
address-family ipv6 unicast vrf vrf101 address-family ipv6 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
address-family l2vpn evpn address-family l2vpn evpn
neighbor evpn-spine activate neighbor evpn-spine activate
neighbor evpn-spine allowas-in 1 neighbor evpn-spine allowas-in 1
neighbor evpn-spine enable-peer-as-check neighbor evpn-spine enable-peer-as-check
!!! User must clear BGP sessions from exec-prompt !!! User must clear BGP sessions from exec-prompt
!! “clear ip bgp neighbor all” !! “clear ip bgp neighbor all”
60
Verification
Verification after Underlay & Spine configuration
SLX-Leaf1-1# show ip bgp summary
BGP4 Summary
Router ID: 10.1.1.11 Local AS Number: 4200000001
Confederation Identifier: not configured
Confederation Peers:
Maximum Number of IP ECMP Paths Supported for Load Sharing: 8
Number of Neighbors Configured: 2, UP: 2
Number of Routes Installed: 3, Uses 324 bytes
Number of Routes Advertising to All Neighbors: 3 (2 entries), Uses 104 bytes
Number of Attribute Entries Installed: 3, Uses 342 bytes BGP session to spines should be in
Neighbor Address AS# State Time Rt:Accepted Filtered Sent ToSend established state
10.0.1.1 4200000000 ESTAB 0h10m44s 1 0 1 0
10.0.1.3 4200000000 ESTAB 0h10m44s 1 0 2 0
SLX-Leaf1-1#
SLX-Leaf1-1# show bgp evpn summary
BGP4 Summary
Router ID: 10.1.1.11 Local AS Number: 4200000001
Confederation Identifier: not configured
Confederation Peers:
Maximum Number of IP ECMP Paths Supported for Load Sharing: 1
Number of Neighbors Configured: 1, UP: 1
Number of Routes Installed: 0
Number of Routes Advertising to All Neighbors: 0 (0 entries)
Number of Attribute Entries Installed: 0
Neighbor Address AS# State Time Rt:Accepted Filtered Sent ToSend Cluster client BGP session should
10.0.1.9 4200000001 ESTAB 0h10m 9s 0 0 0 0 be UP
SLX-Leaf1-1#
SLX-Leaf1-1# show vlan 4090
VLAN Name State Ports Classification
(R)-RSPAN (u)-Untagged
(t)-Tagged
================ =============== ========================== =============== ====================
4090 VLAN4090 ACTIVE Po 1(t)
61
SLX-Leaf3# show ip route bgp
IP Routing Table for VRF "default-vrf"
Total number of IP routes: 7
'*' denotes best ucast next-hop BGP route o/p for remote vTEP ip
'[x/y]' denotes [preference/metric]
addresses from Leaf3 & Leaf4
10.1.1.1/32
*via 10.0.3.1, Eth 0/49, [20/0], 30m31s, eBgp, tag 0
*via 10.0.3.3, Eth 0/50, [20/0], 30m31s, eBgp, tag 0
10.1.6.1/32
*via 10.0.3.1, Eth 0/49, [20/0], 1h11m, eBgp, tag 0
*via 10.0.3.3, Eth 0/50, [20/0], 1h11m, eBgp, tag 0
SLX-Leaf3#
SLX-Leaf1-1#
SLX-Leaf1-1#
SLX-Leaf1-1# show bgp evpn summ
BGP4 Summary Verify overlay-gateway is UP on all leafs.
Router ID: 10.1.1.11 Local AS Number: 4200000001
Ensure BGP neighbors are establieshed
Confederation Identifier: not configured
Confederation Peers:
Maximum Number of IP ECMP Paths Supported for Load Sharing: 1
Number of Neighbors Configured: 3, UP: 3
Number of Routes Installed: 125, Uses 13500 bytes
If things are in order VxLAN tunnels to
Number of Routes Advertising to All Neighbors: 81 (57 entries), Uses 2964 bytes neighbors should be up
Number of Attribute Entries Installed: 112, Uses 12768 bytes Execute these on all Leafs.
Neighbor Address AS# State Time Rt:Accepted Filtered Sent ToSend
10.0.1.1 4200000000 ESTAB 4h44m43s 31 24 24 0
10.0.1.3 4200000000 ESTAB 4h44m43s 31 24 24 0
10.0.1.9 4200000001 ESTAB 6h 4m27s 30 0 33 0
SLX-Leaf1-1#
SLX-Leaf1-1# show tunnel br
Tunnel 61441, mode VXLAN, node-ids 1-2
Admin state up, Oper state up
Source IP 10.1.1.1, Vrf default-vrf
Destination IP 10.1.3.1
62
SLX-Leaf1-1# show cluster
Cluster pod1-cluster 1
================
Cluster State: Deployed Cluster should be UP on MCT leaf.
Client Isolation Mode: Loose
DF Hold Time: 3 Cluster clients shoud be UP
Configured Member Vlan Range: 101,131
Active Member Vlan Range: 101,131
Cluster Control Vl
an: 4090
Configured Member BD Range: 3001
Active Member BD Range: 3001
No. of Peers: 1
No. of Clients: 3
Peer Info:
==========
Peer IP: 10.0.1.9, State: Up
Peer Interface: Port-channel 1
ICL Tunnel Type: NSH, State: Up
Client Info:
============
Name Id ESI Interface Local/Remote State
---- -- --- --------- ------------------
MCT 101 0:0:0:0:0:0:0:1:0:65 Port-channel 101 Up / Up
tu61441 2064 0:0:a:1:3:1:a:1:1:1 Tunnel-61441 Up / Up
tu61442 2065 0:0:a:1:6:1:a:1:1:1 Tunnel-61442 Up / Up
SLX-Leaf1-1#
SLX-Leaf1-1# show vlan brief
Total Number of VLANs configured : 4
VLAN Name State Ports Classification
(R)-RSPAN (u)-Untagged
(t)-Tagged
================ =============== ========================== =============== ====================
1 default INACTIVE(no member port)
Vlan 101 is an extended L2 tenant,
101 VLAN0101 ACTIVE Po 1(t) hence it shows the VTEP end
Po 101(t)
Tu 61441(t) vni 101 points where the vlan is extended
Tu 61442(t) vni 101
SLX-Leaf1-1#
63
SLX-Leaf1-1# show ip arp suppression-cache
Flags: L - Locally Learnt Adjacency
R - Remote Learnt Adjacency
RS - Remote Static Adjacency
Vlan/Bd IP Mac Interface Age Flags
---------------------------------------------------------------------------------------------------
0101 (V) 10.0.101.3 50eb.1a95.6bf7 Po 101 00:23:04 L
0101 (V) 10.0.101.101 0011.9400.044d Po 101 00:00:41 L
0101 (V) 10.0.101.102 0011.9400.044e Po 101 00:00:39 L
0101 (V) 10.0.101.201 0010.9400.1b96 Tu 61441 (10.1.3.1) Never R
0101 (V) 10.0.101.202 0010.9400.1b97 Tu 61441 (10.1.3.1) Never R
0101 (V) 10.0.101.211 0010.9400.2046 Tu 61442 (10.1.6.1) Never R
0101 (V) 10.0.101.212 0010.9400.2047 Tu 61442 (10.1.6.1) Never R
0131 (V) 10.0.131.101 0011.9400.04e3 Po 101 00:00:41 L
0131 (V) 10.0.131.102 0011.9400.04e4 Po 101 00:00:39 L
SLX-Leaf1-1#
SLX-Leaf1-1# show ipv6 nd suppression-cache
Flags: L - Locally Learnt Adjacency
R - Remote Learnt Adjacency
RS - Remote Static Adjacency
Vlan/Bd IP Mac Interface Age
Flags
-------------------------------------------------------------------------------------------------------------------------
0101 (V) fdf8:10:0:65::101 0011.9400.044d Po 101 00:15:23 L
0101 (V) fdf8:10:0:65::102 0011.9400.044e Po 101 00:15:21 L
0101 (V) fdf8:10:0:65::201 0010.9400.1b96 Tu 61441 (10.1.3.1) Never R
0101 (V) fdf8:10:0:65::202 0010.9400.1b97 Tu 61441 (10.1.3.1) Never R
0101 (V) fdf8:10:0:65::211 0010.9400.2046 Tu 61442 (10.1.6.1) Never R
0101 (V) fdf8:10:0:65::212 0010.9400.2047 Tu 61442 (10.1.6.1) Never R
...
SLX-Leaf1-1#
Host IPv6 ND entries learnt locally Host ARP entries learnt locally and
and over BGP from remote VTEPs over BGP from remote VTEPs are
are shown here. shown here.
64
L2 and L3 Extension between Racks
Figure 18 shows a section of the topology to illustrate the following with configuration and verification. Two racks are shown in the diagram.
• Rack1 has a redundant MCT ToR, leaf1-1 and leaf1-2, referred to as leaf1 collectively.
• The tenant has two server VLANs 101 and 131 mapped to VNIs 101 and 131 respectively.
• Tenant VRF is provisioned with a L3 VNI for routing between VLANs. This is also auto mapped based on the Bridge-domain identifier. (For BD,
VNI = 4096 + BD-Number)
• Server VLAN 101 is extended (L2 extension) between these two racks. VLAN/VNI 101 is provisioned on both racks, and there are hosts on
these racks.
• Server VLAN 131 is a VLAN provisioned on Rack1 only, but it belongs to the same tenant. Routing between VNI 101 and 131 is required within
this tenant both in the same rack and across the racks (L3 extension).
• This example also illustrates the symmetric and asymmetric routing operation.
The configuration on leafs is identical on each leaf except for the VTEP IP, router ID, and RD configurations. The vLAG pair is represented with one
VTEP IP address. The use of anycast gateway addresses for the server-facing VLAN interfaces simplifies the configuration drastically.
65
Figure 18 L2/L3 Extension between Racks
Spine Layer
Bridge-Domain 3001
L3 VNI 7097
Po 104 Po 101
66
Configuration
Check the MCT configuration on Leaf1
In SLX-OS the MCT cluster configuration are done independently while the logical vTEP configuration is done only on the principal node of the cluster.
For the MCT pair, Leaf1-1 is the primary node for cluster management. The cluster configuration are done on both Leaf1-1 & Leaf1-2 but for cluster services like
Logical vTEP the EVPN overlay-gateway configuration is done on the principal node.
Peer Info:
==========
Peer IP: 10.0.1.9, State: Up
Peer Interface: Port-channel 1
ICL Tunnel Type: NSH, State: Up
Client Info:
============
Name Id ESI Interface Local/Remote State
---- -- --- --------- ------------------
MCT 1 0:0:0:0:0:0:0:1:0:1 Port-channel 101 Up / Up
MCT 4 0:0:0:0:0:0:0:1:0:4 Port-channel 104 Up / Up
The configuration is pretty much the same except for the router ID and RD of the tenant VRF. This makes it easier to automate the provisioning on
various nodes.
Tenant and L2 Extension Between Racks in a 3-Stage Clos Fabric
!MCT pair
vlan 101
description VLAN 101, VNI 101, Tenant vrf101
router-interface Ve 101
!
vlan 131
description VLAN 131, VNI 131, Tenant vrf101
router-interface Ve 131
!
bridge-domain 3001
description VLAN 3001, L3 VNI 7097, Tenant vrf101
router-interface Ve 3001
!
interface Port-channel 101
switchport trunk allowed vlan add 101
!
interface Port-channel 104
switchport trunk allowed vlan add 131
!
68
interface Loopback 1 interface Loopback 1
ip address 10.1.1.11/32 ip address 10.1.1.12/32
no shutdown no shutdown
! !
interface Loopback 2 interface Loopback 2
ip address 10.1.1.1/32 ip address 10.1.1.1/32
no shutdown no shutdown
! !
ip anycast-gateway-mac 0201.0101.0101 ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202 ipv6 anycast-gateway-mac 0201.0102.0202
ip router-id 10.1.1.11 ip router-id 10.1.1.12
! !
vrf vrf101 vrf vrf101
rd 10.1.1.11:101 rd 10.1.1.12:101
evpn irb ve 3001 evpn irb ve 3001
address-family ipv4 unicast address-family ipv4 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
! !
address-family ipv6 unicast address-family ipv6 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
! !
evpn default evpn default
route-target both auto ignore-as route-target both auto ignore-as
rd auto rd auto
duplicate-mac-timer 5 max-count 3 duplicate-mac-timer 5 max-count 3
bridge-domain add 3001 bridge-domain add 3001
vlan add 101,131 vlan add 101,131
! !
router bgp router bgp
address-family ipv4 unicast vrf vrf101 address-family ipv4 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
! !
address-family ipv6 unicast vrf vrf101 address-family ipv6 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
! !
! !
interface Ve 101 interface Ve 101
vrf forwarding vrf101 vrf forwarding vrf101
ip anycast-address 10.0.101.254/24 ip anycast-address 10.0.101.254/24
ip arp learn-any ip arp learn-any
ipv6 anycast-address fdf8:10:0:65::254/96 ipv6 anycast-address fdf8:10:0:65::254/96
no shutdown no shutdown
! !
interface Ve 131 interface Ve 131
vrf forwarding vrf101 vrf forwarding vrf101
ip anycast-address 10.0.131.254/24 ip anycast-address 10.0.131.254/24
ip arp learn-any ip arp learn-any
ipv6 anycast-address fdf8:10:0:83::254/96 ipv6 anycast-address fdf8:10:0:83::254/96
no shutdown no shutdown
! !
interface Ve 3001 interface Ve 3001
vrf forwarding vrf101 vrf forwarding vrf101
ipv6 address use-link-local-only ipv6 address use-link-local-only
no shutdown no shutdown
! !
Please note that overlay-gateway configuration is applied from the primary node on the two-node MCT pair.
overlay-gateway mct-leaf
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
67
Configuration on Leaf3
vlan 101
description VLAN 101, VNI 101, Tenant vrf101
router-interface Ve 101
!
bridge-domain 3001
description VLAN 3001, L3 VNI 7097, Tenant vrf101
router-interface Ve 3001
!
interface Ethernet 0/34
switchport trunk allowed vlan add 101
!
interface Loopback 1
ip address 10.1.3.2/32
no shutdown
!
interface Loopback 2
ip address 10.1.3.1/32
no shutdown
!
ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202
ip router-id 10.1.3.2
!
vrf vrf101
rd 10.1.3.2:101
evpn irb ve 3001
address-family ipv4 unicast
route-target export 101:101 evpn
route-target import 101:101 evpn
!
address-family ipv6 unicast
route-target export 101:101 evpn
route-target import 101:101 evpn
!
evpn default
route-target both auto ignore-as
rd auto
duplicate-mac-timer 5 max-count 3
bridge-domain add 3001
vlan add 101
!
router bgp
address-family ipv4 unicast vrf vrf101
redistribute connected
maximum-paths 8
!
address-family ipv6 unicast vrf vrf101
redistribute connected
maximum-paths 8
!
!
interface Ve 101
vrf forwarding vrf101
ip anycast-address 10.0.101.254/24
ipv6 anycast-address fdf8:10:0:65::254/96
no shutdown
!
interface Ve 3001
vrf forwarding vrf101
ipv6 address use-link-local-only
no shutdown
!
overlay-gateway leaf3
type layer2-extension
ip interface Loopback 1
map vni auto
activate
!
68
Verification
Verify VLAN Extension between the Racks
Check the L2 extended VLAN on each node. This should show the local L2 trunk ports and also the tunnels to all remote VTEPs where the same VLAN
segment is extended.
In the following output from the Leaf1 MCT peer, there are five tunnels for VLAN 101, which indicates that the same VLAN/VNI segment is
provisioned on five other VTEPs or ToRs. Note that one of the tunnels, Tu 61446, is destined to Leaf3. Also note that there are four underlay next
hops to reach this tunnel destination in the fabric.
POD1-Leaf1-1#
69
In the following output shown from Leaf3, Tunnel 61441 is destined to the vLAG Leaf1 pair's VTEP IP: 10.1.1.1.
POD1-Leaf3#
70
POD1-Leaf1-2# show ip int ve 101
Ve 101 is up protocol is up
Vlan is 101
Hardware is Virtual Ethernet, address is 609c.9fb0.f801
Current address is 609c.9fb0.f801
Interface index (ifindex) is 1207959653
Primary Internet Address is 10.0.101.254/24 broadcast is 10.0.101.255
IP MTU is 9000
...
Vrf : vrf101
POD1-Leaf1-2# show ip int ve 131
Ve 131 is up protocol is up
Vlan is 131
Hardware is Virtual Ethernet, address is 609c.9fb0.f801
Current address is 609c.9fb0.f801
Interface index (ifindex) is 1207959683
Primary Internet Address is 10.0.131.254/24 broadcast is 10.0.131.255
IP MTU is 9000
...
Vrf : vrf101
POD1-Leaf1-2# show ip int ve 3001
Ve 3001 is up protocol is up
Bridge domain is 3001
Hardware is Virtual Ethernet, address is 609c.9fb0.f801
Current address is 609c.9fb0.f801
Interface index (ifindex) is 1207961553
IP unassigned
IP MTU is 9000
...
Vrf : vrf101
71
Local Host Entries on Each Leaf
Depending on the port-channel hashing on server-facing links, the ARP entries may be learned on any of the nodes in the MCT pair. Make sure that all
host entries are learned collectively in the MCT pair.
72
Remote Host Entries in the Extended VLAN
The following table from Leaf3 shows the BGP and ARP entries of the remote hosts behind the Leaf1 pair. Note that the next hop is set to 10.1.1.1,
which is a common VTEP IP of the vLAG pair. This causes the redundant leaf to appear as one VTEP in the underlay network, and load balancing is
accomplished.
In the ARP suppression-cache, both the local and remote entries are indicated with different flags. “R” for remote entries signify that they were learned
over BGP EVPN and the local entries are marked as "L”.
POD1-Leaf3# show bgp evpn routes type arp 10.0.101.101 mac 0011.9400.044d ethernet-tag 0
Status A:AGGREGATE B:BEST b:NOT-INSTALLED-BEST C:CONFED_EBGP D:DAMPED
E:EBGP H:HISTORY I:IBGP L:LOCAL M:MULTIPATH m:NOT-INSTALLED-MULTIPATH
S:SUPPRESSED F:FILTERED s:STALE
1 Prefix: ARP:[0][0011.9400.044d]:[IPv4:10.0.101.101], Status: BE, Age: 3h50m24s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.1 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 59905:1073741925 RT 42000:1 RT 101:101
ExtCom:06:03:60:9c:9f:b0:f5:01 RT 59905:268435557 RT 59905:101 ExtCom:03:0c:00:00:00:00:00:08
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Adj_RIB_out count: 3, Admin distance 20
L2 Label: 101 L3 Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
ESI : 00.000000000000000000
RD: 10.1.1.11:32869
2 Prefix: ARP:[0][0011.9400.044d]:[IPv4:10.0.101.101], Status: E, Age: 3h50m24s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.3 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 59905:1073741925 RT 42000:1 RT 101:101
ExtCom:06:03:60:9c:9f:b0:f5:01 RT 59905:268435557 RT 59905:101 ExtCom:03:0c:00:00:00:00:00:08
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
L2 Label: 101 L3 Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
ESI : 00.000000000000000000
RD: 10.1.1.11:32869
POD1-Leaf3# show ip arp suppression-cache vlan 101
Flags: L - Locally Learnt Adjacency
R - Remote Learnt Adjacency
RS - Remote Static Adjacency
Vlan/Bd IP Mac Interface Age Flags
---------------------------------------------------------------------------------------------------
0101 (V) 10.0.101.103 0011.9400.044f Tu 61441 (10.1.1.1) Never R
0101 (V) 10.0.101.104 0011.9400.0450 Tu 61441 (10.1.1.1) Never R
0101 (V) 10.0.101.105 0011.9400.0451 Tu 61441 (10.1.1.1) Never R
0101 (V) 10.0.101.106 0011.9400.0452 Tu 61441 (10.1.1.1) Never R
0101 (V) 10.0.101.107 0011.9400.0453 Tu 61441 (10.1.1.1) Never R
0101 (V) 10.0.101.201 0010.9400.1b96 Eth 0/34 00:21:38 L
0101 (V) 10.0.101.202 0010.9400.1b97 Eth 0/34 00:21:38 L
As shown in Figure 18, VNI segment 131 is provisioned only on the MCT ToR but is part of the tenant on both ToRs. Let's go over a list of verification
steps required to ensure that communication between the hosts in VNI 101 on Leaf3 and hosts in VNI 131 on MCT Leaf1.
73
RMAC of Each Node
There is one RMAC assigned to every VTEP. This information can be obtained by looking at any of the L3 interfaces or the L3 VNI's associated VLAN
interface. For the MCT pair, even though they have same VTEP IP, they are assigned a unique router MAC.
74
L3 VNI State on the Nodes
Bridge-domain 3001 (VNI 7097) is assigned to the tenant VRF. Make sure that the MCT ToR and Leaf3 have tunnels established to each other and that
this bridge-domain is activated on it.
As seen in the following table for the output from Leaf1, the tunnel source is the VTEP IP of the vLAG, 10.1.1.1, and the destination IP is the VTEP IP of
Leaf3, 10.1.3.1. (Notice additional tunnels in the list; these are destined to other VTEPs where the same tenant is provisioned.)
POD1-Leaf1-1#
POD1-Leaf1-1# show tunnel 61446
Tunnel 61446, mode VXLAN, node-ids 1-2
Ifindex 0x7c00f006, Admin state up, Oper state up
Overlay gateway "mct-leaf", ID 1
Source IP 10.1.1.1, Vrf default-vrf
Destination IP 10.1.3.1
Configuration source BGP-EVPN
MAC learning BGP-EVPN
Active next hops on node 1:
IP: 10.0.1.5, Vrf: default-vrf
Egress L3 port: Eth 0/53, Outer SMAC: 609c.9fb0.f539
Outer DMAC: 609c.9fb0.ac05
Egress L2 Port: Eth 0/53, Outer ctag: 0, stag:0, Egress mode: Local
POD1-Leaf1-1#
75
L3 Bridge-domain/VNI state from Leaf3
The following table shows the BGP entry on Leaf3 for the remote subnet of vlan/VNI 131.
There will be 8 entries in the BGP table: 2 originators in the MCT pair and those two entries are learned from four spines exchanging EVPN routes.
Again, the next hop is the same for all paths due to the common VTEP IP used by the MCT pair.
76
POD1-Leaf3# show bgp evpn routes type ipv4-prefix 10.0.131.0/24 tag 0
Status A:AGGREGATE B:BEST b:NOT-INSTALLED-BEST C:CONFED_EBGP D:DAMPED
E:EBGP H:HISTORY I:IBGP L:LOCAL M:MULTIPATH m:NOT-INSTALLED-MULTIPATH
S:SUPPRESSED F:FILTERED s:STALE
1 Prefix: IP4Prefix:[0][10.0.131.0/24], Status: BE, Age: 3d1h49m53s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.1 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 42000:1 RT 101:101 ExtCom:06:03:60:9c:9f:b0:f5:01 ExtCom:03:0d:00:00:00:00:00:00 RT
59905:1073748921 ExtCom:03:0c:00:00:00:00:00:08
Default Extd Gw Community: Received
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Adj_RIB_out count: 3, Admin distance 20
Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
RD: 10.1.1.11:1
2 Prefix: IP4Prefix:[0][10.0.131.0/24], Status: E, Age: 3d1h49m53s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.3 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 42000:1 RT 101:101 ExtCom:06:03:60:9c:9f:b0:f5:01 ExtCom:03:0d:00:00:00:00:00:00 RT
59905:1073748921 ExtCom:03:0c:00:00:00:00:00:08
Default Extd Gw Community: Received
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
RD: 10.1.1.11:1
3 Prefix: IP4Prefix:[0][10.0.131.0/24], Status: E, Age: 3d1h49m53s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.5 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 42000:1 RT 101:101 ExtCom:06:03:60:9c:9f:b0:f5:01 ExtCom:03:0d:00:00:00:00:00:00 RT
59905:1073748921 ExtCom:03:0c:00:00:00:00:00:08
Default Extd Gw Community: Received
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
RD: 10.1.1.11:1
4 Prefix: IP4Prefix:[0][10.0.131.0/24], Status: BE, Age: 2d21h2m32s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.1 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 42000:1 RT 101:101 ExtCom:06:03:60:9c:9f:b0:d8:01 ExtCom:03:0d:00:00:00:00:00:00 RT
59905:1073748921 ExtCom:03:0c:00:00:00:00:00:08
Default Extd Gw Community: Received
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Adj_RIB_out count: 3, Admin distance 20
Label: 7097 (VNI) Router Mac : 609c.9fb0.d801
RD: 10.1.1.12:1
5 Prefix: IP4Prefix:[0][10.0.131.0/24], Status: E, Age: 2d21h2m32s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.5 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 42000:1 RT 101:101 ExtCom:06:03:60:9c:9f:b0:d8:01 ExtCom:03:0d:00:00:00:00:00:00 RT
59905:1073748921 ExtCom:03:0c:00:00:00:00:00:08
Default Extd Gw Community: Received
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Label: 7097 (VNI) Router Mac : 609c.9fb0.d801
RD: 10.1.1.12:1
6 Prefix: IP4Prefix:[0][10.0.131.0/24], Status: E, Age: 2d21h2m32s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.3 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 42000:1 RT 101:101 ExtCom:06:03:60:9c:9f:b0:d8:01 ExtCom:03:0d:00:00:00:00:00:00 RT
59905:1073748921 ExtCom:03:0c:00:00:00:00:00:08
Default Extd Gw Community: Received
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Label: 7097 (VNI) Router Mac : 609c.9fb0.d801
RD: 10.1.1.12:1
...
10.0.131.0/24
*via 10.1.1.1%default-vrf, Ve 3001, [20/0], 2d21h, eBgp, tag 0, (VNI 7097, GW MAC 609c.9fb0.d801, Tu 61441)
*via 10.1.1.1%default-vrf, Ve 3001, [20/0], 2d21h, eBgp, tag 0, (VNI 7097, GW MAC 609c.9fb0.f501, Tu 61441)
POD1-Leaf3#
77
VLAN Scoping at the ToR Level
VLAN scoping is briefly discussed in the Technology Overview section.
Refer to Figure 19 for the topology used to illustrate the VLAN scoping at the leaf or ToR level. For the purpose of illustration, we’ve chosen a MCT pair
and an individual leaf.
As seen in the diagram, each leaf has a server VLAN that requires a Layer 2 extension to the other rack. Also note that the VLAN numbers are different
on each Rack. By mapping these VLANs to the same VNI number—5257 in this case—we achieve bridging or L2 extension between them. The servers
now have L2 adjacency between them. In other words, they are in the same bridge domain or broadcast domain. In essence, the VLAN tag on the wire
between the servers and the leaf is decoupled from the bridge domain. This VLAN tag need not be identical on both sides to have Layer 2 adjacency or
extension. In other words, the VLAN number is relevant only at the ToR level.
VLAN scoping uses bridge-domain on the Leafs. This can be implemented using traditional VLANs also. However, the VLAN on each leaf must be
mapped manually to a common VNI. Automatic mapping of VLAN to VNI cannot be used. For example:
Bridge-domain simplifies the configuration for VLAN scoping case using auto VLAN to VNI mapping.
VNI 5257
Spines
L3 VNI 7097
78
Configuration
The configuration steps are similar to the L2 extension illustrated in the use-case “L2 and L3 Extension between Racks”. The difference is that vlan-
scoping is achieved using bridge-domain instead of vlan configuration. Difference in configuration is shown below comparing an L2 tenant
configuration based of vlan and of bridge-domain.
• With “map vni auto” command under “overlay-gateway” configuration, bridge-domain will get vni automatically assigned. VNI for BD
will be equal to “4096 + BD value”, so for example BD 100 will map to vni 4196 with “map vni auto”. The recommended practice is to
use this method of vni assignment.
• User could also manually map BD to a desired VNI value. This is done using BD-to-VNI mapping configuration under the overlay gateway
configuration. However every VLAN and BD must be manually mapped to VNIs.
Leaf 1 Leaf 3
Server traffic is tagged with VLAN 161. Server traffic is tagged with VLAN 361.
Create logical interface to mark VLAN 161. Create logical interface to mark VLAN 361.
Add logical interface as member of bridge-domain 1161 Add logical interface as member of bridge-domain 1161.
Add the BD 1161 under evpn which will map it to vni 5257. Add the BD 1161 under evpn which will map it to vni 5257.
Assign VE 1161 as router-interface for BD 1161. Assign VE 1161 as router-interface for BD 1161.
Create the VE 1161 Layer 3 interface for first-hop routing. Create the VE 1161 Layer 3 interface for first-hop routing.
Assign the anycast GW 10.4.137.254 address to VE 1161. Assign the anycast GW 10.4.137.254 address to VE 1161.
Complete configurations and verification steps on leafs in the Figure 21 topology are given in the sections that follow.
• Common configurations, such as port channel and VLANs, are shown in one block.
• The tenant, Layer 3 interfaces, and BGP EVPN configurations are shown in the second block.
79
interface Port-channel 107
switchport mode trunk
switchport trunk tag native-vlan
logical-interface port-channel 107.161
vlan 161
!
bridge-domain 1161
description L2 Tenant vlan
router-interface Ve 1161
logical-interface port-channel 107.161
!
bridge-domain 3001
description BD 3001, L3 VNI 7097, Tenant vrf101
router-interface Ve 3001
!
evpn default
bridge-domain add 1161, 3001
!
80
overlay-gateway mct-leaf1
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
Configuration on Leaf3
interface Ethernet 0/34
switchport mode trunk
switchport trunk tag native-vlan
logical-interface ethernet 0/34.361
vlan 361
!
bridge-domain 1161
description L2 Tenant vlan
logical-interface ethernet 0/34.361
!
bridge-domain 3001
description BD 3001, L3 VNI 7097, Tenant vrf101
router-interface Ve 3001
!
evpn default
bridge-domain add 1161, 3001
!
81
interface Loopback 1
ip address 10.1.3.2/32
no shutdown
!
interface Loopback 2
ip address 10.1.3.1/32
no shutdown
!
ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202
ip router-id 10.1.3.2
!
vrf vrf101
rd 10.1.3.2:101
evpn irb ve 3001
address-family ipv4 unicast
route-target export 101:101 evpn
route-target import 101:101 evpn
!
address-family ipv6 unicast
route-target export 101:101 evpn
route-target import 101:101 evpn
!
evpn default
route-target both auto ignore-as
rd auto
duplicate-mac-timer 5 max-count 3
!
router bgp
address-family ipv4 unicast vrf vrf101
redistribute connected
maximum-paths 8
!
address-family ipv6 unicast vrf vrf101
redistribute connected
maximum-paths 8
!
!
interface Ve 1161
vrf forwarding vrf101
ip anycast-address 10.4.137.254/24
ip arp learn-any
ipv6 anycast-address fdf8:10:0:489::254/96
no shutdown
!
interface Ve 3001
vrf forwarding vrf101
ipv6 address use-link-local-only
no shutdown
!
overlay-gateway leaf3
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
Verification
Verify VLAN Extension between the Racks
Check the L2-extended bridge-domain on each node. This should show the local L2 logical ports & local vlan marking and also the tunnels to all remote
VTEPs where the same bridge-domain/VNI is extended.
In the output below from the Leaf1 MCT pair, there are three tunnels for bridge-domain 1161, which indicates that the same Bridge-domain/VNI
segment is provisioned on three other VTEPs or ToR. Note that one of the tunnels, Tu 61445, is destined to Leaf3. Also note that there are four
underlay next hops to reach this tunnel destination in the fabric.
82
POD1-Leaf1-1# show bridge-domain 1161
Bridge-domain 1161
-------------------------------
Bridge-domain Type: MP
Description:
Number of configured end-points: 6 , Number of Active end-points: 6
VE id: 1161, if-indx: 1207960713
VLAN: 161, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up)
Tagged Ports: po107.161
Un-tagged Ports: List of local server facing ports
VNI: 5257, Tunnels: 3(3 up) and tunnels to each VTEP
Tunnels: tu61445.5257 tu61446.5257 tu61441.5257
VLAN: N/A, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up) where the VLAN is extended
Tagged Ports: po1.5769
Un-tagged Ports:
Tu 61446 is destined to leaf3’s
vtep-ip 10.1.3.1
POD1-Leaf1-1# show tunnel 61446
Tunnel 61446, mode VXLAN, node-ids 1-2
Ifindex 0x7c00f006, Admin state up, Oper state up
Overlay gateway "mct-leaf", ID 1
Source IP 10.1.1.1, Vrf default-vrf
Destination IP 10.1.3.1
Configuration source BGP-EVPN Underlay IP next-hops from
MAC learning BGP-EVPN each vLAG peer to reach the
Active next hops on node 1:
IP: 10.0.1.5, Vrf: default-vrf remote VTEP leaf3. This shows
Egress L3 port: Eth 0/53, Outer SMAC: 609c.9fb0.f539 4 paths as there are 4 spine
links
Outer DMAC: 609c.9fb0.ac05
Egress L2 Port: Eth 0/53, Outer ctag: 0, stag:0, Egress mode: Local
POD1-Leaf1-1#
In the output below from Leaf3, Tunnel 61442 is destined to the vLAG Leaf1 pair's VTEP IP 10.121.1.1.
83
POD1-Leaf3# show bridge-domain 1161
Bridge-domain 1161
-------------------------------
Bridge-domain Type: MP
Description:
Number of configured end-points: 4 , Number of Active end-points: 4 List of local member ports and
VE id: 1161, if-indx: 1207960713 tunnels to each VTEP where
VLAN: 361, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up)
Tagged Ports: eth0/34.361 the vlan is extended
Un-tagged Ports:
VNI: 5257, Tunnels: 3(3 up)
Tunnels: tu61445.5257 tu61441.5257 tu61446.5257
Tu 61441 is destined to leaf5’s
POD1-Leaf3# show tunnel 61441 vtep-ip 10.1.1.1
Tunnel 61441, mode VXLAN, node-ids 1
Ifindex 0x7c00f001, Admin state up, Oper state up
Overlay gateway "pod1-leaf3", ID 1
Source IP 10.1.3.1, Vrf default-vrf
Destination IP 10.1.1.1
Configuration source BGP-EVPN
MAC learning BGP-EVPN
Active next hops on node 1: Underlay IP next-hops from
IP: 10.0.3.5, Vrf: default-vrf
Egress L3 port: Eth 0/51, Outer SMAC: 609c.9fb1.5637 Leaf3 to reach the remote vtep
Outer DMAC: 609c.9fb0.ac07 on MCT pair leaf1-1/1-2. This
shows 4 paths as there are 4
Egress L2 Port: Eth 0/51, Outer ctag: 0, stag:0, Egress mode: Local
POD1-Leaf3#
84
VLAN Layer 3 Interfaces State on the Leaf3 ToR
Depending on the port-channel hashing on server-facing links, the ARP entries may be learned on any of the nodes in the MCT pair. Make sure that all
host entries are learned collectively in the MCT pair.
The table below from Leaf3 shows the BGP and ARP entries of a remote host behind the Leaf1 pair for bridge-domain 1161 or vni 5257. Note that the
next hop is set to 10.1.1.1, which is a common VTEP IP of the MCT pair.
In the ARP suppression-cache, both the local and remote entries are indicated with different flags. “R” for remote entries signify that they were
learned over BGP EVPN and the local entries are marked as "L”.
85
POD1-Leaf3# show bgp evpn routes type arp 10.4.137.101 mac 0011.9400.0579 ethernet-tag 0
Status A:AGGREGATE B:BEST b:NOT-INSTALLED-BEST C:CONFED_EBGP D:DAMPED
E:EBGP H:HISTORY I:IBGP L:LOCAL M:MULTIPATH m:NOT-INSTALLED-MULTIPATH
S:SUPPRESSED F:FILTERED s:STALE
1 Prefix: ARP:[0][0011.9400.0579]:[IPv4:10.4.137.101], Status: BE, Age: 0h26m19s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.1 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 59905:1073747081 RT 42000:1 RT 101:101
ExtCom:06:03:60:9c:9f:b0:f5:01 RT 59905:268440713 RT 59905:5257 ExtCom:03:0c:00:00:00:00:00:08
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Adj_RIB_out count: 3, Admin distance 20
L2 Label: 5257 L3 Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
ESI : 00.000000000000000000
RD: 10.1.1.11:38025
2 Prefix: ARP:[0][0011.9400.0579]:[IPv4:10.4.137.101], Status: E, Age: 0h26m19s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.5 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 59905:1073747081 RT 42000:1 RT 101:101
ExtCom:06:03:60:9c:9f:b0:f5:01 RT 59905:268440713 RT 59905:5257 ExtCom:03:0c:00:00:00:00:00:08
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
L2 Label: 5257 L3 Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
ESI : 00.000000000000000000
RD: 10.1.1.11:38025
...
86
VLAN Scoping at the Port Level within a ToR
Port VLAN scoping enables complete abstraction of a bridge domain where the VLAN tags on the server-side data frame on two ports can be different
and still be bridged between the ports. The VLAN tag is localized at the port level rather than at the ToR level.
On the vLAG leaf, there are two port channels or LAG bundles: po107 and po110. Each has server traffic tagged with an 802.1q VLAN tag of 162 and
262, respectively. From the port VLAN scoping perspective, these tags are referred to as c-tags. The {port,vlan} is added as a member of a virtual-fabric
VLAN. In this case, there is a fabric VLAN ID 6000. (Note that this number is above the 802.1q VLAN range of 4096.)
In summary, BD 1162 comprises two members (port, vlan). (Unlike the ports in traditional VLAN cases.)
On Leaf3, VLAN 362 is mapped to BD 1162/VNI 5258. On the Leaf1 pair, bridge-domain 1162 is mapped to VNI 5258. Thus we're providing Layer 2
extension within and between the ToRs for server-side traffic with different dot1q VLAN tags.
L2 VNI 5258
Spines
L3 VNI 7097
Leaf1-1 Leaf1-2 Leaf3
VTEP-IP 10.1.1.1 VTEP-IP 10.1.1.1 VTEP-IP 10.1.3.1
BD 1162/VNI 5258 BD 1162/VNI 5258 BD 1162/VNI 5258
Configuration
A sample configuration is given below as a quick reference for port-VLAN scoping. In this example, {po107, tag 162} and
[po110, tag 262] are mapped to BD 1162/ VNI 5258. .With this configuration, it is possible to bridge traffic on these ports with the specified dot1q
tags.
87
interface Port-channel 107
switchport
switchport mode trunk-no-default-native
logical-interface port-channel 107.162
vlan 162
!
interface Port-channel 110
switchport
switchport mode trunk-no-default-native
logical-interface port-channel 110.262
vlan 262
!
bridge-domain 1162 p2mp
router-interface Ve 1162
logical-interface port-channel 107.162
logical-interface port-channel 110.262
• Common configurations, such as port channel and bridge-domains, are shown in one block.
• The tenant, Layer 3 interfaces, and BGP EVPN configurations are shown in the second block under each RBridge ID.
88
interface Loopback 1 interface Loopback 1
ip address 10.1.1.11/32 ip address 10.1.1.12/32
no shutdown no shutdown
! !
interface Loopback 2 interface Loopback 2
ip address 10.1.1.1/32 ip address 10.1.1.1/32
no shutdown no shutdown
! !
ip anycast-gateway-mac 0201.0101.0101 ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202 ipv6 anycast-gateway-mac 0201.0102.0202
ip router-id 10.1.1.11 ip router-id 10.1.1.12
! !
vrf vrf101 vrf vrf101
rd 10.1.1.11:101 rd 10.1.1.12:101
evpn irb ve 3001 evpn irb ve 3001
address-family ipv4 unicast address-family ipv4 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
! !
address-family ipv6 unicast address-family ipv6 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
! !
evpn default evpn default
route-target both auto ignore-as route-target both auto ignore-as
rd auto rd auto
duplicate-mac-timer 5 max-count 3 duplicate-mac-timer 5 max-count 3
bridge-domain add 1162, 3001 bridge-domain add 1162,3001
! !
router bgp router bgp
address-family ipv4 unicast vrf vrf101 address-family ipv4 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
! !
address-family ipv6 unicast vrf vrf101 address-family ipv6 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
! !
! !
interface Ve 1162 interface Ve 1162
vrf forwarding vrf101 vrf forwarding vrf101
ip anycast-address 10.4.138.254/24 ip anycast-address 10.4.13.254/24
ip arp learn-any ip arp learn-any
ipv6 anycast-address fdf8:10:0:48a::254/96 ipv6 anycast-address fdf8:10:0:48a::254/96
no shutdown no shutdown
! !
interface Ve 3001 interface Ve 3001
vrf forwarding vrf101 vrf forwarding vrf101
ipv6 address use-link-local-only ipv6 address use-link-local-only
no shutdown no shutdown
! !
overlay-gateway mct-leaf1
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
89
Configuration on Leaf3
interface Ethernet 0/34
switchport mode trunk
switchport trunk tag native-vlan
logical-interface ethernet 0/34.362
vlan 362
!
bridge-domain 1162
description L2 Tenant vlan
router-interface Ve 1162
logical-interface ethernet 0/34.362
!
bridge-domain 3001
description BD 3001, L3 VNI 7097, Tenant vrf101
router-interface Ve 3001
!
evpn default
bridge-domain add 1162, 3001
!
interface Loopback 1
ip address 10.1.3.2/32
no shutdown
!
interface Loopback 2
ip address 10.1.1.1/32
no shutdown
!
ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202
ip router-id 10.1.3.2
!
vrf vrf101
rd 10.1.3.2:101
evpn irb ve 3001
address-family ipv4 unicast
route-target export 101:101 evpn
route-target import 101:101 evpn
!
address-family ipv6 unicast
route-target export 101:101 evpn
route-target import 101:101 evpn
!
evpn default
route-target both auto ignore-as
rd auto
duplicate-mac-timer 5 max-count 3
!
router bgp
address-family ipv4 unicast vrf vrf101
redistribute connected
maximum-paths 8
!
address-family ipv6 unicast vrf vrf101
redistribute connected
maximum-paths 8
!
!
interface Ve 1162
vrf forwarding vrf101
ip anycast-address 10.4.138.254/24
ip arp learn-any
ipv6 anycast-address fdf8:10:0:48a::254/96
no shutdown
!
interface Ve 3001
vrf forwarding vrf101
ipv6 address use-link-local-only
no shutdown
!
90
overlay-gateway leaf3
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
Verification
Verify VLAN Extension between the Racks
Check the L2 extended bridge-domain/VNI on each node. This should show the local L2 member ports and also the tunnels to all remote VTEPs where
the same VNI segment is extended.
In the output below from the Leaf1 MCT pair for the bridge-domain, there are few VxLAN tunnels, which indicates that the same Bridge-domain/VNI
segment is provisioned on other VTEPs or ToRs. Note that one of the tunnels, Tu 61446, is destined to Leaf3. Also note that there are four underlay
next hops to reach this tunnel destination in the fabric.
91
POD1-Leaf1-1# show bridge-domain 1162
Bridge-domain 1162
-------------------------------
Bridge-domain Type: MP
Description: L2 Tenant vlan
Number of configured end-points: 6 , Number of Active end-points: 6
VE id: 1162, if-indx: 1207960714
VLAN: 162, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up)
Tagged Ports: po107.162
Un-tagged Ports: List of local server facing ports
VLAN: 262, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up) and tunnels to each VTEP
Tagged Ports: po110.262
Un-tagged Ports: where the VLAN is extended
VNI: 5258, Tunnels: 3(3 up)
Tunnels: tu61445.5258 tu61446.5258 tu61441.5258
VLAN: N/A, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up) Tu 61446 is destined to leaf3’s
vtep-ip 10.1.3.1
Tagged Ports: po1.5770
Un-tagged Ports:
POD1-Leaf1-1#
92
In the output below from Leaf3, Tunnel 61441 is destined to the MCT Leaf1 pair's VTEP IP 10.1.1.1
POD1-Leaf3# show bridge-domain 1162
Bridge-domain 1162
-------------------------------
Bridge-domain Type: MP
Description:
Number of configured end-points: 4 , Number of Active end-points: 4 List of local member ports and
VE id: 1162, if-indx: 1207960714 tunnels to each VTEP where
VLAN: 362, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up)
Tagged Ports: eth0/34.362 the vlan is extended
Un-tagged Ports:
VNI: 5258, Tunnels: 3(3 up)
Tunnels: tu61445.5258 tu61441.5258 tu61446.5258
Tu 61441 is destined to leaf5’s
POD1-Leaf3# show tunnel 61441 vtep-ip 10.1.1.1
Tunnel 61441, mode VXLAN, node-ids 1
Ifindex 0x7c00f001, Admin state up, Oper state up
Overlay gateway "pod1-leaf3", ID 1
Source IP 10.1.3.1, Vrf default-vrf
Destination IP 10.1.1.1
Configuration source BGP-EVPN
MAC learning BGP-EVPN
Active next hops on node 1: Underlay IP next-hops from
IP: 10.0.3.5, Vrf: default-vrf
Egress L3 port: Eth 0/51, Outer SMAC: 609c.9fb1.5637 Leaf3 to reach the remote vtep
Outer DMAC: 609c.9fb0.ac07 on MCT pair leaf1-1/1-2. This
shows 4 paths as there are 4
Egress L2 Port: Eth 0/51, Outer ctag: 0, stag:0, Egress mode: Local
POD1-Leaf3#
Depending on the port-channel hashing on server-facing links, the ARP entries may be learned on any of the nodes in the MCT pair. Make sure that all
host entries are learned collectively in the MCT pair.
93
POD1-Leaf3# show arp ve 1162 vrf vrf1
Entries in VRF vrf1 : 10
Address Mac-address Interface MacResolved Age Type
--------------------------------------------------------------------------------
10.4.138.201 0010.9400.262c Ve 1162 yes 00:23:51 Dynamic
10.4.138.202 0010.9400.262d Ve 1162 yes 00:23:40 Dynamic
...
POD1-Leaf3#
The table below from Leaf3 shows the BGP and ARP entries of the remote hosts behind the Leaf1 pair. Note that the next hop is set to 10.1.1.1, which
is a common VTEP IP of the MCT pair.
In the ARP table, both the local and remote entries are indicated with different types: BGP-EVPN for remote entries, signifying that they were learned
over BGP EVPN; Dynamic for local entries. Note that the remote host entries are imported into the virtual interface of local BD 1162 on Leaf3.
POD1-Leaf3# show bgp evpn routes type arp 10.4.138.101 mac 0011.9400.0583 ethernet-tag 0
Status A:AGGREGATE B:BEST b:NOT-INSTALLED-BEST C:CONFED_EBGP D:DAMPED
E:EBGP H:HISTORY I:IBGP L:LOCAL M:MULTIPATH m:NOT-INSTALLED-MULTIPATH
S:SUPPRESSED F:FILTERED s:STALE
1 Prefix: ARP:[0][0011.9400.0583]:[IPv4:10.4.138.101], Status: BE, Age: 3d4h5m25s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.1 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 59905:1073747082 RT 42000:1 RT 101:101
ExtCom:06:03:60:9c:9f:b0:f5:01 RT 59905:268440714 RT 59905:5258 ExtCom:03:0c:00:00:00:00:00:08
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
Adj_RIB_out count: 3, Admin distance 20
L2 Label: 5258 L3 Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
ESI : 00.000000000000000000
RD: 10.1.1.11:38026
2 Prefix: ARP:[0][0011.9400.0583]:[IPv4:10.4.138.101], Status: E, Age: 3d4h5m25s
NEXT_HOP: 10.1.1.1, Learned from Peer: 10.0.3.5 (4200000000)
LOCAL_PREF: 100, MED: none, ORIGIN: incomplete, Weight: 0
AS_PATH: 4200000000 4200000001
Extended Community: RT 59905:1073747082 RT 42000:1 RT 101:101
ExtCom:06:03:60:9c:9f:b0:f5:01 RT 59905:268440714 RT 59905:5258 ExtCom:03:0c:00:00:00:00:00:08
Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
L2 Label: 5258 L3 Label: 7097 (VNI) Router Mac : 609c.9fb0.f501
ESI : 00.000000000000000000
RD: 10.1.1.11:38026
...
94
Layer-2 Handoff with VPLS
This use-case illustrates VLAN or Layer-2 extension between two EVPN fabrics over VPLS. The choice of VPLS depends on the VNI ranges used in these
two fabrics. This use case may also be extended to interconnecting an EVPN fabric to a traditional L2 network.
If both fabrics used consistent VNI range i.e. VNI X maps to same VLAN segment on both fabrics and the workloads on them are L2 adjacent, BGP EVPN
based Fabric extension would be an appropriate option.
As shown in the topology below, a pair of switches in MCT pair act as border-leaf that terminate the VxLAN encapsulated packets from the EVPN fabric
and hand off classical Ethernet packets to the MPLS edge devices. Similarly on the other direction, these border-leafs map and encapsulate the CE
packets into VNI segments towards the EVPN fabrics.
The following devices are used in this illustration and these are part of the validated design.
PIN Platform
Note that only the relevant portion of the validated design topology is shown in the diagram and configuration section will include only the relevant
and incremental configuration needed for the use-case. For building the Fabric Underlay and Overlay, please refer to the validated design section.
MPLS
Super-spines
Leaf1-1 Leaf1-2
Leaf1-1 Leaf1-2
Po 211 Po 212 Edge-leaf Pair Edge Leaf Pair
DC1 DC2
Configuration
Configuration steps:
• Border/Edge-leaf MCT pair configuration. Refer to the validated design section for more details on MCT pair configuration.
95
• Configure the VLAN and VNI mappings for all VLAN segments that must be extended outside the fabric.
• Enable the VLANs on the CE edge ports or port-channels connected to the DCI Edge nodes.
• Configure VE interface on the border-leafs for these VLAN segments to enable ARP suppression for remote hosts in the other data centers. In
other words, this border-leaf will respond for ARP requests sent from external hosts to internal hosts.
96
cluster management node-id 1 cluster management node-id 2
node-id 1 !
cluster management principal-priority 1
!
cluster Edge-Leaf 6162 cluster Edge-Leaf 6162
peer-interface Port-channel 1 peer-interface Port-channel 1
peer 10.61.62.1 peer 10.61.62.0
df-load-balance df-load-balance
deploy deploy
client DCI 1 client DCI 1
client-interface Port-channel 2 client-interface Port-channel 2
deploy deploy
! !
overlay-gateway Edge-Leaf
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
A sample MPLS and VPLS configuration is included for completeness. MPLS/VPLS configuration detail is beyond the scope of this document. There are
various ways of setting up the MPLS tunnels (LDP or traffic engineering) and VPLS domains over them.
97
Cluster, Access Circuit and BGP
interface Loopback 1 interface Loopback 1
ip address 10.55.55.1/32 ip address 10.56.56.1/32
no shutdown no shutdown
! !
ip router-id 10.55.55.1 ip router-id 10.56.56.1
ip route 10.56.56.1/32 10.55.56.1 ip route 10.55.55.1/32 10.55.56.0
! !
vlan 4090 vlan 4090
router-interface Ve 4090 router-interface Ve 4090
description MCT peering vlan description MCT peering vlan
! !
interface Ve 4090 interface Ve 4090
ip mtu 9100 ip mtu 9100
ip address 10.55.56.0/31 ip address 10.55.56.1/31
ipv6 mtu 9100 ipv6 mtu 9100
no shutdown no shutdown
! !
interface Port-channel 1 interface Port-channel 1
mtu 9216 mtu 9216
description MCT peer-link LAG description MCT peer-link LAG
switchport switchport
switchport mode trunk switchport mode trunk
switchport trunk allowed vlan add 4090 switchport trunk allowed vlan add 4090
switchport trunk tag native-vlan switchport trunk tag native-vlan
no shutdown no shutdown
! !
interface Port-channel 2 interface Port-channel 2
description MCT LAG to Edge-Leaf description MCT LAG to Edge-Leaf
switchport switchport
switchport mode trunk-no-default-native switchport mode trunk-no-default-native
no shutdown no shutdown
logical-interface port-channel 2.101 logical-interface port-channel 2.101
vlan 101 vlan 101
! !
! !
cluster DCI-cluster 5556 cluster DCI-cluster 5556
member bridge-domain add 101 member bridge-domain add 101
peer-interface Ve 4090 peer-interface Ve 4090
peer 10.56.56.1 peer 10.55.55.1
client-isolation loose client-isolation loose
deploy deploy
client Edge-Leaf 1 client Edge-Leaf 1
client-interface Port-channel 2 client-interface Port-channel 2
esi 0:0:0:0:0:0:0:1:1 esi 0:0:0:0:0:0:0:1:1
deploy deploy
! !
client-pw client-pw
esi 0:0:0:0:0:0:0:2:2 esi 0:0:0:0:0:0:0:2:2
deploy deploy
! !
! !
router bgp router bgp
local-as 100 local-as 100
neighbor 10.56.56.1 remote-as 100 neighbor 10.55.55.1 remote-as 100
neighbor 10.56.56.1 update-source loopback 1 neighbor 10.55.55.1 update-source loopback 1
neighbor 10.56.56.1 bfd neighbor 10.55.55.1 bfd
address-family ipv4 unicast address-family ipv4 unicast
no neighbor 10.56.56.1 activate no neighbor 10.55.55.1 activate
address-family l2vpn evpn address-family l2vpn evpn
neighbor 10.56.56.1 encapsulation mpls neighbor 10.55.55.1 encapsulation mpls
neighbor 10.56.56.1 activate neighbor 10.55.55.1 activate
! !
98
MPLS Link and IGP
router isis router isis
net 49.0001.0010.0055.5501.00 net 49.0001.0010.0056.5601.00
fast-flood 5 fast-flood 5
is-type level-1 is-type level-1
log adjacency log adjacency
lsp-gen-interval 5 lsp-gen-interval 5
lsp-refresh-interval 64000 lsp-refresh-interval 64000
max-lsp-lifetime 65535 max-lsp-lifetime 65535
partial-spf-interval 5000 150 300 partial-spf-interval 5000 150 300
spf-interval level-1 5 150 300 spf-interval level-1 5 150 300
address-family ipv4 unicast address-family ipv4 unicast
metric-style wide level-1 metric-style wide level-1
default-metric 10 default-metric 10
! !
interface Loopback 2 interface Loopback 2
ip router isis ip router isis
ip address 55.1.1.1/32 ip address 56.1.1.1/32
no shutdown no shutdown
! !
interface Ethernet 0/48 interface Ethernet 0/48
description BL to MPLS cloud description BL to MPLS cloud
ip router isis ip router isis
ip address 55.0.1.0/31 ip address 56.0.1.0/31
isis auth-mode md5 level-1 isis auth-mode md5 level-1
isis auth-key level-1 $9$BwrsDbB+tABWGWpINOVKoQ== isis auth-key level-1 $9$BwrsDbB+tABWGWpINOVKoQ==
isis point-to-point isis point-to-point
no shutdown no shutdown
! !
99
MPLS LSP and VPLS Pseudowire
pw-profile vpls pw-profile vpls
mtu 9100 mtu 9100
! !
router mpls router mpls
policy policy
traffic-engineering isis level-1 traffic-engineering isis level-1
ingress-tunnel-accounting ingress-tunnel-accounting
transit-session-accounting transit-session-accounting
! !
ldp ldp
lsr-id 55.1.1.1 lsr-id 56.1.1.1
! !
dynamic-bypass dynamic-bypass
enable-all-interfaces enable-all-interfaces
! !
mpls-interface ethernet 0/48 mpls-interface ethernet 0/48
! !
mpls-interface ve 4090 mpls-interface ve 4090
! !
path to-MLX-1 path to-MLX
hop 55.0.1.1 strict hop 56.0.1.1 strict
! !
lsp to-MLX-DC2 lsp to-MLX-DC2
to 66.1.1.1 to 66.1.1.1
primary-path to-MLX-1 primary-path to-MLX
adaptive adaptive
frr frr
facility-backup facility-backup
! !
enable enable
! !
bridge-domain 101 p2mp bridge-domain 101 p2mp
vc-id 101 vc-id 101
peer 66.1.1.1 load-balance peer 66.1.1.1 load-balance
statistics statistics
logical-interface port-channel 2.101 logical-interface port-channel 2.101
pw-profile vpls pw-profile vpls
bpdu-drop-enable bpdu-drop-enable
local-switching local-switching
! !
Verification
Verification steps are shown on the DC1 nodes, and the same steps can be followed on the DC2 nodes as well.
o VxLAN Tunnel membership towards the internal Leafs where the VLAN is provisioned.
• Local and Remote MAC learning on both border-leaf and DCI Edge.
100
Edge/Border-Leaf MCT Pair
The output below taken from one of the nodes of the MCT pair shows the VLAN state, its member ports and tunnels to the Leafs inside the DC1 where
the VLAN is provisioned. The VLAN is extended on the CE trunk port-channel (PO 2) connected to the DCI/VPLS Edge switch.
SLX-EdgeLeaf1#
MAC and ARP entries of local and remote hosts (highlighted in the output below is the host in DC2)
101
DCI VPLS/Edge
Cluster, Bridge-domain and Pseudowire
DCI-9540-1# show bridge-domain 101
Bridge-domain 101
-------------------------------
Bridge-domain Type: MP, VC-ID: 101 MCT Enabled: TRUE
Number of configured end-points: 3, Number of Active end-points: 2
VE if-indx: 0, Local switching: TRUE, bpdu-drop-enable: TRUE
MAC Withdrawal: Disabled
PW-profile: vpls, mac-limit: 0
VLAN: 101, Tagged ports: 1(1 up), Un-tagged ports: 0 (0 up)
Tagged Ports: po2.101
Un-tagged Ports:
Total VPLS peers: 1 (1 Operational):
VC id: 101, Peer address: 66.1.1.1, State: Operational, uptime: 12 min 20 sec
Load-balance: True, Cos Enabled: False,
Tunnel cnt: 1
rsvp to-MLX-DC2 (cos_enable:False cos_value:0)
Assigned LSPs count:0 Assigned LSPs:
Local VC lbl: 983116, Remote VC lbl: 983121,
Local VC MTU: 9100, Remote VC MTU: 9100,
Local VC-Type: 5, Remote VC-Type: 5
Local PW preferential Status: Standby, Remote PW preferential Status: Active
VC id: 101, Peer address: 66.1.1.1, State: Operational, uptime: 27 min 30 sec
Load-balance: True, Cos Enabled: False,
Tunnel cnt: 1
rsvp to-MLX-DC2 (cos_enable:False cos_value:0)
Assigned LSPs count:0 Assigned LSPs:
Local VC lbl: 983134, Remote VC lbl: 983110,
Local VC MTU: 9100, Remote VC MTU: 9100,
Local VC-Type: 5, Remote VC-Type: 5
Local PW preferential Status: Active, Remote PW preferential Status: Active
Peer Info:
==========
Peer IP: 10.55.55.1, State: Up
Peer Interface: Vlan 4090
Client Info:
============
Name Id ESI Interface Local/Remote State
---- ---- ----------- --------- ------------------
Edge-Leaf 1 0:0:0:0:0:0:0:1:1 Port-channel 2 Up / Up
Client-PW 34816 0:0:0:0:0:0:0:2:2 PW Up / Up
DCI-9540-2# show mac-address-table bridge-domain 101 Host 0010.9400.1b96 behind DC2 fabric is
BDId Mac-address Type State Ports/LIF/PW
101 (B) 0010.9400.1b96 Dynamic-CL Active 66.1.1.1 learned ove the VPLS Pseudowire.
101 (B) 0011.9400.044d Dynamic-CCL Active Po 2.101
DCI-9540-1# show mac-address-table bridge-domain 101 | inc 2046|44d|1b96 Host 0011.9400.044d behind DC1 fabric is
101 (B) 0010.9400.1b96 CR Active 10.56.56.1 learned over the port-channel logical
101 (B) 0011.9400.044d CCR Active Po 2.101
interface between this DCI node and border-
leaf.
102
Layer-3 Handoff with MPLS/L3VPN
This use-case illustrates Layer-3 extension between two EVPN fabrics over MPLS/L3VPN network. This use case may also be extended to
interconnecting an EVPN fabric to a traditional multi-tenant L3 network.
If both fabrics used consistent L3 VNI range i.e. VNI X maps to same L3 tenant VRF on both fabrics, BGP EVPN based fabric extension would be an
appropriate option.
As shown in the topology below, a pair of switches in MCT pair act as border-leaf that terminate the VxLAN encapsulated packets from the EVPN fabric
and hands off Layer-3 routed packets to the MPLS edge devices which act as L3VPN PE routers. Similarly, on the other direction, these border-leafs
map and encapsulate the Layer-3 routed packets into appropriate tenant L3 VNI segments towards the EVPN fabrics.
The following devices are used in this illustration and these are part of the validated design.
PIN Platform
Note that only the relevant portion of the validated design topology is shown in the diagram and configuration section will include only the relevant
and incremental configuration needed for the use-case. For building the Fabric Underlay and Overlay, please refer to the validated design section.
MPLS
Super-spines
DC1 DC2
Configuration
Following are the configuration steps involved:
• Provision the tenant VRFs on the border-leaf pair. This is similar to tenant VRF provisioning on the leaf in the EVPN/VxLAN fabric.
103
• Provision the tenant VRFs on the DCI edge pair. This pair acts as PE edge for L3VPN.
• Provision IPv4 and IPv6 interfaces between the border-leafs and DCI edge nodes in the tenant L3 VRF context.
o Each node in the border-leaf MCT pair is connected to both DCI edge nodes by L2 trunk ports.
o Each trunk will carry a separate VLAN (for isolation) and has an associated VE interface inside the tenant VRF.
o This will be repeated for every tenant VRF on each of the MCT nodes.
• Enable BGP peering to DCI nodes inside the tenant VRF context. This exchanges the IPv4 and IPv6 routes to/from the DCI nodes. DCI node
will also have a VRF for the tenant VRF. (Border-leaf acts as CE and DCI node as PE. They exchange routes into the tenant VRFs on each side).
This is also known as vrf-lite peering. Each tenant VRF will have a separate BGP peering session with each of the DCI nodes.
104
cluster management node-id 1 cluster management node-id 2
node-id 1 !
cluster management principal-priority 1
!
cluster Edge-Leaf 6162 cluster Edge-Leaf 6162
peer-interface Port-channel 1 peer-interface Port-channel 1
peer 10.61.62.1 peer 10.61.62.0
df-load-balance df-load-balance
deploy deploy
client DCI 1 client DCI 1
client-interface Port-channel 2 client-interface Port-channel 2
deploy deploy
! !
router bgp router bgp
local-as 4200007000 local-as 4200007000
capability as4-enable capability as4-enable
fast-external-fallover fast-external-fallover
bfd interval 300 min-rx 300 multiplier 3 bfd interval 300 min-rx 300 multiplier 3
neighbor 10.61.62.1 remote-as 4200007000 neighbor 10.61.62.0 remote-as 4200007000
neighbor 10.61.62.1 bfd neighbor 10.61.62.0 bfd
address-family ipv4 unicast address-family ipv4 unicast
network 10.61.1.1/32 network 10.61.1.1/32
no neighbor 10.61.62.1 activate no neighbor 10.61.62.0 activate
maximum-paths 8 maximum-paths 8
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor 10.61.62.1 encapsulation nsh neighbor 10.61.62.0 encapsulation nsh
neighbor 10.61.62.1 activate neighbor 10.61.62.0 activate
! !
overlay-gateway Edge-Leaf
type layer2-extension
ip interface Loopback 2
map vni auto
activate
!
105
vrf vrf1 vrf vrf1
rd 10.61.1.11:1 rd 10.61.1.12:1
evpn irb ve 3001 evpn irb ve 3001
address-family ipv4 unicast address-family ipv4 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
! !
address-family ipv6 unicast address-family ipv6 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
! !
! !
vlan 2601 vlan 2603
router-interface Ve 2601 router-interface Ve 2603
106
SLX-9540 MCT Pair DCI Edge
IGP and MPLS configuration
router isis router isis
net 49.0001.0010.0055.5501.00 net 49.0001.0010.0056.5601.00
fast-flood 5 fast-flood 5
is-type level-1 is-type level-1
log adjacency log adjacency
lsp-gen-interval 5 lsp-gen-interval 5
lsp-refresh-interval 64000 lsp-refresh-interval 64000
max-lsp-lifetime 65535 max-lsp-lifetime 65535
partial-spf-interval 5000 150 300 partial-spf-interval 5000 150 300
spf-interval level-1 5 150 300 spf-interval level-1 5 150 300
address-family ipv4 unicast address-family ipv4 unicast
metric-style wide level-1 metric-style wide level-1
default-metric 10 default-metric 10
! !
interface Loopback 2 interface Loopback 2
ip router isis ip router isis
ip address 55.1.1.1/32 ip address 56.1.1.1/32
no shutdown no shutdown
! !
interface Ethernet 0/48 interface Ethernet 0/48
description BL to MPLS cloud description BL to MPLS cloud
ip router isis ip router isis
ip address 55.0.1.0/31 ip address 56.0.1.0/31
isis auth-mode md5 level-1 isis auth-mode md5 level-1
isis auth-key level-1 $9$BwrsDbB+tABWGWpINOVKoQ== isis auth-key level-1 $9$BwrsDbB+tABWGWpINOVKoQ==
isis point-to-point isis point-to-point
no shutdown no shutdown
! !
107
The MPLS tunneling configuration is beyond the scope of this document. A sample configuration is provided below for completeness.
router mpls router mpls
policy policy
traffic-engineering isis level-1 traffic-engineering isis level-1
ingress-tunnel-accounting ingress-tunnel-accounting
transit-session-accounting transit-session-accounting
! !
ldp ldp
lsr-id 55.1.1.1 lsr-id 56.1.1.1
! !
dynamic-bypass dynamic-bypass
enable-all-interfaces enable-all-interfaces
! !
mpls-interface ethernet 0/48 mpls-interface ethernet 0/48
! !
mpls-interface ve 4090 mpls-interface ve 4090
! !
path to-DCI_2 path to-DCI_1
hop 10.55.56.1 strict hop 10.55.56.0 strict
! !
path to-MLX-1 path to-MLX
hop 55.0.1.1 strict hop 56.0.1.1 strict
! !
lsp to-DCI_2 lsp to-DCI_1
to 56.1.1.1 to 55.1.1.1
primary-path to-DCI_2 primary-path to-DCI_1
adaptive adaptive
frr frr
facility-backup facility-backup
! !
enable enable
! !
lsp to-MLX-DC2 lsp to-MLX-DC2
to 66.1.1.1 to 66.1.1.1
primary-path to-MLX-1 primary-path to-MLX
adaptive adaptive
frr frr
facility-backup facility-backup
! !
enable enable
! !
108
VRF-lite peering between Border-leaf and DCI nodes
109
router bgp router bgp
local-as 64511 local-as 64511
capability as4-enable capability as4-enable
fast-external-fallover fast-external-fallover
neighbor 66.1.1.1 remote-as 100 neighbor 66.1.1.1 remote-as 64511
neighbor 66.1.1.1 update-source loopback 2 neighbor 66.1.1.1 update-source loopback 2
address-family ipv4 unicast address-family ipv4 unicast
no neighbor 66.1.1.1 activate no neighbor 66.1.1.1 activate
! !
address-family vpnv4 unicast address-family vpnv4 unicast
neighbor 66.1.1.1 send-community extended neighbor 66.1.1.1 send-community extended
neighbor 66.1.1.1 activate neighbor 66.1.1.1 activate
! !
address-family vpnv6 unicast address-family vpnv6 unicast
neighbor 66.1.1.1 send-community extended neighbor 66.1.1.1 send-community extended
neighbor 66.1.1.1 activate neighbor 66.1.1.1 activate
! !
Verification
Edge/Border-Leaf MCT Pair
The verification outputs are shown from one of the nodes in the MCT pair.
110
SLX-EdgeLeaf1# show ip bgp vrf vrf1
Total number of BGP Routes: 189
Status codes: s suppressed, d damped, h history, * valid, > best, i internal, S stale 10.0.131.0/24 are behind
Origin codes: i - IGP, e - EGP, ? - incomplete Leafs on local fabric
Network Next Hop RD MED LocPrf Weight Path
*> 10.0.131.0/24 10.1.1.1 none 100 0 4200003000 4200000000 4200000001 ?
* 10.0.131.0/24 10.1.1.1 none 100 0 4200003000 4200000000 4200000001 ?
*> 10.1.75.0/24 10.55.61.1 none 100 0 64511 4200000200 4200000201 ?
* 10.1.75.0/24 10.56.61.1 none 100 0 64511 4200000200 4200000201 ?
*i 10.1.75.0/24 10.61.62.1 none 100 0 64511 4200000200 4200000201 ? 10.1.75.0/24 is behind DCI
SLX-EdgeLeaf1# show ip route 10.1.75.0/24 vrf vrf1
IP Routing Table for VRF "vrf1" edge or DC2 and is learned
Total number of IP routes: 87 over the vrf-lite peers
'*' denotes best ucast next-hop
'[x/y]' denotes [preference/metric]
10.1.75.0/24
*via 10.55.61.1, Ve 2601, [20/0], 19h37m, eBgp, tag 0
*via 10.56.61.1, Ve 2602, [20/0], 19h37m, eBgp, tag 0 Route info shows vrf-lite &
SLX-EdgeLeaf1# show ip route 10.0.131.0/24 vrf vrf1 EVPN routes
IP Routing Table for VRF "vrf1"
Total number of IP routes: 87
'*' denotes best ucast next-hop
'[x/y]' denotes [preference/metric]
10.0.131.0/24
*via 10.1.1.1%default-vrf, Ve 3001, [20/0], 19h58m, eBgp, tag 0, (VNI 7097, GW MAC 609c.9fb0.d801, Tu 61441)
*via 10.1.1.1%default-vrf, Ve 3001, [20/0], 19h58m, eBgp, tag 0, (VNI 7097, GW MAC 609c.9fb0.f501, Tu 61441)
SLX-EdgeLeaf1#
111
Design Considerations
BGP Route scale considerations
In the overlay or EVPN fabric, following route types are used for the workloads:
When a dual-stack host is provisioned, 3 routes are inserted into BGP EVPN Address-family: a MAC only route for the host’s MAC address, an ARP or
MAC/IP route and a ND or MAC/IPv6 route. On top of this, a Prefix route for the Vlan Subnet on which the host resides.
On a typical Rack of 2000 VMs or hosts – assuming all of them support dual-stack – it is expected to insert 6000 BGP EVPN entries apart from the host
subnets (2 * Subnets – one for IPv4 and one for IPv6).
These routes are sent to Spines. The Spines will send the routes to other Leafs. In this case, each leaf will be receiving a number of copies equal to the
number of Spines.
When same VLANs/VNIs or tenant VRFs are not extended or provisioned on another ToR, the associated host routes and subnet routes will be filtered
out based on the route targets and therefore won’t be in the BGP RIB.
These are certain considerations one must take into account when designing the fabric. The network designers and architects are encouraged to
consult the verified scale parameters of the platforms used in the fabric.
For large scale DC fabrics, one may also consider using only two spines to exchange EVPN routes instead of all spines. This will reduce the number of
copies of BGP paths on the Leaf nodes. Similarly, two Super-spines may be designated to exchange EVPN routes.
112
Appendix1: VDX Leaf configuration
This section provides the configuration steps for VDX platforms used as Leaf in the BGP EVPN fabric.
Node ID Configuration
The VDX platforms used as leaf, spine, and super-spine nodes are enabled with VCS ID 1 by default. Since these nodes will be independent in IP fabric,
we need to ensure that they do not form a VCS fabric between them. This is achieved by configuring a unique VCS ID on each node.
In the validated design, each node—spine, leaf, super-spine, and edge leaf—is configured with a unique VCS ID. The RBridge ID may be re-used. We
recommend using rbridge-id 1 for individual leafs and rbridge-ids 1 and 2 for vLAG pair.
The vLAG pair is assigned its own unique VCS ID, and each node in the vLAG pair has a separate RBridge ID. For example, in the validated design, Leaf1
is a 2-node vLAG pair.
vLAG peer 1:
vLAG peer 2:
From the primary node of the vLAG pair, enable virtual-fabric. For instance, as shown above, the Rbridge 2 is the primary node in the Leaf1 vLAG pair.
113
IP Fabric Infrastructure Links
All nodes in the IP fabric—leafs, spines, and super-spines—are interconnected with Layer 3 interfaces. In the validated design,
• All these links are configured as Layer 3 interfaces with /31 IPv4 address.
• The MTU for these links is set to Jumbo MTU. This is a requirement to handle the VXLAN encapsulation of Ethernet frames.
Loopback Interfaces
Each device in the fabric needs one loopback interface with a unique IPv4 address for the purpose of Router-ID.
rbridge-id 1
interface Loopback 1
no shutdown
ip address 10.121.1.11/32
rbridge-id 1
ip router-id 10.121.1.11
Each leaf needs a loopback interface with a unique IPv4 address to use as VTEP-IP.
rbridge-id 1
interface Loopback 2
no shutdown
ip address 10.121.1.1/32
vLAG Pair/ToR
vLAG configuration involves a few additional configuration steps apart from those shown above.
• Loopback interfaces for router-id and VTEP-IP. The pair act as one logical VTEP, so will share a common VTEP IP address. However each node
in the pair will have a separate router-id.
• Fabric link configuration. This is same as the configuration shown for individual Leaf/ToR. Each node in the pair will have separate links
connecting the Spines.
• Configure the server-facing port channels, and add the required VLANs on them.
114
Node ID Configuration on vLAG Pair
Refer to the Node ID configuration section for assigning the Node ID to the vLAG pair.
• Pod1-Leaf1-1, rbridge-id 1
• Pod1-Leaf1-2, rbridge-id 2
ISL Configuration
As shown in the illustration below, the vLAG pair is interconnected by two 40G Ethernet ports for ISL.
Rbridge 1 Rbridge 2
vLAG
115
Loopback interfaces
Each node has a unique router-id. But both of them share a common VTEP-IP and act as a logical VTEP. In the configuration below, loopback-2
interface has same IP address on both nodes.
rbridge-id 1 rbridge-id 2
interface Loopback 1 interface Loopback 1
no shutdown no shutdown
ip address 10.121.1.11/32 ip address 10.121.1.12/32
! !
interface Loopback 2 interface Loopback 2
no shutdown no shutdown
ip address 10.121.1.1/32 ip address 10.121.1.1/32
! !
ip router-id 10.121.1.11 ip router-id 10.121.1.12
! !
BGP Configuration
For VDX platforms, BGP configuration is done in the rbridge configuration mode. For vLAG pair, the BGP configuration is done for each rbridge node.
Configuration for an individual node is shown below. This must be replicated with the appropriate neighbor IP addresses of the Spines for a dual or
VLAG ToR. Note that the leaf negotiates and exchange EVPN-AFI with only two spines.
!POD1-leaf1-1
rbridge-id 1
router bgp
local-as 4200000001
capability as4-enable
!
neighbor spine-evpn-group peer-group
neighbor spine-evpn-group remote-as 4200000000
neighbor spine-evpn-group password 2 $PVNHITJVPWQ=
neighbor spine-evpn-group bfd
!
neighbor 10.12.1.0 peer-group spine-evpn-group
neighbor 10.13.1.0 peer-group spine-evpn-group
!
neighbor spine-ip-group peer-group
neighbor spine-ip-group remote-as 4200000000
neighbor spine-ip-group password 2 $PVNHITJVPWQ=
neighbor spine-ip-group bfd
!
neighbor 10.11.1.0 peer-group spine-ip-group
neighbor 10.14.1.0 peer-group spine-ip-group
!
address-family ipv4 unicast
network 10.121.1.1/32
maximum-paths 8
graceful-restart
!
address-family l2vpn evpn
graceful-restart
neighbor spine-evpn-group activate
neighbor spine-evpn-group allowas-in 1
neighbor spine-evpn-group next-hop-unchanged
!
Tenant Provisioning
Tenant provisioning refers to the configuration on leafs to enable server VLANs and networks connectivity to tenant VRF contexts and mapping these
VLANs and VRFs to the overlay control and forwarding planes to establish Layer 2 extension and multitenancy. This is applicable to both 3-stage and 5-
stage Clos fabrics.
116
Anycast Gateway MAC Configuration
Anycast gateway MAC configuration is applied to all leafs (except edge leafs) in the data center. This is used as the gateway MAC or router-mac for all
server facing subnets. This enables seamless workload move within and across the PoDs in the data center. It is recommended to set the U/L bit is set
to 1 in the mac-address to indicate that it is a locally administered MAC address and not to conflict with any real MAC addresses.
The MAC addresses must be different for IPv4 and IPv6, but the OUI portion (first three bytes) must be same.
rbridge-id 1
ip anycast-gateway-mac 0201.0101.0101
!
rbridge-id 1
ipv6 anycast-gateway-mac 0201.0102.0202
!
1. Assign a unique RD. Every tenant must have a unique RD value per leaf/ToR where it is provisioned. In the validated design, we are using the
following format: IPv4_Address:nn where
• nn is a unique number for the tenant VRF. This value is re-used on other leafs as well where the same tenant is provisioned.
For example, vrf201 has the following RD values on leafs where it is provisioned.
o On leaf1: 10.121.1.11:201
o On leaf5: 10.121.1.51:201
o On border-leaf1: 10.123.4.1:201
3. Assign import and export route-targets for IPv4 and IPv6 tenant routes.
117
In the configuration templates below, the following Tenant profile is enabled on a leaf:
Name: vrf101
L3 VNI: 7101
Route-target 101:101
RD is route-distinguisher unique
rbridge-id 45 per tenant
vrf vrf101
rd 10.121.1.11:101
! VNI is the unique L3 VNI required
vni 7101 for each tenant for symmetric
! routing
address-family ipv4 unicast
route-target export 101:101 evpn
route-target import 101:101 evpn Route target for export and
! import of tenant IPv4 and IPv6
address-family ipv6 unicast routes. This is similar to
route-target export 101:101 evpn MPLS/VPNs
route-target import 101:101 evpn
!
!
!
This is the routing interface for the Integrated Routing-Bridging (IRB) operation on the leaf.
118
Assign VE (L3) Interface for the Server-Facing VLAN:
rbridge-id 45
interface Ve 2001
vrf forwarding vrf101 VLAN subnet belongs to the tenant vrf101
!
ipv6 anycast-address fd2d:d47f:107:1::254/64
ipv6 nd cache expire 270
! IPv6 anycast gw address. Same address on all
ip anycast-address 10.107.1.254/24 leafs where this VLAN is extended
ip arp-aging-timeout 4 IPv6 neighbor cache timeout set to < 5 mins (
! mac aging-time of 5 mins)
no shutdown
!
!
IPv4 anycast gw address. Same address on all
Leafs where this vlan is extended
IPv4 ARP cache timeout set to < 4 mins ( mac
aging-time of 5 mins)
IPv6
The RD and RT configuration is set to auto in this design for simplicity and may be followed for most of the deployments. Advanced users may define a
different scheme of RD and RT. User-defined RD/RT is not covered in this document.
119
rbridge-id 45 Route target to enable import of host routes (mac/ip) from remote
evpn-instance pod1-leaf1 VTEPS, ignore-as keyword allows downloading routes from VTEPs
! in remote AS. (as in EBGP underlay where each leaf is in a separate
route-target both auto ignore-as AS)
rd auto
! RD is set to Auto for simplified configuration
duplicate-mac-timer 5 max-count 3
vni add 2001
!
! Enable the server vlan’s VNI
For any additional vlans, add the respective VNIs
• Same VTEP-IP
Please note that the configuration for both switches in the vLAG pair can be done from the primary node. The configuration required on each node of
Dual-ToR vLAG pair is shown in two config blocks side-by-side. The global configuration is shown in one config block.
• Loopback1 interface has unique IP address on each node, this is used as the IP Router-ID for the node.
• Loopback2 interface has the same IP address on both nodes, this is used as VETP-IP under overlay-gateway.
• Create the tenant VLAN and VE interface. Enable the VLAN on the server facing vLAG.
• Extend the L2 segment under the EVPN instance on each node or rbridge.
rbridge-id 1 rbridge-id 2
ip anycast-gateway-mac 0201.0101.0101 ip anycast-gateway-mac 0201.0101.0101
ipv6 anycast-gateway-mac 0201.0102.0202 ipv6 anycast-gateway-mac 0201.0102.0202
host-table aging-mode conversational host-table aging-mode conversational
! !
interface Loopback 1 interface Loopback 1
no shutdown no shutdown
ip address 10.121.1.11/32 ip address 10.121.1.12/32
! !
interface Loopback 2 interface Loopback 2
no shutdown no shutdown
ip address 10.121.1.1/32 ip address 10.121.1.1/32
! !
ip router-id 10.121.1.11 ip router-id 10.121.1.12
! !
vrf vrf101 vrf vrf101
rd 10.121.1.11:101 rd 10.121.1.12:101
vni 7101 vni 7101
address-family ipv4 unicast address-family ipv4 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
address-family ipv6 unicast address-family ipv6 unicast
route-target export 101:101 evpn route-target export 101:101 evpn
route-target import 101:101 evpn route-target import 101:101 evpn
! !
! !
interface Ve 7101 interface Ve 7101
vrf forwarding vrf101 vrf forwarding vrf101
ipv6 address use-link-local-only ipv6 address use-link-local-only
no shutdown no shutdown
! !
! !
120
Overlay Gateway
overlay-gateway leaf1
type layer2-extension
ip interface Loopback 2
attach rbridge-id add 1-2
map vlan vni auto
activate
!
rbridge-id 45 rbridge-id 46
interface Ve 2001 interface Ve 2001
vrf forwarding vrf101 vrf forwarding vrf101
ipv6 anycast-address fd2d:d47f:107:1::254/64 ipv6 anycast-address fd2d:d47f:107:1::254/64
ipv6 nd cache expire 270 ipv6 nd cache expire 270
ip anycast-address 10.107.1.254/24 ip anycast-address 10.107.1.254/24
ip arp-aging-timeout 4 ip arp-aging-timeout 4
no shutdown no shutdown
! !
! !
rbridge-id 45 rbridge-id 46
evpn-instance pod1-leaf1 evpn-instance pod1-leaf1
route-target both auto ignore-as route-target both auto ignore-as
rd auto rd auto
duplicate-mac-timer 5 max-count 3 duplicate-mac-timer 5 max-count 3
vni add 2001 vni add 2001
! !
! !
121
Advertise Tenant Subnet routes
rbridge-id 45 rbridge-id 46
router bgp router bgp
address-family ipv4 unicast vrf vrf101 address-family ipv4 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
! !
address-family ipv6 unicast vrf vrf101 address-family ipv6 unicast vrf vrf101
redistribute connected redistribute connected
maximum-paths 8 maximum-paths 8
! !
! !
! !
122
References
1. BGP MPLS-Based Ethernet VPN
https://tools.ietf.org/html/rfc7432
123