Professional Documents
Culture Documents
Chapter 4: VXLAN
Advanced Data Center Switching
We Will Discuss:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.
Layer 2 Apps
The needs of the applications that run on the servers in a data center usually drive the designs of those data centers. There
are many server-to-server applications that have strict requirements layer 2 connectivity between servers. A switched
infrastructure that is built around xSTP or a layer 2 fabric (like Juniper Network’s Virtual Chassis Fabric or Junos Fusion) is
perfectly suited for this type of connectivity. These type of infrastructure allow for broadcast domains to be stretched across
the data center using some form of VLAN tagging.
IP Fabric
Many of today’s next generation data centers are being built around IP Fabrics which, as their name implies, provide IP
connectivity between the racks of a data center. How can a next generation data center based on IP-only connectivity
support the layer 2 requirements of the traditional server-to-server applications? The rest of this section of this chapter will
discuss the possible solutions to the layer 2 connectivity problem.
Layer 2 VPNs
One possible solution to providing layer 2 connectivity over an IP-based data center would be to implement some form of
layer 2 virtual private network (VPN) on the routers that directly attach to the servers in the rack. Usually these routers would
be the top-of-rack (TOR) routers/switches. In this scenario, each of TOR router would act as a layer 2 VPN gateway. A gateway
is the device in a VPN that performs the encapsulation and decapsulation of VPN data. In a layer 2 VPN based on Ethernet, a
gateway (router on left) will take Ethernet frames destined for a remote MAC address, encapsulate the original Ethernet
frame in some other data type (like IP, MPLS, IPSec, etc.) and transmit the newly formed packet to the remote gateway. The
receiving gateway (router on right) will receive the VPN data, decapsulate the data by removing the outer encapsulation, and
then forward the remaining original Ethernet frame to the locally attached server. Notice on the diagram, that the IP Fabric
simply had to forward IP data. The IP Fabric had no knowledge of the Ethernet connectivity that exists between Host A and B.
Data Plane
There are generally two components of a VPN. There is the data plane (as described on this slide) and the control plane (as
described on the next slide).
The data plane of a VPN describes the method in which a gateway encapsulates and decapsulates the original data. Also, in
regards to an Ethernet layer 2 VPN, it might be necessary for the gateway to learn the MAC addresses of both local and
remote servers much like a normal Ethernet switch learns MAC addresses. In almost all forms Ethernet VPNs, the gateways
learn the MAC addresses of locally attached servers in the data plane (i.e. from received Ethernet frames). Remote MAC
addresses can be learned either in the data plane (after decapsulating data received from remote gateways) or in the control
plane.
Control Plane
One question that must be asked is, “How does a gateway learn about remote gateways?” The learning of remote gateways
can happen in one of two ways. Remote gateways can be statically configured on each gateway participating in a VPN or they
can be learned through some dynamic VPN signaling protocol.
Static configuration works fine but it does not really scale. For example, imagine that your have 20 TOR routers participating
in a statically configured layer 2 VPN. If you add another TOR router to the VPN, you would have to manually configure each of
the 20 switches to recognize the newly added gateway to the VPN.
Usually a VPN has some form of dynamic signaling protocol for the control plane. The signaling protocol can allow for
dynamic adds and deletions of gateways from the VPN. Some signaling protocols also allow a gateway to advertise it’s locally
learned MAC addresses to remote gateways. Usually a gateway has to receive an Ethernet frame from a remote host before
it can learn the host’s MAC address. Learning remote MAC addresses in the control plane allows the MAC tables of all
gateways to be more in sync. This has a positive side effect of causing the forwarding behavior of the VPN to be more
efficient (less flooding of data over the fabric).
Virtualization
Data centers are relying on virtualization more and more. The slide shows the concepts of virtualizing servers in a data
center. Instead of installing a bare metal servers (BMS), a server can run as a virtual machine (VM) on a host machine. A VM
is a software computer that runs the same OS and applications as a BMS. A host machine is the physical machine that
house the VMs that run inside it.
One interesting piece of virtualization is how networking works between VMs. Normally, a BMS would simply need a physical
network interface card (NIC) to attach to the network. In the virtualized world, the VMs also utilize NICs, however they are in
fact, virtual. VMs use their virtual NICs to communicate with other VMs. To provide connectivity between VMs on the same
host machine, the virtual NICs attach to virtual switches. To allow VMs to communicate over the physical network, the virtual
switches use the physical NICs of the host machine. If the physical network is a switched network (as in the diagram), the
virtual switches appear to standard switches attached to the network. VLANs can be simply be stretched from one virtual
switch, across the physical switched network, and terminate on one or more remote virtual switches. This works great when
the physical network is some sore of Ethernet switches network. However, what happens when the physical network is based
on IP routing?
VTEP: Part 1
The VXLAN Tunnel Endpoint (VTEP) is the VPN gateway for VXLAN. It performs the encapsulation (and decapsulation) of
Ethernet frames using VXLAN encapsulation. Usually, the mapping of VLAN (VM-facing) to VNI is manually configured on the
VTEP.
VTEP: Part 2
The slide shows how a VTEP handles an Ethernet frame from a locally attached VM that must be sent to a remote VM. Here
is the step by step process take by Virtual Switch 1...
1. VS1 receives an Ethernet frame with a destination MAC of VM3.
2. VS1 performs a MAC table look up and determines that the frame must be sent over the VXLAN tunnel to the
remote VTEP, VS2.
3. VS1 removes any outer VLAN tagging on the original Ethernet frame and then encapsulates the remaining
Ethernet frame using VXLAN encapsulation while also setting the destination IP address to VS2’s VTEP address
as well as setting the VNI appropriately.
4. VS1 forwards the VXLAN packet towards the IP Fabric.
VTEP: Part 3
The slide shows how a VTEP handles an VXLAN packet from a remote VTEP that must be decapsulated and sent to a local
VM. Here is the step by step process take by the network and VS2...
1. The routers in the IP fabric simply route the VXLAN packet to its destination, VS2’s VTEP address.
2. VS2 receives the VXLAN packet and uses the received VNI to determine on which MAC table the MAC table
lookup should be performed.
3. VS2 strips the VXLAN encapsulation leaving the original Ethernet frame.
4. VS2 performs a MAC table lookup to determine the outgoing virtual interface to send the Ethernet frame.
5. VS2, if necessary, pushes on VLAN tag and forwards the Ethernet frame to VM3.
One thing you should notice about the VLAN tagging between the VMs and the virtual switches is that since the VLAN tags
are stripped before sending over the IP Fabric, the VLAN tags do not have to match between remote VMs. This actually allows
for more flexibility in VLAN assignments from server to server and rack to rack.
BUM Traffic
The slide discusses the handling of BUM traffic by VTEPs according to the VXLAN standard model. In this model, you should
note that the underlay network must support a multicast routing protocol, preferably some form of Protocol Independent
Multicast Sparse Mode (PIM-SM). Also, the VTEPs must support Internet Group Membership Protocol (IGMP) so that they can
inform the underlay network that it is a member of the multicast group associated with a VNI.
For every VNI used in the data center, there must also be a multicast group assigned. Remember that there are 2^24 (~16M)
possible VNIs so your customer will need 2^24 group addresses. Luckily, 239/8 is a reserved set of organizationally scoped
multicast group addresses (2^24 group addresses in total) that can be used freely within your customer’s data center.
Multicast Forwarding
When VTEP B receives a broadcast packet from a local VM, VTEP B encapsulates the Ethernet frame into the appropriate
VXLAN/UDP/IP headers. However, it sets the destination IP address of the outer IP header to the VNI’s group address
(239.1.1.1 on the slide). Upon receiving the multicast packet, VTEP B’s DR (the PIM router closest to VTEP B) encapsulates
the multicast packet into unicast PIM register messages that are destined to the IP address of the RP. Upon receiving the
register messages, the RP de-encapsulates the register messages and forwards the resulting multicast packets down the
(*,G) tree. Upon receiving, the multicast VXLAN packet, VTEP A does the following:
1. Strips the VXLAN/UDP/IP headers;
2. Forwards the broadcast packet towards the VMs using the virtual switch;
3. If VTEP B was unknown, VTEP A learns the IP address of VTEP B; and
4. Learns the remote MAC address of the sending VM and maps it to VTEP B’s IP address.
For all of this to work, you must ensure that the appropriate devices support PIM-SM, IGMP, and the PIM DR and RP
functions.
It is not shown on this slide but one R1 receives the first native multicast packet from the RP (source address is VTEP B’s
address), R1 will build a shortest path tree (SPT) to the DR closest to VTEP B which will establish (S,G) state on all routers
along that path.
VXLAN Configuration
The slide highlights the topic we discuss next.
Example Topology
The slide shows the example topology that will be used for the subsequent slides.
Logical View
To help your under they behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast
domain as well as IP subnet. Also, VRRP will run between the two routers so as to provide a redundant default gateway to the
two hosts.
Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Network’s routers as the VTEP interfaces. Therefore, you must make sure that the loopback
addresses of the routers are reachable.
PIM
Some form of PIM must be enabled in the IP Fabric. The slide shows that the routers will run PIM-SM with a statically
configured RP. The configurations of the RP as well as all other routers is shown on the slide. Notice that PIM-SM only needs
to be enabled on the IP Fabric facing interfaces.
Source Address
You must decided on the source address of the VXLAN and multicast packets that will be generated by the local VTEP. Use
the vtep-source-interface statement to specify the interface where the IP address will come from. This command is
the same for both MX and QFX5100 Series devices.
PIM Neighbors
The commands on the slide verify which PIM neighbors have been discovered and the associated settings for the neighbors.
VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.
MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.
We Discussed:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.
Review Questions:
1.
2.
3.
Lab: VXLAN
The slide provides the objective for this lab.