You are on page 1of 42

Advanced Data Center Switching

Chapter 4: VXLAN
Advanced Data Center Switching

We Will Discuss:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.

Chapter 4–2 • VXLAN www.juniper.net


Advanced Data Center Switching

Layer Connectivity Over a Layer 3 Network


The slide lists the topics we will discuss. We discuss the highlighted topic first.

www.juniper.net VXLAN • Chapter 4–3


Advanced Data Center Switching

Layer 2 Apps
The needs of the applications that run on the servers in a data center usually drive the designs of those data centers. There
are many server-to-server applications that have strict requirements layer 2 connectivity between servers. A switched
infrastructure that is built around xSTP or a layer 2 fabric (like Juniper Network’s Virtual Chassis Fabric or Junos Fusion) is
perfectly suited for this type of connectivity. These type of infrastructure allow for broadcast domains to be stretched across
the data center using some form of VLAN tagging.

IP Fabric
Many of today’s next generation data centers are being built around IP Fabrics which, as their name implies, provide IP
connectivity between the racks of a data center. How can a next generation data center based on IP-only connectivity
support the layer 2 requirements of the traditional server-to-server applications? The rest of this section of this chapter will
discuss the possible solutions to the layer 2 connectivity problem.

Chapter 4–4 • VXLAN www.juniper.net


Advanced Data Center Switching

Layer 2 VPNs
One possible solution to providing layer 2 connectivity over an IP-based data center would be to implement some form of
layer 2 virtual private network (VPN) on the routers that directly attach to the servers in the rack. Usually these routers would
be the top-of-rack (TOR) routers/switches. In this scenario, each of TOR router would act as a layer 2 VPN gateway. A gateway
is the device in a VPN that performs the encapsulation and decapsulation of VPN data. In a layer 2 VPN based on Ethernet, a
gateway (router on left) will take Ethernet frames destined for a remote MAC address, encapsulate the original Ethernet
frame in some other data type (like IP, MPLS, IPSec, etc.) and transmit the newly formed packet to the remote gateway. The
receiving gateway (router on right) will receive the VPN data, decapsulate the data by removing the outer encapsulation, and
then forward the remaining original Ethernet frame to the locally attached server. Notice on the diagram, that the IP Fabric
simply had to forward IP data. The IP Fabric had no knowledge of the Ethernet connectivity that exists between Host A and B.

www.juniper.net VXLAN • Chapter 4–5


Advanced Data Center Switching

Data Plane
There are generally two components of a VPN. There is the data plane (as described on this slide) and the control plane (as
described on the next slide).
The data plane of a VPN describes the method in which a gateway encapsulates and decapsulates the original data. Also, in
regards to an Ethernet layer 2 VPN, it might be necessary for the gateway to learn the MAC addresses of both local and
remote servers much like a normal Ethernet switch learns MAC addresses. In almost all forms Ethernet VPNs, the gateways
learn the MAC addresses of locally attached servers in the data plane (i.e. from received Ethernet frames). Remote MAC
addresses can be learned either in the data plane (after decapsulating data received from remote gateways) or in the control
plane.

Chapter 4–6 • VXLAN www.juniper.net


Advanced Data Center Switching

Control Plane
One question that must be asked is, “How does a gateway learn about remote gateways?” The learning of remote gateways
can happen in one of two ways. Remote gateways can be statically configured on each gateway participating in a VPN or they
can be learned through some dynamic VPN signaling protocol.
Static configuration works fine but it does not really scale. For example, imagine that your have 20 TOR routers participating
in a statically configured layer 2 VPN. If you add another TOR router to the VPN, you would have to manually configure each of
the 20 switches to recognize the newly added gateway to the VPN.
Usually a VPN has some form of dynamic signaling protocol for the control plane. The signaling protocol can allow for
dynamic adds and deletions of gateways from the VPN. Some signaling protocols also allow a gateway to advertise it’s locally
learned MAC addresses to remote gateways. Usually a gateway has to receive an Ethernet frame from a remote host before
it can learn the host’s MAC address. Learning remote MAC addresses in the control plane allows the MAC tables of all
gateways to be more in sync. This has a positive side effect of causing the forwarding behavior of the VPN to be more
efficient (less flooding of data over the fabric).

www.juniper.net VXLAN • Chapter 4–7


Advanced Data Center Switching

Layer 2 VPN Options


The slide lists some of the layer 2 VPNs that exist today.

Chapter 4–8 • VXLAN www.juniper.net


Advanced Data Center Switching

Virtualization
Data centers are relying on virtualization more and more. The slide shows the concepts of virtualizing servers in a data
center. Instead of installing a bare metal servers (BMS), a server can run as a virtual machine (VM) on a host machine. A VM
is a software computer that runs the same OS and applications as a BMS. A host machine is the physical machine that
house the VMs that run inside it.
One interesting piece of virtualization is how networking works between VMs. Normally, a BMS would simply need a physical
network interface card (NIC) to attach to the network. In the virtualized world, the VMs also utilize NICs, however they are in
fact, virtual. VMs use their virtual NICs to communicate with other VMs. To provide connectivity between VMs on the same
host machine, the virtual NICs attach to virtual switches. To allow VMs to communicate over the physical network, the virtual
switches use the physical NICs of the host machine. If the physical network is a switched network (as in the diagram), the
virtual switches appear to standard switches attached to the network. VLANs can be simply be stretched from one virtual
switch, across the physical switched network, and terminate on one or more remote virtual switches. This works great when
the physical network is some sore of Ethernet switches network. However, what happens when the physical network is based
on IP routing?

www.juniper.net VXLAN • Chapter 4–9


Advanced Data Center Switching

VXLAN is Supported by Major Vendors


As described in the previous slides, a layer 2 VPN can solve the problem by tunneling Ethernet frames over the IP network. In
the case of virtualized networks, the virtual switches running on the host machines will act as the VPN gateways. Many
vendors of virtualized products have chosen to support VXLAN as the layer 2 VPN. VXLAN functionality can be found in the
virtual switches like VMWare’s Distributed vSwitch, Open vSwitch, and Juniper Network’s Contrail vRouters. If virtualizing the
network is the future, it would seem that VXLAN has become the de facto layer 2 VPN in the data center.

Chapter 4–10 • VXLAN www.juniper.net


Advanced Data Center Switching

VXLAN Using Multicast Control Plane


The slide highlights the topic we discuss next.

www.juniper.net VXLAN • Chapter 4–11


Advanced Data Center Switching

VXLAN—An Ethernet VPN


VXLAN is defined in RFC 7348 and describes a method of tunneling Ethernet frames over an IP network. RFC 7348
describes the data plane and a signaling plane for VXLAN. Although, RFC 7348 discusses PIM and multicast in the signaling
plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway Protocol (MP-BGP) Ethernet VPN
(EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the multicast method of signaling.

Chapter 4–12 • VXLAN www.juniper.net


Advanced Data Center Switching

VXLAN Packet Format


The VXLAN packet consist of the following:
1. Original Ethernet Frame: The Ethernet frame being tunneled over the underlay network minus the VLAN tagging.
2. VXLAN Header (64 bits): Consists of an 8 bit flags field, the VNI, and two reserved fields. The I flag must be set
to 1 and the other 7 reserved flags must be set to 0.
3. Outer UDP Header: Usually contain the well known destination UDP port 4789. Some VXLAN implementations
allow for this destination port to be configured to some other value. The destination port is a hash of the inner
Ethernet frames header.
4. Outer IP Header: The source address is the IP address of the sending VXLAN Tunnel End Point (VTEP). The
destination address is the IP address of the receiving VTEP.
5. Outer MAC: As with any packet being sent over a layer 3 network, the source and destination MAC addresses will
change at each hop in the network.
6. Frame Check Sequence (FCS): New FCS for the outer Ethernet frame.

www.juniper.net VXLAN • Chapter 4–13


Advanced Data Center Switching

VTEP: Part 1
The VXLAN Tunnel Endpoint (VTEP) is the VPN gateway for VXLAN. It performs the encapsulation (and decapsulation) of
Ethernet frames using VXLAN encapsulation. Usually, the mapping of VLAN (VM-facing) to VNI is manually configured on the
VTEP.

Chapter 4–14 • VXLAN www.juniper.net


Advanced Data Center Switching

VTEP: Part 2
The slide shows how a VTEP handles an Ethernet frame from a locally attached VM that must be sent to a remote VM. Here
is the step by step process take by Virtual Switch 1...
1. VS1 receives an Ethernet frame with a destination MAC of VM3.
2. VS1 performs a MAC table look up and determines that the frame must be sent over the VXLAN tunnel to the
remote VTEP, VS2.
3. VS1 removes any outer VLAN tagging on the original Ethernet frame and then encapsulates the remaining
Ethernet frame using VXLAN encapsulation while also setting the destination IP address to VS2’s VTEP address
as well as setting the VNI appropriately.
4. VS1 forwards the VXLAN packet towards the IP Fabric.

www.juniper.net VXLAN • Chapter 4–15


Advanced Data Center Switching

VTEP: Part 3
The slide shows how a VTEP handles an VXLAN packet from a remote VTEP that must be decapsulated and sent to a local
VM. Here is the step by step process take by the network and VS2...
1. The routers in the IP fabric simply route the VXLAN packet to its destination, VS2’s VTEP address.
2. VS2 receives the VXLAN packet and uses the received VNI to determine on which MAC table the MAC table
lookup should be performed.
3. VS2 strips the VXLAN encapsulation leaving the original Ethernet frame.
4. VS2 performs a MAC table lookup to determine the outgoing virtual interface to send the Ethernet frame.
5. VS2, if necessary, pushes on VLAN tag and forwards the Ethernet frame to VM3.
One thing you should notice about the VLAN tagging between the VMs and the virtual switches is that since the VLAN tags
are stripped before sending over the IP Fabric, the VLAN tags do not have to match between remote VMs. This actually allows
for more flexibility in VLAN assignments from server to server and rack to rack.

Chapter 4–16 • VXLAN www.juniper.net


Advanced Data Center Switching

VXLAN Gateways: Part 1


We have discussed VTEPs that exist on virtual switches that sit on the host machines. However, what happens when the VMs
on the host machine need to communicate with a standard BMS that doesn’t support VXLAN. The VXLAN RFC describes how
a networking device like a router or switch can handle the VTEP role. A networking device that can perform that role is called
a VXLAN Gateway. There are two types of VXLAN Gateways; layer 2 and layer 3. The slide shows how a VXLAN Layer 2
Gateway (router on the right) handles VXLAN packets received from a remote VTEP. It simply provides layer 2 connectivity
between hosts on the same VLAN.

www.juniper.net VXLAN • Chapter 4–17


Advanced Data Center Switching

VXLAN Gateways: Part 2


Another form of gateway is the VXLAN Layer 3 Gateway. A layer 3 gateway acts as the default gateway for hosts on the same
VXLAN Segment (i.e. broadcast domain). In the slide, the default gateway for VM1 and VM2 is 10.1.1.254 which belongs to
Router B’s IRB interface. To send a packet to 1.1.1.1 (a remote IP subnet) VM1 must use Address Resolution Protocol (ARP)
to determine the MAC address of 10.1.1.254. Once VM1, knows the MAC address for 10.1.1.254, VM1 and the devices
along the way to the 1.1.1.1 will use the following procedure to forward an IP packet to its destination...
1. VM1 creates an IP packet destined to 1.1.1.1.
2. Since 1.1.1.1 is on a different subnet than VM1, VM1 encapsulates the IP packet in an Ethernet frame with a
destination MAC address of the default gateway’s MAC address and send the Ethernet frame to VS1.
3. VS1 receives the Ethernet frame and performs a MAC table lookup and determines that the Ethernet frame
must be sent over the VXLAN tunnel to Router B. Router B appears to VS1 as the VTEP that is directly attached
the host that owns the destination MAC address. The reality is that the destination MAC address is the MAC
address of Router B’s IRB interface for that VLAN/VXLAN segment.
4. Router B receives the VXLAN packet, determines the VNI which maps to a particular MAC table, and strips the
VXLAN encapsulation leaving the original Ethernet frame.
5. Router B performs a MAC table lookup and determines that the destination MAC belongs to its own IRB
interface.
6. Router B strips the remaining Ethernet framing and performs a routing table lookup to determine the nexthop to
the destination network.
7. Router B encapsulates the IP packet in the outgoing interface’s encapsulation and forwards it to the nexthop.

Chapter 4–18 • VXLAN www.juniper.net


Advanced Data Center Switching

VXLAN MAC Learning


This slide discusses the MAC learning behavior of a VTEP. The next few slides will discuss the details of how remote MAC
addresses are learned by VTEPs when using PIM as the control protocol.

www.juniper.net VXLAN • Chapter 4–19


Advanced Data Center Switching

BUM Traffic
The slide discusses the handling of BUM traffic by VTEPs according to the VXLAN standard model. In this model, you should
note that the underlay network must support a multicast routing protocol, preferably some form of Protocol Independent
Multicast Sparse Mode (PIM-SM). Also, the VTEPs must support Internet Group Membership Protocol (IGMP) so that they can
inform the underlay network that it is a member of the multicast group associated with a VNI.
For every VNI used in the data center, there must also be a multicast group assigned. Remember that there are 2^24 (~16M)
possible VNIs so your customer will need 2^24 group addresses. Luckily, 239/8 is a reserved set of organizationally scoped
multicast group addresses (2^24 group addresses in total) that can be used freely within your customer’s data center.

Chapter 4–20 • VXLAN www.juniper.net


Advanced Data Center Switching

Building the Multicast Tree


The slide shows an example of a PIM-SM enabled network where the (*,G) rendezvous point tree (RPT) is established from
VTEP A to R1 and finally to the rendezvous point (RP). This is the only part of the RPT shown for simplicity but keep in mind
that each VTEP that belongs to 239.1.1.1 will also build its branch of the RPT (including VTEB B).

www.juniper.net VXLAN • Chapter 4–21


Advanced Data Center Switching

Multicast Forwarding
When VTEP B receives a broadcast packet from a local VM, VTEP B encapsulates the Ethernet frame into the appropriate
VXLAN/UDP/IP headers. However, it sets the destination IP address of the outer IP header to the VNI’s group address
(239.1.1.1 on the slide). Upon receiving the multicast packet, VTEP B’s DR (the PIM router closest to VTEP B) encapsulates
the multicast packet into unicast PIM register messages that are destined to the IP address of the RP. Upon receiving the
register messages, the RP de-encapsulates the register messages and forwards the resulting multicast packets down the
(*,G) tree. Upon receiving, the multicast VXLAN packet, VTEP A does the following:
1. Strips the VXLAN/UDP/IP headers;
2. Forwards the broadcast packet towards the VMs using the virtual switch;
3. If VTEP B was unknown, VTEP A learns the IP address of VTEP B; and
4. Learns the remote MAC address of the sending VM and maps it to VTEP B’s IP address.

For all of this to work, you must ensure that the appropriate devices support PIM-SM, IGMP, and the PIM DR and RP
functions.
It is not shown on this slide but one R1 receives the first native multicast packet from the RP (source address is VTEP B’s
address), R1 will build a shortest path tree (SPT) to the DR closest to VTEP B which will establish (S,G) state on all routers
along that path.

Chapter 4–22 • VXLAN www.juniper.net


Advanced Data Center Switching

VXLAN Configuration
The slide highlights the topic we discuss next.

www.juniper.net VXLAN • Chapter 4–23


Advanced Data Center Switching

Example Topology
The slide shows the example topology that will be used for the subsequent slides.

Chapter 4–24 • VXLAN www.juniper.net


Advanced Data Center Switching

Logical View
To help your under they behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast
domain as well as IP subnet. Also, VRRP will run between the two routers so as to provide a redundant default gateway to the
two hosts.

www.juniper.net VXLAN • Chapter 4–25


Advanced Data Center Switching

Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Network’s routers as the VTEP interfaces. Therefore, you must make sure that the loopback
addresses of the routers are reachable.

Chapter 4–26 • VXLAN www.juniper.net


Advanced Data Center Switching

PIM
Some form of PIM must be enabled in the IP Fabric. The slide shows that the routers will run PIM-SM with a statically
configured RP. The configurations of the RP as well as all other routers is shown on the slide. Notice that PIM-SM only needs
to be enabled on the IP Fabric facing interfaces.

www.juniper.net VXLAN • Chapter 4–27


Advanced Data Center Switching

Source Address
You must decided on the source address of the VXLAN and multicast packets that will be generated by the local VTEP. Use
the vtep-source-interface statement to specify the interface where the IP address will come from. This command is
the same for both MX and QFX5100 Series devices.

Chapter 4–28 • VXLAN www.juniper.net


Advanced Data Center Switching

VXLAN Layer 2 Gateway Configuration: Part 1


The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a QFX5100 Series router.

www.juniper.net VXLAN • Chapter 4–29


Advanced Data Center Switching

VXLAN Layer 2 Gateway Configuration: Part 2


The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a MX Series router.

Chapter 4–30 • VXLAN www.juniper.net


Advanced Data Center Switching

VXLAN Layer 3 Gateway


The slide shows how to enable how to enable VXLAN Layer 3 Gateway functionality on an MX Series router (not supported on
QFX5100 series). Also, notice that VRRP has been enable on router as64512.
The VRRP/IRB configuration for router as64513 is as follows...
[edit interfaces irb]
lab@vmx2# show
unit 0 {
family inet {
address 10.1.1.11/24 {
vrrp-group 1 {
virtual-address 10.1.1.254;
priority 100;
}
}
}
}
The bridge domain configuration on router as64513 would be the identical to that shown on the slide.

www.juniper.net VXLAN • Chapter 4–31


Advanced Data Center Switching

Multicast Transit Traffic


As you know, multicast is used in the control plane for VXLAN. It helps in the forwarding of BUM traffic (here we care about
the multicast traffic). Normally, when a VTEP receives multicast traffic from an attached server, it will send a copy to all other
locally attached servers on the same VLAN. It will also send a VXLAN encapsulated copy over the IP fabric using the
multicast-group for the VXLAN segment. That is, every remote VTEP will receive a copy of the original multicast packet,
regardless of whether on not they have any attached receivers. If you know that there are no receivers attached to any
remote VTEPs for a particular multicast group, you can use the command on the slide to help stop the transmission of transit
multicast traffic to uninterested VTEPs.

Chapter 4–32 • VXLAN www.juniper.net


Advanced Data Center Switching

Preserve Original VLAN Tag


As you know, the default behavior of a Juniper Networks device acting as a VXLAN Layer 2 Gateway is to strip the original
VLAN tag of Ethernet frames received from locally attached receivers. Another default behavior of those same devices, is to
automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet frame. The
slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN
tagging is to preserve the 802.1p bits for class of service purposes.

www.juniper.net VXLAN • Chapter 4–33


Advanced Data Center Switching

PIM State Verification


The command on the slide helps determine the current (*,G) and (S,G) state for a router. From the point of view of a VXLAN
Gateway, the (*,G) state should instantiate as soon as you commit the vxlan statement in the configuration. Any (S,G) state
means that the gateway has received multicast traffic (BUM traffic encapsulated in VXLAN) from a remote VTEP allowing it to
learn the remote VTEP’s IP address, so the local gateway has instantiated a SPT towards that remote VTEP.

Chapter 4–34 • VXLAN www.juniper.net


Advanced Data Center Switching

PIM Neighbors
The commands on the slide verify which PIM neighbors have been discovered and the associated settings for the neighbors.

www.juniper.net VXLAN • Chapter 4–35


Advanced Data Center Switching

VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.

Chapter 4–36 • VXLAN www.juniper.net


Advanced Data Center Switching

VTEP Source and Remote


The source command allows you see the locally configured values for a gateway. The remote command allows you to see
the details of the remotely learned gateway/VTEPs.

www.juniper.net VXLAN • Chapter 4–37


Advanced Data Center Switching

MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.

Chapter 4–38 • VXLAN www.juniper.net


Advanced Data Center Switching

We Discussed:
• Reasons why you would use VXLAN in your data center;
• The control and data plane of VXLAN in a controller-less overlay; and
• Configuration and monitoring of VXLAN when using multicast signaling.

www.juniper.net VXLAN • Chapter 4–39


Advanced Data Center Switching

Review Questions:
1.

2.

3.

Chapter 4–40 • VXLAN www.juniper.net


Advanced Data Center Switching

Lab: VXLAN
The slide provides the objective for this lab.

www.juniper.net VXLAN • Chapter 4–41


Advanced Data Center Switching
Answers to Review Questions
1.
Major vendors of virtualization product support VXLAN to provide the layer 2 stretch over an IP-based data center. If the vSwitches
of your virtualized product ONLY support VXLAN, then more than likely your other networking devices will need to support
VXLAN as well.
2.
A VXLAN Gateway automatically removes the VLAN tag for an Ethernet frames received from a locally attached server.
3.
show ethernet-switching vxlan-tunnel-end-point remote mac-table on a QFX5100 Series switch or
show l2-learning vxlan-tunnel-end-point remote mac-table on an MX Series router can be used to view the
MAC learned from remote gateways.

Chapter 4–42 • VXLAN www.juniper.net

You might also like