You are on page 1of 10

Multicast Traffic in Vxlan Using Underlay

eos.arista.com/eos-4-20-5f/multicast-in-vxlan-using-underlay

By Saravanan Balasubramanian

This document provides an Arista specific solution to deliver multicast


traffic in a Vxlan environment where L2 subnet has been extended over an
L3 cloud. Prior to 4.20.5F, multicast traffic ingressing on a Vxlan VLAN
would be flooded to all Vxlan Tunnel Endpoints (VTEPs), which may not be
optimal in terms of bandwidth utilization. The solution described below
uses PIM in the underlay to build a path between source and receivers.
There are two main parts:

1. Injecting the source IP address ( an address in the overlay ) into the


underlay, which is needed for RPF checks.
2. All link local IGMP, PIM, and non-link local multicast traffic will not be
sent or received on PIM enabled interfaces on the overlay.

Contents [hide]

Example
Source Route (Multicast Host Route)
RPF Lookup
MLAG Example
Platform compatibility
Configuration
Troubleshooting
Multi-Agent Protocol Model
Ribd Protocol Model
Limitations
General
Unicast Host Route
Resources

Example
The rest of the document will use the following topology and configuration
to explain how the feature works. In the example network, the source is at
VTEP A and the receivers are at VTEP B. Notice one receiver is in the same
subnet as the source, while the other receiver is in a different subnet.

When the receivers join the group:

1. VTEP B receives an IGMP V2 join on VLAN 20 and VLAN 30, the IGMP
reports will remain local. Note: Prior to 4.20.5F, the IGMP reports
would have been Vxlan encapsulated and sent on the overlay to VTEP
1/10
A.
2. If VTEP B has PIM enabled and RP configured, VTEP B sends a (*,G)
join on the underlay towards the RP. In the case of IGMP V3 source
join, a PIM S,G join is sent towards the source only after VTEP B learns
of the unicast route to the source.

When the source starts sending multicast traffic:

1. VTEP A creates an S,G route with the incoming interface (IIF) as VLAN
10.
2. VTEP A advertises source route ( /32 host route ) into the underlay so
that everyone in the underlay is aware of the source.
3. The traffic is not Vxlan encapsulated and sent on the underlay. At this
point, PIM operates in its usual way. PIM registers with the RP. RP
forwards the multicast data down the RP tree towards VTEP B. RP
learns the source route and switches to SPT.
4. On VTEP B, the S,G route switches to SPT when it learns the source
route.

Source Route (Multicast Host Route)


Source routes can be advertised in both multi-agent and ribd protocol
model. The ribd protocol model support was added in 4.20.5.1F. When
using multi-agent protocol model, the sources routes are injected into the
Multicast Routing Information Base (MRIB). This is an alternate routing
table used specifically by multicast for doing RPF lookup. When using ribd
protocol model, the sources are injected into Unicast Routing Information
Base (URIB). The first hop router (e.g. VTEP A) chooses to inject the source
/32 when the following critieria are met:

1. ip multicast source route export is configured on the incoming


interface
2. Incoming interface is DR
2/10
3. Incoming interface is PIM enabled (configured with ip pim sparse-
mode or ip pim border-router)
4. Source is directly connected
5. At least one (S,G) is present for the source
6. An ARP entry is present for the source (required in 4.20.5.1F)
7. MAC address of source is learnt from a local port (required in
4.20.5.1F)

Multicast traffic sent by the source does not do ARP resolution. So PIM
internally forces an ARP resolution for the source. This helps PIM
determine if the source is on the local VTEP or remote VTEP. The source
route is injected only by the local VTEP. After injecting the source route, the
routes have to be redistributed. Currently, only BGP supports redistribute
attached-host for the URIB and the MRIB. A source route is withdrawn
when the S,G route is deleted or one of the above condition fails. If several
S,G routes exists for the source, all the S,G routes have to age out or be
deleted for the source route to be withdrawn.

RPF Lookup
When PIM does a RPF lookup, it first looks up the MRIB. If no route is
found, PIM falls back on the URIB. In the case of a first hop router where
the source is directly connected, the MRIB will not have any source routes.
On VTEP A, the RPF will be Vlan 10 where the multicast data traffic was
seen. In the multi-agent protocol mode, all the source routes are injected
into the MRIB, and therefore advertised and stored in the peer’s MRIB. On
all the PIM routers in the underlay after receiving the BGP update
containing the source advertisement, PIM will find the source route in the
MRIB. This will allow PIM to send joins to the correct VTEP. On the last hop
router (VTEP B), after receiving the BGP update, the MRIB will have the
source route while the URIB will have the directly connected route. Since
PIM first looks up the MRIB, the last hop router will know that the source is
not local. In the ribd protocol mode, all the source routes are injected into
the URIB. BGP advertises the source routes and stores them in the peer’s
URIB.

MLAG Example
In a Mlag scenario, PIM agents run on both Mlag peers and work as
independent routers. PIM Hellos resolve DR-ship, while PIM asserts
maintain correct forwarding state. A Vxlan VLAN bridges PIM Hellos
between peers locally but will not be sent on the overlay. Each SVI on each
peer of a Mlag needs to have a unique address so neighborship can be
established. Multicast does not work with ip address virtual because the
virtual address performs source NAT on all packets originating with the
virtual address. This causes PIM and IGMP control packets to be dropped
by the kernel. Instead use ip virtual-router address. Because the Pim

3/10
Hellos are not sent on the overlay, the set of addresses used for an SVI can
be repeated in each VTEP. In the example below, 10.1.1.1 and 10.1.1.3
configured on VTEP A are reused on the same subnet for VTEP B.

Platform compatibility
DCS-7050QX, DCS-7050SX, DCS-7050TX support in 4.20.5.1F
DCS-7060CX, DCS-7060CX2, DCS-7060SX2 support in 4.20.5.1F
DCS-7260CX, DCS-7260CX3, DCS-7260QX support in 4.20.5.1F
DCS-7500 and DCS-7280 series support in 4.20.5F
DCS-7300X series support in 4.20.5.1F.
DCS-7320X series support in 4.20.5.1F.

Configuration
To inject a source route, configure ip multicast source route export on
the incoming interface.

Arista(config)#interface Vlan10
Arista(config-Vl10)#ip pim sparse-mode
Arista(config-Vl10)#ip multicast source route export

To redistribute the source routes in the MRIB via BGP while running multi-
agent protocol model, configure redistribute attached-host for the IPv4
multicast address-family. Activate the neighbor to establish a BGP
connection.

4/10
Arista(config-router-bgp)#address-family ipv4 multicast
Arista(config-router-bgp-af)#neighbor 3.0.0.2 activate
Arista(config-router-bgp-af)#redistribute ?
attached-host Multicast source routes
connected Connected routes
isis IS-IS routes
static Static multicast routes
Arista(config-router-bgp-af)#redistribute attached-host

To redistribute the source routes in the URIB via BGP while running ribd
protocol model, configure redistribute attached-host under router bgp.

Arista(config-router-bgp)#redistribute attached-host

This is a sample configuration for a VTEP for the setup above using multi-
agent protocol model.

service routing protocol model multi-agent

ip pim rp-address 15.15.15.15 225.1.1.1/32

interface Loopback0
ip address 1.1.1.1/32

interface vxlan1
vxlan source-interface Loopback0
vxlan vlan10 vni 10000

! Interface to the underlay


interface Ethernet1
ip address 3.0.0.1/24
ip pim sparse-mode

interface vlan10
ip address 10.1.1.1/24
ip pim sparse-mode
ip multicast source route export

router bgp 10
router-id 0.0.0.2

address-family ipv4 multicast


neighbor 3.0.0.2 activate
redistribute attached-host

This is a sample configuration for a VTEP for the setup above using ribd
protocol model.

5/10
service routing protocol model ribd

ip pim rp-address 15.15.15.15 225.1.1.1/32

interface Loopback0
ip address 1.1.1.1/32

interface vxlan1
vxlan source-interface Loopback0
vxlan vlan10 vni 10000

! Interface to the underlay


interface Ethernet1
ip address 3.0.0.1/24
ip pim sparse-mode

interface vlan10
ip address 10.1.1.1/24
ip pim sparse-mode
ip multicast source route export

router bgp 10
router-id 0.0.0.2
redistribute attached-host

Troubleshooting
On the first-hop router, to verify the S,G has been created use show ip
mroute. The RPF should not be using the source route. Instead, the RPF
should use directly connected route to the source in the URIB.

Arista#show ip mroute sparse-mode


PIM Sparse Mode Multicast Routing Table
Flags: E - Entry forwarding on the RPT, J - Joining to the SPT
R - RPT bit is set, S - SPT bit is set, L - Source is attached
W - Wildcard entry, X - External component interest
I - SG Include Join alert rcvd, P - (*,G) Programmed in hardware
H - Joining SPT due to policy, D - Joining SPT due to protocol
Z - Entry marked for deletion, C - Learned from a DR via a register
A - Learned via Anycast RP Router, M - Learned via MSDP
N - May notify MSDP, K - Keepalive timer not running
T - Switching Incoming Interface, B - Learned via Border Router
RPF route: U - From unicast routing table
M - From multicast routing table
225.1.1.1
10.1.1.2, 0:00:52, flags: SL
Incoming interface: Vlan10
RPF route: [U] 10.1.1.0/24 [0/0]
Outgoing interface list:
Ethernet1

As of 4.20.5.1F, an ARP entry should exist for each source. The MAC address
of the source should be learnt on a local port.

6/10
Arista#show arp 10.1.1.2
Address Age (min) Hardware Addr Interface
10.1.1.2 N/A 0012.0100.0001 Vlan10, Port-Channel1

Arista#show mac address-table address 0012.0100.0001 vlan 10


Mac Address Table
------------------------------------------------------------------
Vlan Mac Address Type Ports Moves Last Move
---- ----------- ---- ----- ----- ---------
10 0012.0100.0001 DYNAMIC Po1 1 7:40:43 ago
Total Mac Addresses for this criterion: 1
Multicast Mac Address Table
------------------------------------------------------------------
Vlan Mac Address Type Ports
---- ----------- ---- -----
Total Mac Addresses for this criterion: 0

Multi-Agent Protocol Model


On the first-hop router, to verify the source route has been injected into
the underlay use show rib multicast route ip. The source route should be
advertised by BGP which can use checked with show bgp ipv4 multicast
or more specifically show bgp neighbor advertised-routes.

Arista#show rib multicast route ip route-input


VRF name: default, VRF ID: 0x0, Protocol: route-input
Codes: C - Connected, S - Static, P - Route Input
B - BGP, O - Ospf, O3 - Ospf3, I - Isis
> - Best Route, * - Unresolved Nexthop
L - Part of a recursive route resolution loop
A - Nexthop not resolved in ARP/ND
>P 0.0.0.0/8 [1/0]
>P 10.1.1.2/32 [250/0]
>P 127.0.0.0/8 [1/0]

Arista#show bgp neighbor 3.0.0.1 ipv4 multicast advertised-routes


BGP routing table information for VRF default
Router identifier 0.0.0.2, local AS number 10
Route status codes: s - suppressed, * - valid, > - active, # - not
installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup, L -
labeled-unicast, q - Queued for advertisement
% - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL
Nexthop - Link Local Nexthop
Network Next Hop Metric LocPref Weight
Path
* > 0.0.0.0/8 3.0.0.2 0 - 0 10 ?
* > 10.1.1.2/32 3.0.0.2 0 - 0 10 ?
* > 127.0.0.0/8 3.0.0.2 0 - 0 10 ?

On the underlay and last-hop router, verify that the S,G is using the source
route in MRIB with show ip mroute.

7/10
Arista#show ip mroute sparse-mode
PIM Sparse Mode Multicast Routing Table
Flags: E - Entry forwarding on the RPT, J - Joining to the SPT
R - RPT bit is set, S - SPT bit is set, L - Source is attached
W - Wildcard entry, X - External component interest
I - SG Include Join alert rcvd, P - (*,G) Programmed in hardware
H - Joining SPT due to policy, D - Joining SPT due to protocol
Z - Entry marked for deletion, C - Learned from a DR via a register
A - Learned via Anycast RP Router, M - Learned via MSDP
N - May notify MSDP, K - Keepalive timer not running
T - Switching Incoming Interface, B - Learned via Border Router
RPF route: U - From unicast routing table
M - From multicast routing table
225.1.1.1
10.1.1.2, 0:00:52, flags: SR
Incoming interface: Vlan10
RPF route: [M] 10.1.1.2/32 [200/0] via 3.0.0.1
Vlan20

On the underlay and last-hop router, the route should be received by BGP
using show bgp commands.

Arista#show bgp neighbor 3.0.0.2 ipv4 multicast received-routes


BGP routing table information for VRF default
Router identifier 0.0.0.1, local AS number 20
Route status codes: s - suppressed, * - valid, > - active, # - not
installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup, L -
labeled-unicast
% - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL
Nexthop - Link Local Nexthop
Network Next Hop Metric LocPref Weight Path
* 0.0.0.0/8 3.0.0.1 0 - 0 10 ?
* > 10.1.1.2/32 3.0.0.1 0 - 0 10 ?
* 127.0.0.0/8 3.0.0.1 0 - 0 10 ?

On the underlay and last-hop router, the route should be present in the
MRIB, which can be checked using show ip route multicast.

Arista#show ip route multicast 10.1.1.2/32


Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B I - iBGP, B E - eBGP,
R - RIP, I - IS-IS, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route
B E 10.1.1.2/32 [0/200] via 3.0.0.1, None, Ethernet5

Ribd Protocol Model


On the first-hop router, to verify the source route has been injected into
the underlay and redistributed use show ip bgp. Verify using the same
command in the underlay and last hop router.

8/10
Arista#show ip bgp
BGP routing table information for VRF default
Router identifier 0.0.0.2, local AS number 10
Route status codes: s - suppressed, * - valid, > - active, # - not
installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup, L -
labeled-unicast
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL
Nexthop - Link Local Nexthop
Network Next Hop Metric LocPref
Weight Path
* > 3.0.0.0/24 3.0.0.2 0 100 0
20 i
* > 10.1.1.2/32 - 0 0 -
?

Limitations

General
1. This feature is only supported on default VRFs on both the underlay
and the overlay.
2. Topologies where PIM routes are connected to the edge of Vxlan
VLANs are not supported. This solution expects hosts to be
connected to the Vxlan VLANs.
3. With the current implementation, the VTEPs cannot be non-Arista
routers. In a future release, a solution for interop will be provided.
4. With the current implementation, the VTEPs have to be layer 3 with
PIM and IGMP running. In a future release, we will provide a solution
where a VTEP can be layer 2 and still manage to get the multicast
traffic through the underlay.
5. In the scenario where the source moves, there is a possibility for
traffic loss. For example, in the above topology, if the source moves
from VTEP A to VTEP B, the S,G route on VTEP A will never age out. On
VTEP B, the S,G route has IIF pointing towards the underlay and any
traffic seen on VLAN 20 would cause PIM to install a fastdrop. The
activity on the fastdrop route on VTEP B will keep the S,G route alive,
and VTEP B will continue to send joins towards the source. Since the
first hop router (VTEP A) is receiving joins, it prevents the S,G route
from aging out. VTEP A continues to advertise the source route. Only
when the source route is withdrawn can VTEP B create a route with
VLAN 20 as the new IIF.
6. In the scenario where the BGP update message for the source route
is processed after the multicast data traffic and the last hop router is
configured with ip multicast source route export, the last hop
router might assume it is the first hop and start advertising the
source route. Since both first and last hop routers are advertising the
source routes, the S,G joins might be sent to the last hop router
9/10
causing traffic loss.

Issues described in #5 and #6 are bugs resolved in 4.20.5.1F.

Unicast Host Route


Instead of using ip multicast source route export, configure ip attached-
host route export on the incoming interface. This injects the host route
into URIB when the ARP entry is populated. The ARP entry happens only
when unicast traffic is sent to the source. To advertise the source routes,
redistribute attached-host is configured under the ipv4 unicast BGP
address-family since the source routes are in the URIB. This will work, but
requires further thought on an mlag. For example, on an mlag, unicast
traffic flowing to the source might flow through the non-DR, which will
result in the non-DR advertising the source route. A BGP filter is needed to
prevent the source route from reaching the DR over the peer-link.
Otherwise the DR might assume that the peer has a better path to the
source and set its incoming interface to the peer-link.

Resources
1. ARP converted Host Routes injection into BGP
2. MP BGP for IPv4 Multicast

10/10

You might also like