You are on page 1of 50

Cisco ACI Multi-Pod/Multi-Site

Deployment Options
Max Ardica – Principal Engineer

BRKACI-2003
Agenda
• ACI Introduction and Multi-Fabric Use Cases
• ACI Multi-Fabric Design Options
• ACI Stretched Fabric Overview
• ACI Multi-Pod Deep Dive
• ACI Multi-Site Solutions Overview
• Conclusions

3
Session Objectives

At the end of the session, the participants should be able to:


 Articulate the different Multi-Fabric deployment options
offered with Cisco ACI
 Understand the design considerations associated to those
options
Initial assumption:
 The audience already has a good knowledge of ACI main
concepts (Tenant, BD, EPG, L2Out, L3Out, etc.)

4
Introducing: Application Centric Infrastructure (ACI)
Web App DB
Outside QoS QoS QoS
(Tenant
Filter Service Filter
VRF)

APIC

Application Policy
ACI Fabric Infrastructure Controller
Integrated GBP VXLAN Overlay

6
ACI MultiPod/MultiSite Use Cases
 Single Site Multi-Fabric
Multiple Fabrics connected within the same DC (between halls, buildings, … within the
same Campus location)
Cabling limitations, HA requirements, Scaling requirements

 Single Region Multi-Fabric (Classic Active/Active scenario)


Scoped by the application mobility domain of 10 msec RTT
BDs/IP Subnets can be stretched between sites
Desire is reducing as much as possible fate sharing across sites, yet maintaining
operational simplicity

 Multi Region Multi-Fabric


Creation of separate Availability Zones
Disaster Recovery – Minimal Cross Site Communication
Deployment of applications not requiring Layer 2 adjacency
7
Agenda
• ACI Introduction and Multi-Fabric Use Cases
• ACI Multi-Fabric Design Options
• ACI Stretched Fabric Overview
• ACI Multi-Pod Deep Dive
• ACI Multi-Site Solutions Overview
• Conclusions

8
ACI Multi-Fabric Design Options
Single APIC Cluster/Single Domain Multiple APIC Clusters/Multiple Domains
Stretched Fabric Dual-Fabric Connected (L2 and L3 Extension)
ACI Fabric ACI Fabric 1 ACI Fabric 2
Site 1 Site 2

L2/L3

Multi-Pod (Q3CY16) Multi-Site (Future)

Pod ‘A’ IP Network Pod ‘n’ Site ‘A’ IP Network Site ‘n’

MP-BGP - EVPN

… MP-BGP - EVPN

9
APIC Cluster
Agenda
• ACI Introduction and Multi-Fabric Use Cases
• ACI Multi-Fabric Design Options
• ACI Stretched Fabric Overview
• ACI Multi-Pod Deep Dive
• ACI Multi-Site Solutions Overview
• Conclusions

10
For more information on ACI Stretched
Stretched ACI Fabric Fabric Deployment:
BRKACI-3503

ACI Stretched Fabric

DC Site 1 DC Site 2

vCenter

Transit leaf Transit leaf

 Fabric stretched to two sites  works as a  Work with one or more transit leaf per site 
single fabric deployed within a DC any leaf node can be a transit leaf
 One APIC cluster  one management and  Number of transit leaf and links dictated by
configuration point redundancy and bandwidth capacity decision
 Anycast GW on all leaf switches  Different options for Inter-site links (dark fiber, 11

DWDM, EoMPLS PWs)


Stretched ACI Fabric
Support for 3 Interconnected Sites (Q2CY16) Site 2

 Transit leafs in all sites connect to the


local and remote spines

Site 1

Site 3
Transit Leaf

2x40G or 4x40G

16
Agenda
• ACI Introduction and Multi-Fabric Use Cases
• ACI Multi-Fabric Design Options
• ACI Stretched Fabric Overview
• ACI Multi-Pod Solution Deep Dive
• ACI Multi-Site Solutions Overview
• Conclusions

17
ACI Multi-Pod Solution
Overview
Inter-Pod Network

Pod ‘A’ Pod ‘n’

MP-BGP - EVPN

Single APIC Cluster


IS-IS, COOP, MP-BGP IS-IS, COOP, MP-BGP

 Multiple ACI Pods connected by an IP Inter-Pod  Forwarding control plane (IS-IS, COOP)
L3 network, each Pod consists of leaf and spine fault isolation
nodes  Data Plane VXLAN encapsulation between
 Managed by a single APIC Cluster Pods
18
 Single Management and Policy Domain  End-to-end policy enforcement
ACI Multi-Pod Solution
Use Cases
 Handling 3-tiers physical Pod
cabling layout Inter-Pod
Leaf Nodes Network
Cable constrain (multiple
buildings, campus, metro)
requires a second tier of “spines” Spine Nodes
Preferred option when compared
to ToR FEX deployment

 Evolution of Stretched Fabric Pod 1 Pod 2


design
Metro Area (dark fiber, DWDM),
L3 core
APIC Cluster
DB Web/App Web/App
>2 interconnected sites
19
ACI Multi-Pod Solution
SW and HW Requirements

 Software
The solution will be available from Q3CY16 SW Release

 Hardware
The Multi-Pod solution can be supported with all currently shipping Nexus
9000 platforms
The requirement is to use multicast in the Inter-Pod Network for handling
BUM (L2 Broadcast, Unknown Unicast, Multicast) traffic across Pods

20
ACI Multi-Pod Solution
Supported Topologies
Intra-DC Two DC sites connected
back2back
10G/40G/100G
40G/100G 40G/100G
Pod 1 Pod n Pod 1 40G/100G 40G/100G
Pod 2
Dark fiber/DWDM (up
to 10 msec RTT)

APIC Cluster APIC Cluster
DB Web/App Web/App DB Web/App Web/App

3 DC Sites Multiple sites interconnected


Pod 1 Pod 2
40G/100G
10G/40G/100G
40G/100G
by a generic L3 network
Dark fiber/DWDM (up 40G/100G 40G/100G
to 10 msec RTT)

L3
40G/100G 40G/100G 40G/100G

21
Pod 3
ACI Multi-Pod Solution
Scalability Considerations

Those scalability values may change without warning before the


Multi-Pod solution gets officially released

 At FCS, the maximum number of supported ACI leaf nodes is 400


(across all Pods)
 200 is the maximum number of leaf nodes per Pod

 Use case 1: larger number of Pods (up to 20) with a small number of
leaf nodes in each Pod (20-30)
 Use case 2: low number of Pods (2-3) with large number of leaf
nodes in each Pod (up to 200)

22
ACI Multi-Pod Solution
Inter-Pod Network (IPN) Requirements
 Not managed by APIC, must be pre-configured

 IPN topology can be arbitrary, not mandatory to 40G/100G 40G/100G


Pod ‘A’ Pod ‘B’
connect to all spine nodes
MP-BGP - EVPN
 Main requirements:
40G/100G interfaces to connect to the spine nodes DB Web/App
APIC Cluster Web/App

Multicast BiDir PIM  needed to handle BUM traffic


DHCP Relay to enable spine/leaf nodes discovery across Pods
OSPF to peer with the spine nodes and learn VTEP reachability
Increased MTU support to handle VXLAN encapsulated traffic
QoS (to prioritize intra APIC cluster communication)

23
APIC – Distributed Multi-Active Data Base

One copy is ‘active’ for every


The Data Base is specific portion of the Data
replicated across APIC Base
nodes
Shard 1 Shard 1 Shard 1
APIC APIC APIC
Shard 2 Shard 3 Shard 2 Shard 3 Shard 2 Shard 3

 Processes are active on all nodes (not active/standby)


 The Data Base is distributed as active + 2 backup instances (shards) for every attribute
24
APIC – Distributed Multi-Active Data Base

Shard 1 Shard 11
Shard Shard 1
APIC APIC APIC
Shard 2 Shard 3 Shard 2 Shard 3 Shard 2 Shard 3

 When an APIC fails a backup copy of the shard is promoted to active and it takes over
for all tasks associated with that portion of the Data Base

26
APIC – Design Considerations

APIC APIC APIC APIC APIC

X X
APIC APIC APIC

Additional APIC will increase the system APIC will allow read-only access to the DB
scale (today up to 5 nodes supported) but when only one node remains active
does not add more redundancy (standard DB quorum)

APIC APIC 800km

There is a max supported distance between


data base (APIC) nodes – 800km
APIC
X
APIC APIC APIC 800km APIC

NOT RECOMMENDED: failure of site 1 may


cause irreparable loss of data for some
APIC

shards and inconsistent behaviour for others


ACI Multi-Pod Solution
APIC Cluster Deployment Considerations

 APIC cluster is stretched across multiple Pods


Central Mgmt for all the Pods (VTEP address, VNIDs,
class-IDs, GIPo, etc.)
Pod ‘A’ Pod ‘B’
Centralized policy definition
Recommended not to connect more than two APIC MP-BGP - EVPN
nodes per Pod (due to the creation of three replicas
per ‘shard’) DB APIC Cluster
Web/App Web/App

 The first APIC node connects to the ‘Seed’ Pod


Drives auto-provisioning for all the remote Pods

 Pods can be auto-provisioned and managed


even without a locally connected APIC node

28
ACI Multi-Pod Solution
Auto-Provisioning of Pods
DHCP requests are relayed
by the IPN devices back to
Provisioning interfaces on the the APIC in Pod 1 Spine 1 in Pod 2 connects
spines facing the IPN and EVPN to the IPN and generates
control plane configuration 5 DHCP requests

3 4
6
DHCP response reaches Spine 1
allowing its full provisioning

2 7

Discovery and Discovery and


provisioning of all the 1 Single APIC Cluster provisioning of all the
devices in the local Pod 8 devices in the local Pod
9
APIC Node 1 connected to a APIC Node 2 connected
APIC Node 2 joins the to a Leaf node in Pod 2
Leaf node in ‘Seed’ Pod 1
‘Seed’ Pod 1 Cluster Pod 2

10 Discover other Pods following the same procedure


29
ACI Multi-Pod Solution
IPN Control Plane
IPN Global VRF
IP Prefix Next-Hop
10.0.0.0/16 Pod1-S1, Pod1-S2, Pod1-S3, Pod1-S4

10.1.0.0/16 Pod2-S1, Pod2-S2, Pod2-S3, Pod2-S4


 Separate IP address pools for VTEPs
assigned by APIC to each Pod
OSPF OSPF
Summary routes advertised toward the IPN
via OSPF routing
IS-IS to OSPF
 Spine nodes redistribute other Pods mutual redistribution
10.0.0.0/16 10.1.0.0/16
summary routes into the local IS-IS
process DB APIC Cluster
Web/App Web/App
Needed for local VTEPs to communicate with
remote VTEPs Leaf Node Underlay VRF
IP Prefix Next-Hop
10.1.0.0/16 Pod1-S1, Pod1-S2, Pod1-S3, Pod1-S4

30
ACI Fabric – Integrated Overlay
Decoupled Identity, Location & Policy

APIC

VTEP VXLAN IP Payload

VTEP VTEP VTEP VTEP VTEP VTEP

 ACI Fabric decouples the tenant end-point address, it’s “identifier”, from the location of that end-
point which is defined by it’s “locator” or VTEP address
 Forwarding within the Fabric is between VTEPs (ACI VXLAN tunnel endpoints) and leverages an
extender VXLAN header format referred to as the ACI VXLAN policy header
 The mapping of the internal tenant MAC or IP address to location is performed by the VTEP using
a distributed mapping database 31
Host Routing - Inside
Inline Hardware Mapping DB - 1,000,000+ hosts
10.1.3.35 Leaf 3
10.1.3.11 Leaf 1
Global Station Table Proxy Proxy Proxy Proxy fe80::8e5e Leaf 4
contains a local cache fe80::5b1a Leaf 6
of the fabric endpoints

10.1.3.35 Leaf 3
Proxy Station Table contains
addresses of ‘all’ hosts attached
* Proxy A to the fabric

10.1.3.11 Port 9

10.1.3.11 10.1.3.35 fe80::462a:60ff:fef7:8e5e fe80::62c5:47ff:fe0a:5b1a

Local Station Table  The Forwarding Table on the Leaf Switch is divided between local (directly attached) and
contains addresses of global entries
‘all’ hosts attached  The Leaf global table is a cached portion of the full global table
directly to the Leaf
 If an endpoint is not found in the local cache the packet is forwarded to the ‘default’
forwarding table in the spine switches (1,000,000+ entries in the spine forwarding table)
32
ACI Multi-Pod Solution
Inter-Pods MP-BGP EVPN Control Plane

172.16.1.10 Leaf 1 172.16.1.10 Proxy A


 MP-BGP EVPN used to communicate MP-BGP - EVPN
172.16.2.40 Leaf 3 172.16.2.40 Proxy A
Endpoint (EP) and Multicast Group
172.16.1.20 Proxy B 172.16.1.20 Leaf 4
information between Pods
172.16.3.50 Proxy B 172.16.3.50 Leaf 6
All remote Pod entries associated to a Proxy
VTEP next-hop address Proxy A Proxy B

 Single BGP AS across all the Pods COOP

 BGP EVPN on multiple spines in a Pod 172.16.2.4


APIC Cluster 172.16.1.2 172.16.3.5
172.16.1.1
(minimum of two for redundancy) 0
0 0 0

Some spines may also provide the route


reflector functionality (one in each Pod)

34
ACI Multi-Pod Solution
Overlay Data Plane
Group
VTEP IP VNID Tenant Packet
Policy

Spine encapsulates
172.16.2.40 Leaf 4 Leaf 4
172.16.1.20 Proxy B traffic to remote 172.16.1.20
Proxy A
172.16.2.40
Proxy B Spine VTEP Spine encapsulates
traffic to local leaf
3 4
Proxy A Proxy B

172.16.2.40 Pod1 L4

* Proxy A
5 * Proxy B

VM2 unknown, traffic is 2 Leaf learns remote VM1


encapsulated to the local Single APIC Cluster location and enforces policy
Proxy A Spine VTEP (adding 172.16.2.40 172.16.1.20
S_Class information) 1 6
VM1 sends traffic If policy allows it, VM2
destined to remote VM2 receives the packet

35
ACI Multi-Pod Solution
Overlay Data Plane (2)
Group
VTEP IP VNID Tenant Packet
Policy

172.16.1.20 Pod2 L4
172.16.2.40 Pod1 L4

* Proxy A
8 * Proxy B

Leaf learns remote VM2 location 9 Leaf enforces policy in


(no need to enforce policy) Single APIC Cluster ingress and, if allowed,
172.16.2.40 172.16.1.20 encapsulates traffic to
10 7 remote Leaf node L4
VM1 receives the packet VM2 sends traffic back to
From this point VM1 to VM1 communication remote VM1
11 is encapsulated Leaf to Leaf (VTEP to VTEP)

36
ACI Multi-Pod Solution
Handling of Multi-Destination Traffic (BUM*)
IPN replicates traffic to all
the Pods that joined GIPo 1
(optimized delivery to Pods)

Spine 2 is responsible to
4
send GIPo 1 traffic toward
the IPN
3

BUM frame is flooded along the


tree associated to GIPo 1. VTEP
2 5 learns VM1 remote location
*
*
172.16.2.40 Pod1 L4
BUM frame is associated to
GIPo 1 and flooded intra-Pod Single APIC Cluster Proxy B
*
via the corresponding tree 172.16.2.40 172.16.1.20
1 6
VM1 generates a BUM VM2 receives the BUM
frame frame

37

*L2 Broadcast, Unknown Unicast and Multicast


ACI Multi-Pod Solution
Traditional WAN Connectivity

 A Pod does not need to have a dedicated


WAN connection
Pod 1 Pod 2
 Multiple WAN connections can be deployed
across Pods MP-BGP - EVPN

 Traditional L3Out configuration


Shared between tenants or dedicated per tenant
(VRF-Lite)
 VTEPs always select WAN connection in
WAN WAN
the local Pod based on preferred metric
Inbound traffic may require “hair-pinning” across Pod 3
the IPN network
Recommended to deploy clustering technology
when stateful services are deployed

39
ACI Integration with WAN at Scale
‘Project GOLF’ Overview
‘GOLF’ Devices  Addresses both control plane and data
WAN plane scale
VXLAN data plane between ACI spines and
WAN Routers
MP-BGP BGP-EVPN control plane between ACI spines
EVPN IP Network and WAN routers
OpFlex for exchanging config parameters (VRF
names, BGP Route-Targets, etc.)

 Consistent policy enforcement on ACI leaf


nodes (for both ingress and egress
VRF-1
directions)
VRF-2
 ‘GOLF’ Router support (Q3CY16)

Nexus 7000, ASR9000 and ASR1000 (not yet


Web/App
committed) 40
ACI Integration with WAN at Scale
Supported Topologies
Directly Connected
Remote WAN Routers Multi-Pod + GOLF
WAN Routers

WAN WAN WAN

MP-BGP
MP-BGP
EVPN IP Network IP Network
EVPN
MP-BGP
EVPN

41
Multi-Pod and GOLF
Intra-DC Deployment – Control Plane
GOLF
Devices WAN
WAN routes received on the Pod
spines as EVPN routes and translated
MP-BGP EVPN Control Plane to VPNv4/VPNv6 routes with the spine
proxy TEP as Next-Hop
IPN
Public BD subnets advertised to
GOLF devices with the external
spine-proxy TEP as Next-Hop

Multiple
Pods

Web/App Web/App
... Web/App
DB DB
Single
Single APIC Cluster
APIC Domain

42
Multi-Pod and GOLF
Intra-DC Deployment – Control Plane

 Option to consolidate ‘Golf’ and


‘IPN’ devices GOLF WAN
Perform pure L3 routing for Inter-Pod Devices
VXLAN traffic
VXLAN Encap/Decap for WAN to DC IPN
traffic flows

Multiple
Pods

Web/App Web/App
... Web/App
DB DB
Single
Single APIC Cluster
APIC Domain

43
*Not available at FCS
Multi-Pod and GOLF
Multi-DC Deployment – Control Plane

GOLF devices inject host routes


into the WAN or register them in
Host routes for endpoint belonging the LISP database
Host routes for endpoint belonging
to public BD subnets in Pod ‘A’
to public BD subnets in Pod ‘B’

MP-BGP EVPN Control Plane


MP-BGP EVPN Control Plane

Pod ‘A’ IPN Pod ‘B’

Web/App Web/App
DB DB
Single APIC Cluster

46
Multi-Pod and GOLF
Multi-DC Deployment – Data Plane

Traffic from an external user is


GOLF devices VXLAN steered toward the GOLF devices
encapsulate traffic and send it to (via routing or LISP)
the Spine Proxy VTEP address

IPN
Spine encapsulates traffic
Proxy A Proxy B
to the destination VTEP
that can then apply policy

Web/App Web/App
DB DB
Single APIC Cluster

47
Multi-Pod and GOLF
Multi-DC Deployment – Data Plane (2)

Traffic is received by
the external user
GOLF devices de-encapsulate traffic WAN
and route it to the WAN (or LISP
encapsulates to the remote router)

IPN

Leaf applies policy and


encapsulates traffic directly to
the local GOLF VTEP address

Web/App Web/App
DB
Single APIC Cluster

48
ACI Multi-Pod Solution
Summary

 ACI Multi-Pod solution represents the natural evolution of the


Stretched Fabric design
 Combines the advantages of a centralized mgmt and policy
domain with fault domain isolation (each Pod runs independent
control planes)
 Control and data plane integration with WAN Edge devices
(Nexus 7000/7700 and ASR 9000) completes and enriches the
solution
 The solution is planned to be available in Q3CY16 and will be
released with a companion Design Guide

49
Agenda
• ACI Introduction and Multi-Fabric Use Cases
• ACI Multi-Fabric Design Options
• ACI Stretched Fabric Overview
• ACI Multi-Pod Solution Deep Dive
• ACI Multi-Site Solutions Overview
• Conclusions

50
For more information on ACI Dual Fabric
ACI Dual-Fabric Solution Deployment:
Overview BRKACI-3503

ACI Fabric 1 ACI Fabric 2

L2/L3
DCI

 Independent ACI Fabrics interconnected via L2  Data Plane VXLAN encapsulation


and L3 DCI technologies terminated at the edge of each Fabric
 Each ACI Fabric is independently managed by a VLAN hand-off to the DCI devices for providing
Layer 2 extension service
separate APIC cluster
 Requires to classify inbound traffic for
 Separate Management and Policy Domains
providing end-to-end policy extensibility
51
ACI Multi-Site (Future)
Overview
Inter-Site Network

Site ‘A’ Site ‘n’

MP-BGP - EVPN


Separate APIC
IS-IS, COOP, MP-BGP Clusters IS-IS, COOP, MP-BGP

 Multiple ACI fabrics connected via IP Network  End to end policy enforcement
 Separate availability zones with maximum isolation with policy collaboration
 Separate APIC clusters, separate management and  Support multiple sites
policy domains, separate fabric control planes  Not bound by distance 52
ACI Multi-Site
Reachability
Inter-Site Network

Site ‘A’ Site ‘n’

MP-BGP - EVPN


Separate APIC
Clusters

 Host Level Reachability Advertised between Fabrics via BGP


 Transit Network is IP Based
 Host Routes do not need to be advertised into transit network
 Policy Context is carried with packets as they traverse the transit IP Network
 Forwarding between multiple Fabrics is allowed (not limited to two sites)
ACI Multi-Site
Policy Collaboration
 EPG policy is exported by source site to desired peer target site fabrics
Fabric ‘A’ advertises which of its endpoints it allows other sites to see
 Target site fabrics selectively imports EPG policy from desired source sites
Fabric ‘B’ controls what it wants to allow its endpoints to see in other sites
 Policy export between multiple Fabrics is allowed (not limited to two sites)

Site ‘A’ Site ‘B’

Web1 Web2 Import Web & App Export Web & App Web1 Web2
from Fabric ‘B’ to Fabric ‘A’

App1 App2 App1 App2


Export Web, App, Import Web, App,
DB to Fabric ‘B’ DB from Fabric ‘A’
dB1 dB1 dB2
Scope of Policy Inter-Site Network

Site ‘A’ Site ‘n’

MP-BGP - EVPN


Separate APIC
Clusters

Web App Web App

 Policy is applied at provider of the contract (always at fabric where the provider endpoint is
connected)
Scoping of changes
No need to propagate all policies to all fabrics
Different policy applied based on source EPG (which fabric)
Agenda
• ACI Introduction and Multi-Fabric Use Cases
• ACI Multi-Fabric Design Options
• ACI Stretched Fabric Overview
• ACI Multi-Pod Solution Deep Dive
• ACI Multi-Site Solutions Overview
• Conclusions

56
Conclusions
 Cisco ACI offers different multi-fabric
options that can be deployed today
 There is a solid roadmap to evolve those
options in the short and mid term
 Multi-Pod represents the natural evolution
of the existing Stretched Fabric design
 Multi-Site will replace the Dual-Fabric
approach

 Cisco will offer smooth and gradual


migration path to drive the adoption of
those new solutions

57
Where to Go for More Information

 ACI Stretched Fabric White Paper


http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_kb-
aci-stretched-fabric.html#concept_524263C54D8749F2AD248FAEBA7DAD78
 ACI Dual Fabric Design Guide
Coming soon!
 ACI Dual Fabric Live Demos
Active/Active ASA Cluster Integration
https://youtu.be/Qn5Ki5SviEA
vCenter vSphere 6.0 Integration
http://videosharing.cisco.com/p.jsp?i=14394

58
Thank you

59

You might also like