You are on page 1of 55

Segment Routing in Datacenter

using Nexus 9000 and 3000


Ambrish Mehta - Principal Engineer (INSBU Engineering)
Swami Narayanan - Principal Engineer (INSBU Engineering)
BRKDCN-2050
Agenda
• What is Segment Routing
• Challenges in Datacenter Networks
• Segment Routing Architecture on Nexus 9000/3000
• Configuration Walk Through
• Deployment Use Cases
• Q&A
What is Segment Routing
“Segment Routing (SR), leverages the source routing paradigm. A
node steers a packet through an ordered list of instructions, called
segments. A segment can represent any instruction, topological or
service-based. A segment can have a local semantic to an SR
node or global within an SR domain”

Source: draft-filsfils-spring-segment-routing

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Segment Routing in DataCenter
• Simple extensions in BGP protocol
Standardized Control Plane • No LDP/RSVP complexities and limits

• Optimal path creation directly at source


Simplified Traffic Engineering
• Remove complexities of RSVP

Adaptive SLA • Dedicated forwarding path & bandwidth


• Performance guarantees

• End-to-end forwarding and TE


Single Operational Model
• Removes multiple layers of technology

Efficient Datapath with • Support ECMP


• Minimize LSP state in network
scalable network
• CPU & memory saving

• Support for NXAPI/DME


Programmatic Interface

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Challenges in Datacenter Networks
BGP
Data Center Network AS

Internet

4 … 4 Spine

2 … 2 3 … 3
Leaf

1 …1 1 …1 Top of the Rack


… Applications

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
BGP
Challenges with Existing DC deployments AS

• Application always takes shortest path


based on protocol algorithm.
• Data traffic is not aware of link 4 … 4
utilization and load in the network.
• Link failure in some part of networks
can create hot spots/bandwidth … …
2 2 3 3
challenges and potential re-hashing in
end to end data forwarding path.

1 … 1 1 … 1

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
BGP
Challenges with Existing DC deployments AS

• Long lived elephant flows can potentially


starve short lived mouse flows for
bandwidth.
4 … 4
• Lack of agility in effectively utilizing
available capacity.
• Operational complexity in tweaking … …
protocol parameters. 2 2 3 3

1 … 1 1 … 1

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Segment Routing Architecture on
Nexus 9000 and 3000
Overview
• Built on top of existing MPLS forwarding infrastructure.
• MPLS label as a forwarding construct to identify segment (Segment ID).
• Predictable Label allocation schema across the network.
• BGP as a control protocol to distribute Label.
• Realizes Source Routing, where a label stack can be pushed by an application.
• Built for Software Defined Networking!!!

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
Segment Routing Control Plane
1) MPLS Label allocation for a given IP prefix
• Dynamic Label Range
• Segment Routing Global Block (SRGB)

2) MPLS Label Exchange with peers.


• BGP is being used as control plane.
• New address families Labeled-Unicast (a.k.a BGP-LU) and Link-State (a.k.a BGP-LS)
have been added.

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Control Plane: Segment Routing Global Block (SRGB)
• Consistent and predictable label values across network.
• Carve a subset of Label block from wider MPLS Label range.
• Default SRGB range is 16,000 to 23,999.
• New attribute “Label Index” is carried in BGP update.
• Label at every node is calculated based on following formula.
Label = SRGB base + Label Index (Received in BGP update)
E.g. Prefix 172.0.11.0/24 with Label Index of 1 gets label 16001
• Recommended to have same SRGB at every node in the network.

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Control Plane : Segment Routing Global Block
SRGB [16000;23999] SRGB [16000;23999] SRGB [16000;23999] SRGB [16000;23999]

172.0.11.0/24
A B C D
BGP-LU BGP-LU BGP-LU

IP: 1.1.1.3/32 IP: 172.0.11.0/24 IP: 172.0.11.0/24


Label: 16001 Label: 16001 Label: Imp-Null
Nexthop: B Nexthop: C Nexthop: D
Label Index: 1 Label Index: 1 Label Index: 1

IP Out-label In-label Out-label In-label Out-label In-label Out-label

172.0.11.0/24 16001 16001 16001 16001 POP Imp-Null -

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
eBGP LU
Control Plane: MPLS Label Allocation SVI
16001
21 16001
24 SRGB MPLS
Dynamic Label
MPLS Label

4
… 4 Spine

16001
21
2 … 2 16001
24 16001
22 3 … 3 16001
20

Leaf

16001
41 1 … 1 16001
34 16001
36 1 … 1 Top of the Rack
172.0.11.0/24 Label Index: 1
Application
BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
SID (Segment IDs)
• Prefix SID
• Node SID
• Peer Node SID
• Peer Adjacency SID
• Peerset SID

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Prefix SID eBGP LU

• Associate MPLS Label with an IP


prefix. 4 … 4
• Prefix is typically a subnet on which
application is hosted inside the
Datacenter. … …
2 2 3 3
• Advertise in BGP with Label Index.

1 … 1 1 … 1
172.0.1.0/24 172.0.2.0/24
Label Index 1 Label Index 2

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Node SID eBGP LU

• Associate MPLS Label with an IP


prefix.
Loopback0:
4.4.4.1/32
Label Index: 401 4
16401 … 4
16402
Loopback0:
4.4.4.2/32
Label Index: 402

• Prefix is loopback configured on a


given node.
• Tag node in the network with MPLS … Loopback0: …
2
16201 2
16202 3.3.3.2/32
Label Index: 202 3
16203 3
16204
Label.
• More Scalable than Prefix SID.
1
16101 … 1
16102
Loopback0:
1.1.1.2/32 1
16103 … 1
16104
Label Index: 102

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Egress Peer Engineering
• Used on Peering Router on Datacenter Edge.
• Peering Routers may not be supporting BGP-LU.
• Allocate MPLS label for engineered peer.
• Exchange EPE data sets via BGP-LS to outside entity (e.g. Orchestrator).
• Orchestrator computes forwarding path based on various user defined policies.
• Orchestrator sends label stack associated for a data path to the Host OR Top of
the Rack (ToR) Switch.

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
BGP-LU
Egress Peer Engineering BGP-V4

16001 Loopback0:
4.4.4.1/32
16 16 Label Index: 1
Peer Routers
Payload Payload
Ingress Node Transit Node EPE Node Payload
16

A D
B C
16001

SRGB [16000;23999] SRGB [16000;23999] SRGB [16000;23999]


BGP-LU BGP-LU E
Internet
16001

16
BGP LS
Payload
Peer Router
Use Label Stack
{ 16001 , 16} F
Orchestrator
Do you have egress
engineered path for
Application me ?

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
BGP-LU
Egress Peer Engineering BGP-V4

16001 Loopback0:
4.4.4.1/32
16 16 Label Index: 1
Peer Routers
Payload Payload
Ingress Node Transit Node EPE Node Payload
16

A D
B C
16001

SRGB [16000;23999] SRGB [16000;23999] SRGB [16000;23999]


BGP-LU BGP-LU E
Internet

Via NX-API/DME
BGP LS
Payload
impose label stack Peer Router
{ 16001 , 16}

F
Orchestrator

Application

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
SID (Segment IDs)
• Prefix SID
• Node SID
• Peer Node SID
• Peer Adjacency SID
• Peerset SID

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
BGP-LU
Peer-Node SID BGP-V4

Loopback0:
4.4.4.1/32
Label Index: 1
Peer Routers
Ingress Node Transit Node EPE Node 31
D1
A D
B C
SRGB [16000;23999] SRGB [16000;23999] SRGB [16000;23999] 32
BGP-LU BGP-LU E1 E
Internet
33
BGP LS
Via NX-API/DME Peer Router
impose label stack Label Nexthop
31 D1 F1 F
32 E1
Orchestrator
33 F1
Application

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
BGP-LU
Peer Adjacency SID Static Route
BGP V4
41
Loopback0:
4.4.4.1/32 Multihop BGP peer
Label Index: 1
Peer Routers
Ingress Node Transit Node EPE Node 31
D1
A D
B C 32 D2
SRGB [16000;23999] SRGB [16000;23999] SRGB [16000;23999] 33
BGP-LU BGP-LU E1 E
Internet
34
BGP LS
Via NX-API/DME Label Nexthop Peer Router
impose label stack
41 ECMP {D1,D2}
F1 F
31 D1

Orchestrator 32 D2

33 E1
Application 34 F1

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
BGP-LU
PeerSet SID BGP V4

Loopback0: Peer Set


4.4.4.1/32
Label Index: 1
Peer Routers
Ingress Node Transit Node EPE Node D1
A D
B C
SRGB [16000;23999] SRGB [16000;23999] SRGB [16000;23999]
BGP-LU BGP-LU E1 E
34 Internet

BGP LS
Via NX-API/DME Peer Router
impose label stack

Label Nexthop F1 F
41 ECMP {D1, E1}
Orchestrator 34 F1

Application

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Configuration Walkthrough
Global Configuration
!Enable Required Feature sets
N9K1#config terminal
N9K1(config)#feature-set mpls
N9K1(config)#install feature-set mpls
N9K1(config)#feature bgp
N9K1(config)#feature mpls segment-routing
N9K1(config)#segment-routing mpls
N9K1(config-segment-routing-mpls))#end
N9K1#
..
!Enable mpls forwarding on respective interfaces
N9K1#config terminal
N9K1(config)#interface <x>
N9K1(config-if)#mpls ip forwarding
N9K1(config-if)#end

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
BGP Configuration: Node/Prefix SID eBGP LU

AS
A B C
Loopback0:

1 2 3
.0 10.10.10.x/24 .1 .1 20.20.20.x/24 .0
1.1.1.3/32
Label Index: 1

router bgp 3
router bgp 2 address-family ipv4 unicast
.. network 1.1.1.3/32 route-map ADD-LABEL-INDEX
template peer AF-LABEL allocate-label route-map ALLOCATE-LABEL-FILTER
address-family ipv4 labeled-unicast template peer AF-LABEL
neighbor 10.10.10.0 address-family ipv4 labeled-unicast Advertise
inherit peer AF-LABEL neighbor 20.20.20.1 Network and set
remote-as 1 Label Allocation
inherit peer AF-LABEL Label Index via
neighbor 20.20.20.0 remote-as 2 Route Map
inherit peer AF-LABEL ..
remote-as 3 route-map ALLOCATE-LABEL-FILTER permit 10
match ip address prefix-list P1
ip prefix-list P1 seq 5 permit 1.1.1.3/32

route-map ADD-LABEL-INDEX permit 10
BGP-LU AF Capability set label-index 1

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Egress Peer Engineering Configuration eBGP LU
eBGP V4
Loopback0:
1.1.1.3/32

B C Label Index: 1
D
AS

2
.0 30.30.30.x/24

3 4
.1 20.20.20.x/24 .0 .1

router bgp 3 router bgp 4


router bgp 2 address-family ipv4 unicast template peer AF-V4
.. network 1.1.1.3/32 route-map ADD-LABEL-INDEX address-family ipv4 unicast
template peer AF-LABEL allocate-label route-map ALLOCATE-LABEL-FILTER neighbor 30.30.30.0
address-family ipv4 labeled-unicast template peer AF-LABEL remote-as 3
neighbor 10.10.10.0 address-family ipv4 labeled-unicast inherit peer AF-V4
inherit peer AF-LABEL template peer AF-V4
remote-as 1 address-family ipv4 unicast
neighbor 20.20.20.0 neighbor 20.20.20.1
inherit peer AF-LABEL inherit peer AF-LABEL
remote-as 3 remote-as 2
neighbor 30.30.30.1
remote-as 4
inherit peer AF-V4 Egress Engineer
egress-engineering Traffic to this peer

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Label Stack Imposition

in-label 100002 allocate policy 168.0.1.0 255.255.255.0
forward
path 1 next-hop 10.0.0.10 out-label-stack 16004 16002 16001

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Orchestration
!Enable Required Feature sets
N9K1#config terminal
N9K1(config)#feature nx-api
N9K1(config)#end
N9K1(config)#

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Orchestration

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Orchestration
import requests
import json

url='http://172.31.203.123/ins'
switchuser='administrator'
switchpassword='cisco123'

myheaders={'content-type':'application/json-rpc'}
payload=[
{
"jsonrpc": "2.0",
"method": "cli",
"params": {
"cmd": "config t",
"version": 1
},
"id": 1
},
{
"jsonrpc": "2.0",
"method": "cli",
"params": {
"cmd": "segment-routing mpls",
"version": 1
},
"id": 2
}
]
response = requests.post(url,data=json.dumps(payload), headers=myheaders,auth=(switchuser,switchpassword)).json()
BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Deployment Use Cases
DC Multi Clos Design - Reference
Peering Router

Spine

Leaf

ToR … … … …

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Why Source Routing in DC ?
• Workload / Compute Systems (Server / VM) come and go on need
basis and provisioned automatically
• High volume traffic within Data Center (East <-> West)
• Host connectivity to network switches are known, VM movement from
server -> server is automated. Hence application end point is very well
known in DC
• Easy to steer traffic when the location of systems well known in the
network
• Makes perfect sense to have the network infra to support path steering
capability

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
What if Network offers ?
• Flexibility for Application to instruct the path
• Creating a logical private cloud for Network/Application segmentation
• Encode signature for differential treatment in network (CDN –
Voice/Video compared to backup data)
• Capacity Management and On Demand provisioning
• All the above & more with easy to Configure, Scale and Orchestrate

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Segment ID as Transport
POD – POD Transport
• Map label index (SID) to a prefix
(172.0.11.0 -> 1)
• Same label index exchanged
… throughout network using BGP
Spine … LU. With same SRGB (16000-
16001 16001
IP->172.0.11.1 IP->172.0.11.1
23999) all nodes has same
Payload Payload Prefix -> label mapping
• Ingress TOR / vSwitch pushes
Leaf the label and forwards with
underlying ECMP in network.
16001 Advantages
IP->172.0.11.1 IP->172.0.11.1
Payload Payload • Simple to configure,
troubleshoot and automate
ToR … … 172.0.11.0/24 ->
16001
… … (consistent SRGB)
IP->172.0.11.1
Prefix SRGB IP->172.0.11.1 • Makes use of underlying ECMP
Payload Label
Payload paths
172.0.11.0/24 16001 P1: 172.0.11.0/24
Label Index: 1 • Simple migration from traditional
Application IP based network.

BGP LU POD1 POD2

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Anycast IP : 1.1.1.1, Label Index: 100
(Loopback on all peering router)
Over the Top Solution
• Configure same Anycast IP
100.0.1.0/24 – (1.1.1.1) as loopback on all PR
100.0.100.0/24
and advertise label index 100
Peering (Anycast SID)
100.0.1.0/24- 100.0.100.0/24
IP->100.0.51.5 Router
Payload
NH : 1.1.1.1 • Form eBGP Multi-Hop session
(over BGP LU) between TOR<-
>PR and advertise with NH
Spine … … 1.1.1.1.
16100
IP->100.0.51.5 • Ingress TOR pushes 16100 (NH
Payload
label) for any traffic outside DC.
• Incase Application can push
Leaf label of PR eBGP Multi-Hop
session can be avoided.
16100 Advantages
IP->100.0.51.5
Payload
• With Anycast SID from PR layer
and combination of BGP
ToR … … Multihop with Anycast IP as next-
IP->100.0.51.5
hop, same label can be reused.
Payload
Prefix SRGB Label
• Simple to manage and
1.1.1.1 16100 troubleshoot
100.0.1.0/24 - 16100 (NH
100.0.100.0/24 label pushed)

BGP LU
POD1
BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
Traffic Steering
• Network fully meshed physically.
Multi-plane Segmentation
• Partition the network logically into
Orange & Blue plane (SID / Node
filtering).
• ToR advertises networks with
label index mapped according to
policy (172.0.1.0/24 -> 100,
172.0.2.0/24 -> 101)
16100 16001 16002
• Policy consistent through out Orchestrator
172.0.1.1 16100
network so only the allowed
SID’s get through. (Orchestrator Data 172.0.1.1

for pushing policy). Data

• Ingress TOR selects the plane


based on the policy. Alternately
17002 17101
Host could select Plane through 16001
17001 17102

label.
16100
172.0.1.1
Advantages 172.0.1.1
Data

• Network segmentation for Private Data

Cloud. … … … …
• Application Segmentation for 172.0.1.1
172.0.1.1

isolation Data Data


172.0.1.0/24 -> Label Index 100
• Cost Effective and investment 172.0.2.0/24 -> Label Index 101
protection
• Effective utilization of the
available bandwidth. BGP LU Plane 1 Plane 2
BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Segment ID as Service
Network Policy using Segment Routing
• Enterprise policy may restrict direct
communication between different groups
• Policy exposed to Controller and Pushed
to both Host and TOR
• Ingress Node validates Top Label and
Local Network 16050
200
• PUSH the verified label (for Egress 200
500
policy check) 500
Data
• Egress Node validate the Outer Label Data

(destination) and Inner Label (Verified).


• Once cleared POP label and forward. POP outer
Drop and Log if not matching the policy. 16001 16050
label, Verify
16050
Advantages Inner Clean
200 Data
Label, POP
• Provides access restriction across and Forward
Data Host1 Host2
groups
Label 100 Label 200
Verify the top
label with the
Source Network
and Push Clean
label

Controller
BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Zero Touch Node Isolation
• Steady state applications sends
data traffic with destination label
(underlying ECMP)
• Node 17001 need to enter
maintenance mode.
Spine 18101
• Controller creates a new Anycast 172.0.11.1
17104

SID 20001 with only Active 18101


16001 18101 16002
Controller
Data
members. Anycast groups could 18101

also be pre-provisioned (to avoid 172.0.11.1

dynamic creation) Data

20001
• Controller pushes the new SID 16004

(label) along with original Leaf 17104


17001 17002 17003
18101 17004 17101 17102 17103 17104
destination label as Stack to 18101
18101

172.0.11.1 172.0.11.1
vSwitch 172.0.11.1
Data Data
Data
• Label imposition can be done at
20001
ToR as well using DME (label …
imposition) ToR 18001 … …
18101
Hypervisor to
Impose label
18050 18101 …
172.0.11.1 18150
18101 172.0.11.1 stack 20001,
Data
18101
Advantages 172.0.11.1 Data

• With zero touch and impact, Data


Applications
172.0.1.1
switches can be commissioned
/de-commissioned

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
DC Egress Engineering
Bandwidth Management and Egress Engineering
eBGP Multihop PE

• Egress adjacency (link) chosen based


2001 on policy to meet SLA (Gold customer)
EPE Node Data
2002 • Provision on ToR switch to impose
BGP LS Session with 16100 2003 Label Stack (16100, 2002)
Border Router • ToR receives Data from Gold Customer
Orchestrator 2002 Data
• ToR pushes the Egress Node (16100)
and Egress Adjacency (2002) based on
policy
• Advantage :
16100 2002 Data
• Service selection & honor SLA
requirements
Provision Label Stack
(16100, 2002) for
Customer X
……
Data
… …

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Benefits of Segment Routing
• Power of Segment Routing
• Simple, Flexible and easy to troubleshoot
• Consistent Label/Segment across network with SRGB. Easy to Automate /
Orchestrate
• Scalable as State maintained at ingress node

• End-to-End control over the network infrastructure to transport your applications


• Network/Application Segmentation for guaranteed SLA
• Adaptive traffic switching and bandwidth management
• Investment protection and significant cost reduction
• Built for SDN era to simplify Network Operations through centralized monitoring /
orchestration

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
More Information
• White Paper
http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-
737536.html

• Blog
http://blogs.cisco.com/datacenter/application-level-intelligence-in-the-data-center-using-segment-
routing?_ga=1.127143757.1347823405.1468366647

• Segment Routing
http://www.segment-routing.net/

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner will
receive a $750 Amazon gift card.
• Complete your session surveys
through the Cisco Live mobile
app or from the Session Catalog
on CiscoLive.com/us.

Don’t forget: Cisco Live sessions will be available


for viewing on-demand after the event at
CiscoLive.com/Online

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Lunch & Learn
• Meet the Engineer 1:1 meetings
• Related sessions
WISP: LABRST-2020
Segment Routing in Datacenter using Nexus 9000/3000

BRKDCN-2050 © 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Q&A
Thank you

You might also like