You are on page 1of 151

MPLS Traffic

Engineering
NANOG18

Robert Raszuk - IOS Engineering


raszuk@cisco.com

1999, Cisco Systems, Inc.

Location of files
This presentation, handouts & demo are located at:
ftp://ftpeng.cisco.com/rraszuk/nanog18
RR_MPLS_TE_Nanog.pdf - this presentation
TE_Monitor.pdf - show & debug commands
TE_Config.pdf - full configuration syntax
TE_SampleCfg.pdf - configuration sample
TE_DEMO.tar - Tared TE offline demo (HTML)
TEisistdp_1.pdf - Demos Lab Topology
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

Traffic Engineering: Motivations

Reduce the overall cost of operations by


more efficient use of bandwidth resources
by preventing a situation where some parts of
a service provider network are over-utilized
(congested), while other parts under-utilized

The ultimate goal is cost saving !


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

Traffic Engineering: Motivations

MPLS and Traffic Eng allows for one to


spread the traffic and distribute it across
the entire network infrastructure like
magnetic fields between poles while
also providing the redundancy required
for high availability service.
(Eric Dean)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

Without Traffic Engineering


Cars:
SFO-LAX

SAN-SMF

LAX-SFO

SMF-SAN

No Traffic
Engineering
analogy
to Human
Drivers

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

With Traffic Engineering


Cars:
SFO-LAX

SAN-SMF

LAX-SFO

SMF-SAN

Traffic
Engineering
analogy
to Auto Pilot

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

Routing solution to Traffic


Engineering
R2
R3

R1

Construct routes for traffic streams within a service provider in such


a way, as to avoids causing some parts of the providers network to
be over-utilized, while others parts remain under-utilized
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

The Overlay Solution


L3

L3
L2

L3

L2

L2
L2

L3

L2

L2

L3

L3

L3

L3

L3
L3

L3

L3

Physical

Logical

Routing at layer 2 (ATM or FR) is used for traffic engineering


Analogy to direct highways between SFO-LAX & SAN-SMF.
Nobody enters the highway in between.
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

Traffic engineering with overlay


R2
R3

R1

PVC for R2 to R3 traffic


PVC for R1 to R3 traffic
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

Overlay solution: drawbacks


Extra network devices (cost)
More complex network management (cost)
two-level network without integrated network
management
additional training, technical support, field
engineering

IGP routing scalability issue for meshes


Additional bandwidth overhead (cell tax)
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

10

Traffic engineering with Layer 3


R2
R3

R1

IP routing: destination-based least-cost routing


Path for R2 to R3 traffic
Path for R1 to R3 traffic
under-utilized alternate path
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

11

Traffic engineering with Layer 3


R2
R3

R1

IP routing: destination-based least-cost routing


Path for R2 to R3 traffic
Path for R1 to R3 traffic
under-utilized alternate path
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

12

Traffic engineering with Layer 3


what is missing ?
Path computation based just on IGP metric is
not enough
Support for explicit routing (aka source
routing) is not available
Analogy:

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

San

San

Jose

Jose

13

MPLS Traffic
Engineering

1999, Cisco Systems, Inc.

14

TE - key mechanisms
Explicit routing (aka source routing)
Constrained-based Path Selection Algorithm
(Example: Choose path with no congestion, avoid
highways, select scenic roads etc)

Extensions to OSPF/ISIS for flooding of


resources / policy information (Live collection of
traffic statistics - pilot tests in Europe)

MPLS as the forwarding mechanism (Auto Pilot


programmed in each car when entering city)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

15

TE - key mechanisms

Explicit routing (aka source routing)


RSVP as the mechanism for establishing
Label Switched Paths (LSPs)
use of the explicitly routed LSPs in the
forwarding table

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

16

What is a traffic trunk ?


A

Aggregation of (micro) flows that are:


forwarded along a common path (within a service provider)
often from a POP to another POP
share a common QoS requirement (if L-LSPs are used)

Essential for scalability


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

17

TE basics

Traffic within a Service Provider as a


collection of POP to POP traffic trunks with
known bandwidth and policy requirements
TE provides traffic trunk routing that meets
the goal of Traffic Engineering
via a combination of on-line and off-line
procedures

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

18

Requirements:
Differentiating traffic trunks:
large, critical traffic trunks must be well routed in
preference to other trunks

Handling failures:
automated re-routing in the presence of failures

Pre-configured paths:
for use in conjunction with the off-line route
computation procedures

Support of multiple Classes of Service


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

19

Requirements (cont.)
Constraining sub-optimality:
should re-optimize on new/restored bandwidth
in a non-disruptive fashion - maintain the existing route until the
new route is established, without any double counting

Ability to spread traffic trunk across multiple Label


Switched Paths (LSPs)
could provide more efficient use of networking
resources

Ability to include / exclude certain links for certain traffic


trunks
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

20

Design Constraints
Constrained to a single routing domain
initially constrained to a single area

Requires OSPF or IS-IS


Unicast traffic
Focus on supporting routing based on a
combination of administrative +
bandwidth constraints
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

21

Trunks Attributes

1999, Cisco Systems, Inc.

22

Trunk Attributes

Configured at the head-end of the trunk


Bandwidth
Priorities
setup priority: priority for taking a resource
holding priority: priority for holding a
resource

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

23

Trunk attributes
Ordered list of Path Options
possible administratively specified paths (via
an off-line central server) - {explicit list}
Constrained-based Dynamically computed
paths based on combo of Bw and policies

Re-optimization
each path option is enabled or not for reoptimization, interval given in seconds.
Max 1 week (7*24*3600), Disable 0, Def 1h.
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

24

Trunk Attributes
Resource class affinity (Policy)
supports the ability to include/exclude certain links for
certain traffic trunks based on a user-defined Policy
Tunnel is characterized by a
32-bit resource-class affinity bit string
32-bit resource-class mask (0= dont care, I care)

Link is characterized by a 32-bit resource-class attribute


string
Default-value of tunnel/link bits is 0
Default value of the tunnel mask = 0x0000FFFF
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

25

Example0: 4-bit string, default


C
A

0000

0000
0000
D

0000

B
0000

Trunk A to B:
tunnel = 0000, t-mask = 0011

ADEB and ADCEB are possible


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

26

Example1a: 4-bit string


C
A

0000

0000
0000
D

0010

B
0000

Setting a link bit in the lower half drives all tunnels off the
link, except those specially configured
Trunk A to B:
tunnel = 0000, t-mask = 0011

Only ADCEB is possible


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

27

Example1b: 4-bit string


C
A

0000

0000
0000
D

0010

B
0000

A specific tunnel can then be configured to allow such


links by clearing the bit in its affinity attribute mask
Trunk A to B:
tunnel = 0000, t-mask = 0001

Again, ADEB and ADCEB are possible


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

28

Example1c: 4-bit string


C
A

0000

0000
0000
D

0010

B
0000

A specific tunnel can be restricted to only such links by


instead turning on the bit in its affinity attribute bits
Trunk A to B:
tunnel = 0010, t-mask = 0011

No path is possible
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

29

Example2a: 4-bit string


C
A

0000

0000
0000
D

0100

B
0000

Setting a link bit in the upper half drives has no immediate


effect
Trunk A to B:
tunnel = 0000, t-mask = 0011

ADEB and ADCEB are both possible


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

30

Example2b: 4-bit string


C
A

0000

0000
0000
D

0100

B
0000

A specific tunnel can be driven off the link by setting the bit
in its mask
Trunk A to B:
tunnel = 0000, t-mask = 0111

Only ADCEB is possible


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

31

Example2c: 4-bit string


C
A

0000

0000
0000
D

0100

B
0000

A specific tunnel can be restricted to only such links


Trunk A to B:
tunnel = 0100, t-mask = 0111

No path is possible
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

32

Trunk Attribute

Resource Class Affinity (Policy)

The user defines the semantics:


this bit/mask says low-delay path
excluded

Flexible (maybe too flexible :)


1c vs 2c ? in 1c, the default tunnels
will not be willing to flow via the special
links

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

33

Link Attributes and


their flooding

1999, Cisco Systems, Inc.

34

Link Resource Attributes

Resource attributes are configured on


every link in a network
bandwidth
Link Attributes
TE-specific link metric

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

35

Link Resource Attributes

Resource attributes are flooded throughout


the network
bandwidth per priority (0-7)
Link Attributes (Policy)
TE-specific link metric
draft-li-mpls-igp-te-00.txt

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

36

Per-Priority Available BW
T=0

T=1
T=2

T=3
T=4

Link L, BW=100

D advertises: AB(0)=100== AB(7)=100


AB(i) = Available Bandwidth at priority I

Setup of a tunnel over L at priority=3 for 30 units


D

Link L, BW=100

D advertises: AB(0)=AB(1)=AB(2)=100
AB(3)=AB(4)==AB(7)=70

Setup of an additional tunnel over L at priority=5 for 30 units


D

NANOG18 - Robert Raszuk

Link L, BW=100

2000, Cisco Systems, Inc.

D advertises: AB(0)=AB(1)=AB(2)=100
AB(3)=AB(4)=70
AB(5)=AB(6)=AB(7)=40
37

Information Distribution

Re-use the flooding service from the


Link-State IGP
opaque LSA for OSPF
draft-katz-yeung-ospf-traffic-00.txt

new wide TLV for IS-IS


draft-ietf-isis-traffic-00.txt

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

38

Information Distribution

Periodic (timer-based)
On significant changes of available
bandwidth (threshold scheme)
On link configuration changes
On LSP Setup failure

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

39

Periodic Timer

Periodically, a node checks if the


current TE status is the same as the
one lastly broadcasted.
If different, it floods its updated TE
Links status

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

40

Significant Change

100%
92%
85%
70%

Update

50%
Update

Each time a threshold is


crossed, an update is
sent
Denser population as
utilization increases
Different thresholds for
UP and Down (stabler)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

41

LSP Setup Failure

Due to the threshold scheme, it is possible


that one node thinks he can signal an LSP
tunnel via node Z while in fact, Z does not
have the required resources
When Z receives the Resv message and
refuses the LSP tunnel, it broadcasts an
update of its status

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

42

Constrained-based
Computation

1999, Cisco Systems, Inc.

43

Constrained-Based Routing

In general, path computation for an LSP may seek to


satisfy a set of requirements associated with the LSP,
taking into account a set of constraints imposed by
administrative policies and the prevailing state of the
network -- which usually relates to topology data and
resource availability. Computation of an engineered
path that satisfies an arbitrary set of constraints is
referred to as "constraint based routing.
Draft-li-mpls-igp-te-00.txt

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

44

Path Computation
On demand by the trunks head-end:
for a new trunk
for an existing trunk whose (current)
LSP failed
for an existing trunk when doing reoptimization

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

45

Path Computation
Input:
configured attributes of traffic trunks originated
at this router
attributes associated with resources
available from IS-IS or OSPF
topology state information
available from IS-IS or OSPF

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

46

Path Computation
Prune links if:
insufficient resources (e.g., bandwidth)
violates policy constraints

Compute shortest distance path


TE uses its own metric
Tie-break: selects the path with the
highest minimum bandwdith so far, then
with the smallest hop-count
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

47

Path Computation

Output:
explicit route - expressed as a sequence of
router IP addresses
interface addresses for numbered links
loopback address for unnumbered links
used as an input to the path setup
component

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

48

Example
C
BW(3)=80
0100

1000
BW(3)=60

0000
BW(3)=50

0000
BW(3)=20

B
0000
BW(3)=80

0010
BW(3)=70

1000
BW(3)=50
G

Tunnels request:
Priority 3, BW = 30 units,
Policy string: 0000, mask: 0011
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

49

MPLS as the forwarding


mechanism

1999, Cisco Systems, Inc.

50

MPLS Labels
Two types of MPLS Labels:
Prefix Labels & Tunnel Labels
Distributed
by:

LDP

RSVP

MP-BGP

CR-LDP

PIM

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

51

MPLS as forwarding engine

Traffic engineering requires explicit routing capability


IP supports only the destination-based routing
not adequate for traffic engineering

MPLS provides simple and efficient support for


explicit routing
label swapping
separation of routing and forwarding

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

52

LSP tunnel Setup

1999, Cisco Systems, Inc.

53

RSVP Extensions to RFC2205


for LSP Tunnels
downstream-on-demand label distribution
instantiation of explicit label switched paths
allocation of network resources (e.g., bandwidth) to explicit LSPs
rerouting of established LSP-tunnels in a smooth fashion using
the concept of make-before-break
tracking of the actual route traversed by an LSP-tunnel
diagnostics on LSP-tunnels
the concept of nodal abstraction
preemption options that are administratively controllable
draft-ietf-mpls-rsvp-lsp-tunnel-0X.txt

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

54

RSVP Extensions: new objects


LABEL_REQUEST found in Path
LABEL found in Resv
EXPLICIT_ROUTE found in Path
RECORD_ROUTE found in Path, Resv
SESSION_ATTRIBUTE found in Path 0x01 Fast Reroute Capable,
0x02 Permit Merging, 0x04 May Reoptimize => SE
New C-Types are also assigned for the SESSION,
SESSION
SENDER_TEMPLATE,
SENDER_TEMPLATE FILTER_SPEC,
FILTER_SPEC FLOWSPEC objects.
All new objects are optional with respect to RSVP (RFC2205).
The LABEL_REQUEST and LABEL objects are mandatory with
respect to MPLS LSP signalisation specification.

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

55

LSP Setup

Initiated at the head-end of a trunk


Uses RSVP (with extensions) to
establish Label Switched Paths
(LSPs) for traffic trunks

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

56

Path Setup - Example


R9

R8
R3
R4
R2

Pop

R5

R1

Label 32
Label 49
Label 17

R6

R7
Label 22

Setup: Path (ERO = R1->R2->R6->R7->R4->R9)


Reply: Resv communicates labels and
reserves bandwidth on each link
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

57

Path Setup - more details


R1

R2

R3
1

Path:
Common_Header
Session(R3-lo0, 0, R1-lo0)
PHOP(R1-2)
Label_Request(IP)
ERO (R2-1, R3-1)
Session_Attribute (S(3), H(3), 0x04)
Sender_Template(R1-lo0, 00)
Sender_Tspec(2Mbps)
Record_Route(R1-2)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

58

Path Setup - more details


R1

R2

R3
1

Path State:
Session(R3-lo0, 0, R1-lo0)
PHOP(R1-2)
Label_Request(IP)
ERO (R2-1, R3-1)
Session_Attribute (S(3), H(3), 0x04)
Sender_Template(R1-lo0, 00)
Sender_Tspec(2Mbps)
Record_Route (R1-2)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

59

Path Setup - more details


R1

R2

R3
1

Path:
Common_Header
Session(R3-lo0, 0, R1-lo0)
PHOP(R2-2)
Label_Request(IP)
ERO (R3-1)
Session_Attribute (S(3), H(3), 0x04)
Sender_Template(R1-lo0, 00)
Sender_Tspec(2Mbps)
Record_Route (R1-2, R2-2)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

60

Path Setup - more details


R1

R2

R3
1

Path State:
Session(R3-lo0, 0, R1-lo0)
PHOP(R2-2)
Label_Request(IP)
ERO ()
Session_Attribute (S(3), H(3), 0x04)
Sender_Template(R1-lo0, 00)
Sender_Tspec(2Mbps)
Record_Route (R1-2, R2-2, R3-1)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

61

Path Setup - more details


R1

R2

R3
1

Resv:
Common_Header
Session(R3-lo0, 0, R1-lo0)
PHOP(R3-1)
Style=SE
FlowSpec(2Mbps)
Sender_Template(R1-lo0, 00)
Label=POP
Record_Route(R3-1)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

62

Path Setup - more details


R1

R2

R3
1

Resv State
Session(R3-lo0, 0, R1-lo0)
PHOP(R3-1)
Style=SE
FlowSpec (2Mbps)
Sender_Template(R1-lo0, 00)
OutLabel=POP
IntLabel=5
Record_Route(R3-1)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

63

Path Setup - more details


R1

R2

R3
1

Resv:
Common_Header
Session(R3-lo0, 0, R1-lo0)
PHOP(R2-1)
Style=SE
FlowSpec (2Mbps)
Sender_Template(R1-lo0, 00)
Label=5
Record_Route(R2-1, R3-1)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

64

Path Setup - more details


R1

R2

R3
1

Resv state:
Session(R3-lo0, 0, R1-lo0)
PHOP(R2-1)
Style=SE
FlowSpec (2Mbps)
Sender_Template(R1-lo0, 00)
Label=5
Record_Route(R1-2, R2-1, R3-1)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

65

Trunk Admission Control


Performed by routers along a Label Switched
Path (LSP)
Determines if resources are available
May tear down (existing) LSPs with a lower
priority
Does the local accounting
Triggers IGP information distribution when
resource thresholds are crossed

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

66

Link Admission Control

Already invoked by Path message


if BW is available, this BW is put aside in a waiting pool
(waiting for the RESV msg)
if this process required the pre-emption of resources, LCAC
notified RSVP of the pre-emption which then sent PathErr and/or
ResvErr for the preempted tunnel
if BW is not available, LCAC says No to RSVP and a Path
error is sent. A flooding of the nodes resource info is triggered,
if needed
draft-ietf-mpls-rsvp-lsp-tunnel-02.txt

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

67

Path Monitoring

Use of new Record Route Object


keep track of the exact tunnel path
detects loops
copy of RRO to ERO allows for route
pinning

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

68

Path Re-Optimization

Looks for opportunities to re-optimize


make before break
no double counting of reservations
via RSVP shared explicit style!

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

69

Non-disruptive rerouting - new


path setup
R9

R8
R3
R4
R2

Pop

R5

R1

32
49
17

R6

R7
22

Current Path (ERO = R1->R2->R6->R7->R4->R9)


New Path (ERO = R1->R2->R3->R4->R9) - shared with Current Path
Until R9 gets new Path Message, current Resv is refreshed
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

70

Non-disruptive rerouting switching paths


R9

R8
R3
R4
R2

Pop
Pop

26

89

R5

R1

32
38
49

17

R6

R7
22

Resv: allocates labels for both paths


Reserves bandwidth once per link
PathTear can then be sent to remove old path (and release
resources)
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

71

Reroute - More Details


ERO (R2-1, R3-1)
R3-1
Sender_Template(R1-lo0, 00)
00

Session(R3-lo0, 0, R1-lo0)

00
R1 2

R3

1 R2 2
01

1
3 01

3
01

Resource Sharing

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

ERO (R2-1, , R3-3)


R3-3
Sender_Template(R1-lo0, 01)
01

72

Reroute - More Details


R1

R2

R3
2
3

Path:
Common_Header
Session(R3-lo0, 0, R1-lo0)
PHOP(R1-2)
Label_Request(IP)
ERO (R2-1, ,R3-3)
Session_Attribute (S(3), H(3), 0x04)
Sender_Template(R1-lo0, 01)
Sender_Tspec(3Mbps)
Record_Route(R1-2)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

1
3

73

Reroute - More Details


R1

R2

R3
3

Path State:
Session(R3-lo0, 0, R1-lo0)
PHOP(R1-2)
Label_Request(IP)
ERO (R2-1, ,R3-3)
Session_Attribute (S(3), H(3), 0x04)
Sender_Template(R1-lo0, 01)
Sender_Tspec(3Mbps)
Record_Route (R1-2)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

74

Reroute - More Details


R1

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

R2

R3
3

75

Reroute - More Details


R1

R2

R3
3

RSVP:
Common_Header
Session(R3-lo0, 0, R1-lo0)
PHOP(R3-3)
Style=SE
FlowSpec(3Mbps)
Sender_Template(R1-lo0, 01)
Label=POP
Record_Route(R3-3)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

76

Reroute - More Details


R1

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

R2

R3
3

77

Reroute - More Details


R1

R2

R3
3

RSVP:
Common_Header
Session(R3-lo0, 0, R1-lo0)
PHOP(R2-1)
Style=SE
FlowSpec (3Mbps)
Sender_Template(R1-lo0, 01)
Label=6
Record_Route(R2-1, , R3-3)
Sender_Template(R1-lo0, 00)
Label=5
Record_Route(R2-1, R3-1)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

78

Reroute - More Details


R1

R2

R3
3

RSVP state:
Session(R3-lo0, 0, R1-lo0)
PHOP(R2-1)
Style=SE
FlowSpec
Sender_Template(R1-lo0, 01)
Label=6
Record_Route(R2-1, , R3-3)
Sender_Template(R1-lo0, 00)
Label=5
Record_Route(R2-1, R3-1)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

79

Fast Restoration

Handling link failures - two


complementary mechanisms:
Path protection
Link/Node protection

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

80

Path Protection

1999, Cisco Systems, Inc.

81

Path Protection

Step1: link failure detection


O(depends on L2/L1)

Step2a: IGP reaction (ISIS case)


Either via Step1 or via IGP hello expiration (30s by default for ISIS)
5s (default) must occur by default before the generation of a new LSP
5.5s (default) must occur before a change of the LSPDB and the consecutive SPF run.
The next SPF run can only occur 10s after (default)
Flooding time (LSP are paced (16ms for first LSP, 33ms between LSPs, depend also
on link speed)
Once the RIB is updated, this change must be incorporated into CEF.
The Head-end finally computes the new topology and finds out that some
established LSPs are affected. It schedules a reoptimization for them

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

82

Path Protection

Step2b: RSVP signalisation


rsvp path states with the failed intf as oif is detected

check if another oif available (if loose ero)


if not, clear path state and send tear to head-end
Step2: Either stepA or stepB alarms the head-end
Step3: Re-optimization
dijkstra computation: O(0.5)ms per node (rule of thumb)

RSVP signalisation time to instal rerouted tunnel


convergence in the order of several seconds (at least).

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

83

Path Protection
Speed it Up

Fine Tune the IGP convergence


Through adequate tuning, ISIS could be tuned to converge in 2-3s, this
ensuring that the convergence time bottleneck is the signalisation time for the
new tunnel.

Several tunnels in parallel with load-babalancing


if combined with the IGP convergence, the path resilience could be brought to
around 2-3s

One end-2-end tunnel in parallel but in backup mode


feature under development (Fast Path Protection)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

84

Fast ReRoute
(aka Link Protection)
An Overview

1999, Cisco Systems, Inc.

85

Objective

FRR allows for temporarily routing


around a failed link or node while the
head-end may reoptimize the entire
LSP
rerouting under 50ms
scalable (must support lots of LSPs)
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

86

Fast reroute Overview


Controlled by the routers at ends of a failed
link
link protection is configured on a per link basis
Session_Attributes Flag 0x01 allows the use of
Link Protection for the signalled LSP

Uses nested LSPs (stack of labels)


original LSP nested within link protection LSP

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

87

Static backup Tunnel


R9

R8
R4
R2

R5

R1

Pop
17

R6

R7
22

Setup: Path (R2->R6->R7->R4)


Labels Established on Resv message

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

88

Routing prior R2-R4 link failure


R9

R8
R4
R2

Pop
R1

R5

14
37
R6

R7

Setup: Path (R1->R2->R4->R9)


Labels Established on Resv message

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

89

Link Protection Active


R9

R8
R4
R2

R5

R1

R6

R7

On failure of link from R2 -> R4, R2 simply changes outgoing


Label Stack from 14 to <17, 14>

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

90

Link Protection Active


R8

R9

Pop 14

Swap 37->14
Push 17

R4

R2
Push 37
R5

R1
R7

R6
Swap 17->22

Label Stack:

Pop 22

R1

R2
37

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

R6
17
14

R7
22
14

R4
14

R9
None

91

Fast ReRoute
More details on Link
Protection (FRR v1)

1999, Cisco Systems, Inc.

92

V1 Constrain

We protect the facility (link), not individual LSPs


scalability vs granularity

No node resilience
Static backup tunnel
The protected link must use the Global Label space
A backup tunnel can backup at most one link, but n
LSPs travelling via this link

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

93

Terminology
R9

R8
R4
R2

R5

R1

R6

R7

LSP: end-to-end tunnel onto which data


normally flows (eg R1 to R9)
BackUp tunnel: temporary route to take in
the event of a failure
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

94

Terminology

Link Protection
In the event of a link failure, an LSP is
rerouted to the next-hop using a
preconfigured backup tunnel

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

95

How to indicate a link is protected and


which tunnel is the backup?

On R2 (For LSPs flowing from R2 to R4):


interface pos <r2tor4>
mpls traffic-eng backup tunnel 1000 link

LSPs are unidirectional, so the same


protection should be enable for the
opposite direction if reverse LSP is conf.

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

96

How to setup the backup tunnel?


Just as a normal tunnel whose head-end
is R2 and tail-end is R4
v1 requires a manually configured ERO
interface Tunnel1000
ip unnumbered Loopback0
tunnel destination R4
tunnel mode mpls traffic-eng
tunnel mpls traffic-eng priority 7 7
tunnel mpls traffic-eng bandwidth 800
tunnel mpls traffic-eng path-option 1
explicit name backuppath1

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

ip explicit-path name
backuppath1 enable
next-address R6
next-address R7
next-address R4

97

Which LSPs can be rerouted on R2 in


the event of R2-R4 failure?

The LSPs flowing through R2 that


have R2-R4 as Outgoing Interface
have been signalled by their respective
head-ends with a session attribute flag
0x01=ON (may use fast-reroute tunnels)
int tunnel 1

## config on the head-end

tunnel mpls traffic-eng fast-reroute

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

98

Global Label Allocation


POP

R8
14

R9

R4

R2
R5

R1
R6

R7

For the blue LSP, R4 bound a global label of 14


Any MPLS frame received by R4, with label 14,
will be switched onto the link to R9 with a POP,
whatever the incoming interface

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

99

How fast is fast?

Link Failure Notification


Usual PoS alarm detection
PoS driver optimisation to interrupt RP in < 1ms
Expected call to net_cstate(idb, UP/DOWN) identifying the DOWN
state of the protected int to start our protection action.

RP updates the master TFIB (replace a swap by a swap-push)


< 1ms

Master TFIB change notified to the linecards


< 1ms

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

100

Path state while Rerouting


Path (, PHOP=R2, )
R8
BackUP tunnel

R9

Path
state
R4

R2

R5

R1

R6

R7

PathError (Reservation in Place)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

101

Path & Resv Msgs [Error & Tear]


R2

R1

R4

R3

When no link protection:


Resv Tear

Conf.

Path Tear

Conf.

Resv Tear

When link protection:


Path Error
Resv in
place
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

R4 waits for
refresh

102

LSP reoptimization

Head-end notified by PathError


special flag (reservation in place)
indicates that the path states must not be
destroyed. It is just a hint to the head-end
that the path should be reoptimized

Head-end notified by IGP


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

103

Why the Patherror?

The Patherror might be faster


In case of multi-area IGP, the IGP will not provide the
information
In case of very fast up-down-up, the LSP will be put on the
backup tunnel and will stay there as the IGP will not have
originated a new LSP/LSA
a router waits a certain time before originating a new
LSP/LSA after a topological change

Reliable PathErr optimization

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

104

Resv state while Rerouting


The loss of the interface does not affect the Path
and Resv states for the LSPs received on that
interface that are marked fast reroutable!
R9

R8

Resv
state

BackUP tunnel

R4

R2

R5

R1

Resv
R6

R7

Resv Message is unicast to the Phop (R2)


R2s Path State has been informed that the Resv might arrive over a different intf as
the one used by the Path message
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

105

DiffServ and LSP Reoptimization

In order to optimize the bandwdith usage, backup tunnels


might be configured with 0kbps
no non-working bandwdith as in SDH!

Although usually the backbone is though as being


congestion-free, during rerouting some local congestion
might occur
Use diffserv to handle this short-term congestion
Use LSP reoptimization to handle the long-term
congestion

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

106

Layer1/2 and Layer3

Backup Tunnel should not use


the protected L3 link
the protected L1/L2 links!!!

Use WANDL (loaded with both L3 and L1/2


topologies) to compute the best paths for backup
tunnels
Download this as static backup tunnels to the routers

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

107

Fast ReRoute
Node Protection

1999, Cisco Systems, Inc.

108

Overview
R9

R8
R4
R3
R2

R5

R1

R6

R7

Backup Tunnel to the next-hop of the LSPs


next-hop
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

109

A few More details


Assume
R2 is configured with resilience for R3
R2 receives a path message for a new LSP whose
ERO is {R3, R4, }, whose Session is (R9, 1, R1),
whose sender is (R1, 1) and whose session attribute is
(0x01 ON, 0x02 OFF)
0x01: may use local fast-reroute if available
0x02: merge capable

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

110

A few More details

Then
R2 checks if it already has a tunnel to R4
If not, R2 builds a backup tunnel to R4 (currently
just like in link protection - manual explicit setup).
R2 sends a Path onto the tunnel with Session
(R9, 1, R1), Sender (R2, 1), Session Attribute (0x01
OFF, 0x02 ON) and PHOP R2

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

111

A few More details

When R4 receives this Path message,


it matches the session with the LSPs one
merge (and thus stop) this path message
sends a RESV back to R2 (unicast) and allocate
the appropriate label L

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

112

A few More details

When R2 detects R3s failure,


For the TFIB entry for the LSP, R2 changes the existing swap
by a swap to L and a push of the backup tunnel label

R4s states are refreshed by the secondary path messages


(over the backup tunnels)
ERO of the original path is adjusted at R2
NHOP is modified in R2 (from R3 to R4)
PHOP is modified in R4 (from R3 to R2)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

113

A few More Details

RESV is being sent back from R4 to R2


directly
If R3 is still active and just the R2-R3 link
failed R4 needs to ignore & drop any
Tear-Down msg R3 would be sending
after the termination of reception of path
refreshes from R2.
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

114

How to detect R3s failure?

A node may fail while the link is still


up
A nodes linecard processes might
survive, a main process failure (freeze
of the RP process)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

115

A possible solution
RP

LC

RP

LC

...
LC

Keepalives between LCs


Keepalives between a LC and its master RP
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

116

Assigning traffic to Paths


(aka autoroute)

1999, Cisco Systems, Inc.

117

Enhancement to SPF
During SPF each new node found is moved from a
TENTative list to PATHS list. Now the first-hop is
being determined via:
A. Check if there is any TE tunnel terminating at
this node from the current router and if so do
the metric check
B. If there is no TE tunnel and the node is directly
connected use the first-hop from adj database
C. In non of the above applies the first-hop is
copied from the parent of this new node.

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

118

Enhancement to SPF - metric check


Tunnel metric:
A. Relative +/- X
B. Absolute Y
The default is relative metric of 0.
Example:
Metric of native IP path to the found node = 50

NANOG18 - Robert Raszuk

1. Tunnel with relative metric of -10 =>

40

2. Tunnel with relative metric of +10 =>

60

3. Tunnel with absolute metric of 10 =>

10

2000, Cisco Systems, Inc.

119

Enhancement to SPF - metric check


If the metric of the found TE tunnel at this node is
higher then the metric for other tunnels or native
IGP path this tunnel is not installed as next hop
If the metric of the found TE tunnel is equal to other
TE tunnels the tunnel is added to the existing nexthops
If the metric of the found TE tunnel is lower then
the metric of other TE tunnels or native IGP the
tunnel replaces them as the only next-hop.

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

120

Other TE New Features

1999, Cisco Systems, Inc.

121

Auto-Bandwidth
Global command:

Monitor marked tunnels 5-min average


counters every X minutes
default: X = 300 (seconds)
(config)# mpls traffic-eng auto-bw
timers frequency <seconds>

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

122

Auto-Bandwidth
Per tunnel command:
Every Y minutes, update the BW constraint of the tunnel with
the maximum of:
the largest 5-min values sampled during the last Y minutes
(Def Y = 24 * 3600sec) - 24h
a configured maximum value
(config-if)# tunnel mpls traffic-eng auto-bw
{frequency <seconds>} {max-bw <kbs>}

if the new Bw is not available, the old one is maintained (the


new BW is signalled via a 2nd tunnel to follow make before
break model)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

123

Example

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

124

Verbatim
Applies to explicitly routed LSPs
Disable any check against TE/IGP database
of the head end
RSVP still check BW (and policy when this
will be in Path) hop by hop
Application: manual TE through multi-area
IGP
CLI: tunnel mpls traffic-eng path-option verbatim

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

125

In-Progress

Allows an end-head to account for bw


consumed by tunnels that it has just
signalled and for whom the IGP LSA/LSP
update has not reflected the available
bandwdith

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

126

Example
In-Prog Bw: 10
55
Avail Bw: 100

All tunnels require 45 units of BW


In-progress counters reset upon new LSA/LSP reception
In-progress counter decremented upon receipt of path-error
NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

127

Benefits

Speed-up the installation of tunnels as it


avoids spending time trying not working
solutions
Allows for better load-balancing
igp metric then max(min(path-bw)!

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

128

Under/Overbook
ML: Maximum link bandwidth:
This sub-TLV contains the maximum bandwidth that can be used on this link in
this direction (from the system originating the LSP to its neighbors). This is useful
for traffic engineering.

MR: Maximum reservable link bandwidth:


This sub-TLV contains the maximum amount of bandwidth that can be reserved
in this direction on this link. Note that for oversubscription purposes, this can be
greater than the bandwidth of the link.

UR(I): Unreserved bandwidth at Priority i:


This sub-TLV contains the amount of bandwidth reservable on this direction on
this link, at a certain priority. Note that for oversubscription purposes, this can be
greater than the bandwidth of the link.

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

129

Under/Overbook
As config:
As config:
int
...
int s0
s0
bandwidth
<B1>
(eg
1500
kbps)
bandwidth <B1>
(eg 1500 kbps)
ip
ip rsvp
rsvp bandwdith
bandwdith <B2>
<B2> (eg
(eg 4000
4000 kbps)
kbps)

Physical T1
s0

...

ML is set to B1 (eg 1500)


MR is set to B2 (eg 4000)
At t=0, for all i 0 to 7, UB(i) = M = (eg 4000)
routerA's LCAC will not accept an LSP tunnel asking more than ML even if there is available
bandwdith at the requested priority.
However, LCAC would allow for example 5 trunks each asking 700 kbps (thus each asking less than
ML) while the aggregate is smaller than MR: because { 700 < ML=1500 } and { 3500 < MR=4000 }

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

130

Standby

Current solution
Tu1: bw1
A

B
Tu2: bw2
Tu3: bw3
Tu4: bw4

Solution:
4 tunnels from A to B:
Tu1s relative metric: -3
Tu2 and tu3s relative metric: -2
Tu4s relative metric: -1

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

131

Last hop label


IETF draft-ietf-mpls-label-encaps-07.txt
A value of 0 represents the "IPv4 Explicit NULL Label
A value of 1 represents the "Router Alert Label
A value of 2 represents the "IPv6 Explicit NULL Label"
A value of 3 represents the "Implicit NULL Label

New cli forces tailend to send implicit-null (3) instead of explicit null (0) default.
# [no] mpls traffic-eng signalling advertise implicit-null [<acl>]
On receipt (n-1) node we must map 0, 1 or 3 to internal Implicit Null [1
only for historical reasons]

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

132

QoS and RRR

1999, Cisco Systems, Inc.

133

QoS and RRR

MPLS TE can operate simultaneously (and


orthogonally) with MPLS Diff-Serv
All Precedence/DSCP packets follow the
same TE tunnels
Diff-Serv provides selective discard (via WRED),
and selective scheduling (via WFQ)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

134

QoS and RRR


Future:
Scalable per-tunnel scheduling and
policing
Guaranteed PIPE in MPLS-VPN CoS

per-DSCP/per-FEC traffic engineering


diffserv backbone capacity management

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

135

DiffServ and fast-reroute/TE

In order to optimize the bandwdith usage, backup tunnels


might be configured with 0kbps
no non-working bandwdith as in SDH!

Although usually the backbone is though as being


congestion-free, during rerouting some local congestion
might occur
Use diffserv to handle this short-term congestion
Use LSP reoptimization to handle the long-term
congestion

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

136

RSVP
LSP Signalling Protocol
for Traffic Engineering

1999, Cisco Systems, Inc.

137

MPLS-TE Signalling Protocol

Two proposed signaling mechanisms for


MPLS traffic engineering are being
considered by the IETFs MPLS work group
RSVP (Cisco and a number of Gigabit router
startups (Avici, Argon, Ironbridge, Juniper, and
Torrent))
CR-LDP (Ericsson, Ennovate, GDC, Nortel)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

138

Why RSVP ?
What is needed: An IP signalling Protocol!
ability to establish and maintain Label Switched
Path along an explicit route
ability to reserve resources when establishing a
path

Interdependent, not independent tasks


benefit from consolidation

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

139

Do I need RSVP only for TE ?


NO !

Other uses of RSVP in todays networks:


Voice over IP call setup, Video (IPTV)
Hybrid deployments (only where needed)
QoS DiffServ Engineering (Cops)
Qualitative Service for DiffServ with RSVP
(as opposed to Quantitative RSVP IntServ model)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

140

RSVP is a natural choice

RFC2205: provides a general facility for


creating and maintaining distributed
reservation state across a mesh of multicast
and unicast delivery paths
TE: use as a general facility for creating and
maintaining distributed forwarding &
reservation state across a mesh of delivery
paths

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

141

RSVP is a natural choice


RFC2205: transfers and manipulates QoS
control parameters as opaque data, passing
them to the appropriate traffic control
module for interpretation
TE: transfer and manipulate explicit route
and label control parameters as opaque data
pass explicit route parameter to the
appropriate routing module, and label
parameter to the MPLS module

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

142

RSVP is a natural choice


Leverage Standardized Protocols
PIM for Multicast MPLS
BGP for MPLS VPNs
RSVP for MPLS Traffic Engineering
LDP (TDP) has been designed because it was easier than fixing all
IGPs (RIP, EIGRP, OSPF, ISIS)

fast deployments and engineering consistency

Leverage Deployed Experience


RSVP deployed since 1996 (IOS 11.2)
ww.isi.edu/rsvp/DOCUMENTS/ietf_rsvp_qos_survey
for a list of RSVP implementations

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

143

RSVP is a natural choice

RSVP easily supports


Dynamic resizing of tunnels or paths through
refresh messages
Supports strict as well as loose source routes
No double counting of bandwidth when rerouting sub-optimal routes

Extensible via definition of new objects


NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

144

RSVP/TE and Scalability

Very Different than IntServ context

State applies to a collection of flows (i.e. a traffic trunk),


rather than to a single (micro) flow
RSVP sessions are used between routers, not hosts
Sessions are long-lived (up to a few weeks)
Paths are not bound by destination-based routing
Reference: Applicability Statement for Extensions to
RSVP for LSP-Tunnels (draft-awduche-mpls-rsvp-tunnelapplicability-01.txt)

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

145

RSVP/TE and Scalability

Very Different than IntServ context


RFC2208: the resource requirements for running
RSVP on a router increases proportionally with the
number of separate sessions
TE: that is why using traffic trunks to aggregate flows
is essential
RFC2208: supporting numerous small reservations
on a high-bandwidth link may easily overtax the
routers and is inadvisable
TE : n/a in the context of TE - traffic trunks aggregate
multiple flows

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

146

TE/RSVP Scalability

With basic RSVP (RFC2205), 10000 RRR LSP tunnels


flowing through a 75x0 or 12000 is not a problem
Already Deployed on a number of Tier-1 ISP backbones
http://www.nanog.org/mtg-9905/hanna.html
Ship with 12.0(5)S

Refresh Aggregation work will again enhance this


scalability

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

147

Conclusion

Using RSVP as MPLS/TE signalling protocol is the natural


and consistent choice
It is however only one part of a whole solution:
MPLS as forwarding engine
IGP (OSPF/ISIS) extensions
Constrained Base Routing (RRR)
RSVP as MPLS/TE Signalling Protocol
Installation of Tunnels in the FIB

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

148

Summary

1999, Cisco Systems, Inc.

149

Traffic Eng

Provides traffic engineering capabilities at


Layer 3
above and beyond of what is provided with
ATM

Could be used for other applications as well


Shipping and deployed in production

NANOG18 - Robert Raszuk

2000, Cisco Systems, Inc.

150

Presentation_ID

1999, Cisco Systems, Inc.

151

You might also like