You are on page 1of 28

Routing Design for Large Scale Data

Centers:

Global Networking Services Team, Global Foundation Services, Microsoft Corporation


2
Problem Statement

3
100s thousands of servers
10G NICs

Aware of the network


Explicit parallelism
Example: Web Index computation

4
Query
Background

Query

Background

5
The simpler the better

Single protocol
Simple behavior
Wide vendor support

6
What We Started With

7
Folded on diagram

ECMP Based

8
9
10
Why BGP over IGP

11
Better vendor interoperability
Less state-machines, data-structures etc

Use for unequal-cost Anycast load-balancing solution

12
BGP RIB structure is simpler compared to link-state
LSDB
Clear picture of what sent where (RIBIn, RIBOut)

E.g. link failures have limited propagation scope


More stability due to reduced event flooding domains

13
Not a problem with automated configuration generation

Is not our primary goal anyways, few seconds are OK


Practical convergence in less than a second

14
The New Approach

15
Broadcast storms
Hard to troubleshoot

Bandwidths scales up, and


not out
16
17
18
No need to buy higher-radix boxes
Cheaper infrastructure

No interworking/redistributions etc

19
Details and Design
Choices

20
We rely on ECMP for routing
Needed for Anycast prefixes

Simplifies path hiding at WAN edge (remove private


AS)
Simplifies route-filtering at WAN edge (single
regexp)

21
Allow AS in

22
AS_PATH Multipath Relax
Allow AS In
Fast eBGP Fall-over
Remove Private AS

23
Otherwise: Route Black-
Holing on link failure!

24
Otherwise: Route Black-
Holing on link failure!

25
This made it perfect choice for us!

26
Questions?

27

You might also like