07 Seattle

SEATTLE
- A Scalable Ethernet Architecture

for Large Enterprises
M.Sc. Pekka Hippeläinen

IBM
phippela@gmail
1.10.2012 T-110.6120 – Special Course in Future Internet Technologies 1

SEATTLE
 Based on and pictures borrowed

from:Changhoon,K;Caesar,M;Rexford,J.
Floodless in SEATTLE: A Scalable Ethernet
Architecture for Large Enterprises
 Is it possible to build a protocol that maintains the

same configuration-free properties as Ethernet
bridging, yet scales to large networks?
1.10.2012 2
Contents
 Motivation: network management challenge

 Ethernet features: ARP and DHCP broadcasts
 1) Ethernet Bridging
 2) Scaling with Hybrid networks
 3) Scaling with VLANs
 Distributed Hashing
 SEATTLE approach
 Results
 Conclusions
1.10.2012 3
Network management
challenge
 IP Networks require massive effort to configure
and manage
 Even 70% of an enterprise network’s cost goes to
maintenance and configuration
 Ethernet is much simpler to manage
 However Ethernet does not scale well beyond
small LANs
 SEATTLE architecture aims to provide scalability
of IP with simplicity of Ethernet management
1.10.2012 4
Why Ethernet is so wonderful
? Easy to setup, easy to manage
 DHCP server, some hubs, plug’n play
1.10.2012 5
Flooding query 1: DHCP
requests
 Lets say node A joins the ethernet
 To get IP / confirm IP – node A sends a DHCP request as a
broadcast
 Request floods through the broadcast domain
18.09.2012 6
Flooding query 2: ARP
 In order for node A to communicate to node B in
the same broadcast domain, the sender needs
MAC address of the node B
 Lets assume that node B IP is know
 Node A sends and Address Request Protocol (ARP)
broadcast – to find out MAC address of node B
 Similarly to DHCP broadcast – the request is
flooded through the whole broadcast domain
 This is basically {IP -> MAC} mapping
1.10.2012 7
Why flooding is bad ?
 Large Ethernet deployments contain vast number
of hosts and thousands of bridges
 Ethernet was not designed to such a scale
 Virtualization and mobile deployments can cause
many dynamic events – causing control traffic
 Broadcast messages need to be processed in the
end hosts – interrupting cpu
 The bridges forwarding tables grow roughly
linearly with number of hosts
1.10.2012 8
1) Ethernet bridging
 Ethernet consists of segments each comprising a
single physical layer
 Ethernet bridges are used to interconnect
segments to multi-hop network i.e. LAN
 This forms a single broadcast domain
 Bridge learns how to reach a host – by inspecting
the incoming frames and associating the source
MAC with the incoming port
 A bridge stores this information to a forwarding
table – using the table to forward packets to
correct direction
1.10.2012 9
Bridge spanning tree
 One bridge is configured to be the root bridge
 Other bridges collectively compute a spanning
tree based on the distance to the root
 Thus traffic is not routed through shortest path
but along the spanning tree
 This approach avoids broadcast storms
1.10.2012 10
1.10.2012 11
2) Hybrid IP/Ethernet
 In this approach multiple LANs are
interconnected with IP routing
 In hybrid networks each LAN contains at most a
few hundred of hosts that form IP subnet
 IP subnet is associated with the IP prefix
 Assigning IP prefixes to subnet and associating
subnets with router interfaces is a manual process
 Unlike MAC which is host identifier – IP address
denotes the hosts current location in the network
1.10.2012 12
1.10.2012 13
Drawbacks of Hybrid
approach
 Biggest drawback is the configuration overhead
 Router interfaces must be configured
 Host must have correct IP address corresponding to
the subnet it is located (DHCP can be used)
 Networking policies are defined usually per
network prefix i.e. topology
 When network changes the policies must be updated
 Limited mobility support
 Mobile users & virtualized hosts at datacenters
 If IP is constant – the user should stay on the same
subnet
1.10.2012 14
3) Virtual LANs
 Overcomes some problems of Ethernet and IP
Networks
 Administrators can logically groups hosts into
same broadcast domain
 VLANS can be configured to overlap – configuring
bridges not the hosts
 Now broadcast overhead can be reduced by the
isolates domains
 Mobility is simplified – IP address can be retained
while moving between bridges
1.10.2012 15
Virtual LANs
 Traffic from B1 to B2 can be ‘trunked’ over
multiple bridges
 Inter domain traffic needs to be routed
1.10.2012 16
Drawbacks of VLANs
 Trunk configuration overhead
 Extending VLAN across multiple bridges requires
VLAN to be configured at each of the bridges
participating. Often manual work.
 Limited control plane scalability
 Forwarding table entries and broadcast traffic for
every active host and every VLAN visible
 Insufficient data plane efficiency
 Single spanning tree is still used within each VLAN
 Inter-VLAN traffic must be routed via IP gateways
1.10.2012 17
Distributed Hash Tables
 Hash tables are used to store {key -> value} pairs
 In case of multiple nodes there is nice way to
make
 Nodes symmetric
 Distribute the hash table entries evenly among nodes
 Keep reshuffling of entries small in case of
adding/removing nodes
 Idea is to calculate H(key) that is mapped to a
host – one can visualize this to mapping to an
angle (or to a point on a circle)
1.10.2012 18
Distributed Hash Tables
 Each node is mapped to randomly distributed
points on the circle
 Thus each node is mapped to multiple buckets
 One calculates the H(key) – and stores the entry
to the node owning this bucket
 If node is removed – the values are now assigned
to next buckets
 If node is added – entries are moved to the new
buckets
1.10.2012 19
SEATTLE approach 1/2
 1) Switches calculate shortest
path among themselves
 This is link state protocol – basically Dijkstra
 Switch level discovery protocol – Ethernet hosts do
not respond
 Switch topology much more stable than at host level
 Much more scalable than at host level
 Each switch has an ID – one MAC address of the
switch interfaces
1.10.2012 20
SEATTLE approach 2/2
 2) DHT used in switches
 {IP->MAC} mapping
 This is essentially ARP request avoiding flooding
 {MAC->location} mapping
 When switch is located – routing along the shortest path
can be used
 DCHP Service location can also be stored
 SEATTLE thus reduces flooding, allows usage of
shortest path and offers a nice way to locate
DHCP service
1.10.2012 21
SEATTLE
 Control overhead reduced with consistent
hashing
 When set of switches changes due to network failure
or recovery – only some entries must be moved
 Balancing load with virtual switches
 If some switches are more powerful – the switch can
represent itself as many – getting more load
 Enabling flexible service discovery
 This is mainly DHCP – but could be something like
{“PRINTER”->location}
1.10.2012 22
Topology changes
 Adding and removing switches/links can alter
topology
 Switch/link failures and recoveries can also lead
to partitioning events (more rare)
 Non-partitioning link failures are easy to handle
– the resolver for hash entry is not changed
1.10.2012 23
Switch failures
 If switch fails or recovers hash entries need to be
moved
 The switch that published value – monitors the
liveliness of resolver. Republishing entry when
needed
 The entries have TTL
1.10.2012 24
Partitioning events
 Each switch has to book keep also locally-stored
location entries
 If switch s_old is removed / not reachable – all the
switches need to remove these location entries
 This approach correctly handles partitioning
events
1.10.2012 25
Scaling:
location
 Hosts use directory service to publish and maintain
{mac->location} mappings
 When host a with mac_a arrives – it accesses switch
S_a (steps 1-3)
 Switch s_a publishes {mac_a,location}, by calculating the
correct bucket F(mac_a) i.e. switch/resolver
 When node b wants to send message to node a
 F(mac_a) is calculated to fetch the location
 ’Reactive resolution’ – also cache misses do not lead
flooding
1.10.2012 26
Scaling:
ARP
 When node b makes ARP request – SEATTLE
converts this to a {F(IP_a) -> mac_a} request
 The resolver/switch for F(IP_a) is usually
different from F(mac_a)
 Optimization for hosts making ARP request
 F(IP_a) address resolver can also store mac_a and S_a
 When node b makes F(IP_a) ARP request also mac_a-
>S_a mapping is cached to S_b
 Shortest path (-> path 10) can now be used
1.10.2012 27
Handling host dynamics
 Location change
 Wireless handoff
 VM moved but retaining MAC
 Host MAC address changes
 NIC card replaced
 Failover event
 VM migration forcing MAC change
 Host changes IP
 DHCP leave expires
 Manual reconfiguration
1.10.2012 28
Insert, delete and update
 Location change
 Host h moves from s_old to s_new
 s_new updates the existing mac-to-location entry
 MAC change
 IP-to-MAC update
 MAC-to-location deletion (old) and insertion (new)
 IP change
 S_h deletes old IP-to-MAC and inserts new IP-to-MAC
1.10.2012 29
Ethernet: Bootstrapping
hosts
 Host discovered by access switches
 SEATTLE switches snoop ARP requests
 Most OSes generate ARP request at boot up / if up
 Aldo DHCP messages or host down can be used
 Host configuration without broadcast
 DHCP_SERVER hashes string “DHCP_SERVER” and
stores the location to the switches
 The “DHCP_SERVER” string is used to locate service
 No need to broadcast for ARP or DHCP
1.10.2012 30
Scalable and flexible VLANs
 To support broadcasts – the authors suggest

using groups
 Similar to VLAN - groups is defined as a set of
hosts who share the same broadcast domain
 The groups are not limited to layer-2 reachability
 Multicast-based group-wide broadcasting
 Multicast tree with broadcast root for each group
 F(group_id) used for broadcast root location
1.10.2012 31
Simulations
 1) Campus ~40 000 students
 517 routers and switches
 2) AP-Large (Access Provider)
 315 routers
 3) Datacenter (DC)
 4 core routes with 21 aggregation switches
 Routers were converted to SEATTLE switches
1.10.2012 32
Cache timeout and AP-large
with 50k hosts
 Shortest path cache timeout
has impact on number of
location lookups
 Even with 60s time out 99.98%
packets were forwarded without lookup
 Control overhead (blue) decreases very fast – where as the
table size increases only moderately
 Shortest path is used in majority of routing in these
simulations
1.10.2012 33
Table size increase in DC
 Ethernet bridges stores entry

for each destination ~ O(sh)
behavior across network
 SEATTLE requires only ~O(h) state since only access
and resolver switches need to store and location
information for each hosts
 With this topology the table size was reduced by factor of 22
 In AP-large case the factor was increased to 64
1.10.2012 34
Control overhead in AP-
large
 Number of control messages
over all links in the topology
divided by the number switches
and duration of the trace
 SEATTLY significantly reduces control overhead in
the simulations
 This is mainly because Ethernet generates network
wide floods for a significant number of packets
1.10.2012 35
Effect of switch failure in
DC
 Switches were allowed to fail
randomly
 The average recover time was
30 seconds
 SEATTLE can use all the links in the topology, where
as Ethernet is restricted to the spanning tree
 Ethernet must re-compute the tree causing outages
1.10.2012 36
Effect of host mobility in
Campus
 Hosts were randomly moved
between access switches
 For high mobility rates,
SEATLLES loss rate was
lower than Ethernet
 On Ethernet it takes sometime for switches to evict
the stale information location information and re-
learn the new location
 SEATTLE provided low loss and broadcast overhead
1.10.2012 37
What was omitted
 Authors suggest multi-level one-hop DHTs
 With large dynamic networks – it can be beneficial that
entries are stored close
 This is achieved with regions and backbone – border
switches connect to the backbone switches
 With topology changes
 Approach to seamless mobility is described in the paper
 Updating remote host caches is required with switch
based MAC revocation lists
 Some simulation results
 Authors also made sample implementation
1.10.2012 38
Conlusions
 Operators today face challenges in managing and
configuring large networks. This is largely to
complexity of administering IP networks.
 Ethernet is not a viable alternative
 poor scaling and inefficient path selection
 SEATTLE promises scalable self-configuring routing
 Simulations suggest efficient routing, low latency with
quick recovery
 Host mobility supported with low control overhead
 Ethernet stacks at end hosts are not modified
1.10.2012 39
Thank you for your attention!
Questions? Comments?
1.10.2012 40

07 Seattle

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

07 Seattle

Uploaded by

Copyright:

Available Formats

SEATTLE

- A Scalable Ethernet Architecture

M.Sc. Pekka Hippeläinen

1.10.2012 T-110.6120 – Special Course in Future Internet Technologies 1

 Based on and pictures borrowed

 Is it possible to build a protocol that maintains the

 Motivation: network management challenge

 To support broadcasts – the authors suggest

 Ethernet bridges stores entry

You might also like