Professional Documents
Culture Documents
1.10.2012 2
Contents
1.10.2012 3
Network management
challenge
IP Networks require massive effort to configure
and manage
Even 70% of an enterprise network’s cost goes to
maintenance and configuration
Ethernet is much simpler to manage
However Ethernet does not scale well beyond
small LANs
SEATTLE architecture aims to provide scalability
of IP with simplicity of Ethernet management
1.10.2012 4
Why Ethernet is so wonderful
? Easy to setup, easy to manage
DHCP server, some hubs, plug’n play
1.10.2012 5
Flooding query 1: DHCP
requests
Lets say node A joins the ethernet
To get IP / confirm IP – node A sends a DHCP request as a
broadcast
Request floods through the broadcast domain
18.09.2012 6
Flooding query 2: ARP
In order for node A to communicate to node B in
the same broadcast domain, the sender needs
MAC address of the node B
Lets assume that node B IP is know
Node A sends and Address Request Protocol (ARP)
broadcast – to find out MAC address of node B
Similarly to DHCP broadcast – the request is
flooded through the whole broadcast domain
This is basically {IP -> MAC} mapping
1.10.2012 7
Why flooding is bad ?
Large Ethernet deployments contain vast number
of hosts and thousands of bridges
Ethernet was not designed to such a scale
Virtualization and mobile deployments can cause
many dynamic events – causing control traffic
Broadcast messages need to be processed in the
end hosts – interrupting cpu
The bridges forwarding tables grow roughly
linearly with number of hosts
1.10.2012 8
1) Ethernet bridging
Ethernet consists of segments each comprising a
single physical layer
Ethernet bridges are used to interconnect
segments to multi-hop network i.e. LAN
This forms a single broadcast domain
Bridge learns how to reach a host – by inspecting
the incoming frames and associating the source
MAC with the incoming port
A bridge stores this information to a forwarding
table – using the table to forward packets to
correct direction
1.10.2012 9
Bridge spanning tree
One bridge is configured to be the root bridge
Other bridges collectively compute a spanning
tree based on the distance to the root
Thus traffic is not routed through shortest path
but along the spanning tree
This approach avoids broadcast storms
1.10.2012 10
1.10.2012 11
2) Hybrid IP/Ethernet
In this approach multiple LANs are
interconnected with IP routing
In hybrid networks each LAN contains at most a
few hundred of hosts that form IP subnet
IP subnet is associated with the IP prefix
Assigning IP prefixes to subnet and associating
subnets with router interfaces is a manual process
Unlike MAC which is host identifier – IP address
denotes the hosts current location in the network
1.10.2012 12
1.10.2012 13
Drawbacks of Hybrid
approach
Biggest drawback is the configuration overhead
Router interfaces must be configured
Host must have correct IP address corresponding to
the subnet it is located (DHCP can be used)
Networking policies are defined usually per
network prefix i.e. topology
When network changes the policies must be updated
Limited mobility support
Mobile users & virtualized hosts at datacenters
If IP is constant – the user should stay on the same
subnet
1.10.2012 14
3) Virtual LANs
Overcomes some problems of Ethernet and IP
Networks
Administrators can logically groups hosts into
same broadcast domain
VLANS can be configured to overlap – configuring
bridges not the hosts
Now broadcast overhead can be reduced by the
isolates domains
Mobility is simplified – IP address can be retained
while moving between bridges
1.10.2012 15
Virtual LANs
Traffic from B1 to B2 can be ‘trunked’ over
multiple bridges
Inter domain traffic needs to be routed
1.10.2012 16
Drawbacks of VLANs
Trunk configuration overhead
Extending VLAN across multiple bridges requires
VLAN to be configured at each of the bridges
participating. Often manual work.
Limited control plane scalability
Forwarding table entries and broadcast traffic for
every active host and every VLAN visible
Insufficient data plane efficiency
Single spanning tree is still used within each VLAN
Inter-VLAN traffic must be routed via IP gateways
1.10.2012 17
Distributed Hash Tables
Hash tables are used to store {key -> value} pairs
In case of multiple nodes there is nice way to
make
Nodes symmetric
Distribute the hash table entries evenly among nodes
Keep reshuffling of entries small in case of
adding/removing nodes
Idea is to calculate H(key) that is mapped to a
host – one can visualize this to mapping to an
angle (or to a point on a circle)
1.10.2012 18
Distributed Hash Tables
Each node is mapped to randomly distributed
points on the circle
Thus each node is mapped to multiple buckets
One calculates the H(key) – and stores the entry
to the node owning this bucket
If node is removed – the values are now assigned
to next buckets
If node is added – entries are moved to the new
buckets
1.10.2012 19
SEATTLE approach 1/2
1) Switches calculate shortest
path among themselves
This is link state protocol – basically Dijkstra
Switch level discovery protocol – Ethernet hosts do
not respond
Switch topology much more stable than at host level
Much more scalable than at host level
Each switch has an ID – one MAC address of the
switch interfaces
1.10.2012 20
SEATTLE approach 2/2
2) DHT used in switches
{IP->MAC} mapping
This is essentially ARP request avoiding flooding
{MAC->location} mapping
When switch is located – routing along the shortest path
can be used
DCHP Service location can also be stored
SEATTLE thus reduces flooding, allows usage of
shortest path and offers a nice way to locate
DHCP service
1.10.2012 21
SEATTLE
Control overhead reduced with consistent
hashing
When set of switches changes due to network failure
or recovery – only some entries must be moved
Balancing load with virtual switches
If some switches are more powerful – the switch can
represent itself as many – getting more load
Enabling flexible service discovery
This is mainly DHCP – but could be something like
{“PRINTER”->location}
1.10.2012 22
Topology changes
Adding and removing switches/links can alter
topology
Switch/link failures and recoveries can also lead
to partitioning events (more rare)
Non-partitioning link failures are easy to handle
– the resolver for hash entry is not changed
1.10.2012 23
Switch failures
If switch fails or recovers hash entries need to be
moved
The switch that published value – monitors the
liveliness of resolver. Republishing entry when
needed
The entries have TTL
1.10.2012 24
Partitioning events
Each switch has to book keep also locally-stored
location entries
If switch s_old is removed / not reachable – all the
switches need to remove these location entries
This approach correctly handles partitioning
events
1.10.2012 25
Scaling:
location
Hosts use directory service to publish and maintain
{mac->location} mappings
When host a with mac_a arrives – it accesses switch
S_a (steps 1-3)
Switch s_a publishes {mac_a,location}, by calculating the
correct bucket F(mac_a) i.e. switch/resolver
When node b wants to send message to node a
F(mac_a) is calculated to fetch the location
’Reactive resolution’ – also cache misses do not lead
flooding
1.10.2012 26
Scaling:
ARP
When node b makes ARP request – SEATTLE
converts this to a {F(IP_a) -> mac_a} request
The resolver/switch for F(IP_a) is usually
different from F(mac_a)
Optimization for hosts making ARP request
F(IP_a) address resolver can also store mac_a and S_a
When node b makes F(IP_a) ARP request also mac_a-
>S_a mapping is cached to S_b
Shortest path (-> path 10) can now be used
1.10.2012 27
Handling host dynamics
Location change
Wireless handoff
VM moved but retaining MAC
Host MAC address changes
NIC card replaced
Failover event
VM migration forcing MAC change
Host changes IP
DHCP leave expires
Manual reconfiguration
1.10.2012 28
Insert, delete and update
Location change
Host h moves from s_old to s_new
s_new updates the existing mac-to-location entry
MAC change
IP-to-MAC update
MAC-to-location deletion (old) and insertion (new)
IP change
S_h deletes old IP-to-MAC and inserts new IP-to-MAC
1.10.2012 29
Ethernet: Bootstrapping
hosts
Host discovered by access switches
SEATTLE switches snoop ARP requests
Most OSes generate ARP request at boot up / if up
Aldo DHCP messages or host down can be used
Host configuration without broadcast
DHCP_SERVER hashes string “DHCP_SERVER” and
stores the location to the switches
The “DHCP_SERVER” string is used to locate service
No need to broadcast for ARP or DHCP
1.10.2012 30
Scalable and flexible VLANs
1.10.2012 31
Simulations
1) Campus ~40 000 students
517 routers and switches
2) AP-Large (Access Provider)
315 routers
3) Datacenter (DC)
4 core routes with 21 aggregation switches
Routers were converted to SEATTLE switches
1.10.2012 32
Cache timeout and AP-large
with 50k hosts
Shortest path cache timeout
has impact on number of
location lookups
Even with 60s time out 99.98%
packets were forwarded without lookup
Control overhead (blue) decreases very fast – where as the
table size increases only moderately
Shortest path is used in majority of routing in these
simulations
1.10.2012 33
Table size increase in DC
1.10.2012 34
Control overhead in AP-
large
Number of control messages
over all links in the topology
divided by the number switches
and duration of the trace
SEATTLY significantly reduces control overhead in
the simulations
This is mainly because Ethernet generates network
wide floods for a significant number of packets
1.10.2012 35
Effect of switch failure in
DC
Switches were allowed to fail
randomly
The average recover time was
30 seconds
SEATTLE can use all the links in the topology, where
as Ethernet is restricted to the spanning tree
Ethernet must re-compute the tree causing outages
1.10.2012 36
Effect of host mobility in
Campus
Hosts were randomly moved
between access switches
For high mobility rates,
SEATLLES loss rate was
lower than Ethernet
On Ethernet it takes sometime for switches to evict
the stale information location information and re-
learn the new location
SEATTLE provided low loss and broadcast overhead
1.10.2012 37
What was omitted
Authors suggest multi-level one-hop DHTs
With large dynamic networks – it can be beneficial that
entries are stored close
This is achieved with regions and backbone – border
switches connect to the backbone switches
With topology changes
Approach to seamless mobility is described in the paper
Updating remote host caches is required with switch
based MAC revocation lists
Some simulation results
Authors also made sample implementation
1.10.2012 38
Conlusions
Operators today face challenges in managing and
configuring large networks. This is largely to
complexity of administering IP networks.
Ethernet is not a viable alternative
poor scaling and inefficient path selection
SEATTLE promises scalable self-configuring routing
Simulations suggest efficient routing, low latency with
quick recovery
Host mobility supported with low control overhead
Ethernet stacks at end hosts are not modified
1.10.2012 39
Thank you for your attention!
Questions? Comments?
1.10.2012 40