You are on page 1of 154

Chapter 4

Network Layer:
Data Plane
A note on the use of these PowerPoint slides:
We’re making these slides freely available to all (faculty, students,
readers). They’re in PowerPoint form so you see the animations; and
can add, modify, and delete slides (including this one) and slide content
to suit your needs. They obviously represent a lot of work on our part.
In return for use, we only ask the following:
 If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
 If you post any slides on a www site, that you note that they are
adapted from (or perhaps identical to) our slides, and note our
copyright of this material.
Computer Networking: A
For a revision history, see the slide note for this page.
Top-Down Approach
Thanks and enjoy! JFK/KWR 8th edition
Jim Kurose, Keith Ross
All material copyright 1996-2020
J.F Kurose and K.W. Ross, All Rights Reserved Pearson, 2020
Network layer: our goals
 understand principles  instantiation, implementation
behind network layer in the Internet
services, focusing on data • IP protocol
plane: • NAT, middleboxes
• network layer service models
• forwarding versus routing
• how a router works
• addressing
• generalized forwarding
• Internet architecture
Network Layer: 4-2
Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 Virtual Circuit and Datagram  hierarchical routing
Networks  Routing in the Internet
 IP: the Internet Protocol  RIP
 datagram format  OSPF
 addressing  Routing among ISPs
 network address translation
 BGP
 IPv6

Network Layer: 4-3


Network-layer services and protocols
 transport segment from sending to mobile network

receiving host national or global ISP

• sender: encapsulates segments into


datagrams, passes to link layer application
transport

• receiver: delivers segments to network


link

transport layer protocol physical


network
link
network
link
physical
 network layer protocols in every
physical

Internet device: hosts, routers network


link network
link
physical

 routers:
physical network
link datacenter
physical network

• examines header fields in all IP


datagrams passing through it application
transport
network
enterprise
• moves datagrams from input ports to network
link
physical

output ports to transfer datagrams


along end-end path Network Layer: 4-4
Two key network-layer functions
network-layer functions: analogy: taking a trip
 forwarding: move packets from  forwarding: process of getting
a router’s input link to through single interchange
appropriate router output link  routing: process of planning trip
 routing: determine route taken from source to destination
by packets from source to
destination
• routing algorithms

forwarding

routing
Network Layer: 4-5
Network layer: data plane, control plane
Data plane: Control plane
 local, per-router function  network-wide logic
 determines how datagram  determines how datagram is
arriving on router input port routed among routers along end-
is forwarded to router end path from source host to
output port destination host
values in arriving  two control-plane approaches:
packet header
• traditional routing algorithms:
0111 1 implemented in routers
2
3 • software-defined networking (SDN):
implemented in (remote) servers

Network Layer: 4-6


Per-router control plane
Individual routing algorithm components in each and every
router interact in the control plane

Routing
Algorithm
control
plane

data
plane

values in arriving
packet header
0111 1
2
3

Network Layer: 4-7


Network layer: roadmap
 Network layer: overview  Routing algorithms
 data plane  link state
 control plane  distance vector
 Virtual Circuit and Datagram  hierarchical routing
Networks  Routing in the Internet
 IP: the Internet Protocol  RIP
 datagram format  OSPF
 addressing  Routing among ISPs
 network address translation
 BGP
 IPv6

Network Layer: 4-8


Connection, connection-less service
datagram network provides network-layer connectionless service
virtual-circuit network provides network-layer connection service
analogous to TCP/UDP connecton-oriented / connectionless
transport-layer services, but:
 service: host-to-host
 no choice: network provides one or the other
 implementation: in network core

Network Layer 4-9


Virtual circuits
“source-to-dest path behaves much like telephone
circuit”
• performance-wise
• network actions along source-to-dest path

 call setup, teardown for each call before data can flow
 each packet carries VC identifier (not destination host address)
 every router on source-dest path maintains “state” for each
passing connection
 link, router resources (bandwidth, buffers) may be allocated to
VC (dedicated resources = predictable service)

Network Layer 4-10


VC implementation
a VC consists of:
1. path from source to destination
2. VC numbers, one number for each link along path
3. entries in forwarding tables in routers along path
 packet belonging to VC carries VC number (rather than dest
address)
 VC number can be changed on each link.
 new VC number comes from forwarding table

Network Layer 4-11


VC forwarding table
12 22 32

1 3
2
VC number
interface
forwarding table in number
northwest router:
Incoming interface Incoming VC # Outgoing interface Outgoing VC #

1 12 3 22
2 63 1 18
3 7 2 17
1 97 3 87
… … … …

VC routers maintain connection state information!


Network Layer 4-12
Virtual circuits: signaling
protocols
 used to setup, maintain teardown VC
 used in ATM, frame-relay, X.25
 not used in today’s Internet

application application
5. data flow begins 6. receive data
transport transport
network 4. call connected 3. accept call
1. initiate call network
data link 2. incoming call
data link
physical physical

Network Layer 4-13


Datagram networks
 no call setup at network layer
 routers: no state about end-to-end connections
• no network-level concept of “connection”
 packets forwarded using destination host address

application application
transport transport
network 1. send datagrams 2. receive datagrams network
data link data link
physical physical

Network Layer 4-14


Datagram forwarding table
4 billion IP addresses, so
routing algorithm rather than list individual
destination address
local forwarding table
list range of addresses
dest address output (aggregate table entries)
address-range 1 3 link
address-range 2 2
address-range 3 2
address-range 4 1

IP destination address in
arriving packet’s header
1
3 2

Network Layer 4-15


Datagram forwarding table
Destination Address Range Link Interface

11001000 00010111 00010000 00000000


through 0
11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000


through 1
11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000


through 2
11001000 00010111 00011111 11111111

otherwise 3

Q: but what happens if ranges don’t divide up so nicely?


Network Layer 4-16
Destination-based forwarding

Q: but what happens if ranges don’t divide up so nicely?


Network Layer: 4-17
Longest prefix matching
longest prefix match
when looking for forwarding table entry for given
destination address, use longest address prefix that
matches destination address.

Destination Address Range Link interface


11001000 00010111 00010*** ******** 0
11001000 00010111 00011000 ******** 1
11001000 00010111 00011*** ******** 2
otherwise 3

11001000 00010111 00010110 10100001 which interface?


examples:
11001000 00010111 00011000 10101010 which interface?
Network Layer: 4-18
Longest prefix matching
longest prefix match
when looking for forwarding table entry for given
destination address, use longest address prefix that
matches destination address.

Destination Address Range Link interface


11001000 00010111 00010*** ******** 0
11001000 00010111 00011000 ******** 1
11001000 match!
00010111 00011*** ******** 2
otherwise 3

11001000 00010111 00010110 10100001 which interface?


examples:
11001000 00010111 00011000 10101010 which interface?
Network Layer: 4-19
Longest prefix matching
longest prefix match
when looking for forwarding table entry for given
destination address, use longest address prefix that
matches destination address.

Destination Address Range Link interface


11001000 00010111 00010*** ******** 0
11001000 00010111 00011000 ******** 1
11001000 00010111 00011*** ******** 2
otherwise 3
match!
11001000 00010111 00010110 10100001 which interface?
examples:
11001000 00010111 00011000 10101010 which interface?
Network Layer: 4-20
Longest prefix matching
longest prefix match
when looking for forwarding table entry for given
destination address, use longest address prefix that
matches destination address.

Destination Address Range Link interface


11001000 00010111 00010*** ******** 0
11001000 00010111 00011000 ******** 1
11001000 00010111 00011*** ******** 2
otherwise 3
match!
11001000 00010111 00010110 10100001 which interface?
examples:
11001000 00010111 00011000 10101010 which interface?
Network Layer: 4-21
Datagram or VC network: why?
Internet (datagram) ATM (VC)
 data exchange among  evolved from telephony
computers  human conversation:
• “elastic” service, no strict timing • strict timing, reliability
req. requirements
• need for guaranteed service
 many link types
• different characteristics  “dumb” end systems
• uniform service difficult
• telephones
• complexity inside network
 “smart” end systems
(computers)
• can adapt, perform control, error
recovery
• simple inside network,
complexity at “edge”

Network Layer 4-22


Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 Virtual Circuit and Datagram  hierarchical routing
Networks  Routing in the Internet
 IP: the Internet Protocol  RIP
• datagram format  OSPF
• addressing  Routing among ISPs
• network address translation  BGP
• IPv6

Network Layer: 4-23


Network Layer: Internet
host, router network layer functions:

transport layer: TCP, UDP

Path-selection
IP protocol
• datagram format
algorithms: • addressing
network implemented in • packet handling conventions
• routing protocols forwarding
layer (OSPF, BGP) table ICMP protocol
• SDN controller • error reporting
• router “signaling”

link layer

physical layer

Network Layer: 4-24


IP Datagram format
32 bits
IP protocol version number total datagram
ver head. type of length length (bytes)
header length(bytes) len service
fragment fragmentation/
“type” of service: 16-bit identifier flgs
 diffserv (0:5) offset reassembly
time to upper header
 ECN (6:7) header checksum
live layer checksum
TTL: remaining max hops source IP address 32-bit source IP address
(decremented at each router)
Maximum length: 64K bytes
destination IP address 32-bit destination IP address
upper layer protocol (e.g., TCP or UDP) Typically: 1500 bytes or less
options (if any) e.g., timestamp, record
overhead route taken
 20 bytes of TCP payload data
 20 bytes of IP (variable length,
 = 40 bytes + app typically a TCP
layer overhead for or UDP segment)
TCP+IP
Network Layer: 4-25
IP fragmentation/reassembly
 network links have MTU (max.
transfer size) - largest possible link-
fragmentation:
level frame


in: one large datagram
• different link types, different MTUs out: 3 smaller datagrams

 large IP datagram divided


(“fragmented”) within net reassembly
• one datagram becomes several
datagrams
• “reassembled” only at destination


• IP header bits used to identify, order
related fragments

Network Layer: 4-26


IP fragmentation/reassembly
example: length ID fragflag offset
=4000 =x =0 =0
 4000 byte datagram
 MTU = 1500 bytes one large datagram becomes
several smaller datagrams

1480 bytes in length ID fragflag offset


data field =1500 =x =1 =0

offset = length ID fragflag offset


1480/8 =1500 =x =1 =185

length ID fragflag offset


=1040 =x =0 =370

Network Layer: 4-27


IP addressing: introduction
223.1.1.1

 IP address: 32-bit identifier 223.1.2.1


associated with each host or router 223.1.1.2
interface 223.1.1.4 223.1.2.9

 interface: connection between 223.1.1.3


223.1.3.27

host/router and physical link 223.1.2.2

• router’s typically have multiple


interfaces 223.1.3.1 223.1.3.2

• host typically has one or two


interfaces (e.g., wired Ethernet,
wireless 802.11) dotted-decimal IP address notation:
 IP addresses associated with each 223.1.1.1 = 11011111 00000001 00000001 00000001
interface
223 1 1 1
 Written in dotted-decimal notations Network Layer: 4-28
IP addressing: introduction
223.1.1.1

 IP address: 32-bit identifier 223.1.2.1


associated with each host or 223.1.1.2
router interface 223.1.1.4 223.1.2.9

 interface: connection between 223.1.1.3


223.1.3.27

host/router and physical link 223.1.2.2

• router’s typically have multiple


interfaces 223.1.3.1 223.1.3.2

• host typically has one or two


interfaces (e.g., wired Ethernet,
wireless 802.11) dotted-decimal IP address notation:
223.1.1.1 = 11011111 00000001 00000001 00000001

223 1 1 1
Network Layer: 4-29
IP addressing: introduction
223.1.1.1

Q: how are interfaces 223.1.2.1

actually connected? 223.1.1.2

A: we’ll learn about A: wired


223.1.1.4 223.1.2.9

that in chapters 6, 7 Ethernet interfaces


connected by 223.1.1.3
223.1.3.27
223.1.2.2
Ethernet switches

223.1.3.1 223.1.3.2

For now: don’t need to worry


about how one interface is
connected to another (with no
intervening router) A: wireless WiFi interfaces
connected by WiFi base station

Network Layer: 4-30


IP addressing

Binary Notation Hexadecimal Notation


Network Layer 4-31
IP addressing

Network Layer 4-32


IP addressing (Classful
Addressing)

Network Layer 4-33


IP addressing (Classful
Addressing)
Exercise:

Network Layer 4-34


IP addressing (Classful
Addressing)
Solution:

Network Layer 4-35


IP addressing (Classful
Addressing)

Network Layer 4-36


IP addressing (Classful
Addressing)

Network Layer 4-37


IP addressing (Classful
Addressing)

Network Layer 4-38


IP addressing (Classful
Addressing)

Network Layer 4-39


IP addressing (Classful
Addressing)

Network Layer 4-40


IP addressing (Classful
Addressing)

Network Layer 4-41


IP addressing (Classful
Addressing)

Network Layer 4-42


IP addressing: Classless
Addressing
Note Limitation of Classful Addressing

In classful addressing, a large part of the


available addresses were wasted.

Classful addressing, which is almost


obsolete, is replaced with classless
addressing.

43
IP addressing: CIDR
CIDR: Classless InterDomain Routing (pronounced “cider”)
• subnet portion of address of arbitrary length
• address format: a.b.c.d/x, where x is # bits in subnet portion
of address
subnet host
part part
11001000 00010111 00010000 00000000
200.23.16.0/23

Network Layer: 4-44


Subnets
223.1.1.1

 What’s a subnet ? 223.1.2.1

• device interfaces that can 223.1.1.2


223.1.1.4 223.1.2.9
physically reach each other
without passing through an 223.1.1.3
223.1.3.27

intervening router 223.1.2.2

 IP addresses have structure:


• subnet part: devices in same subnet 223.1.3.1 223.1.3.2

have common high order bits


• host part: remaining low order bits network consisting of 3 subnets

Network Layer: 4-45


Subnets subnet 223.1.1.0/24
223.1.1.1 subnet 223.1.2.0/24

Recipe for defining subnets: 223.1.2.1

 detach each interface from its 223.1.1.2


223.1.1.4 223.1.2.9

host or router, creating


“islands” of isolated networks 223.1.1.3
223.1.3.27
223.1.2.2

 each isolated network is


subnet
called a subnet 223.1.3.0/24 223.1.3.1 223.1.3.2

subnet mask: /24


(high-order 24 bits: subnet part of IP address)

Network Layer: 4-46


Subnets 223.1.1.2

subnet 223.1.1/24
223.1.1.1
 where are the 223.1.1.4

subnets? 223.1.1.3

 what are 223.1.7.0


223.1.9.2
the /24 subnet 223.1.9/24
subnet 223.1.7/24

subnet
addresses? 223.1.9.1 223.1.7.1
223.1.8.1 223.1.8.0

subnet 223.1.2/24 223.1.2.6 subnet 223.1.8/24 223.1.3.27


subnet 223.1.3/24
223.1.2.1 223.1.2.2 223.1.3.1 223.1.3.2

Network Layer: 4-47


IP addressing: CIDR
MASK
 How do we know how many bits represent the network
portion and how many bits represent the host portion?
 The prefix length is the number of bits in the address that
gives us the network portion. For example, in 172.16.4.0 /24,
the /24 is the prefix length - it tells us that the first 24 bits are
the network address.
 another entity that is used to specify the network portion is
called subnet mask. The subnet mask consists of 32 bits, just
as the address does, and uses 1s and 0s to indicate which bits
of the address are network bits and which bits are host bits.

Network Layer 4-48


IP addressing (CIDR)

Solution
The binary representation of the given address is
11001111 00010000 00100101 00011000
For first addresses, we set (32−29) = 3 rightmost bits to 0, we get
11001111 00010000 00100101 00011000
For last addresses, we set (32 − 29) = 3 rightmost bits to 1, we get
11001111 00010000 00100101 00011111
The value of n is 29, which means that the range of the block is
2 32−29or 8.
So, There are only 8 addresses in this block
Network Layer 4-49
IP addressing (CIDR)

Network Layer 4-50


IP addressing (CIDR)
 Subnetting allows for creating multiple logical networks from a single
address block
 Subnets are created using one or more of the host bits as network bits
 done by extending the mask to borrow some of the bits from the host portion
to create additional network bits

Network Layer 4-51


IP addressing (CIDR)
Calculating Subnets and Hosts
 The number of subnets is calculated using 2n, where n is the number of
bits borrowed
 21 = 2 subnets
 the more bits borrowed, the more subnets can be defined

 The number of useable hosts per subnet is calculated using 2h - 2 where h


is the number of host bits left
 27 – 2 = 126 useable hosts per subnet
 with each bit borrowed, fewer host addresses are available per subnet

Network Layer 4-52


IP addressing (CIDR)
Subnetting Example 1
 Need to borrow a minimum of 2 host bits.
 22 = 4 subnets

Network Layer 4-53


IP addressing (CIDR)
Subnetting Example 2
 Need to borrow a minimum of 3 host bits.
 23 = 8 subnets

Network Layer 4-54


IP addressing (CIDR)

Network Layer 4-55


IP addressing (CIDR)

Network Layer 4-56


IP addressing (CIDR)

Network Layer 4-57


IP addressing (CIDR)

Network Layer 4-58


IP addressing (CIDR)

Network Layer 4-59


IP addresses: how to get one?
Host address can be configured manually.
Q: How does a host get IP address?

 hard-coded by system admin in a file


• Windows: control-panel->network->configuration->tcp/ip->properties
• UNIX: /etc/rc.config
 DHCP: Dynamic Host Configuration Protocol: dynamically get
address from as server
• “plug-and-play”
• DHCP allows a host to obtain an IP address automatically.

Network Layer 4-60


DHCP: Dynamic Host Configuration
Protocol
goal: host dynamically obtains IP address from network server when it
“joins” network
 can renew its lease on address in use
 allows reuse of addresses (only hold address while connected/on)
 support for mobile users who join/leave network

DHCP overview:
 host broadcasts DHCP discover msg [optional]
 DHCP server responds with DHCP offer msg [optional]
 host requests IP address: DHCP request msg
 DHCP server sends address: DHCP ack msg
Network Layer: 4-61
DHCP client-server scenario
Typically, DHCP server will be co-
DHCP server located in router, serving all subnets
223.1.1.1
223.1.2.1
to which router is attached

223.1.2.5
223.1.1.2
223.1.1.4 223.1.2.9

223.1.1.3
223.1.3.27 arriving DHCP client needs
223.1.2.2 address in this network

223.1.3.1 223.1.3.2

Network Layer: 4-62


DHCP client-server scenario
DHCP server: 223.1.2.5 DHCP discover Arriving client
src : 0.0.0.0, 68
Broadcast: is there a
dest.: 255.255.255.255,67
DHCPyiaddr:
server 0.0.0.0
out there?
transaction ID: 654

DHCP offer
src: 223.1.2.5, 67
Broadcast: I’m a DHCP
dest: 255.255.255.255, 68
server!
yiaddrr:Here’s an IP
223.1.2.4
transaction
address youID:can
654use
lifetime: 3600 secs
The two steps above can
DHCP request be skipped “if a client
src: 0.0.0.0, 68 remembers and wishes to
dest:: 255.255.255.255, 67
Broadcast: OK. I would reuse a previously
yiaddrr: 223.1.2.4 allocated network address”
like totransaction
use this ID:
IP655
address!
lifetime: 3600 secs
[RFC 2131]

DHCP ACK
src: 223.1.2.5, 67
Broadcast: OK. You’ve
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
got that IPID:
transaction address!
655
lifetime: 3600 secs
Network Layer: 4-63
Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 What’s inside a router  hierarchical routing
• input ports, switching, output ports Routing in the Internet
• buffer management, scheduling  RIP
 IP: the Internet Protocol  OSPF
• datagram format  Routing among ISPs
• addressing  BGP
• network address translation
• IPv6
Network Layer: 4-64
NAT: network address translation
NAT: all devices in local network share just one IPv4 address as
far as outside world is concerned
rest of local network (e.g., home
Internet network) 10.0.0/24

10.0.0.1
138.76.29.7 10.0.0.4

10.0.0.2

10.0.0.3

all datagrams leaving local network have datagrams with source or destination in
same source NAT IP address: 138.76.29.7, this network have 10.0.0/24 address for
but different source port numbers source, destination (as usual)
Network Layer: 4-65
NAT: network address translation
 all devices in local network have 32-bit addresses in a “private” IP
address space (10/8, 172.16/12, 192.168/16 prefixes) that can only
be used in local network
 advantages:
 just one IP address needed from provider ISP for all devices
 can change addresses of host in local network without notifying
outside world
 can change ISP without changing addresses of devices in local
network
 security: devices inside local net not directly addressable, visible
by outside world

Network Layer: 4-66


NAT: network address translation
implementation: NAT router must (transparently):
 outgoing datagrams: replace (source IP address, port #) of every
outgoing datagram to (NAT IP address, new port #)
• remote clients/servers will respond using (NAT IP address, new port
#) as destination address
 remember (in NAT translation table) every (source IP address, port #)
to (NAT IP address, new port #) translation pair
 incoming datagrams: replace (NAT IP address, new port #) in
destination fields of every incoming datagram with corresponding
(source IP address, port #) stored in NAT table
Network Layer: 4-67
NAT: network address translation
NAT translation table
2: NAT router changes 1: host 10.0.0.1 sends
WAN side addr LAN side addr datagram to
datagram source address
from 10.0.0.1, 3345 to 138.76.29.7, 5001 10.0.0.1, 3345 128.119.40.186, 80
138.76.29.7, 5001, …… ……
updates table
S: 10.0.0.1, 3345
D: 128.119.40.186, 80
10.0.0.1
1
S: 138.76.29.7, 5001
2 D: 128.119.40.186, 80 10.0.0.4
10.0.0.2
138.76.29.7 S: 128.119.40.186, 80
D: 10.0.0.1, 3345
4
S: 128.119.40.186, 80 10.0.0.3
D: 138.76.29.7, 5001 3
3: reply arrives, destination
address: 138.76.29.7, 5001

Network Layer: 4-68


ICMP: internet control message
protocol
 used by hosts & routers to
Type Code description
communicate network- 0 0 echo reply (ping)
level information 3 0 dest. network unreachable
• error reporting: 3 1 dest host unreachable
unreachable host, network, 3 2 dest protocol unreachable
port, protocol 3 3 dest port unreachable
• echo request/reply (used by 3 6 dest network unknown
ping) 3 7 dest host unknown
4 0 source quench (congestion
 network-layer “above” IP: control - not used)
• ICMP msgs carried in IP 8 0 echo request (ping)
datagrams 9 0 route advertisement
10 0 router discovery
 ICMP message: type, code 11 0 TTL expired
plus first 8 bytes of IP 12 0 bad IP header
datagram causing error
Network Layer 4-69
Traceroute and ICMP
source sends series of when ICMP messages
UDP segments to dest arrives, source records
 first set has TTL =1 RTTs
 second set has TTL=2, etc.
 unlikely port number stopping criteria:
 UDP segment eventually
when nth set of arrives at destination host
datagrams arrives to nth  destination returns ICMP
router: “port unreachable”
 router discards datagrams message (type 3, code 3)
 and sends source ICMP  source stops
messages (type 11, code 0)
 ICMP messages includes
name of router & IP address
3 probes 3 probes

3 probes
Network Layer 4-70
IPv6: motivation
 initial motivation: 32-bit IPv4 address space would be
completely allocated
 additional motivation:
• speed processing/forwarding: 40-byte fixed length header
• enable different network-layer treatment of “flows”

IPv6 datagram format:


• fixed-length 40 byte header
• no fragmentation allowed

Network Layer: 4-71


IPv6: motivation

Fig: IPv6 Address Format


Network Layer 4-72
IPv6: Addreviated Address

Network Layer 4-73


IPv6: Addreviated Address

Network Layer 4-74


IPv6: Addreviated Address

Network Layer 4-75


IPv6 datagram format
flow label: identify
priority: identify
32 bits datagrams in same
priority among ver pri flow label "flow.” (concept of
datagrams in flow
payload len next hdr hop limit “flow” not well defined).
source address
128-bit (128 bits)
IPv6 addresses destination address
(128 bits)

payload (data)

What’s missing (compared with IPv4):


 no checksum (to speed processing at routers)
 no fragmentation/reassembly
 no options (available as upper-layer, next-header protocol at router)
Network Layer: 4-76
Transition from IPv4 to IPv6
 not all routers can be upgraded simultaneously
• no “flag days”
• how will network operate with mixed IPv4 and IPv6 routers?
 tunneling: IPv6 datagram carried as payload in IPv4 datagram among
IPv4 routers (“packet within a packet”)
• tunneling used extensively in other contexts (4G/5G)

IPv4 header fields IPv6 header fields


IPv4 payload
IPv4 source, dest addr IPv6 source dest addr
UDP/TCP payload

IPv6 datagram
IPv4 datagram
Network Layer: 4-77
Tunneling and encapsulation
A B Ethernet connects two E F
Ethernet connecting IPv6 routers
two IPv6 routers: IPv6 IPv6 IPv6 IPv6

IPv6 datagram
Link-layer frame The usual: datagram as payload in link-layer frame

IPv4 network A B E F
connecting two
IPv6 routers IPv6 IPv6/v4 IPv6/v4 IPv6

IPv4 network

Network Layer: 4-78


Tunneling and encapsulation
A B Ethernet connects two E F
Ethernet connecting IPv6 routers
two IPv6 routers: IPv6 IPv6 IPv6 IPv6

IPv6 datagram
Link-layer frame The usual: datagram as payload in link-layer frame

IPv4 tunnel A B IPv4 tunnel E F


connecting IPv6 routers
connecting two
IPv6 routers IPv6 IPv6/v4 IPv6/v4 IPv6

IPv6 datagram
IPv4 datagram tunneling: IPv6 datagram as payload in a IPv4 datagram
Network Layer: 4-79
Tunneling
A B IPv4 tunnel E F
connecting IPv6 routers
logical view:
IPv6 IPv6/v4 IPv6/v4 IPv6

A B C D E F
physical view:
IPv6 IPv6/v4 IPv4 IPv4 IPv6/v4 IPv6

flow: X src:B src:B src:B flow: X


src: A dest: E dest: E src: A
dest: F
dest: E
dest: F
Flow: X Flow: X Flow: X
Src: A Src: A Src: A
Note source and data Dest: F Dest: F Dest: F data
destination
addresses! data data data

A-to-B: E-to-F:
B-to-C: B-to-C: B-to-C:
IPv6 IPv6
IPv6 inside IPv6 inside IPv6 inside
IPv4 IPv4 IPv4
Network Layer: 4-80
IPv6: adoption
 Google1: ~ 30% of clients access services via IPv6
 NIST: 1/3 of all US government domains are IPv6 capable

https://www.google.com/intl
/en/ipv6/statistics.html
Network Layer: 4-81
IPv6: adoption
 Google1: ~ 30% of clients access services via IPv6
 NIST: 1/3 of all US government domains are IPv6 capable
 Long (long!) time for deployment, use
• 25 years and counting!
• think of application-level changes in last 25 years: WWW, social
media, streaming media, gaming, telepresence, …
• Why?

1
https://www.google.com/intl/en/ipv6/statistics.html
Network Layer: 4-82
Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 Virtual and Datagram circuits  Routing in the Internet
 IP: the Internet Protocol  RIP
• datagram format  OSPF
• addressing  Routing among ISPs
• network address translation  BGP
• IPv6

Network Layer: 4-83


Routing protocols mobile network
national or global ISP
Routing protocol goal: determine “good”
paths (equivalently, routes), from sending
hosts to receiving host, through network
application
transport
network
of routers link
physical
network network

 path: sequence of routers packets link


physical
link
physical

traverse from given initial source host to network

final destination host link


physical
network
link
physical network
datacenter
 “good”: least “cost”, “fastest”, “least
link
physical network

congested” application
transport
 routing: a “top-10” networking enterprise
network
link

challenge! network physical

Network Layer: 5-84


Graph abstraction: link
costs
5
ca,b: cost of direct link connecting a and b
3
v w 5 e.g., cw,z = 5, cu,z = ∞
2
u 2 1 z
3
1 cost defined by network operator:
2
x 1
y could always be 1, or inversely related
to bandwidth, or inversely related to
congestion
graph: G = (N,E)
N: set of routers = { u, v, w, x, y, z }
E: set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

Network Layer: 5-85


Routing algorithm classification
global: all routers have complete
topology, link cost info
• “link state” algorithms
How fast
dynamic: routes change
do routes static: routes change more quickly
change? slowly over time • periodic updates or in
response to link cost
changes
decentralized: iterative process of
computation, exchange of info with neighbors
• routers initially only know link costs to
attached neighbors
• “distance vector” algorithms
global or decentralized information? Network Layer: 5-86
Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 Virtual and Datagram circuits  Routing in the Internet
 IP: the Internet Protocol  RIP
• datagram format  OSPF
• addressing  Routing among ISPs
• network address translation  BGP
• IPv6

Network Layer: 4-87


Dijkstra’s link-state routing algorithm
 centralized: network topology, link notation
costs known to all nodes
• accomplished via “link state
 cx,y: direct link cost from node
broadcast” x to y; = ∞ if not direct
neighbors
• all nodes have same info
 D(v): current estimate of cost
 computes least cost paths from one of least-cost-path from source
node (“source”) to all other nodes to destination v
• gives forwarding table for that node  p(v): predecessor node along
path from source to v
 iterative: after k iterations, know
least cost path to k destinations  N': set of nodes whose least-
cost-path definitively known

Network Layer: 5-88


Dijkstra’s link-state routing algorithm
1 Initialization:
2 N' = {u} /* compute least cost path from u to all other nodes */
3 for all nodes v
4 if v adjacent to u /* u initially knows direct-path-cost only to direct neighbors */
5 then D(v) = cu,v /* but may not be minimum cost! */
6 else D(v) = ∞
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12 D(v) = min ( D(v), D(w) + cw,v )
13 /* new least-path-cost to v is either old least-cost-path to v or known
14 least-cost-path to w plus direct-cost from w to v */
15 until all nodes in N'
Network Layer: 5-89
Dijkstra’s algorithm: an example
v w x y z
Step N' D(v),p(v) D(w),p(w) D(x),p(x) D(y),p(y) D(z),p(z)
0 u 2,u 5,u 1,u ∞ ∞
1 ux 2,u 4,x 2,x ∞
2 uxy 2,u 3,y 4,y
3 uxyv 3,y 4,y
4 uxyvw 4,y
5 uxyvwz
Initialization (step 0): For all a: if a adjacent to then D(a) = cu,a
5
3 find a not in N' such that D(a) is a minimum
v w 5 add a to N'
2
u 2 1 z update D(b) for all b adjacent to a and not in N' :
3 D(b) = min ( D(b), D(a) + ca,b )
1 2
x 1
y

Network Layer: 5-90


Dijkstra’s algorithm: an example
5
3
v w 5
2
u 2 1 z
3
1 2
x 1
y

resulting least-cost-path tree from u: resulting forwarding table in u:


destination outgoing link
v w
v (u,v) route from u to v directly
u z x (u,x)
y (u,x) route from u to all
x y w (u,x) other destinations
x (u,x) via x
Network Layer: 5-91
Dijkstra’s algorithm: another example
v w x y z
D(v), D(w), D(x), D(y), D(z), x
9
Step N' p(v) p(w) p(x) p(y) p(z)

0 u 7,u 3,u 5,u ∞ ∞ 5 7


4
1 uw 6,w 5,u 11,w ∞ 8
2 uwx 6,w 11,w 14,x 3 w z
u y
2
3 uwxv 10,v 14,x
3
4 uwxvy 12,y 7 4

5 uwxvyz v

notes:
 construct least-cost-path tree by tracing predecessor nodes
 ties can exist (can be broken arbitrarily)
Network Layer: 5-92
Dijkstra’s algorithm: discussion
algorithm complexity: n nodes
 each of n iteration: need to check all nodes, w, not in N
 n(n+1)/2 comparisons: O(n2) complexity
 more efficient implementations possible: O(nlogn)
message complexity:
 each router must broadcast its link state information to other n routers
 efficient (and interesting!) broadcast algorithms: O(n) link crossings to disseminate a
broadcast message from one source
 each router’s message crosses O(n) links: overall message complexity: O(n2)

Network Layer: 5-93


Dijkstra’s algorithm: oscillations possible
 when link costs depend on traffic volume, route oscillations possible
 sample scenario:
• routing to destination a, traffic entering at b, c, d with rates 1, e (<1), 1
• link costs are directional, and volume-dependent

a 2+e
a 0
a 2+e a
1 1+e 0 2+e 0
d b d 1+e 1 b d 0 0 b d 1+e 1 b
0 0
e 1 0 1
1 0
c 0 1 1
c 1+e 1 1 0 0 1
c 1 c
e e e
e
given these costs, given these costs, given these costs,
initially find new routing…. find new routing…. find new routing….
resulting in new costs resulting in new costs resulting in new costs

Network Layer: 5-94


Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 Virtual and Datagram circuits  Routing in the Internet
 IP: the Internet Protocol  RIP
• datagram format  OSPF
• addressing  Routing among ISPs
• network address translation  BGP
• IPv6

Network Layer: 4-95


Distance vector algorithm
Based on Bellman-Ford (BF) equation (dynamic programming):
Bellman-Ford equation

Let Dx(y): cost of least-cost path from x to y.


Then:
Dx(y) = minv { cx,v + Dv(y) }

v’s estimated least-cost-path cost to y


min taken over all neighbors v of x direct cost of link from x to v
Network Layer: 5-96
Bellman-Ford Example
Suppose that u’s neighboring nodes, x,v,w, know that for destination z:
Dv(z) = 5 Dw(z) = 3 Bellman-Ford equation says:
5
Du(z) = min { cu,v + Dv(z),
3 w
v 5 cu,x + Dx(z),
2
u 2 1 z cu,w + Dw(z) }
3
1 2
= min {2 + 5,
x 1
y 1 + 3,
5 + 3} = 4
Dx(z) = 3
node achieving minimum (x) is
next hop on estimated least-
cost path to destination (z)
Network Layer: 5-97
Distance vector algorithm
key idea:
 from time-to-time, each node sends its own distance vector estimate
to neighbors
 when x receives new DV estimate from any neighbor, it updates its
own DV using B-F equation:
Dx(y) ← minv{cx,v + Dv(y)} for each node y ∊ N

 under minor, natural conditions, the estimate Dx(y) converge to the


actual least cost dx(y)

Network Layer: 5-98


Distance vector algorithm:
each node: iterative, asynchronous: each local
iteration caused by:
wait for (change in local link  local link cost change
cost or msg from neighbor)  DV update message from neighbor

distributed, self-stopping: each


recompute DV estimates using node notifies neighbors only when
DV received from neighbor its DV changes
 neighbors then notify their
if DV to any destination has neighbors – only if necessary
changed, notify neighbors  no notification received, no
actions taken!

Network Layer: 5-99


Distance vector: example
DV in a:
Da(a)=0
Da(b) = 8
Da(c) = ∞ a b c
8 1
Da(d) = 1

t=0 Da(e) = ∞
Da(f) = ∞ 1 1
Da(g) = ∞
 All nodes have
Da(h) = ∞
distance estimates
Da(i) = ∞ A few asymmetries:
to nearest d e f  missing link
neighbors (only) 1 1
 larger cost
 All nodes send
their local
distance vector to 1 1 1
their neighbors

g h i
1 1

Network Layer: 5-100


Distance vector example: iteration

a b c
8 1

t=1 1 1
All nodes:
 receive distance
vectors from
neighbors d e f
 compute their new 1 1
local distance
vector
 send their new 1 1 1
local distance
vector to neighbors

g h i
1 1

Network Layer: 5-101


Distance vector example: iteration

a
compute compute
b compute
c
8 1

t=1 1 1
All nodes:
 receive distance
vectors from
neighbors d
compute compute
e compute
f
 compute their new 1 1
local distance
vector
 send their new 1 1 1
local distance
vector to neighbors

g
compute h
compute compute
i
1 1

Network Layer: 5-102


Distance vector example: iteration

a b c
8 1

t=1 1 1
All nodes:
 receive distance
vectors from
neighbors d e f
 compute their new 1 1
local distance
vector
 send their new 1 1 1
local distance
vector to neighbors

g h i
1 1

Network Layer: 5-103


Distance vector example: iteration

a b c
8 1

t=2 1 1
All nodes:
 receive distance
vectors from
neighbors d e f
 compute their new 1 1
local distance
vector
 send their new 1 1 1
local distance
vector to neighbors

g h i
1 1

Network Layer: 5-104


Distance vector example: iteration

compute
a compute
b compute
c
2 1

t=2 1 1
All nodes:
 receive distance
vectors from
neighbors d
compute compute
e compute
f
 compute their new 1 1
local distance
vector
 send their new 1 1 1
local distance
vector to neighbors

g
compute compute
h compute
i
8 1

Network Layer: 5-105


Distance vector example: iteration

a b c
8 1

t=2 1 1
All nodes:
 receive distance
vectors from
neighbors d e f
 compute their new 1 1
local distance
vector
 send their new 1 1 1
local distance
vector to neighbors

g h i
1 1

Network Layer: 5-106


Distance vector example: iteration

…. and so on

Let’s next take a look at the iterative computations at nodes

Network Layer: 5-107


DV in c:
Distance vector example: computation DV in b:
Db(a) = 8 Db(f) = ∞
Dc(a) = ∞
Db(c) = 1 Db(g) = ∞ Dc(b) = 1
DV in a: Db(d) = ∞ Db(h) = ∞ Dc(c) = 0
Da(a)=0 Db(e) = 1 Db(i) = ∞ Dc(d) = ∞
Da(b) = 8 Dc(e) = ∞
Da(c) = ∞ a b c Dc(f) = ∞
8 1
Da(d) = 1 Dc(g) = ∞

t=1 Da(e) = ∞
Da(f) = ∞ 1 1
Dc(h) = ∞
Dc(i) = ∞
 b receives DVs Da(g) = ∞ DV in e:
from a, c, e Da(h) = ∞ De(a) = ∞
Da(i) = ∞ De(b) = 1
d e f De(c) = ∞
1 1
De(d) = 1
De(e) = 0
De(f) = 1
1 1 1
De(g) = ∞
De(h) = 1
De(i) = ∞
g h i
1 1

Network Layer: 5-108


DV in c:
Distance vector example: computation DV in b:
Db(a) = 8 Db(f) = ∞
Dc(a) = ∞
Db(c) = 1 Db(g) = ∞ Dc(b) = 1
DV in a: Db(d) = ∞ Db(h) = ∞ Dc(c) = 0
Da(a)=0 Db(e) = 1 Db(i) = ∞ Dc(d) = ∞
Da(b) = 8 Dc(e) = ∞
Da(c) = ∞ a b c Dc(f) = ∞
8 compute 1
Da(d) = 1 Dc(g) = ∞

t=1 Da(e) = ∞
Da(f) = ∞ 1 1
Dc(h) = ∞
Dc(i) = ∞
 b receives DVs Da(g) = ∞ DV in e:
from a, c, e, Da(h) = ∞ De(a) = ∞
computes: e
Da(i) = ∞ De(b) = 1
d e f De(c) = ∞
1
Db(a) = min{cb,a+Da(a), cb,c +Dc(a), cb,e+De(a)} = min{8,∞,∞} =8 1
De(d) = 1
Db(c) = min{cb,a+Da(c), cb,c +Dc(c), c b,e +De(c)} = min{∞,1,∞} = 1
De(e) = 0
Db(d) = min{cb,a+Da(d), cb,c +Dc(d), c b,e +De(d)} = min{9,2,∞} = 2 De(f) = 1
1 1 1
Db(e) = min{cb,a+Da(e), cb,c +Dc(e), c b,e +De(e)} = min{∞,∞,1} = 1 De(g) = ∞
Db(f) = min{cb,a+Da(f), cb,c +Dc(f), c b,e +De(f)} = min{∞,∞,2} = 2
DV in b: De(h) = 1
Db(g) = min{cb,a+Da(g), cb,c +Dc(g), c b,e+De(g)} = min{∞, ∞, ∞} = ∞ Db(a) = 8 Db(f) =2 De(i) = ∞
g h 1Db(c) = 1 Db(g)i = ∞
1 ∞, 2} = 2
Db(h) = min{cb,a+Da(h), cb,c +Dc(h), c b,e+De(h)} = min{∞,
Db(d) = 2 Db(h) = 2
Db(i) = min{cb,a+Da(i), cb,c +Dc(i), c b,e+De(i)} = min{∞, ∞, ∞} = ∞ Db(e) = 1 Db(i) = ∞
Network Layer: 5-109
DV in c:
Distance vector example: computation DV in b:
Db(a) = 8 Db(f) = ∞
Dc(a) = ∞
Db(c) = 1 Db(g) = ∞ Dc(b) = 1
DV in a: Db(d) = ∞ Db(h) = ∞ Dc(c) = 0
Da(a)=0 Db(e) = 1 Db(i) = ∞ Dc(d) = ∞
Da(b) = 8 Dc(e) = ∞
Da(c) = ∞ a b c Dc(f) = ∞
8 1
Da(d) = 1 Dc(g) = ∞

t=1 Da(e) = ∞
Da(f) = ∞ 1 1
Dc(h) = ∞
Dc(i) = ∞
 c receives DVs Da(g) = ∞ DV in e:
from b Da(h) = ∞ De(a) = ∞
Da(i) = ∞ De(b) = 1
d e f De(c) = ∞
1 1
De(d) = 1
De(e) = 0
De(f) = 1
1 1 1
De(g) = ∞
De(h) = 1
De(i) = ∞
g h i
1 1

Network Layer: 5-110


DV in c:
Distance vector example: computation DV in b:
Db(a) = 8 Db(f) = ∞
Dc(a) = ∞
Db(c) = 1 Db(g) = ∞ Dc(b) = 1
Db(d) = ∞ Db(h) = ∞ Dc(c) = 0
Db(e) = 1 Db(i) = ∞ Dc(d) = ∞
Dc(e) = ∞
a b c
compute Dc(f) = ∞
8 1
Dc(g) = ∞

t=1 1 1
Dc(h) = ∞
Dc(i) = ∞
 c receives DVs
from b computes:

d b(a}} = 1 + 8 = 9
Dc(a) = min{cc,b+D e f
DV in c:
Dc(b) = min{cc,b+Db(b)} = 1 + 0 = 1
Dc(a) = 9
Dc(d) = min{cc,b+Db(d)} = 1+ ∞ = ∞ Dc(b) = 1
Dc(e) = min{cc,b+Db(e)} = 1 + 1 = 2 Dc(c) = 0
Dc(f) = min{cc,b+Db(f)} = 1+ ∞ = ∞ Dc(d) = 2
Dc(g) = min{cc,b+Db(g)} = 1+ ∞ = ∞ Dc(e) = ∞ * Check out the online interactive
Dc(f) = ∞ exercises for more examples:
g b(h)} = 1+ ∞ = ∞
Dc(h) = min{cbc,b+D h i http://gaia.cs.umass.edu/kurose_ross/interactive/
Dc(g) = ∞
Dc(i) = min{cc,b+Db(i)} = 1+ ∞ = ∞
Dc(h) = ∞
Network Layer: 5-111
Dc(i) = ∞
Distance vector example: computation DV in b:
Db(a) = 8 Db(f) = ∞
Db(c) = 1 Db(g) = ∞
Db(d) = ∞ Db(h) = ∞ DV in e:
DV in d:
Db(e) = 1 Db(i) = ∞ De(a) = ∞
Dc(a) = 1
De(b) = 1
Dc(b) = ∞ a b c De(c) = ∞
Dc(c) = ∞ 8 1
De(d) = 1
Dc(d) = 0
t=1 Dc(e) = 1
1
Q: what is new DV computed in e at
1t=1?
De(e) = 0
De(f) = 1
 e receives DVs Dc(f) = ∞
De(g) = ∞
from b, d, f, h Dc(g) = 1
De(h) = 1
Dc(h) = ∞
De(i) = ∞
Dc(i) = ∞ d compute
e f DV in f:
1 1
DV in h: Dc(a) = ∞
Dc(a) = ∞ Dc(b) = ∞
Dc(b) = ∞ Dc(c) = ∞
Dc(c) = ∞ 1 1 1
Dc(d) = ∞
Dc(d) = ∞ Dc(e) = 1
Dc(e) = 1 Dc(f) = 0
Dc(f) = ∞ g h i Dc(g) = ∞
1 1
Dc(g) = 1 Dc(h) = ∞
Dc(h) = 0 Dc(i) = 1 Network Layer: 5-112
Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x cost to cost to
table x y z x y z
x 0 2 7 x 0 2 3

from
from
y ∞∞ ∞ y 2 0 1
z ∞∞ ∞ z 7 1 0

node y cost to
table x y z y
2 1
x ∞ ∞ ∞
x z
from

y 2 0 1 7
z ∞∞ ∞

node z cost to
table x y z
x ∞∞ ∞
from

y ∞∞ ∞
z 7 1 0
time
5-113
Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x cost to cost to cost to
table x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 3

from
from
y ∞∞ ∞ y 2 0 1 y 2 0 1

from
z ∞∞ ∞ z 7 1 0 z 3 1 0
node y cost to cost to cost to
table x y z x y z x y z y
2 1
x ∞ ∞ ∞ x 0 2 7 x 0 2 3 x z
from

y 2 0 1 y 2 0 1 7

from
y 2 0 1

from
z ∞∞ ∞ z 7 1 0 z 3 1 0

node z cost to cost to cost to


table x y z x y z x y z

x ∞∞ ∞ x 0 2 7 x 0 2 3
from

from
y 2 0 1 y 2 0 1
from

y ∞∞ ∞
z 7 1 0 z 3 1 0 z 3 1 0
time
5-114
Distance vector: link cost changes
1
link cost changes: y
4 1
 node detects local link cost change x z
 updates routing info, recalculates local DV 50

 if DV changes, notify neighbors

t0 : y detects link-cost change, updates its DV, informs its neighbors.


“good news t1 : z receives update from y, updates its table, computes new least
travels fast”
cost to x , sends its neighbors its DV.
t2 : y receives z’s update, updates its distance table. y’s least costs
do not change, so y does not send a message to z.

Network Layer: 5-115


Distance vector: link cost changes
60
link cost changes: y
4 1
 node detects local link cost change x z
 “bad news travels slow” – count-to-infinity problem: 50

• y sees direct link to x has new cost 60, but z has said it has a path at cost of 5. So
y computes “my new cost to x will be 6, via z); notifies z of new cost of 6 to x.
• z learns that path to x via y has new cost 6, so z computes “my new cost to
x will be 7 via y), notifies y of new cost of 7 to x.
• y learns that path to x via z has new cost 7, so y computes “my new cost to
x will be 8 via y), notifies z of new cost of 8 to x.
• z learns that path to x via y has new cost 8, so z computes “my new cost to
x will be 9 via y), notifies y of new cost of 9 to x.

 see text for solutions. Distributed algorithms are tricky! Network Layer: 5-116
Comparison of LS and DV algorithms
message complexity robustness: what happens if router
LS: n routers, O(n2) messages sent malfunctions, or is compromised?
DV: exchange between neighbors; LS:
convergence time varies • router can advertise incorrect link cost
• each router computes only its own
speed of convergence table
LS: O(n2) algorithm, O(n2) messages DV:
• may have oscillations
• DV router can advertise incorrect path
DV: convergence time varies cost (“I have a really low cost path to
• may have routing loops everywhere”): black-holing
• count-to-infinity problem
• each router’s table used by others:
error propagate thru network

Network Layer: 5-117


Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 Virtual and Datagram circuits  Routing in the Internet
 IP: the Internet Protocol  RIP
• datagram format  OSPF
• addressing  Routing among ISPs
• network address translation  BGP
• IPv6

Network Layer: 4-118


Internet approach to scalable routing
aggregate routers into regions known as “autonomous
systems” (AS) (a.k.a. “domains”)

intra-AS (aka “intra-domain”): inter-AS (aka “inter-domain”):


routing among within same AS routing among AS’es
(“network”)  gateways perform inter-domain
 all routers in AS must run same intra- routing (as well as intra-domain
domain protocol routing)
 routers in different AS can run different
intra-domain routing protocols
 gateway router: at “edge” of its own AS,
has link(s) to router(s) in other AS’es
Network Layer: 5-119
Interconnected ASes
forwarding table configured by intra-
and inter-AS routing algorithms
Intra-AS Inter-AS
Routing Routing  intra-AS routing determine entries for
forwarding destinations within AS
table
 inter-AS & intra-AS determine entries
for external destinations

intra-AS
3c
routing3a inter-AS routing intra-AS
2c
3b 2a routing
2b
1c
AS3 intra-AS
1a routing 1b AS2
1d
AS1

Network Layer: 5-120


Inter-AS routing: a role in intradomain forwarding
 suppose router in AS1 receives AS1 inter-domain routing must:
datagram destined outside of AS1: 1. learn which destinations reachable
• router should forward packet to through AS2, which through AS3
gateway router in AS1, but which 2. propagate this reachability info to all
one? routers in AS1

3c
3a other
2c
3b 2a networks
2b
1c
AS3
other 1a 1b AS2
networks
1d
AS1

Network Layer: 5-121


Inter-AS routing: routing within an AS
most common intra-AS routing protocols:
 RIP: Routing Information Protocol [RFC 1723]
• classic DV: DVs exchanged every 30 secs
• no longer widely used
 EIGRP: Enhanced Interior Gateway Routing Protocol
• DV based
• formerly Cisco-proprietary for decades (became open in 2013 [RFC 7868])
 OSPF: Open Shortest Path First [RFC 2328]
• link-state routing
• IS-IS protocol (ISO standard, not RFC standard) essentially same as OSPF

Network Layer: 5-122


RIP ( Routing Information Protocol)
 included in BSD-UNIX distribution in 1982
 distance vector algorithm
• distance metric: # hops (max = 15 hops), each link has cost 1
• DVs exchanged with neighbors every 30 sec in response message (aka advertisement)
• each advertisement: list of up to 25 destination subnets (in IP addressing sense)

from router A to destination subnets:


u v subnet hops
w u 1
A B
v 2
w 2
x x 3
z C D y 3
y z 2
Network Layer 4-123
RIP: example

z
w x y
A D B

C
routing table in router D
destination subnet next router # hops to dest
w A 2
y B 2
z B 7
x -- 1
…. …. ....
Network Layer 4-124
RIP: example
A-to-D advertisement
dest next hops
w - 1
x - 1
z C 4
…. … ... z
w x y
A D B

C
routing table in router D
destination subnet next router # hops to dest
w A 2
y B 2
A 5
z B 7
x -- 1
…. …. ....
Network Layer 4-125
RIP: link failure, recovery
if no advertisement heard after 180 sec -->
neighbor/link declared dead
 routes via neighbor invalidated
 new advertisements sent to neighbors
 neighbors in turn send out new advertisements (if tables
changed)
 link failure info quickly propagates to entire net

Network Layer 4-126


OSPF (Open Shortest Path First) routing
 “open”: publicly available
 classic link-state
• each router floods OSPF link-state advertisements (directly over IP
rather than using TCP/UDP) to all other routers in entire AS
• multiple link costs metrics possible: bandwidth, delay
• each router has full topology, uses Dijkstra’s algorithm to compute
forwarding table
 security: all OSPF messages authenticated (to prevent malicious
intrusion)

Network Layer: 5-127


Hierarchical OSPF
 two-level hierarchy: local area, backbone.
• link-state advertisements flooded only in area, or backbone
• each node has detailed area topology; only knows direction to reach
other destinations
area border routers: boundary router:
“summarize” distances to connects to other ASes
backbone
destinations in own area, backbone router:
advertise in backbone runs OSPF limited
to backbone
local routers:
• flood LS in area only area 3
• compute routing within
area
• forward packets to outside internal
area 1 routers
via area border router
area 2 Network Layer: 5-128
Network layer: roadmap
 Network layer: overview  Routing algorithms
• data plane  link state
• control plane  distance vector
 Virtual and Datagram circuits  Routing in the Internet
 IP: the Internet Protocol  RIP
• datagram format  OSPF
• addressing  Routing among ISPs
• network address translation  BGP
• IPv6

Network Layer: 4-129


Internet inter-AS routing: BGP
 BGP (Border Gateway Protocol): the de facto inter-domain routing
protocol
• “glue that holds the Internet together”
 allows subnet to advertise its existence, and the destinations it can
reach, to rest of Internet: “I am here, here is who I can reach, and how”
 BGP provides each AS a means to:
• eBGP: obtain subnet reachability information from neighboring ASes
• iBGP: propagate reachability information to all AS-internal routers.
• determine “good” routes to other networks based on reachability information
and policy

Network Layer: 5-130


eBGP, iBGP connections
2b

2a 2c

1b 3b
2d
1a 1c 3a
∂ 3c
AS 2
1d 3d

AS 1 eBGP connectivity AS 3
logical iBGP connectivity

1c gateway routers run both eBGP and iBGP protocols

Network Layer: 5-131


BGP basics
 BGP session: two BGP routers (“peers”) exchange BGP messages over
semi-permanent TCP connection:
• advertising paths to different destination network prefixes (BGP is a “path
vector” protocol)
 when AS3 gateway 3a advertises path AS3,X to AS2 gateway 2c:
• AS3 promises to AS2 it will forward datagrams towards X
AS 3 3b
AS 1 1b 3a 3c
1a 1c AS 2 3d
2b
1d BGP advertisement:
2a 2c X
AS3, X
2d
Network Layer: 5-132
BGP basics
 BGP session: two BGP routers (“peers”) exchange BGP messages over
semi-permanent TCP connection:
• advertising paths to different destination network prefixes (BGP is a “path
vector” protocol)
 when AS3 gateway 3a advertises path AS3,X to AS2 gateway 2c:
• AS3 promises to AS2 it will forward datagrams towards X
AS 3 3b
AS 1 1b 3a 3c
1a 1c AS 2 3d
2b
1d BGP advertisement:
2a 2c X
AS3, X
2d
Network Layer: 5-133
Path attributes and BGP routes
 BGP advertised route: prefix + attributes
• prefix: destination being advertised
• two important attributes:
• AS-PATH: list of ASes through which prefix advertisement has passed
• NEXT-HOP: indicates specific internal-AS router to next-hop AS
 policy-based routing:
• gateway receiving route advertisement uses import policy to
accept/decline path (e.g., never route through AS Y).
• AS policy also determines whether to advertise path to other other
neighboring ASes

Network Layer: 5-134


BGP path advertisement
AS 3 3b
AS 1 1b 3a 3c
1a 1c AS 2 3d X
2b
1d AS3, X
AS2,AS3,X 2a 2c

2d

 AS2 router 2c receives path advertisement AS3,X (via eBGP) from AS3 router 3a
 based on AS2 policy, AS2 router 2c accepts path AS3,X, propagates (via iBGP) to all
AS2 routers
 based on AS2 policy, AS2 router 2a advertises (via eBGP) path AS2, AS3, X to
AS1 router 1c
Network Layer: 5-135
BGP path advertisement (more)
AS 3 3b
AS 1 1b AS3,X 3a 3c
AS3,X
AS3,X
1a 1c AS 2 3d X
2b
AS3,X
1d AS3, X
AS2,AS3,X 2a 2c

2d

gateway router may learn about multiple paths to destination:


 AS1 gateway router 1c learns path AS2,AS3,X from 2a
 AS1 gateway router 1c learns path AS3,X from 3a
 based on policy, AS1 gateway router 1c chooses path AS3,X and advertises path
within AS1 via iBGP
Network Layer: 5-136
BGP messages
 BGP messages exchanged between peers over TCP connection
 BGP messages:
• OPEN: opens TCP connection to remote BGP peer and authenticates
sending BGP peer
• UPDATE: advertises new path (or withdraws old)
• KEEPALIVE: keeps connection alive in absence of UPDATES; also ACKs
OPEN request
• NOTIFICATION: reports errors in previous msg; also used to close
connection

Network Layer: 5-137


BGP path advertisement
AS 3 3b
AS 1 1b AS3,X 3a 3c
AS3,X
1
AS3,X
1a 1c AS 2 3d X
2 2b
local link AS3,X
2 1
interfaces 1d AS3, X
at 1a, 1d AS2,AS3,X 2a 2c

2d

dest interface  recall: 1a, 1b, 1d learn via iBGP from 1c: “path to X goes through 1c”
… …
1c 1
 at 1d: OSPF intra-domain routing: to get to 1c, use interface 1
X 1  at 1d: to get to X, use interface 1
… …

Network Layer: 5-138


BGP path advertisement
AS 3 3b
AS 1 1b 3a 3c
1
1a 1c AS 2 3d X
2 2b
1d
2a 2c

2d

dest interface
… …  recall: 1a, 1b, 1d learn via iBGP from 1c: “path to X goes through 1c”
1c 2
 at 1d: OSPF intra-domain routing: to get to 1c, use interface 1
X 2
… …  at 1d: to get to X, use interface 1
 at 1a: OSPF intra-domain routing: to get to 1c, use interface 2
 at 1a: to get to X, use interface 2
Network Layer: 5-139
Why different Intra-, Inter-AS routing ?
policy:
 inter-AS: admin wants control over how its traffic routed, who
routes through its network
 intra-AS: single admin, so policy less of an issue
scale:
 hierarchical routing saves table size, reduced update traffic
performance:
 intra-AS: can focus on performance
 inter-AS: policy dominates over performance

Network Layer: 5-140


Hot potato routing
AS 3 3b
AS 1 1b 3a 3c
1a 1c AS 2 3d X
2b 112
1d AS1,AS3,X AS3,X
2a 2c
201 263

2d
OSPF link weights

 2d learns (via iBGP) it can route to X via 2a or 2c


 hot potato routing: choose local gateway that has least intra-domain
cost (e.g., 2d chooses 2a, even though more AS hops to X): don’t worry
about inter-domain cost!
Network Layer: 5-141
BGP route selection
 router may learn about more than one route to destination
AS, selects route based on:
1. local preference value attribute: policy decision
2. shortest AS-PATH
3. closest NEXT-HOP router: hot potato routing
4. additional criteria

For more details go to page 405.

Network Layer: 5-142


Software defined networking (SDN)
3. controlplane functions
4. programmable routing
access
… load
control balance
external to data-plane
control switches
applications Remote Controller

control
plane

data
plane

CA 2. control, data plane


CA CA CA CA
separation

1: generalized “flow-based”
forwarding (e.g., OpenFlow)
Network Layer: 5-143
Software defined networking (SDN)
network-control
Data-plane switches: applications

routing
 fast, simple, commodity switches
access load
implementing generalized data-plane control balance
forwarding (Section 4.4) in hardware control
plane
 flow (forwarding) table computed, northbound API

installed under controller supervision SDN Controller


 API for table-based switch control (network operating system)

(e.g., OpenFlow)
southbound API
• defines what is controllable, what is not
 protocol for communicating with data
controller (e.g., OpenFlow) plane

SDN-controlled switches
Network Layer: 5-144
Software defined networking (SDN)
network-control
SDN controller (network OS): applications

routing
 maintain network state load
access
information control balance

 interacts with network control northbound API


control
plane
applications “above” via
northbound API SDN Controller
(network operating system)
 interacts with network switches
“below” via southbound API southbound API

 implemented as distributed system


data
for performance, scalability, fault- plane

tolerance, robustness
SDN-controlled switches
Network Layer: 5-145
Software defined networking (SDN)
network-control
network-control apps: applications

routing
 “brains” of control: implement access load
control functions using lower- control balance

level services, API provided by northbound API


control
plane
SDN controller
SDN Controller
 unbundled: can be provided by (network operating system)
3rd party: distinct from routing
vendor, or SDN controller southbound API

data
plane

SDN-controlled switches
Network Layer: 5-146
Components of SDN controller
routing access load
control balance

interface layer to network Interface, abstractions for network control apps

control apps: abstractions API network


graph
RESTful
API
… intent

network-wide state statistics … flow tables


SDN
management : state of
networks links, switches,
Network-wide distributed, robust state management
controller
services: a distributed database Link-state info host info … switch info

communication: communicate OpenFlow … SNMP


between SDN controller and Communication to/from controlled devices
controlled switches

Network Layer: 5-147


OpenFlow protocol
 operates between controller, switch OpenFlow Controller
 TCP used to exchange messages
• optional encryption
 three classes of OpenFlow messages:
• controller-to-switch
• asynchronous (switch to controller)
• symmetric (misc.)
 distinct from OpenFlow API
• API used to specify generalized
forwarding actions

Network Layer: 5-148


OpenFlow: controller-to-switch messages
Key controller-to-switch messages OpenFlow Controller
 features: controller queries switch
features, switch replies
 configure: controller queries/sets
switch configuration parameters
 modify-state: add, delete, modify flow
entries in the OpenFlow tables
 packet-out: controller can send this
packet out of specific switch port

Network Layer: 5-149


OpenFlow: switch-to-controller messages
Key switch-to-controller messages OpenFlow Controller
 packet-in: transfer packet (and its
control) to controller. See packet-out
message from controller
 flow-removed: flow table entry deleted
at switch
 port status: inform controller of a
change on a port.

Fortunately, network operators don’t “program” switches by creating/sending


OpenFlow messages directly. Instead use higher-level abstraction at controller
Network Layer: 5-150
SDN: control/data plane interaction example
Dijkstra’s link-state
routing 1 S1, experiencing link failure uses
4 OpenFlow port status message to
network
graph
RESTful
API
… intent notify controller

statistics
3 … flow tables
2 SDN controller receives OpenFlow
message, updates link status info
Link-state info host info … switch info
2 3 Dijkstra’s routing algorithm
OpenFlow … SNMP
application has previously registered
to be called when ever link status
changes. It is called.
1
4 Dijkstra’s routing algorithm access
s2 network graph info, link state info
s1
s4 in controller, computes new
s3 routes
Network Layer: 5-151
SDN: control/data plane interaction example
Dijkstra’s link-state
routing

4 5
network
graph
RESTful
API
… intent 5 link state routing app interacts
3 … with flow-table-computation
statistics flow tables component in SDN controller,
Link-state info host info … switch info
which computes new flow tables
2 needed
OpenFlow … SNMP
6 controller uses OpenFlow to
6
install new tables in switches
1 that need updating
s2
s1
s4
s3
Network Layer: 5-152
SDN: selected challenges
 hardening the control plane: dependable, reliable, performance-
scalable, secure distributed system
• robustness to failures: leverage strong theory of reliable distributed
system for control plane
• dependability, security: “baked in” from day one?
 networks, protocols meeting mission-specific requirements
• e.g., real-time, ultra-reliable, ultra-secure
 Internet-scaling: beyond a single AS
 SDN critical in 5G cellular networks

Network Layer: 5-153


Chapter 4: done!
 Network layer: overview
 virtual circuit and datagram networks
 IP: the Internet Protocol
 datagram format,
 IPv4 addressing
 DHCP, NAT, ICMP,
 IPv6
 Routing algorithms
 link state, distance vector
 Routing in the Internet
 RIP, OSPF
 Routing among ISPs : BGP

You might also like