You are on page 1of 111

Chapter 3

Transport Layer

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you can add, modify, and delete slides
(including this one) and slide content to suit your needs. They obviously
Computer Networking:
represent a lot of work on our part. In return for use, we only ask the A Top Down Approach ,
following:
 If you use these slides (e.g., in a class) in substantially unaltered form,
6th edition.
that you mention their source (after all, we’d like people to use our book!) Jim Kurose, Keith Ross
 If you post any slides in substantially unaltered form on a www site, that
you note that they are adapted from (or perhaps identical to) our slides, and
Addison-Wesley, March
note our copyright of this material. 2012.
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012


J.F Kurose and K.W. Ross, All Rights Reserved
Transport Layer 3-1
Chapter 3: Transport Layer
Our goals:
 understand principles  learn about transport
behind transport layer protocols in the
layer services: Internet:
 multiplexing/demultipl  UDP: connectionless
exing transport
 reliable data transfer  TCP: connection-oriented
 flow control transport
 congestion control  TCP congestion control

Transport Layer 3-2


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-3


Transport services and protocols
application
transport
 provide logical communication network
data link
between app processes physical

running on different hosts


 transport protocols run in
end systems
 send side: breaks app
messages into segments,
passes to network layer
 rcv side: reassembles application
transport
segments into messages, network
data link
passes to app layer physical

 more than one transport


protocol available to apps
 Internet: TCP and UDP

Transport Layer 3-4


Transport vs. network layer
 network layer: logical Household analogy:
communication 12 kids sending letters to
between computers 12 kids
 transport layer: logical  processes = kids
communication  app messages = letters
between processes in envelopes
relies on, enhances,
 hosts = houses

network layer services
 transport protocol =
Ann and Bill
 network-layer protocol
= postal service

Transport Layer 3-5


Internet transport-layer protocols
 reliable, in-order
application
transport
network
delivery (TCP) data link
physical
network
 congestion control data link
network
physical
data link
 flow control physical

 connection setup
 unreliable, unordered
network
data link
physicalnetwork
delivery: UDP data link
physical
 no-frills extension of network
data link
“best-effort” IP
application
physical network transport
data link network
 services not available: physical data link
physical

 delay guarantees
 bandwidth guarantees

Transport Layer 3-6


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-7


Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process

P3 P1
P1 P2 P4 application
application application

transport transport transport

network network network

link link link

physical physical physical

host 2 host 3
host 1
Transport Layer 3-8
How demultiplexing works
 host receives IP datagrams
 each datagram has source 32 bits
IP address, destination IP
address source port # dest port #

 each datagram carries 1


transport-layer segment other header fields
 each segment has source,
destination port number
(recall: well-known port application
numbers for specific data
applications) (message)
 host uses IP addresses & port
numbers to direct segment to
appropriate socket TCP/UDP segment format

Transport Layer 3-9


Connectionless demultiplexing
 When host receives UDP
 Q: The Unix system call to
segment:
associate port number with
checks destination port
a socket? 
number in segment
 directs UDP segment to
socket with that port
number
 IP datagrams with
different source IP
 UDP socket identified by addresses and/or source
two-tuple: port numbers directed
(dest IP address, dest port number) to same socket

Transport Layer 3-10


Connectionless demux (cont)

P2 P1
P1
P3

SP: 6428 SP: 6428


DP: 9157 DP: 5775

SP: 9157 SP: 5775


client DP: 6428 DP: 6428 Client
server
IP: A IP: C IP:B

SP provides “return address”

Transport Layer 3-11


Connection-oriented demux
 TCP socket identified  Server host may support
by 4-tuple: many simultaneous TCP
 source IP address sockets:
 source port number  each socket identified by
 dest IP address its own 4-tuple
 dest port number  Web servers have
 recv host uses all four different sockets for
values to direct each connecting client
segment to appropriate  non-persistent HTTP will
socket have different socket for
each request

Transport Layer 3-12


Connection-oriented demux
(cont)

P1 P4 P5 P6 P2 P1P3

SP: 5775
DP: 80
S-IP: B
D-IP:C

SP: 9157 SP: 9157


client DP: 80 DP: 80 Client
server
IP: A S-IP: A
IP: C S-IP: B IP:B
D-IP:C D-IP:C

Transport Layer 3-13


Connection-oriented demux:
Threaded Web Server

P1 P4 P2 P1P3

SP: 5775
DP: 80
S-IP: B
D-IP:C

SP: 9157 SP: 9157


client DP: 80 DP: 80 Client
server
IP: A S-IP: A
IP: C S-IP: B IP:B
D-IP:C D-IP:C

Transport Layer 3-14


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-15


UDP: User Datagram Protocol [RFC 768]
 “no frills,” “bare bones”
Internet transport Why is there a UDP?
protocol
 “best effort” service, UDP
segments may be:
 lost
 delivered out of order
to app
 connectionless:
 no handshaking between
UDP sender, receiver
 each UDP segment
handled independently
of others

Transport Layer 3-16


UDP: User Datagram Protocol [RFC 768]
 “no frills,” “bare bones”
Internet transport
protocol
Why is there a UDP?
 “best effort” service, UDP  no connection
segments may be: establishment (which can
 lost add delay)
 delivered out of order  simple: no connection state
to app at sender, receiver
 connectionless:  small segment header
 no handshaking between  no congestion control: UDP
UDP sender, receiver can blast away as fast as
 each UDP segment desired
handled independently
of others

Transport Layer 3-17


UDP: more
 often used for streaming
multimedia apps 32 bits

 loss tolerant Length, in source port # dest port #


 rate sensitive bytes of UDP length checksum
segment,
 other UDP uses including
 DNS header
 SNMP
 reliable transfer over UDP: Application
add reliability at data
application layer (message)
 application-specific
error recovery!
UDP segment format

Transport Layer 3-18


UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment

Sender: Receiver:
 treat segment contents  compute checksum of
as sequence of 16-bit received segment
integers  check if computed checksum
 checksum: addition (1’s equals checksum field value:
complement sum) of  NO - error detected
segment contents  YES - no error detected.
 sender puts checksum But maybe errors
value into UDP checksum nonetheless? More later
field ….

Transport Layer 3-19


Internet Checksum Example
 Note
 When adding numbers, a carryout from the
most significant bit needs to be added to the
result
 Example: add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
Transport Layer 3-20
Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-21


Principles of Reliable data transfer
 important in app., transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-22
Principles of Reliable data transfer
 important in app., transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-23
Principles of Reliable data transfer
 important in app., transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-24
Reliable data transfer: getting started
rdt_send(): called from above, deliver_data(): called by rdt
(e.g., by app.). Passed data to to deliver data to upper layer
deliver to receiver upper layer

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet


to transfer packet over arrives on rcv-side of channel
unreliable channel to receiver

Transport Layer 3-25


Reliable data transfer: getting started
We’ll:
 incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
 consider only unidirectional data transfer
 but control info will flow on both directions!
 use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
1 event
uniquely determined 2
by next event actions

Transport Layer 3-26


Rdt1.0: reliable transfer over a reliable channel

 underlying channel perfectly reliable


 no bit errors
 no loss of packets

 separate FSMs for sender, receiver:


 sender sends data into underlying channel
 receiver read data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)
udt_send(packet)

sender receiver

Transport Layer 3-27


Rdt2.0: channel with bit errors
 underlying channel may flip bits in packet
 Q: how to detect bit errors?
 recall: UDP checksum to detect bit errors
 the question: how to recover from errors:
 acknowledgements (ACKs): receiver explicitly tells sender that
packet received OK
 negative acknowledgements (NAKs): receiver explicitly tells
sender that packet had errors
 sender retransmits pkt on receipt of NAK
 human scenarios using ACKs, NAKs?
 new mechanisms in rdt2.0 (beyond rdt1.0):
 error detection
 receiver feedback: control messages (ACK,NAK) receiver-
>sender

Transport Layer 3-28


rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for

call from
sender below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-29


rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
 call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-30


rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
 call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-31


rdt2.0 has a fatal flaw!
What happens if
ACK/NAK
corrupted?
 sender doesn’t know
what happened at
receiver!

What to do?
 Q?

Transport Layer 3-32


rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted?  sender adds sequence
number to each packet
 sender doesn’t know what
 sender retransmits current
happened at receiver!
packet if ACK/NAK
garbled
What to do?
 receiver discards (doesn’t
 sender ACKs/NAKs deliver up) duplicate packet
receiver’s ACK/NAK? What
if sender ACK/NAK lost?
 retransmit, but this might stop and wait
cause retransmission of Sender sends one packet,
correctly received packet! then waits for receiver
response

Transport Layer 3-33


rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)


Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

Transport Layer 3-34


rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

Transport Layer 3-35


rdt2.1: discussion
Sender: Receiver:
 sequence # added to  must check if received
packet packet is duplicate
 two sequence #’s (0,1)  state indicates whether
will suffice. Why? 0 or 1 is expected
packet sequence #
 must check if received
 note: receiver can not
ACK/NAK corrupted
know if its last
 twice as many states ACK/NAK received OK
 state must “remember” at sender
whether “current”
packet has 0 or 1
sequence #

Transport Layer 3-36


rdt2.2: a NAK-free protocol

 same functionality as rdt2.1, using ACKs only


 instead of NAK, receiver sends ACK for last packet
received OK
 receiver must explicitly include sequence # of packet being
ACKed
 duplicate ACK at sender results in same action as
NAK: retransmit current packet

Transport Layer 3-37


rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) 
Wait for receiver FSM
0 from
below fragment
sndpkt = make_pkt(ACK?,
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
chksum)
&& has_seq1(rcvpkt)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) Transport Layer 3-38
rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) 
Wait for receiver FSM
0 from
below fragment
sndpkt = make_pkt(ACK1, rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
chksum) && has_seq1(rcvpkt)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) Transport Layer 3-39
rdt3.0: channels with errors and loss

New assumption: Approach: sender waits


underlying channel can “reasonable” amount of
also lose packets (data time for ACK
or ACKs)  retransmits if no ACK
 checksum, seq. #, ACKs, received in this time
retransmissions will be  if pkt (or ACK) just delayed
of help, but not enough (not lost):
Q: how to deal with loss?  retransmission will be
duplicate, but use of seq.
 sender waits until
#’s already handles this
certain data or ACK
lost, then retransmits  receiver must specify seq
# of pkt being ACKed
 yuck: drawbacks?
 requires countdown timer

Transport Layer 3-40


rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer 
 Wait for Wait
for timeout
call 0 from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) 
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer

Transport Layer 3-41


rdt3.0 in action

Transport Layer 3-42


rdt3.0 in action

Transport Layer 3-43


Performance of rdt3.0

 rdt3.0 works, but performance stinks


 example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet:

Ttransmit = L (packet length in bits) 8kb/pkt


= = 8 microsec
R (transmission rate, bps) 10**9 b/sec

 U sender: utilization – fraction of time sender busy sending

U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R
 1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
 network protocol limits use of physical resources!

Transport Layer 3-44


rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK

ACK arrives, send next


packet, t = RTT + L / R

U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R

Transport Layer 3-45


Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-
be-acknowledged packets
 range of sequence numbers must be increased
 buffering at sender and/or receiver

 Two generic forms of pipelined protocols: go-Back-N,


selective repeat
Transport Layer 3-46
Pipelining: increased utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R

Increase utilization
by a factor of 3!

U 3*L/R .024
= = = 0.0008
sender 30.008
RTT + L / R

Transport Layer 3-47


Go-Back-N
Sender:
 Sequence # in packet header, k-bit
 “window” of up to N, consecutive unack’ed packets allowed

 ACK(n): ACKs all packets up to, including sequence # n


 Cumulative ACK
 Timer for each in-flight packet batch (per send_base)
 timeout(n): retransmit packet n and all higher sequence #
packets in window (send_base to nextseqnum)
Transport Layer 3-48
GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
 else
refuse_data(data)
base=1
nextseqnum=1 timeout
start_timer
Wait udt_send(sndpkt[base])
udt_send(sndpkt[base+1])
rdt_rcv(rcvpkt) && …
(corrupt(rcvpkt) || udt_send(sndpkt[nextseqnum-1])
isOLDACK(rcfpkt))
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt) && isNEWACK (rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer (no pkt outstanding)
else {
stop_timer (for the old base)
start_timer (for the new base) Transport Layer 3-49
}
GBN: receiver extended FSM
default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
 && isEXPECTED(rcvpkt)
expectedseqnum=1 Wait extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++

 Principle:
 if it’s the expected data packet, send ACK
 Else, send NAK
 Making it ACK-only:
 Send ACK for correctly-received packet with highest in-order
sequence #
• Need to remember expectedseqnum
 For corrupted or out-of-order packet:
• discard (don’t buffer) -> no receiver buffering!
• ACK packet with highest in-order sequence #
Transport Layer 3-50
GBN in
action

Transport Layer 3-51


Selective Repeat
 Actually easier to understand
 Receiver individually acknowledges all correctly
received packets
 buffers packets, as needed, for eventual in-order
delivery to upper layer
 Sender only re-sends packets for which ACK not
received
 sender timer for each unACKed packet
 Sender window
 N consecutive sequence #’s
 again limits sequence #s of sent, unACKed packets

Transport Layer 3-52


Selective repeat: sender, receiver windows

Transport Layer 3-53


Selective repeat
sender receiver
Data from above : pkt n in [rcvbase, rcvbase+N-1]
 If next available sequence  Send ACK(n)
# in window, send packet  Out-of-order: buffer
ACK(n) in [sendbase,sendbase+N-  In-order: deliver (also
1]: deliver buffered, in-order
 Mark packet n as received packets), advance window
 If n smallest unACKed to next not-yet-received
packet, advance window packet
base to next unACKed pkt n in [rcvbase-N, rcvbase-1]
sequence #  Send ACK(n)
timeout(n): otherwise:
 Resend packet n, restart  Ignore
timer

Transport Layer 3-54


Selective repeat in action

Transport Layer 3-55


Selective repeat:
dilemma
Example:
 sequence #’s: 0, 1, 2, 3
 window size=3

 receiver sees no
difference in two
scenarios!
 incorrectly passes
duplicate data as new
in (a)

Q: what relationship
between sequence #
size and window size?

Transport Layer 3-56


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-57


TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

 point-to-point:  full duplex data:


 one sender, one receiver  bi-directional data flow
in same connection
 reliable, in-order byte  MSS: maximum segment
steam: size
 no “message boundaries”  connection-oriented:
 pipelined:  handshaking (exchange
of control messages)
 TCP congestion and flow
init’s sender, receiver
control set window size state before data
 send & receive buffers exchange
 flow controlled:
application
writes data
application
reads data
 sender will not
overwhelm receiver
socket socket
door door
TCP TCP
send buffer receive buffer
segment

Transport Layer 3-58


TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UA P R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

Transport Layer 3-59


TCP seq. #’s and ACKs
Seq. #’s:
Host A Host B
 byte stream
“number” of first User
types
byte in segment’s ‘C’
data host ACKs
receipt of
ACKs: ‘C’, echoes
 seq # of next byte back ‘C’
expected from
other side host ACKs
 cumulative ACK receipt
of echoed
Q: how receiver handles ‘C’
out-of-order segments

time
simple telnet scenario

Transport Layer 3-60


TCP seq. #’s and ACKs
Seq. #’s:
Host A Host B
 byte stream
“number” of first User
byte in segment’s types
data ‘C’
host ACKs
ACKs: receipt of
 seq # of next byte ‘C’, echoes
expected from back ‘C’
other side
 cumulative ACK host ACKs
Q: how receiver handles receipt
out-of-order segments of echoed
‘C’
 A: TCP spec doesn’t
say, - up to the
implementor time
simple telnet scenario

Transport Layer 3-61


TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
 1 sec? 1 min? Or else?
 too short?
 too long?

Transport Layer 3-62


TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value?  SampleRTT: measured time from
 longer than RTT segment transmission until ACK
 but RTT varies
receipt
 ignore retransmissions
 too short: premature
timeout  Why?

 unnecessary  SampleRTT will vary, want


retransmissions estimated RTT “smoother”
 too long: slow reaction  How?
to segment loss  average several recent
measurements, not just
current SampleRTT

Transport Layer 3-63


TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

 Exponential weighted moving average


 influence of past sample decreases exponentially fast
 typical value:  = 0.125

Transport Layer 3-64


Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

300

250
RTT (milliseconds)

200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

SampleRTT Estimated RTT

Transport Layer 3-65


TCP Round Trip Time and Timeout
Setting the timeout
 EstimtedRTT plus “safety margin”
 large variation in EstimatedRTT -> larger safety margin
 first estimate of how much SampleRTT deviates from
EstimatedRTT:

DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|

(typically,  = 0.25)

Then set timeout interval:

TimeoutInterval = EstimatedRTT + 4*DevRTT

Transport Layer 3-66


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-67


TCP reliable data transfer
 TCP creates rdt  Retransmissions are
service on top of IP’s triggered by:
unreliable service  timeout events
 Pipelined segments  duplicate acks
 Cumulative acks  Initially consider
 TCP uses single
simplified TCP sender:
ignore duplicate acks
retransmission timer 
 ignore flow control,
congestion control

Transport Layer 3-68


TCP sender events:
data rcvd from app: timeout:
 Create segment with  retransmit segment
seq # that caused timeout
 seq # is byte-stream
 restart timer
number of first data
byte in segment Ack rcvd:
 start timer if not  If acknowledges
already running (think previously unacked
of timer as for oldest segments
unacked segment)  update what is known to
 expiration interval: be acked
TimeOutInterval  start timer if there are
outstanding segments

Transport Layer 3-69


NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum

loop (forever) { TCP


sender
switch(event)

event: data received from application above


create TCP segment with sequence number NextSeqNum (simplified)
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)

event: timer timeout


retransmit not-yet-acknowledged segment with
smallest sequence number
start timer

event: ACK received, with ACK field value of y


if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}

} /* end of loop forever */


Transport Layer 3-70
TCP: retransmission scenarios
Host A Host B Host A Host B
Sendbase
= 92

Seq=92 timeout
Seq=100 timeout
Seq=92 timeout
timeout

X
loss

Sendbase
= 100
SendBase
Seq=120 timeout = 120

SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
Transport Layer 3-71
TCP retransmission scenarios (more)
Host A Host B
timeout

X
loss

SendBase
= 120

time
Cumulative ACK scenario

Transport Layer 3-72


TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver TCP Receiver action


Arrival of in-order segment with Delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed send ACK

Arrival of in-order segment with Immediately send single cumulative


expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending

Arrival of out-of-order segment Immediately send duplicate ACK,


higher-than-expect seq. # . indicating seq. # of next expected byte
Gap detected

Arrival of segment that Immediate send ACK, provided that


partially or completely fills gap segment starts at lower end of gap

Transport Layer 3-73


Fast Retransmit
 Time-out period often  If sender receives 3
relatively long: ACKs for the same
 long delay before data, it supposes that
resending lost packet segment after ACKed
 Detect lost segments data was lost:
via duplicate ACKs.  fast retransmit: resend
 Sender often sends segment before timer
many segments back-to- expires
back
 If segment is lost,
there will likely be many
duplicate ACKs.

Transport Layer 3-74


Host A Host B

timeout

time

Figure 3.37 Resending a segment after triple duplicate ACK


Transport Layer 3-75
Fast retransmit algorithm:

event: ACK received, with ACK field value of y


if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
}

a duplicate ACK for fast retransmit


already ACKed segment

Transport Layer 3-76


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-77


TCP Flow Control
flow control
sender won’t overflow
 receive side of TCP receiver’s buffer by
connection has a transmitting too much,
receive buffer: too fast

 speed-matching
service: matching the
send rate to the
receiving app’s drain
rate
 app process may be
slow at reading from
buffer
Transport Layer 3-78
TCP Flow control: how it works
 Rcvr advertises spare
room by including value
of RcvWindow in
segments
 Sender limits unACKed
(Suppose TCP receiver data to RcvWindow
discards out-of-order  guarantees receive
segments) buffer doesn’t overflow
 spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]

Transport Layer 3-79


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-80


TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection”
before exchanging data Step 1: client host sends TCP
segments SYN segment to server
 initialize TCP variables:  specifies initial seq #

 seq. #s  no data

 buffers, flow control Step 2: server host receives


info (e.g. RcvWindow) SYN, replies with SYNACK
segment
 client: connection initiator  server allocates buffers
connect();  specifies server initial

 server: contacted by client seq. #


listen(); Step 3: client receives SYNACK,
replies with ACK segment,
which may contain data

Transport Layer 3-81


TCP Connection Management (cont.)

Closing a connection: client server

client closes socket: close


close();

Step 1: client end system


sends TCP FIN control close
segment to server

Step 2: server receives

timed wait
FIN, replies with ACK.
Closes connection, sends
FIN.
closed

Transport Layer 3-82


TCP Connection Management (cont.)

Step 3: client receives FIN, client server


replies with ACK. closing
 Enters “timed wait” -
will respond with ACK
to received FINs
closing
Step 4: server, receives
ACK. Connection closed.

timed wait
Note: with small
closed
modification, can handle
simultaneous FINs.
closed

Transport Layer 3-83


TCP Connection Management (cont)

TCP server
lifecycle

TCP client
lifecycle

Transport Layer 3-84


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-85


Principles of Congestion Control

Congestion:
 informally: “too many sources sending too much
data too fast for network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queuing in router buffers)
 a top-10 problem!

Transport Layer 3-86


Causes/costs of congestion: scenario 1
Host A
out
 two senders, two
in : original data

receivers
 one router,
Host B unlimited shared
output link buffers

infinite buffers
 no retransmission

 maximum
achievable
throughput
 large delays
when congested
Transport Layer 3-87
Causes/costs of congestion: scenario 2

 one router, finite buffers


 sender retransmission of lost packet

Application Layer Host A in : original out


data
Transport Layer
'in : original data, plus
retransmitted data

Host B finite shared output


link buffers

Transport Layer 3-88


Causes/costs of congestion: scenario 2
 “perfect” case, always: =  (goodput)
in out
 retransmission only when loss:  > 
in out
 retransmission of lost packet makes  larger (than perfect
in
case) for same 
out

“costs” of congestion:
 more work (retransmission) for given “goodput”
 unneeded retransmissions: link carries multiple copies of pkt
Transport Layer 3-89
Causes/costs of congestion: scenario 3
 Four senders
Q: what happens as 
 Multi-hop paths in
and  increase ?
 Timeout/retransmit in

Host A out
in : original data
'in : original data, plus
retransmitted data

finite shared output


link buffers

Host B

Transport Layer 3-90


Causes/costs of congestion: scenario 3
H 
o
o
s
u
t
A t

H
o
s
t
B

Another “cost” of congestion:


 when packet gets dropped, any “upstream”
transmission capacity used for that packet was
wasted!
Transport Layer 3-91
Message: Congestion is bad

But what can we do about it?

Transport Layer 3-92


Try this:
Driving on the Highway

 You are a taxi driver in a big alliance


serving the Taipei- CKS airport line
 台北 – 林口路段壅塞
 How do you inform your fellow drivers?

Transport Layer 3-93


Approaches towards congestion control
Two broad approaches towards congestion control:

End-end congestion Network-assisted


control: congestion control:
 no explicit feedback from  routers provide feedback
network to end systems
 congestion inferred from  single bit indicating
end-system observed loss, congestion (SNA,
delay DECbit, TCP/IP ECN,
 approach taken by TCP ATM)
 explicit rate that
sender should send at

Transport Layer 3-94


Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
flow control
 3.3 Connectionless

connection management
transport: UDP 

 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

Transport Layer 3-95


TCP AIMD
multiplicative decrease: additive increase:
cut CongWin in half increase CongWin by
after loss event 1 MSS every RTT in
CongWin = CongWin * 0.5 the absence of loss
congestion
window
events: probing
24 Kbytes CongWin = CongWin + 1

16 Kbytes

8 Kbytes

time

Long-lived TCP connection


Transport Layer 3-96
TCP Congestion Control
 end-end control (no network How does sender
assistance) perceive congestion?
 sender limits transmission:  loss event
LastByteSent-LastByteAcked  How to tell whether
 CongWin there’s a loss event?
 TCP sender reduces
 Roughly,
rate (CongWin) after
rate =
CongWin
Bytes/sec
loss event
RTT three mechanisms:
 CongWin is dynamic, a function  AIMD
of perceived network  slow start
congestion  conservative after
timeout events
Transport Layer 3-97
TCP Slow Start
 When connection begins,
 When connection begins,
CongWin = 1 MSS increase rate
exponentially fast until
 Example:
first loss event
 MSS = 500 bytes
 RTT = 200 msec
 initial rate = 20 kbps
 available bandwidth may
be >> MSS/RTT
 desirable to quickly ramp
up to respectable rate

Transport Layer 3-98


TCP Slow Start (more)
 When connection Host A Host B
begins, increase rate
exponentially:

RTT
 double CongWin every
RTT
 done by incrementing
CongWin for every ACK
received
 Summary: initial rate
is slow but ramps up
exponentially fast
time

Transport Layer 3-99


Refinement
Q: When should the
exponential
increase switch to
linear?
A: When CongWin
gets to 1/2 of its
value before
timeout.

Implementation:
 Variable Threshold
 At loss event, Threshold is
set to 1/2 of CongWin just
before loss event

Transport Layer 3-100


Refinement
Philosophy:
 After 3 dup ACKs:
Why Half the CongWin
 CongWin is cut in half
vs. 1?
 window then grows
linearly
 But after timeout event:
 CongWin instead set to
1 MSS;
 window then grows
exponentially
 to a threshold, then
grows linearly

Transport Layer 3-101


Refinement
Philosophy:
 After 3 dup ACKs:
 CongWin is cut in half • 3 dup ACKs indicates
network capable of
 window then grows
delivering some segments
linearly
• timeout before 3 dup
 But after timeout event:
ACKs is “more alarming”
 CongWin instead set to
1 MSS;
 window then grows
exponentially
 to a threshold, then
grows linearly

Transport Layer 3-102


Let’s Play a Game:
Guessing a number
 You can
 Increase your guess any way you want
 But decrease only when your guess exceed the
number

Transport Layer 3-103


Summary: TCP Congestion Control

 When CongWin is below Threshold, sender in


slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.

 When timeout occurs, Threshold set to


CongWin/2 and CongWin is set to 1 MSS.

Transport Layer 3-104


TCP sender congestion control
State Event TCP Sender Action Commentary
Slow Start ACK receipt CongWin = CongWin + MSS, Resulting in a doubling of
(SS) for previously If (CongWin > Threshold) CongWin every RTT
unacked set state to “Congestion
data Avoidance”
Congestion ACK receipt CongWin = CongWin+MSS * Additive increase, resulting
Avoidance for previously (MSS/CongWin) in increase of CongWin by
(CA) unacked 1 MSS every RTT
data
SS or CA Loss event Threshold = CongWin/2, Fast recovery,
detected by CongWin = Threshold, implementing multiplicative
triple Set state to “Congestion decrease. CongWin will not
duplicate Avoidance” drop below 1 MSS.
ACK
SS or CA Timeout Threshold = CongWin/2, Enter slow start
CongWin = 1 MSS,
Set state to “Slow Start”
SS or CA Duplicate Increment duplicate ACK count CongWin and Threshold not
ACK for segment being acked changed

Transport Layer 3-105


TCP throughput
 What’s the average throughout of TCP as a
function of window size and RTT?
 Ignore slow start
 Let W be the window size when loss occurs.
 When window is W, throughput is W/RTT
 Just after loss, window drops to W/2,
throughput to W/2RTT.
 Average throughout: .75 W/RTT

Transport Layer 3-106


TCP Futures: TCP over “long, fat pipes”

 Example: 1500 byte segments, 100ms RTT, want 10


Gbps throughput
 Requires window size W = 83,333 in-flight
segments
 Throughput in terms of loss rate:

1.22  MSS
RTT L
 ➜ L = 2·10-10 Wow
 New versions of TCP for high-speed

Transport Layer 3-107


TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K

TCP connection 1

bottleneck
TCP
router
connection 2
capacity R

Transport Layer 3-108


Why is TCP fair?
Two competing sessions:
 Additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally

R equal bandwidth share

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R

Transport Layer 3-109


Fairness (more)
Fairness and UDP Fairness and parallel TCP
connections
 Multimedia apps often  nothing prevents app from
do not use TCP opening parallel connections
 do not want rate between 2 hosts.
throttled by congestion  Web browsers do this
control  Example: 10 users, link of rate
 Instead use UDP: R supporting 9 connections;
 pump audio/video at  new app asks for 1 TCP, gets
constant rate, tolerate rate R/10
packet loss  new app asks for 9 TCPs, gets
R/2 !
 Research area: TCP
friendly

Transport Layer 3-110


Chapter 3: Summary
 principles behind transport
layer services:
 multiplexing,
demultiplexing
 reliable data transfer
 flow control Next:
 congestion control  leaving the network
 instantiation and “edge” (application,
implementation in the transport layers)
Internet  into the network
 UDP “core”
 TCP
Transport Layer 3-111

You might also like