You are on page 1of 87

Transport Layer

Chapter 6

Slides courtesy: Sweta & Chebrolu


Revised: August 2011
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Outline

• Transport Layer Overview


• Multiplexing & Demultiplexing
• Internet Protocols – UDP
• Internet Protocols – TCP

Revised: August 2011


CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Transport Layer Overview

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Milestones
• Progression in scale of networks
• Point-to-point link (2 nodes)
• Small local area networks (tens of nodes)
• Extended local area networks (thousands of nodes)
• Heterogeneous inter-networks (millions of nodes)
• Can now handle host-to-host delivery
• Network layer (determines which next hop) uses
services of link layer (delivers to next hop) which in
turn uses services of physical layer (converts bits to
signals) to deliver packets
• Next: process to process communication -> role of the
transport layer
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Transport Layer Services (1)
• Hosts run many application
processes
application
transport
• Provide logical communication network
data link network
between application physical
network
data link
physical
data link
processes running on different physical
network
hosts data link
physical network

• Help multiplex/demultiplex data link


physical

packets to deliver to the network


data link
physical
right process
application
• Enhance network layer transport
network
services data link
physical
Transport Layer Services (2)
• Run on end hosts
− Sender: breaks application messages into segments,
and passes to network layer
− Receiver: reassembles segments into messages, passes to
application layer
• The unit of data at transport layer is termed “segment”
Segment

Segment
Application Layer Expectations

• Guaranteed message delivery


• Ordered delivery
• No duplication
• Delay guarantees
• Support arbitrarily large messages
• Support flow control

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Network Layer Limitations

• Best effort service model


• Packet losses
• Re-ordering
• Duplicate copies
• Limit on maximum message size
• Long delay

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Challenge
• Enhance network layer services to meet application
expectations
• Cannot provide services that inherently cannot be
supported by network layer (e.g., delay guarantees,
bandwidth guarantees)
• Different transport protocols offer different tradeoffs
• TCP (transmission control protocol): reliable, in-order
delivery
− congestion control
− flow control
− connection setup
• UDP (user datagram protocol): unreliable, unordered
delivery
− no-frills extension of “best-effort” IP
Multiplexing & Demultiplexing

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Multiplexing & Demultiplexing
multiplexing at sender:
handle data from multiple demultiplexing at receiver:
sockets, add transport header use header info to deliver
(later used for demultiplexing) received segments to correct
socket

application

application P1 P2 application socket


P3 transport P4
process
transport network transport
network link network
link physical link
physical physical

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
How Demultiplexing Work
• host receives IP datagrams
• each datagram has source IP
32 bits
address, destination IP
address source port # dest port #
• each datagram carries one
transport-layer segment other header fields
• each segment has source,
destination port number
application
• host uses IP addresses & data
port numbers to direct (payload)
segment to appropriate
socket
TCP/UDP segment format

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Socket (1)
• An interface between an application process and the
transport layer
• The application process sends/receives messages
to/from another application process (local or remote)
via a socket
• Also referred to as the Application Programming
Interface (API) between the application and the network

application application
socket controlled by
process process app developer

transport transport
network network controlled
link by OS
link Internet
physical physical

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Socket (2)
• Application developer can
• Specify type of transport protocol
• Configure a few parameters related to the transport
layer
• To help multiplex/demultiplex a segment
• Sockets have unique identifiers (port & IP address)
• Segments carry fields that help identify the right
socket
− Fields of relevance: source and destination port

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connectionless Multiplexing &
Demultiplexing
• Used with UDP sockets
• Socket identified by two-tuple:
• Destination IP address, Destination port number
• Transport layer checks port information in segment and
directs to the right socket
• IP datagrams with different source IP addresses and/or
source port numbers, but the same destination IP
address and destination port number, are directed to the
same socket

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connectionless Multiplexing &
Demultiplexing: Example

application
application application
P1
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical

source port: 6428 source port: 6428


dest port: 9157 dest port: 6000

source port: 9157 source port: 6000


dest port: 6428 dest port: 6428

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection-oriented Multiplexing &
Demultiplexing
• Used with TCP sockets
• Socket identified by 4-tuple:
• Source IP address
• Source port number
• Destination IP address
• Destination port number
• All four values are used to direct segment to the right
socket
• A server host may support many simultaneous TCP
sockets, with each socket attached to a process, and
with each socket identified by its 4-tuple
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection-oriented Multiplexing &
Demultiplexing: Example
application
application P4 P5 P6 application
P3 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B

host: IP source IP,port: B,80 host: IP


address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80
three segments, all destined to IP address: B,
dest port: 80 are demultiplexed to different sockets
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Obtaining Port Information
• Client contacts server
• Client picks a random port and sends message
• Server knows identity of client process (based on the
source IP address and the source port in the received
message)
• How does client know server’s port information?
• Server listens to messages on well known ports
• Refer to /etc/services file in Unix systems

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Review - 1
• The role of transport layer is to provide logical
communication between processes
• All transport protocols provide multiplexing and
demultiplexing capability
• Others try to enhance network services to meet
application specific requirements
• Different types of multiplexing and demultiplexing
• Role of sockets

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Internet Protocols – UDP

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
User Datagram Protocol
• Provides multiplexing and demultiplexing capability
over best-effort network layer service
• ‘bare bones’ transport protocol
− UDP segments can be lost, duplicated, delivered out of order
• Connectionless:
− No handshaking between UDP sender and receiver
− Each UDP segment is handled independently of others
• Reliable transfer over UDP:
− add reliability at application layer

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Why Would Anyone Use UDP?
• Fine control over what data is sent and when
• As soon as an application process writes into the socket
• … UDP will package the data and send the packet
• No delay for connection establishment
• UDP just blasts away without any formal preliminaries
• … which avoids introducing any unnecessary delays
• No connection state
• No allocation of buffers, parameters, sequence #s, etc.
• … making it easier to handle many active clients at once
• Small packet header overhead
• UDP header is only eight-bytes long

23
Popular Applications that use UDP
Multimedia streaming
• Retransmitting lost/corrupted packets is not worthwhile
• By the time the packet is retransmitted, it’s too late
• E.g., telephone calls, video conferencing, gaming
Simple query protocols like Domain Name System (DNS)
• Overhead of connection establishment is overkill
• Easier to have the application retransmit if needed
“Address for www.cnn.com?”

“12.3.4.15”

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
UDP Segment Format
length, in bytes of
32 bits UDP segment,
source port # dest port # including header

length checksum
• Source/Destination port:
identifies sending/receiving
process
application • Client: random port
data • Server: well-known port
(payload)
• Checksum: covers UDP segment
and IP pseudoheader
• Detect “errors” in transmitted
UDP segment format segment
• Provides end-to-end delivery
check

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Review - 2
• UDP is a simple transport protocol
• Provides multiplexing/demultiplexing and simple error
detection capability
• Finds good use in many applications in spite of its
simplicity

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Internet Protocols – TCP

• TCP Overview
• TCP Segment Header
• TCP Connection Management
• TCP Flow Control
• TCP Timer Management
• TCP Congestion Control

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Overview

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Overview

• Connection-oriented: • TCP Services:


• handshaking • Multiplexing &
(exchange of control Demultiplexing
msgs) inits sender, • Reliable point-to-point data
receiver state before transfer
data exchange − One sender, one receiver
• Reliable, in-order byte • Full-duplex
stream: − bi-directional data flow in
same connection
• no “message
boundaries” • Flow control
• Congestion control

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The TCP Service Model
TCP provides applications with a reliable byte stream
between processes

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Recap: Sliding Window Protocol

Sender Receiver

TX Time
Frame0
Frame1 ACK0
Frame2 ACK1
RTT Frame3 ACK2
ACK3
Frame4
Frame5
Frame6
Frame7

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Link vs Transport: Connection Management

• Data Link Layer: dedicated physical link connects


same two hosts
• Transport Layer: connects processes running on any
two hosts in the Internet
• Need at Transport Layer: explicit connection
establishment before data exchange and tear down
after done

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Link vs Transport: RTT

• Data Link Layer: fixed (almost) RTT


• Transport Layer: varies from connection to connection
and can be highly variable within connection
• Need at Transport Layer: time out mechanism has to
be adaptive

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Link vs Transport: Reordering

• Data Link Layer: no reordering


• Transport Layer: packets can take different paths and
suffer arbitrary delays
• Need at Transport Layer: transport protocol should be
robust against old packets suddenly showing up

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Link vs Transport: Flow Control

• Data Link Layer: end points can be engineered to


support the link
• Transport Layer: any kind of computer can be
connected to the Internet
• Need at Transport Layer: mechanisms to ensure one
side doesn’t overwhelm other side’s resources (e.g.,
buffer space)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Link vs Transport: Congestion Control

• Data Link Layer: not possible to unknowingly congest


the link
• Transport Layer: no idea what links will be traversed,
network capacity can dynamically vary due to
computing traffic
• Need at Transport Layer: mechanisms to alter
sending rate in response to network congestion

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Segment Header

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Segment Header

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Sequence Number & Acknowledgement

• Each byte has a sequence number


• Sequence number field contains the sequence number
of the first byte in the data stream encapsulated in a
segment.
• Acknowledgement number field carry information about
flow in the other direction
• Carries sequence number of next byte a host is
expecting
• Unless specified, ACK is cumulative

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Sequence Number & Acknowledgement:
Example

A B

Seq: 20, ACK: 857, Data: 1000B

Seq: 857, ACK: 1020, Data: 100B

Seq: 1020, ACK: 957, Data: 500B


CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Segment Header

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Flags
• CWR & ECE: used to signal congestion when ECN (Explicit
Congestion Notification) is used.

• URG: set to 1 if the Urgent Pointer field is in use


• indicate that segment contains urgent data (not used)
• Urgent Pointer (bytes) indicates where in the segment non-
urgent data begins

• ACK: set to 1 if the Acknowledgement number field is valid

• PSH: indicates receiver should pass data to higher layers


immediately (not used)

• RST: used to abruptly reset a connection that has become


confused

• SYN/FIN: used during connection establishment and termination

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Segment Header

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Options

• Can negotiate maximum segment size (MSS)


• Can perform window scaling
• Permits use of selective ACKs

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Review - 3
• TCP: provides quite a few features at the transport layer
• Multiplexing & demultiplexing
• Reliable point-to-point data transfer
• Full-duplex
• Flow control
• Congestion control
• Heart of TCP is the sliding window protocol
• Examined TCP header

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Connection Management

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Background
• TCP is a connection-oriented protocol
• Processes can run on any type of machine in the
Internet
• Connection establishment helps
• Exchange and initiate state variables
− MSS, initial sequence number, ACK type
• Allocate resources (e.g., send and receive buffers)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Setup

2-way handshake:

Let’s talk ESTAB


ESTAB
OK
ESTAB
ESTAB

Q: will 2-way handshake always work in network?

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
2-Way Handshake Failure Scenario 1

ESTAB
Time out, retransmit

ESTAB

connection
client completes
terminates
Duplicate

ESTAB

Half open connection!


What the hell is this? (no client!)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
2-Way Handshake Failure Scenario 2

ESTAB
Time out, retransmit

ESTAB

Time out, retransmit Transfer $1M to account x


connection
client completes
terminates
Duplicate

ESTAB

Duplicate
What the hell is this?
Transfer $1M to account x

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP 3-Way Handshake

choose init seq num, x


send TCP SYN msg
SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, Seq=x+1,
client-to-server data ACKnum=y+1
received ACK(y)
indicates client is live

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Solution for Failure Scenario 1

ESTAB
Time out, retransmit

ESTAB

connection
client completes
terminates
Duplicate

ESTAB

What the hell is this?

Abort connection
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Solution for Failure Scenario 2

ESTAB
Time out, retransmit

ESTAB
Time out, retransmit
connection
client completes
terminates
Duplicate

ESTAB

Duplicate
What the hell is this? Huh? I sent Seq=z.
Why acks y? Stop

Abort connection

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Initial Sequence Number (ISN)

• Why not start with Sequence number 0?


• Segments from different connections can get mixed
up
• Security risk when ISN’s are predictable
• Original solution: use a clock (e.g., increments every 4
microsec) to choose ISN
• Easy for an attacker to predict the next ISN
• Solution: use random ISN

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Termination
• Asymmetric release (just hang-up) leads to loss of
data
• Symmetric release
• Treat connection as two separate unidirectional
connections
• Each side should be released separately

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Two Army Problem

• The attack will succeed if and only if both armies attack the
enemy at the same time.

• If neither side is ready to disconnect until it is convinced that the


other side is ready to disconnect, the disconnection will never
happen. CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Connection Termination

can no longer
send but can
receive data FINbit=1, seq=x

ACKbit=1; ACKnum=x+1
can still
wait for server send data
close
• Follows simple 2-
way handshake FINbit=1, seq=y

• Each side ACKbit=1; ACKnum=y+1


independently time-wait
closes connection

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Review - 4
• TCP is a connection-oriented protocol
• Connection management complicated by the fact
that packets can get retransmitted, delayed,
delivered out of order, etc.
• Connection establishment governed by 3-way
handshake
• Connection termination is based on symmetric
release and managed by 2-way handshake
B

A
time
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Flow Control

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Sliding Window

TCP adds flow control


to the sliding window

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Buffers

Application Application
Process Process
Write bytes Read bytes
LastByteRead
LastByteWritten

LastByteAcked LastByteSent NextByteExpected LastByteRcvd

LastByteAcked <= LastByteSent <= LastByteWritten LastByteRead < NextByteExpected <= LastByteRcvd+1

Both buffers are of finite size: MaxSendBuffer and MaxRcvBuffer


Buffer Overflow
LastByteWritten LastByteRead

LastByteAcked LastByteSent NextByteExpected LastByteRcvd

LastByteWritten – LastByteAcked <= MaxSendBuffer LastByteRcvd – LastByteRead <= MaxRcvBuffer

• Two cases that can cause receive buffer overflow


• Receiver has limited resources (small
MaxRcvBuffer)
• Application is slow to read

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Flow Control (1)
LastByteWritten LastByteRead

LastByteAcked LastByteSent NextByteExpected LastByteRcvd

LastByteWritten – LastByteAcked <= MaxSendBuffer LastByteRcvd – LastByteRead <= MaxRcvBuffer

• Receiver side: avoid overflowing receive buffer


• WIN = MaxRcvBuffer - [(NextByteExpected - 1) -
LastByteRead]
(free space available at the receive buffer)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Flow Control (2)
LastByteWritten LastByteRead

LastByteAcked LastByteSent NextByteExpected LastByteRcvd

LastByteWritten – LastByteAcked <= MaxSendBuffer LastByteRcvd – LastByteRead <= MaxRcvBuffer

• Sender Size: avoid overflowing receive buffer and also


congesting the network
• LastByteSent – LastByteAcked <= WIN
• LastByteSent – LastByteAcked <= cwnd
• MaxWindow = min(cwnd, WIN)
• EffectiveWindow = MaxWindow - (LastByteSent-
LastByteAcked)
(how much data it can send)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Timer Management

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Retransmission Timeout
How long should the retransmission timeout be?
• Should be longer than RTT
• Too short: unnecessary retransmissions
• Too long: slow reaction to segment loss
How to estimate RTT?

LAN case – small, Internet case –


regular RTT large, varied RTT
Estimate RTT (1)
• SampleRTT: measured time from segment
transmission until ACK receipt
• ignore retransmissions
• SampleRTT will vary, want estimated RTT “smoother”
• average several recent measurements, not just
current SampleRTT

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Estimate RTT (2)
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

▪ exponential weighted moving average


▪ influence of past sample decreases exponentially fast
▪ typical value:  = 0.125 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr


RTT (milliseconds)

300

250
RTT (milliseconds)

200

sampleRTT
150

EstimatedRTT

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
time (seconds)
SampleRTT Estimated RTT
Timeout Interval
timeout interval: EstimatedRTT plus “safety margin”
• large variation in EstimatedRTT -> larger safety
margin
estimate SampleRTT deviation from EstimatedRTT:

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|

(typically,  = 0.25)

TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT “safety margin”

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Review - 5
• Looked at how TCP implements flow control in the
context of the sliding window protocol
• Receiver advertises the free space available at the
receive buffer
• Sender determines the amount of data that can be
sent based on the advertised window size, as well
as cwnd
• Looked at how to determine the retransmission timeout
interval
• Function of the RTT
• Dynamic approach

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The Problem of Congestion
What is congestion?
• Load is higher than capacity
What do IP routers do?
• Drop the excess packets
Why is this bad?
• Wasted bandwidth for retransmissions

“congestion
Goodput collapse” Increase in load that
results in a decrease in
useful work done.
72
Load
Important Questions
• How does the sender know there is congestion?
• IP layer provides no explicit feedback regarding
network congestion
• Must infer based on network performance
• What should the sender do?
• Limit the rate at which it sends traffic into its
connection as a function of perceived network
congestion
• How does a sender limit the rate at which it sends
traffic into its connection?
• What algorithm should the sender use to change
its send rate as a function of perceived network
congestion?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Inferring From Implicit Feedback

• What does the end host see?


• Packet delay – RTT estimate
• Packet loss – Timeout, duplicate ACKs
• TCP views packet loss as a signal of congestion.
• Assume that packet loss caused by damage is very
small in wired networks
How to Adjust Send Rate?
• Upon detecting congestion (timeout, duplicate ACKs)
• Decrease the send rate (e.g., divide in half)

• But, what if conditions change?


• Suppose there is more bandwidth available
• Would be a shame to stay at a low send rate

• Upon not detecting congestion (receive ACKs for


previously unacked segments)
• Increase the send rate, a little at a time
• And see if the packets are successfully delivered

75
How does Sender Limits Send Rate?

Sender Receiver

TX Time
Segment0 Send rate ≈ MaxWindow/RTT
Segment1 ACK0
Segment2 ACK1
RTT Segment3 ACK2 LastByteWritten
ACK3
Segment4
Segment5
Segment6 LastByteAcked LastByteSent
Segment7
LastByteSent – LastByteAcked <= WIN
LastByteSent – LastByteAcked <= cwnd
MaxWindow = min(cwnd, WIN)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Avoidance: Additive
Increase Multiplicative Decrease (AIMD)
• How much to increase and decrease?
• Increase linearly, decrease multiplicatively
• A necessary condition for stability of TCP
• Consequences of over-sized window are much
worse than having an under-sized window
− Over-sized window: packets dropped and retransmitted
− Under-sized window: somewhat lower throughput

• Multiplicative decrease
• On loss of packet, cut cwnd in half
• Additive increase
• On success for last window of data, increase cwnd
by 1 MSS every RTT until loss (linearly)
77
TCP Additive increase

Additive increase grows


cwnd slowly ACK
• Adds 1 MSS every RTT

Implemented by
• Increase cwnd by
1/cwnd on every ACK
of new data

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Slow Start
• When connection begins, increase rate exponentially until first
loss event:
• initially cwnd = 1 MSS; double cwnd every RTT
• Implementation: increase cwnd by 1 on every ACK of new
data

• Slow start ends


when cwnd reaches
a threshold
(ssthresh) or packet
loss occurs.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Tahoe (1)

At the beginning: slow-start phase


• Initially, cwnd = 1; ssthresh is set arbitrarily large.
When new data is acked
• If (cwnd < ssthresh), cwnd += 1 (slow start)
• Else, cwnd += 1/cwnd (additive increase)
On packet loss, congestion avoidance
• Set ssthresh = cwnd/2; cwnd = 1

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Tahoe (2)
Slow start followed by additive increase
Threshold is half of previous loss cwnd

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Reno (1)

• Incorporate two new mechanisms:


• Fast retransmission
• Fast recovery
• Fast retransmission: retransmit packets after 3
duplicate ACKs
• Cut the window by half (loss event)
• Avoid having to timeout which keeps the link idle for
longer duration

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Reno (2) - Fast Recovery
• On 3rd duplicate ACK,
1. Set ssthresh = cwnd/2
2. Retransmit the missing segment
3. Set cwnd = ssthresh + 3
4. Each time the same duplicate ACK arrives, set cwnd
= cwnd + 1. Transmit a new packet, if allowed by
cwnd
5. If a non-duplicated ACK arrives, set cwnd =
ssthresh, and continue with a linear increase of
cwnd (additive increase)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Reno (3)
With fast recovery, we get the classic sawtooth
• Retransmit lost packet after 3 duplicate ACKs
• New packet for each dup. ACK until loss is repaired
3 duplicate ACKs

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Tahoe vs Reno (1)
• loss indicated by timeout:
− cwnd set to 1 MSS;
− window then grows exponentially (as in slow start)
to threshold, then grows linearly

• loss indicated by 3 duplicate ACKs: TCP


RENO
− dup ACKs indicate network capable of delivering
some segments
− cwnd is cut in half window then grows linearly

• TCP Tahoe always sets cwnd to 1


(timeout or 3 duplicate acks)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Tahoe vs Reno (2)

3 duplicate ACKs

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Review - 6
• Congestion control is a complex problem
• TCP relies on a variety of techniques to achieve this
• Slow start, additive increase multiplicative decrease
(AIMD)
• TCP Tahoe: slow start with AIMD
• Loss recovery slow;
• TCP Reno: improves upon Tahoe
• Better loss recovery via duplicate ACKs (fast
retransmit)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011

You might also like