You are on page 1of 221

.

Computer Networks

Transport Layer

Jamali@iust.ac.ir ITransport Layer 3 -1


Chapter 3 Outline

 3. Introduction  reliable data transfer


 3.1 Transport-layer  flow control
services  connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP  3.8 Multimedia Stream & TCP
 3.4 Principles of reliable  3.9 TCP fairness
data transfer  3.10 TCP modeling
 3.5 Connection-oriented  3.11 http modeling
transport: TCP
 segment structure

Jamali@iust.ac.ir ITransport Layer 3 -2


Protocols/Services

Application Program application


Services End-to-
End-to-End
protocols
transport

network
Data Transport
Services Hop-to-
Hop-to-Hop
link protocols

physical

Jamali@iust.ac.ir ITransport Layer 3 -3


Data Transport Services1

App. Software
Controlled
by App. Soft. application
(API) the application the application

transport transport
Data
Controlled network network
Transport
by OS link link
Services
physical physical

Jamali@iust.ac.ir ITransport Layer 3 -4


Data Transport Services2
Application’s Messages, Objects, Files
applicationprocess
application process applicationprocess
application process
application process application process

Data Data
Transport Transport
Services Services

Computer Network

 Source
 Source–
Source-
Source –-Destination
Breaking Destination
down the (end to end)
routing,
messages, flowthe
finding
in source,control.
path,
and It
 Data
 Transport
makes possible
 Error
through Services
slow
slow-
links are
and provided
-running
thedetection
and routersprocess
correction.towell
theof the
...
(switches)
assembling
application the message,
process. The maininservices
destination.
are:
communicate
network. with fast-
fast-running process.

Jamali@iust.ac.ir ITransport Layer 3 -5


Services & Layer Protocols
 Transport-layer Protocols:
 They take care of application processes. The processes are
Data
distinguished by transport
means of port numbers.
network
 Transport
They control flow intensity between the
linkcommunicating
processes. Services physical
 They apply acknowledgment scheme and make a reliable inter
inter--
process communication.
 Network-layer Protocols:
 They manage to pass packets router-by-router, from source host
to destination host. Hosts are distinguished by means of IP add.
 They do accounting for inter-
inter-host traffic.
 Link-layer Protocols and Physical-layer Protocols:
 They make frames move into links, repeaters, hubs, switches, and
routes in a way from source host to destination host.
 They take care of channel coding and error correction system.
 They regulate flow intensity between adjacent intermediate
systems.

Jamali@iust.ac.ir ITransport Layer 3 -6


Transport Layer vs. Network Layer
 network layer: logical communication between host-
router, router-router, router-host.
 transport layer: logical communication between
processes.
 relies on: enhances from network layer services
 extends “host-to-host” communication to “process-to-
process” communication

Computer Network - University analogy


IUST students send letters to TU students

 processes = students,
 Port number = students ID number,
 application messages = letters in envelopes,
 hosts = universities,
 IP add. = university’s address,
 transport protocol = post office of universities
 network-layer protocol = postal service of state

Jamali@iust.ac.ir ITransport Layer 3 -7


Chapter 3 Outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services  connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP  3.8 Multimedia Stream & TCP
 3.4 Principles of reliable  3.9 TCP fairness
data transfer  3.10 TCP modeling
 3.5 Connection-oriented  3.11 http modeling
transport: TCP
 segment structure

Jamali@iust.ac.ir ITransport Layer 3 -8


Transport Services and Protocols
 provide logical communication application
between application transport Logical end to end transport
network
processes running on data link
physical
different hosts network
data link

 transport protocols run in physical

end systems
 sending side: breaks app
messages into segments,
passes to network layer
 receiving side:
reassembles segments
into messages
messages, passes to
application layer application
transport
 more than one transport network
data link
protocol available to physical

applications.
 Internet: TCP and UDP

Jamali@iust.ac.ir ITransport Layer 3 -9


Protocol layering and data

App. Process decides to send


a message to its counterpart
Message App. Process
App. Layer adds its header,
sends the message to transport layer
Ha Message application
Transport layer breaks down
the message into several parts, transport
add its header to each part
Ht Ht Ht Ht
And makes segments.
It sends one-by-one segments
to network layer
network

Jamali@iust.ac.ir ITransport Layer 3-10


Internet Transport-
Transport-Layer Protocols
 reliable, in-order application
transport
Logical end to end
network transport
delivery (TCP) data link
physical
network
 congestion control, data link
physical
 flow control,
 connection setup.
 unreliable, unordered
delivery (UDP) network
data link
physical
 no-frills extension of
“best-effort” IP.
 services not available: application
transport
network
 delay guarantees, data link
physical

 bandwidth guarantees.

Jamali@iust.ac.ir ITransport Layer 3-11


Chapter 3 Outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP  3.8 Multimedia Stream & TCP
 3.4 Principles of reliable  3.9 TCP fairness
data transfer  3.10 TCP modeling
 3.5 Connection-oriented  3.11 http modeling
transport: TCP
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-12


Multiplexing/Demultiplexing
Multiplexing at Sending Host
gathering data from multiple sockets, enveloping
data with header (later used for demultiplexing)

application P3 P1
P1 application P2 P4 application

transport transport transport

network network network

link link link

physical physical physical


host 1 host 2 host 3
multiplexing

= process = socket
Jamali@iust.ac.ir ITransport Layer 3-13
Multiplexing/Demultiplexing
Demultiplexing at Receiving Host
delivering received segments
to correct socket

application P3 P1
P1 application P2 P4 application

transport transport transport

network network network

link link link

physical physical physical


host 1 host 2 host 3
demultiplexing

= process = socket
Jamali@iust.ac.ir ITransport Layer 3-14
How Demultiplexing Works

 host receives IP datagrams: 32 bits


 each datagram has source IP source port # dest port #
address, destination IP address
 each datagram carries 1 other header fields
transport-layer segment
 each segment has source,
destination port number application
(recall: well-known port numbers data
for specific applications).
(message)
 host uses IP addresses & port
numbers to direct segment to TCP/UDP segment format
appropriate socket.

Jamali@iust.ac.ir ITransport Layer 3-15


Connectionless Demultiplexing1

 Apps create sockets in  When host receives UDP


segment:
destination:  checks destination port
sd1 = Socket(PF-Inet,sock- number in segment,
Dgram,Ipproto_TCP);
 directs UDP segment to
Bind(sd1, Socket,Socket length); Socket (process) with that
port number,
 UDP Socket identified by  IP datagrams with different
2-tuple: source IP addresses and/or
source port numbers directed
 Dest. IP address’ to same socket (process)
(process)..
 Dest. Port number.
 It means:
Socket =
(dest IP address , dest port number)

Jamali@iust.ac.ir ITransport Layer 3-16


Connectionless Demultiplexing2
Demultiplexing2
Client Server
(running on IP address:A) (running on IP address:C)

create socket, create socket,


clientSocket = port=x, for
DatagramSocket() incoming request:
serverSocket =
DatagramSocket()
Create, address (hostid, port=x,
send datagram request
using clientSocket read request from
serverSocket

write reply to
serverSocket
read reply from specifying client
clientSocket host address,
close port number
clientSocket

Jamali@iust.ac.ir ITransport Layer 3-17


Connectionless Demultiplexing3
Two arriving UDP segments with
different source IP address or source
port number will be directed to a socket.

P2 P1 P1P3

SP: 5193 SP: 5193


DP: 4012 DP: 801
C to A C to B

SP: 4012 SP: 801


client DP: 5193
server DP: 5193 Client
IP: A A to C
IP: C B to C IP:B

server (datagram) sockets: (C, 5193)


client socket: (A, 4012)
client socket: (B, 801)

Jamali@iust.ac.ir ITransport Layer 3-18


Connection--Oriented Demultiplexing1
Connection

 TCP socket identified  Server host may support


by 4-tuple: many simultaneous TCP
 source IP address sockets:
 source port number  each socket identified by
 dest IP address its own 4-tuple
 dest port number  Example: Web servers
 receiving host uses all have different sockets
four values to direct for each connecting
segment to appropriate client
socket.  non-persistent HTTP will
have different socket for
each request.

Jamali@iust.ac.ir ITransport Layer 3-19


Client/Server Socket Interaction: TCP
Client Server
(running on IP address:A) (running on IP address:C)
create socket,
port=x
x, for
incoming request:
welcomeSocket =
ServerSocket()

create socket, wait for incoming


connect to hostid
hostid, port=x
x TCP connection request
clientSocket =
Socket() connection setup connectionSocket =
welcomeSocket.accept()

send request using


clientSocket Create and read request from
connectionSocket

write reply to
read reply from connectionSocket
clientSocket
close
close connectionSocket
clientSocket

Jamali@iust.ac.ir ITransport Layer 3-20


Sockets in Connection-
Connection-Oriented
Server
Client
(running on IP address:A) (running on IP address:C)

Client process Server process


Server IP Address
&
Port Number1
Welcoming
socket

Client Connection
socket
bytes socket

Server IP Address Client IP Address


& &
Client IP Address Port Number2 + Port Number
&
Port Number
4-tuple identifier
Jamali@iust.ac.ir ITransport Layer 3-21
Connection--Oriented Demultiplexing2
Connection

P2 P1

SP: 2549
DP: 1324
C to A
SP: 1324
client DP:
DP:2549
80 server
IP: A A
A to
to CC IP: C

connection socket (A, C, 1324, 2549)


client sockets (C, A, 2549, 1324)

Jamali@iust.ac.ir ITransport Layer 3-22


Connection--Oriented Demultiplexing3
Connection
 Server host may support many simultaneous TCP sockets, with each socket attached
to a process.
 Each socket is identified by its own 4-tuple.
 All 4 fields are used to direct (demultiplex) the segment to the appropriate socket.

P4 P5 P6 P1
P1 P2 P3

SP: 1807
DP: 2053
A to C
SP: 9157 SP: 5775
client DP: 2053 server DP: 2053 Client
IP: A A to C IP: C B to C IP:B

In contrast with UDP, two arriving TCP segments with different source IP
address or source port number will be directed to two different sockets.

Jamali@iust.ac.ir ITransport Layer 3-23


Connection--Oriented Demultiplexing4
Connection

Threaded Server

P2 P4 P1
P1 P3

SP: 1807
DP: 2053
A to C
SP: 9157 SP: 5775
client DP: 2053 server DP: 2053 Client
IP: A A to C IP: C B to C IP:B

Jamali@iust.ac.ir ITransport Layer 3-24


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP  3.8 Multimedia Stream & TCP
 3.4 Principles of reliable  3.9 TCP fairness
data transfer  3.10 TCP modeling
 3.5 Connection-oriented  3.11 http modeling
transport: TCP
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-25


UDP: User Datagram Protocol [RFC 768]

 simple Internet transport


protocol. Why is there a UDP?
 “best effort” service, UDP  no connection
segments may be: establishment (which can
 Lost, add delay).
 Delivered out of order  simple: no connection state
to app, at sender, receiver.
 connectionless:  small segment header.
 no handshaking between  no congestion control: UDP
UDP sender, receiver. can blast away as fast as
 each UDP segment desired
handled independently
of others.

Jamali@iust.ac.ir ITransport Layer 3-26


UDP Header

 often used for streaming


multimedia apps 32 bits
 loss tolerant Length, in source port # dest port #
 rate sensitive bytes of UDP length checksum
segment,
 other UDP uses including
 DNS header
Application
 SNMP
data
 reliable transfer over UDP:
add reliability at (message)
application layer
 application-specific
error recovery! UDP segment format

Jamali@iust.ac.ir ITransport Layer 3-27


UDP Checksum

Goal: detect “errors” (e.g., flipped bits) in transmitted


segment

Sender: Receiver:
 treat segment contents  compute checksum of
as sequence of 16-bit received segment
integers.  check if computed checksum
 checksum: addition (1’s equals checksum field value:
complement sum) of  NO - error detected
segment contents.
 YES - no error detected.
 sender puts checksum
value into UDP checksum
field.

Jamali@iust.ac.ir ITransport Layer 3-28


Checksum Example
Source Port #:1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
Destin. Port #:1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

Length:
1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1’s
Sum:1 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0
complement
Checksum: 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1

Note That: Source Port# + Dest. Port# + Checksum =1111111111111111

Jamali@iust.ac.ir ITransport Layer 3-29


Checksum Example-
Example-When msb is not Zero

 Note
 When adding numbers, a carryout from the most
significant bit needs to be added to the result
 Example: add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1
Sum:1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
Checksum:1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

Jamali@iust.ac.ir ITransport Layer 3-30


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of  3.9 TCP fairness
reliable data transfer  3.10 TCP modeling
 3.5 Connection-oriented  3.11 http modeling
transport: TCP
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-31


Chapter 3 outline
Important in: app., transport, link layers.

application
application

layer
layer

(a) Service model

(b) Service implementation

characteristics of unreliable channel will determine


complexity of reliable data transfer (rdt) protocol.

Jamali@iust.ac.ir ITransport Layer 3-32


Reliable Data Transfer: Getting started

rdt_send(): called from above,


deliver_data(): called by
(e.g., by app.). Passed data to
rdt to deliver data to upper
deliver to receiver upper layer

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet


to transfer packet over
arrives on rcv-side of channel
unreliable channel to receiver

Jamali@iust.ac.ir ITransport Layer 3-33


Reliable Data Transfer: getting started
We will:
 incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
 consider only unidirectional data transfer
 but control info will flow on both directions!
 use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state
state uniquely state
1 event 2
determined by
actions
next event

Jamali@iust.ac.ir ITransport Layer 3-34


Reliable Data Transfer over Unreliable Channel
 rdt1.0: underlying channel perfectly reliable.
 rdt2.0: underlying channel may flip bits in packet. ACK/NAK
+ Stop&Wait.
 rdt2.1: What happens if ACK/NAK corrupted?

 Sender handles defected ACK/NAKs.

 rdt2.2: a NAK-free protocol.

 Instead of NAK, receiver sends ACK for last pkt


received OK.
 Duplicate ACK at sender results in same action as NAK:
retransmit current pkt.
 rdt3.0: Channels with errors and loss (Timer).
 Stop&Wait: Performance is low.
 Pipelining increase the performance: Go-Back-N,
Selective Repeat.

Jamali@iust.ac.ir ITransport Layer 3-35


rdt1.0: a protocol for a completely reliable channel
 underlying channel perfectly reliable
 no bit errors no loss of packets
 separate FSMs for sender, receiver:
 sender sends data into underlying channel
 receiver read data from underlying channel

sender receiver
Wait for rdt_send(data) Wait for rdt_rcv(rcvpkt)
call from sndpkt = make_pkt(data) call from extract (rcvpkt,data)
above udt_send(sndpkt) below deliver_data(data)

rdt_send(data) event, creates a packet rdt_rcv(rcvpkt) event, removes the data from
containing the data via the action the packet via the action extract(rcvpkt, data)
make_pkt(data) and sends the packet and passes the data up to upper layer via the
via the action udt_send(packet). action deliver_data(data).

Jamali@iust.ac.ir ITransport Layer 3-36


rdt22.0: channel with bit errors
rdt
 underlying channel may flip bits in packet
 recall: UDP checksum to detect bit errors.
 the question: how to recover from errors:
 acknowledgements (ACKs): receiver explicitly tells sender that
pkt received OK.
 negative acknowledgements (NAKs): receiver explicitly tells
sender that pkt had errors (Please repeat that.)
 sender retransmits pkt on receipt of NAK.
 new mechanisms in rdt2.0 (beyond rdt1.0):
 error detection.
 receiver feedback: Reciever sends control message (ACK,NAK)
to sender.
 raliable data transfer based the retransmission is known as:
ARQ (Automatic Repeat reQuest).

Jamali@iust.ac.ir ITransport Layer 3-37


rdt2.0: FSM specification (Stop&Wait)

rdt_send(data
data)
sndpkt = make_pkt(data
data, checksum) NACK packet is received.
udt_send(sndpkt)

rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&


Wait for Wait for isNAK(rcvpkt) corrupt(rcvpkt)
call from ACK or
udt_send(sndpkt) udt_send(NAK)
above NAK

Wait for
rdt_rcv(rcvpkt) && isACK(rcvpkt)
call from Receiver
Wait for call from above below (one state)
ACK packet is received.
rdt_rcv(rcvpkt) &&
Sender notcorrupt(rcvpkt)
(two states) extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Jamali@iust.ac.ir ITransport Layer 3-38


rdt2.0: operation with no errors
rdt_send(data
data)
sndpkt = make_pkt(data
data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt)
corrupt(rcvpkt)
above NAK udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
Wait for call form above call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data
data)
deliver_data(data
data)
udt_send(ACK)

Jamali@iust.ac.ir ITransport Layer 3-39


rdt2.0: error scenario
rdt_send(data
data)
sndpkt = make_pkt(data
data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt)
corrupt(rcvpkt)
above NAK udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
Wait for call from above call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data
data)
deliver_data(data
data)
udt_send(ACK)

Jamali@iust.ac.ir ITransport Layer 3-40


rdt2.0 has a fatal defect: ACK/NAK Corruption

What happens if Handling duplicates:


ACK/NAK corrupted?  sender adds sequence
 sender doesn’t know what number to each pkt
happened at receiver!  sender retransmits current
 can’t just retransmit: pkt if ACK/NAK is recieved
possible duplicate  receiver discards (doesn’t
deliver up) duplicate pkt
What to do?
 sender ACKs/NAKs
stop and wait
receiver’s ACK/NAK? What Sender sends one packet,
if sender ACK/NAK lost? then waits for receiver
 retransmit, but this might response
cause retransmission of
correctly received pkt!

Jamali@iust.ac.ir ITransport Layer 3-41


DA1
rdt2.1: Sender handles defected ACK/NAKs.

rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt) Seq. no=0
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
Wait for call 0
from above Wait for call 1
Wait for Wait for from above
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 Seq. no=1
above
( corrupt(rcvpkt) ||
rdt_send(data)
isNAK(rcvpkt) )
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
udt_send(sndpkt)

Jamali@iust.ac.ir ITransport Layer 3-42


Slide 42

DA1 Event ( corrupt(rcvpkt) || isNAK(rcvpkt) )

means:
the pakcet recieved by SENDER is corrupted
or the packet recieved by SENDER is a NAK packet.
Dr. Analoui, 11/22/2002
rdt2.1: Receiver handles defected ACK/NAKs.

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)


&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below notcorrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

Jamali@iust.ac.ir ITransport Layer 3-43


rdt2.1: discussion

Sender: Receiver:
 seq # added to pkt  must check if received
 two seq. #’s (0,1) will packet is duplicate.
suffice. Why?  state indicates whether
0 or 1 is expected pkt
 must check if received seq #.
ACK/NAK corrupted.  note: receiver can not
 twice as many states. know if its last
 state must “remember” ACK/NAK received OK
whether “current” pkt at sender.
has 0 or 1 seq. #

Jamali@iust.ac.ir ITransport Layer 3-44


rdt2.2: a NAK-
NAK-free protocol

 same functionality as rdt2.1, using ACKs


only
 instead of NAK, receiver sends ACK for
last pkt received OK
 receiver must explicitly include seq # of pkt
being ACKed
 duplicate ACK at sender results in same
action as NAK: retransmit current pkt

Jamali@iust.ac.ir ITransport Layer 3-45


rdt2.2: sender, receiver fragments

rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt)
( corrupt(rcvpkt) || && notcorrupt(rcvpkt)
has_seq1(rcvpkt) ) && isACK(rcvpkt,0)
Wait for …
udt_send(sndpkt) Wait for receiver
0 from
below

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)


&& has_seq1(rcvpkt)
extract(rcvpkt,data); deliver_data(data)
sndpkt = make_pkt(ACK1, 1 chksum)
udt_send(sndpkt)
Jamali@iust.ac.ir ITransport Layer 3-46
rdt3.0: Channels with errors and loss (Timer)

New assumption: Approach: sender waits


underlying channel can “reasonable” amount of
also lose packets (data time for ACK
or ACKs)  retransmits if no ACK
 checksum, seq. #, ACKs, received in this time
retransmissions will be  if pkt (or ACK) just delayed
of help, but not enough (not lost):
Q: how to deal with loss?  retransmission will be
duplicate, but use of seq.
 sender waits until
#’s already handles this
certain data or ACK
lost, then retransmits  receiver must specify seq
# of pkt being ACKed
 Timer drawbacks?
 requires countdown timer

Jamali@iust.ac.ir ITransport Layer 3-47


rdt3.0: Sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer Wait for ACK0
Wait for call 0 from above Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) Wait for call 1 from above
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer
Wait for ACK1

Jamali@iust.ac.ir ITransport Layer 3-48


rdt3.0 in action

(a) operation with no loss


(b) lost packet

Jamali@iust.ac.ir ITransport Layer 3-49


rdt3.0 in action

(c) lost ACK (d) premature time

Jamali@iust.ac.ir ITransport Layer 3-50


Performance of Stop & Wait (rdt3.0)

 rdt3.0 works, but performance stinks


 example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet:

T = L (packet length in bits) 8kb/pkt = 8 µsec


=
transmit R (transmission rate, bps) 9
10 b/sec

U L/R 0.008 ms
sender = = = 0.00027
RTT + L / R 15ms + 15ms + 0.008 ms

 U sender: utilization – fraction of time sender busy sending


 1KB pkt every 30 msec = 33kB/sec throughput over 1 Gbps link
 network protocol limits use of physical resources!

Jamali@iust.ac.ir ITransport Layer 3-51


rdt3.0: stop-
stop-and-
and-wait operation

sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK

ACK arrives, send next


packet, t = RTT + L / R

L/R 0.008
U
sender = = 30.008
= 0.00027
RTT + L / R

Jamali@iust.ac.ir ITransport Layer 3-52


Pipelined protocols

Pipelining: sender allows multiple, “in-flight”, yet-to-


be-acknowledged pkts
 range of sequence numbers must be increased
 buffering at sender and/or receiver

data packet data packet

ACK packet ACK packet


(a) A stop-and-wait (b) A pipelined protocol in operation
protocol in operation

 Two generic forms of pipelined protocols: go-Back-N,


selective repeat

Jamali@iust.ac.ir ITransport Layer 3-53


Pipelining: Increasing Utilization

sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R

Increase utilization
by a factor of 3!
3*L/R .024
U = = = 0.0008
sender 30.008
RTT + L / R microsecon

Jamali@iust.ac.ir ITransport Layer 3-54


Go--Back-
Go Back-N
Sender:
 k-bit seq # in pkt header
 “window” of up to N, consecutive unack’ed pkts allowed

send_base nextseqnum already usable,


ACK’d not yet sent
1 234567891 1 1 1 1 1 1 1 1 1 22 2 2 2 2 2 2 2 23 sent, not not usable
0 1 234567890 1 2 3 4 5 6 7 8 90 yet ACK’d

window size
A
N C
K
6

 ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”


 may deceive duplicate ACKs (see receiver)

 timer for each in-flight pkt


 timeout(n): retransmit pkt n and all higher seq # pkts in window

Jamali@iust.ac.ir ITransport Layer 3-55


GBN: Sender extended FSM
2 rdt_send(data)
base
Timer can be thought of if (nextseqnum < base+N) {
as a timer for the oldest sndpkt[nextseqnum] =
transmitted--but-
transmitted but-not-
not-yet
yet-- make_pkt(nextseqnum,data,chksum)
acknowledged packet. udt_send(sndpkt[nextseqnum])
if (nextseqnum == base+N) start_timer
nextseqnum++
nextseqnum
}
1 start else
refuse_data(data)
base=1 base+N
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-1])
wait
rdt_rcv(rcvpkt) && If a timeout occurs, the sender resends all
If an ACK is received but packets that have been previously sent but that
there are still additional
notcorrupt(rcvpkt)
have not yet been acknowledged.
transmitted--but-
transmitted but-yet
yet--to-
to- base = getacknum(rcvpkt)+1
be-
be-acknowledged packets, If (base == nextseqnum)
the timer is restarted
stop_timer
else
start_timer
Jamali@iust.ac.ir ITransport Layer 3-56
GBN: Receiver extended FSM

default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
start && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=1 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++

ACK-only: always send ACK for correctly-received pkt


with highest in-order seq #
 may generate duplicate ACKs
 need only remember expectedseqnum
 out-of-order pkt:
 discard (don’t buffer) -> no receiver buffering!
 Re-ACK pkt with highest in-order seq #

Jamali@iust.ac.ir ITransport Layer 3-57


GBN in action

go--back N

time

Jamali@iust.ac.ir ITransport Layer 3-58


Selective Repeat

 receiver individually acknowledges all correctly


received pkts
 buffers pkts, as needed, for eventual in-order delivery
to upper layer
 sender only resends pkts for which ACK not
received
 sender timer for each unACKed pkt
 sender window
 N consecutive seq #’s
 again limits seq #s of sent, unACKed pkts

Jamali@iust.ac.ir ITransport Layer 3-59


Selective repeat: sender, receiver windows

Jamali@iust.ac.ir ITransport Layer 3-60


Selective repeat

sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-1]
 if next available seq # in  send ACK(n)
window, send pkt  out-of-order: buffer
timeout(n):  in-order: deliver (also
 resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
next not-yet-received pkt
ACK(n) in [sendbase,sendbase+N]:
 mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
 if n smallest unACKed pkt,
 ACK(n)
advance window base to
next unACKed seq # otherwise:
 ignore

Jamali@iust.ac.ir ITransport Layer 3-61


Selective repeat in action

Jamali@iust.ac.ir ITransport Layer 3-62


Selective repeat: dilemma

Example:
 seq #’s: 0, 1, 2, 3
 window size=3

 receiver sees no
difference in two
scenarios!
 incorrectly passes
duplicate data as new
in (a)

Q: what relationship
between seq # size
and window size?
Jamali@iust.ac.ir ITransport Layer 3-63
Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.10 TCP modeling
 3.5 Connection-
 3.11 http modeling
oriented transport:
TCP
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-64


TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

 point-to-point:  full duplex data:


 one sender, one receiver  bi-directional data flow in same
connection
 Reliable:  MSS: maximum segment size
 guaranteed arrival  connection-oriented:
 no error
 handshaking (exchange of control
 in order delivery msgs) init’s sender, receiver state
 in-order byte stream: before data exchange
 no “message boundaries”  flow controlled:
 pipelined:  sender will not overwhelm receiver
 TCP congestion and flow control set  no delay or bandwidth guarantee
window size
 send & receive buffers
Process Process
writes data reads data

Socket Socket
segment segment

TCP TCP
send receive
buffer buffer

Jamali@iust.ac.ir ITransport Layer 3-65


TCP Reliable Data Transfer

 TCP provides reliable data transfer service on top


of IP’s unreliable service,
 Cumulative ACKs,
 Single retransmission timer,
 When the receiver receives out-of-order,
segments, it buffers them and re-ACK the last in-
order data,
 The sender retransmits at timeout or receiving
duplicate ACKs,
 Somewhere between Go-back-N and Selective
Repeat.

Jamali@iust.ac.ir ITransport Layer 3-66


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-67


TCP Segment Structure

ACK: ACK #valid seq # is byte-stream number of


first data byte in segment
URG: urgent data (generally not used)

Header Length
[4Bytes] # bytes
rcvr willing
to accept

TCP checksum
(as in UDP)

PSH: push data now


(generally not used)

RST, SYN, FIN: Maximum Segment Size, window scaling factor,


connection estab Time-stamping, maximum segment length,
(setup, teardown RFCs: 854, 1323
commands)

Jamali@iust.ac.ir ITransport Layer 3-68


TCP Segment Structure
32 bits
URG: urgent data seq # is byte-
(generally not used) source port # dest port #
stream number
sequence number of first data
ACK: ACK #
[4Bytes] valid acknowledgement number byte in segment
head not
len used
UA P R S F Receive window
PSH: push data now # bytes
(generally not used) checksum Urg data pnter rcvr willing
to accept
RST, SYN, FIN: Options (variable length)
connection estab
(setup, teardown
commands)
application
data Maximum Segment Size,
TCP checksum window scaling factor,
(variable length) Time-stamping,
(as in UDP)
maximum segment length,

RFCs: 854, 1323

Jamali@iust.ac.ir ITransport Layer 3-69


TCP Segment Structure (con.)

data

application transport
1 2 … 1001 … 2001 … 3001 … 4001 … 5001… Byte
rdt_send(data)

data
Seq=001

6000 B
Seq=1001

Seq=2001

Seq=3001

Seq=4001
(a) 6000 Byte data

Seq=5001
passed to TCP
TCP Header
(b) Data is broken into 6 1000-Byte-segments.
Jamali@iust.ac.ir ITransport Layer 3-70
TCP seq#s and Ack#s

Seq. #’s:
Host A Host B
 byte stream
“number” of first User
types
byte in segment’s ‘C’
data host ACKs
receipt of
ACKs: ‘C’, echoes
 seq # of next byte back ‘C’
expected from
other side host ACKs
 cumulative ACK receipt
of echoed
Q: how receiver handles ‘C’
out-of-order segments
 A: TCP spec doesn’t
say, it is up to
simple telnet scenario time
implementor.

Jamali@iust.ac.ir ITransport Layer 3-71


TCP Round Trip Time and Timeout

TCP uses a timeout/retransmit mechanism to recover from lost segment.

Q: how to set TCP Q: how to estimate RTT?


timeout value?  SampleRTT: measured time

 longer than RTT from segment transmission


 but RTT varies until ACK receipt
 ignore retransmissions
 too short: premature
timeout  SampleRTT will vary, want

 unnecessary
estimated RTT “smoother”
retransmissions  average several recent

 too long: slow reaction measurements, not just


to segment loss current SampleRTT

Jamali@iust.ac.ir ITransport Layer 3-72


How to estimate max RTT?

 SampleRTT = propagation + queuing delay


 Queuing delay highly variable,
 So, different samples of RTTs will give
different random values of queuing delay.
 Chebyshev’s Theorem:
 MaxRTT = AveragegRTT + k*DevRTT
 Error probability is less than 1/(k**2)
 Result true for ANY distribution of samples.
 In TCP:
 RetransmotionTimeOut =AverageRTT+
4*DevRTT

Jamali@iust.ac.ir ITransport Layer 3-73


Request for Comments: 2988, Nov 2000

 Until a round-trip time (RTT) measurement has


been made for a segment sent between the sender
and receiver, the sender should set
 RTO  3 secs.

 When the first RTT measurement SampleRTT is


made, the host must set
1. StimatedRTT  SampleRTT
2. DevRTT  SampleRTT/2
3. RTO  StimatedRTT + max (G, 4×DevRTT).

 Experience has shown that finer clock granularities (G  100 msec)


perform somewhat better than more coarse granularities.

Jamali@iust.ac.ir ITransport Layer 3-74


Request for Comments: 2988, Nov 2000

 When a subsequent RTT measurement SampleRTT' is made,


a host must set
4. DevRTT(1 - β)×DevRTT + β×|StimatedRTT-SampleRTT'|
5. StimatedRTT  (1 - α) × StimatedRTT + α × SampleRTT‘
6. RTO  StimatedRTT + max (G, 4 × DevRTT)
 The value of StimatedRTT used in the update to DevRTT is its
value before updating StimatedRTT itself using the second
assignment.

 Whenever RTO is computed, if it is less than 1 second then the RTO


should be rounded up to 1 second.
 The above should be computed using α =1/8 and β =1/4.

Jamali@iust.ac.ir ITransport Layer 3-75


Example RTT estimation

350
EstimatedRTT
SampleRTT
300
RTT(milisec)

250

200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

n=time (seconds)

Jamali@iust.ac.ir ITransport Layer 3-76


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer
 flow control
services
 connection management
 3.2 Multiplexing and
 3.6 Principles of congestion
demultiplexing
control
 3.3 Connectionless
 3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-77


TCP reliable data transfer

 TCP creates rdt  Retransmissions are


service on top of IP’s triggered by:
unreliable service  timeout events
 Pipelined segments  duplicate acks
 Cumulative acks  Initially consider
 TCP uses single simplified TCP sender:
ignore duplicate acks
retransmission timer 

 ignore flow control,


 TCP is a GBN style congestion control
protocol. RFC 2018
propose a Selective
Repeat style for TCP.

Jamali@iust.ac.ir ITransport Layer 3-78


TCP sender events:
data rcvd from app: timeout:
 Create segment with  retransmit segment
seq # that caused timeout
 seq # is byte-stream  restart timer
number of first data Ack rcvd:
byte in segment
 If acknowledges
 start timer if not previously unacked
already running (think segments
of timer as for oldest  update what is known to
unacked segment) be acked
 expiration interval:  start timer if there are
RTO outstanding segments
[TimeOutInterval]

Jamali@iust.ac.ir ITransport Layer 3-79


TCP Sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum

loop (forever) {
switch(event)

event: data received from application above


create TCP segment with sequence number NextSeqNum
if (timer currently not running) start timer Comment:
pass segment to IP • SendBase-1: last
NextSeqNum = NextSeqNum + length(data)
cumulatively
event: timer timeout ack’ed byte
retransmit not-yet-acknowledged segment with Example:
smallest sequence number • SendBase-1 = 71;
start timer y= 73, so the rcvr
wants 73+ ;
event: ACK received, with ACK field value of y y > SendBase, so
if (y > SendBase) { that new data is
SendBase = y
acked
if (there are currently not-yet-acknowledged segments)
start timer
}
} /* end of loop forever */
Jamali@iust.ac.ir ITransport Layer 3-80
TCP: retransmission schemes

1,2,3………….……… .., 1000[B]

A data segment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122 2324 25 26

0001 4001 TimeOut


12001
12001 18001 20001

Ack=20001
Ack=12001
Ack=4001
Ack=18001
Ack=25001
Seq=1 Host A Win = 4000 [B]
Win=10000
Win=5000
Win=7000
Host B
Win=000

ACKs from B is not detailed Win= RcvWindow

Jamali@iust.ac.ir ITransport Layer 3-81


TCP ACK generation [RFC 1122, RFC 2581]*

Event at Receiver TCP Receiver action


Arrival of in-order segment with Delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed. send ACK.

Arrival of in-order segment with Immediately send single cumulative


expected seq #. One other ACK, ACKing both in-order segments.
segment has ACK pending.
Arrival of out-of-order segment Immediately send duplicate ACK,
higher-than-expect seq. # . indicating seq. # of next expected byte.
Gap detected.
Arrival of segment that Immediate send ACK, provided that
partially or completely fills gap. segment starts at lower end of gap.

Jamali@iust.ac.ir ITransport Layer 3-82


TCP retransmission (Normal ACK)*
Host A Host B

SendBase= 1

Win1

Seq=1 ( RTO)
T<500ms

T<500ms
SendBase= 4001

Win2
T<500ms

time

Jamali@iust.ac.ir ITransport Layer 3-83


TCP retransmission (Lost ACK)*

Host A Host B

SendBase= 1

Win1
T<500ms

Seq=1 (RTO)
T<500ms

SendBase= 1

Win1
SendBase= 4001
Seq=1 (RTO)

T<500ms
Win3

time

Jamali@iust.ac.ir ITransport Layer 3-84


TCP retransmission (Premature ACK)*

Host A Host B

Win1
T<500ms

Seq=1 (RTO)
T<500ms

SendBase= 1

Win1

SendBase= 4001 T<500ms


Win3

time

Jamali@iust.ac.ir ITransport Layer 3-85


TCP ACK Generation (Illustrated)*
Host A Host B

Win
Seq=2001
All data up to seq=3001
Ack=3001
are ACKed.
Seq=3001

Win Seq=4001 T<500ms All data up to seq=5001


Ack=5001
are ACKed.

Seq=8001 is expected
Seq=12001
gap is detected
Seq=9001 2Acks=8001
Sender keeps gap is detected
transmission 2Acks=8001
based on Seq=8001
Win&SendBase Received segment
starts at lower end of
Acks=9001 the gap.
Seq=8001
Immediate send ACK,
Time

Jamali@iust.ac.ir ITransport Layer 3-86


Duplicate ACKs

 TCP receiver sends an immediate ACK if it receives


an out-of-order segment.
 This is a duplicate ACK.

 This dupe ACK informs the sender and tells it what


sequence number the receiver expected.
 Its unclear whether dupe ACKs indicate loss or
simply packet re-ordering on the network.
 But, multiple duplicate ACKs probably indicate loss.

Jamali@iust.ac.ir ITransport Layer 3-87


Fast Retransmit

 Time-out period often  If sender receives 3


relatively long: ACKs for the same
 long delay before data, it supposes that
resending lost packet segment after ACKed
 Detect lost segments data was lost:
via duplicate ACKs.  fast retransmit: resend
 Sender often sends segment before timer
many segments back-to- expires
back
 If segment is lost,
there will likely be many
duplicate ACKs.

Jamali@iust.ac.ir ITransport Layer 3-88


Fast Retransmit (Illustrated)

Host A Host B

Seq=1001 (RTO)

1
2
3

Resend segment before


Three “ACKs=1001” means that timer expires. time
the segment “Seq=1001” is lost.

Jamali@iust.ac.ir ITransport Layer 3-89


Fast retransmit algorithm:

event: ACK received, with ACK field value of y


if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of ACKs received for y is 3) {
resend segment with sequence number y
}

a duplicate ACK for fast retransmit


already ACKed segment

Jamali@iust.ac.ir ITransport Layer 3-90


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer
 flow control
services
 connection management
 3.2 Multiplexing and
 3.6 Principles of congestion
demultiplexing
control
 3.3 Connectionless
 3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-91


TCP Flow Control

 receive side of TCP


connection has a receive  Sender won’t overflow
buffer: RcvBuffer receiver’s buffer by
data to app proc. transmitting too much,
too fast.
TCP  speed-matching
data RcvBuffer service: matching the
in buffer send rate to the
receiving app’s drain
RcvWindow

spare rate.
buffer
 App process may be
slow at reading from
buffer.
data from IP

Jamali@iust.ac.ir ITransport Layer 3-92


TCP Flow Control: how it works

data to app proc.  Rcvr advertises spare


room by including value
TCP of RcvWindow in
data segments

RcvBuffer
in buffer
RcvWindow

spare  Sender limits unACKed


buffer data to RcvWindow
 guarantees receive
buffer doesn’t overflow
data from IP

(Suppose TCP receiver discards out-of-order segments)


spare room in buffer = RcvWindow = RcvBuffer - [LastByteRcvd – LastByteRead]

Jamali@iust.ac.ir ITransport Layer 3-93


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and
 3.6 Principles of congestion
demultiplexing
control
 3.3 Connectionless
 3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-94


TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection”
before exchanging data Step 1: client host sends TCP
segments SYN segment to server
 initialize TCP variables:  specifies initial seq #

 seq. #s  no data

 buffers, flow control Step 2: server host receives


info (e.g. RcvWindow) SYN, replies with SYNACK
 client: connection initiator segment
Socket clientSocket = new
 server allocates buffers
Socket("hostname","port
 specifies server initial
number");
seq. #
 server: contacted by client
Socket connectionSocket =
Step 3: client receives SYNACK,
welcomeSocket.accept(); replies with ACK segment,
which may contain data

Jamali@iust.ac.ir ITransport Layer 3-95


TCP Connection Management (cont.)

Three way handshake:

client server

connection request

connection accepted

connection ack.

Win= RcvWindow
ISN= Initial Sequence Number

time time

Jamali@iust.ac.ir ITransport Layer 3-96


TCP Connection Management (cont.)

Closing a connection:
client server
Either of the two processes

FIN=1 means: no more data from sender


participating in a TCP
connection can end the close
connection.

close
Example
client closes socket:

30, 60 or 120 sec.


timed wait
ClientSocket.close();

Step 1: client end system


sends TCP FIN control
time
segment to server. closed

Jamali@iust.ac.ir ITransport Layer 3-97


Final ACK loss

 If clients final ACK is lost client server


the server resends ACK
and FIN.
close
 If the resented ACK and
FIN reaches client before
timed wait, client resends
its final ACK and waits close
again.
 After timed out all
resources on client side

30, 60 or 120 sec.


timed wait
released (including port
numbers).

The time is implementation-dependent. time

Jamali@iust.ac.ir ITransport Layer 3-98


TCP Connection Management (cont.)

Step 2: server receives FIN, replies


with ACK. Closes connection, sends FIN.

Step 3: client receives FIN, replies with ACK.


 Enters “timed wait” - will respond with ACK to received FINs

Step 4: server, receives ACK. Connection closed.

Note: with small modification, can handle


simultaneous FINs.

Jamali@iust.ac.ir ITransport Layer 3-99


TCP Connection Management (cont)

TCP server
lifecycle

TCP client
lifecycle

Jamali@iust.ac.ir ITransport Layer 3-100


State Transition Diagram
(start)
Normal path for a client EVENT/ACTION
CLOSED
Normal path for a server
Unusual event (Step 1 of the 3-way handshake)
Open/- Close/- Connect/SYN
(Passive open) Close/- (Active open)

LISTEN
(Step 2 of the 3-way handshake)
SYN/SYN + ACK
RST/- Send/SYN
SYN_RCVD SYN/SYN + ACK SYN_SENT
ACK/- SYN + ACK/ACK
(Step 3 of the 3-way handshake)

Close/FIN
ESTABLISHED

(Passive close)
Close/FIN FIN/ACK
FIN_WAIT_1
FIN_WAIT_1 CLOSE_WAIT
(Active close)

FIN/ACK
ACK/- Close/FIN

FIN_WAIT_2 CLOSING LAST_ACK

ACK/- ACK/-
FIN/ACK Timeout/-
TIME_WAIT timed wait
CLOSED (back to start)
30,60.120 sec

Jamali@iust.ac.ir ITransport Layer 3-101


TCP State CLOSED There are several things that must
be remembered about a
connection. To store this
passive open, applic. information we imagine that there
diagram create TCB close
is a data structure called a
Transmission Control Block (TCB).

LISTEN

applic. close
or timeout,
SYN_RCVD receive SYN, SYN_SENT delete TCB
send ACK

applic.
close, ESTABLISHED
send
FIN
CLOSE_WAIT

FIN_WAIT_1 CLOSING

LAST_ACK

TIME_WAIT 2MSL timeout


FIN_WAIT_2
delete TCB
Jamali@iust.ac.ir ITransport Layer 3-102
Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and
demultiplexing  3.6 Principles of
 3.3 Connectionless congestion control
transport: UDP  3.7 TCP congestion control
 3.4 Principles of reliable  3.8 Multimedia Stream & TCP
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-103


Principles of Congestion Control
source1

sink1
100Mbps 100Mbs
bottleneck source2
sink2
1.5Mbps 10Mbps
100Mbps

sink3 10Mbps 100Mbps source3

 Congestion: “too many sources sending too much data too


fast for network to handle and competing for bottleneck
bandwidth”
 Two common approaches:
 rate-based: control rate of traffic (e.g., token bucket)

 window-based: limit number of unacknowledged packets

 window size controls rate,


 Flow control = prevents end-system buffer overflow
 window-based control can be used for both.

Jamali@iust.ac.ir ITransport Layer 3-104


Congestion: A Close-
Close-up View

packet
knee cliff loss
 knee – point after which

Throughput
 throughput increases very slowly
 delay increases fast
congestion
 cliff – point after which collapse
 throughput starts to decrease
very fast to zero (congestion
collapse) Offered
Load
 delay approaches infinity

Delay
 Note (in an M/M/1 queue)
 delay = 1/(1 – utilization)

Offered
Load

Jamali@iust.ac.ir ITransport Layer 3-105


Congestion Control vs. Congestion Avoidance

 Congestion control goal knee cliff


packet
loss
 Stay left of cliff

Throughput
 Keeps network operating
at full capacity, but congestion
collapse
minimizes packet loss
maximize “goodput”
Offered
 Congestion avoidance goal Load
 stay left of knee
 Right of cliff:
 Congestion collapse

Jamali@iust.ac.ir ITransport Layer 3-106


Congestion Collapse
 Definition: Increase in network load results in decrease
of useful work done
 Many possible causes
 Spurious retransmissions of packets still in flight
 Undelivered packets
 Packets consume resources and are dropped elsewhere in network
 Fragments
 Mismatch of transmission and retransmission units
 Control traffic
 Large percentage of traffic is for control
 Stale or unwanted packets
 Packets that are delayed on long queues

Jamali@iust.ac.ir ITransport Layer 3-107


Causes/Costs of Congestion: scenario 1
Host A
λin : original data λout
rate [B/s]

unlimited shared
Host B output link buffers

Shared link
R[B/s]

 Applications in A and B sending into connection at an average rate


of λin bytes/sec.
 “Original" data: sent into the socket only once.

 Simple transport protocol: no error recovery (retransmission), flow


control, or congestion control.

Jamali@iust.ac.ir ITransport Layer 3-108


Causes/Costs of Congestion: scenario 1(cont)

λout = λin λout = R/2


[Byte/s] [Byte/s]
λout (throughput)

Delay (ms)
R/2

[Byte/s] [Byte/s]
R/2 R/2
λin(offered load) λin(offered load)
(a) Per-connection throughput. (b) Per-connection delay.

Jamali@iust.ac.ir ITransport Layer 3-109


Scenario 2: Two Sender, a Router with Finite Buffers
 one router, finite buffers
 sender retransmission of lost packet

Host A λ in : original data λ out


λ‘ in : original data, plus
retransmitted data
(offered load to network)
Host B

finite shared output


link buffers

Jamali@iust.ac.ir ITransport Layer 3-110


Scenario 2: (cont)

 Each connection is reliable. If a packet containing a


transport-level segment is dropped at the router, it
will eventually be retransmitted by the sender.

 λin [Bytes/sec] = rate at which the application


sends original data into the socket.

 λ’in [Bytes/sec] = offered load to the network


(containing original data or retransmitted data).

Jamali@iust.ac.ir ITransport Layer 3-111


Scenario 2: retransmission due to lost packet (perfect retransmission)
[Byte/s]

λout (throughput)
R/2

R/3
R/4

[Byte/s]
R/2
λ’ in
(offered load)

Example: At λ’in = R/2 --> λout = R/3

λ’in = R/2 = 0.333R Bytes/sec (on average) original data +


0.167R Bytes/sec (on average) retransmitted data.

 Cost of a congested network: the sender must perform


retransmissions in order to compensate for dropped (lost)
packets due to buffer overflow.

Jamali@iust.ac.ir ITransport Layer 3-112


Scenario 2: retransmission due to delayed (not lost) packet

[Byte/s]

R/2

λout (throughput)
R/3

R/4

[Byte/s]
R/2
λ’in(offered load)

retransmission due to delayed (not lost) packet. Each packet is


assumed to be forwarded (on average) twice by the router.

Jamali@iust.ac.ir ITransport Layer 3-113


Scenario 2: retransmission due to delayed (not lost) packet

 Sender timeouts and retransmit a packet that has been


delayed in the queue, but not yet lost.

 Both the original data packet and the retransmission may


reach the receiver.
The receiver will discard the retransmission.

 The "work" done by the router in forwarding the


retransmitted copy of the original packet was "wasted" as
the receiver will have already received the original copy of
this packet.

 Cost of a congested network: unneeded retransmissions by


the sender in the face of large delays may cause a router to
use its link bandwidth to forward unneeded copies of a
packet.

Jamali@iust.ac.ir ITransport Layer 3-114


Causes/Costs of Congestion: scenario 3
Suppose:
 Each host uses a timeout/ retransmission mechanism
 All hosts have the same value of λin, and
 All router links have capacity R Bytes/sec.

 A-C:R1,R2; Host A λin : original data Host B


C-A:R3,R4; λout
 λ'in : original data, plus
 D-B:R3,R2; retransmitted data
 B-D:R1,R4
 .. R1
finite shared output
link buffers
Host D
R4 Host C
R2

R3

Jamali@iust.ac.ir ITransport Layer 3-115


Causes/Costs of Congestion: scenario 3(Cont)

 Extremely small λin: buffer overflows are rare, and


λout=λ’in
 Larger λin: , the overflows are still rare. Thus, an
increase in λin results in an increase in λout .

Jamali@iust.ac.ir ITransport Layer 3-116


Causes/Costs of Congestion: scenario 3(Cont)
 Extremely large λin (and hence λ‘in): A-C traffic arriving to
R2 is at most R.
 If λ‘in is extremely large for all connections, then the
arrival rate of B-D traffic at R2 can be much larger than
that of the A-C traffic.
 As the offered load approaches infinity, an empty buffer at
R2 is immediately filled by a B-D packet, and the throughput
of the A-C connection at R2 goes to zero.

 When packet dropped (in R2), any “upstream” (R1)


transmission capacity used for that packet was wasted!
 As the offered load approaches infinity, the throughput
goes to zero.

Jamali@iust.ac.ir ITransport Layer 3-117


Scenario 3 (cont)
[Byte/s] Congestive Collapse: although the
R/2
network links are being heavily utilized,
very little useful work is being done.
(throughput)

Host A
Host B

Host D R1
λout

R4
λout
R/2 R2
[Byte/s]
λ’in (offered load)
Host C

A-C and B-D traffic compete at router R2 for the buffer, A-C traffic
that successfully gets through R2 becomes smaller and smaller as the
offered load from B-D gets larger and larger.

Jamali@iust.ac.ir ITransport Layer 3-118


Approaches towards congestion control

Two broad approaches towards congestion control:

end-to-end congestion network-assisted congestion


control: control:
 routers provide feedback
 no explicit feedback
to end systems
from network  single bit indicating
 congestion inferred congestion (SNA,
from end-system DECnet, TCP/IP ECN,
ATM)
observed loss, delay
 explicit rate sender
 approach taken by TCP should send at

Jamali@iust.ac.ir ITransport Layer 3-119


Congestion Control

Throughput
Controlled

Uncontrolled

Offered load

Lack of congestion control

Jamali@iust.ac.ir ITransport Layer 3-120


Load, delay and power

Typical behavior of queueing A simple metric of how well the


systems : network is performing:

Load
Power =
Delay
Average Packet delay

Power

offered load Load


“optimal
load”

Jamali@iust.ac.ir ITransport Layer 3-121


Congestion Avoidance
 Drops are the only widely used indicator of
congestion
 TCP - drops and retransmissions

»Congestion collapse
»TCP’s congestion avoidance (Jacobson)

Congestion Collapse Congestion Avoidance

Kbytes/sec
Load
Kbytes/sec

Load

Goodput
Goodput

Time Time

Jamali@iust.ac.ir ITransport Layer 3-122


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless
transport: UDP  3.7 TCP congestion
 3.4 Principles of reliable control
data transfer  3.8 Multimedia Stream & TCP
 3.5 Connection-oriented  3.9 TCP fairness
transport: TCP  3.10 TCP modeling
 segment structure  3.11 http modeling

Jamali@iust.ac.ir ITransport Layer 3-123


TCP in action

TCP AQM Drop!!!


ACK…
ACK…

TCP
Queue

Sink
Inbound Link Router Outbound Link

Sink
ACK…
Congestion Notification…

Jamali@iust.ac.ir ITransport Layer 3-124


FLAVORS OF TCP

 When the congestion control is concerned


there TCP flavors:

 TCP – Tahoe Transaction TCP


 TCP – Reno (T/TCP) rfc1644
 TCP – New Reno
 TCP – SACK
 TCP – Vegas

Jamali@iust.ac.ir ITransport Layer 3-125


T/TCP

 Problem: TCP 3Way-Handshake is expensive for very short


connections
 (like RPC or web requests)
 Approach: Transaction TCP
 send SYN+ACK+data in first packet
 reply with SYN+ACK+FIN+data
 then ACK+FIN
 Limitations
 have to cache of ISN (Initial Sequence Number) information,
and may have to fall back to 3WH sometimes
 experimental only, not deployed, not clear bug free

Jamali@iust.ac.ir ITransport Layer 3-126


TCP Congestion Control
 end-end control (no How does sender
network assistance) perceive congestion?
 CongWin is dynamic,  loss event = timeout or
function of perceived 3 duplicate acks
network congestion  TCP sender reduces
 manifestations: rate (CongWin) after
 lost packets loss event
(buffer overflow at three mechanisms:
routers)  slow start
 long delays (queuing  AIMD
in router buffers)  conservative after
timeout events

Jamali@iust.ac.ir ITransport Layer 3-127


TCP Congestion Control and Slow Start

Congestion Window Size:


TCP Tahoe TCP Reno (RFC 2581)
 Initial congestion window 14

Congestion Window Size [MSS]


threshold=8MSS 12
 Slow start (exponential increase):
10
 Hits threshold at fourth threshold
8
transmission
threshold
 Retransmission Time Out: 6

 New CongWin =1 4
 New threshold=12/2 2
 Window then grows 0
exponentially 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3ACKs after eigth transmi.:
Number of Transmission 3ACKs
 New CongWin =1
 New threshold=12/2
 Window then grows
exponentially • 3ACKs indicates network capable of
 3ACKs after eigth transmi.: delivering some segments
 New CongWin=12/2
• timeout before 3ACKs
 window then grows linearly
•3ACKs is “more alarming”

Jamali@iust.ac.ir ITransport Layer 3-128


Slow Start

Timeouts

Rate

halved
Slow start in operation
Exponential “slow until it reaches half of
start” t cwnd.
previous

Why is it called slow-start? Because TCP originally had


no congestion control mechanism. The source would just
start by sending a whole window’s worth of data.
Nick McKeown
Jamali@iust.ac.ir ITransport Layer 3-129
TCP Congestion Control - 2

 Slow-start and Congestion Avoidance

cwnd Congestion Avoidance


Slow Start

W W+1
4

2
1
RTT RTT Time

Jamali@iust.ac.ir ITransport Layer 3-130


TCP--Reno Behavior
TCP

Packet loss
cwnd
8
7
X
6
5
4
O
3
Y
2
1
X O Y
time
Slow Congestion Timeout Slow
Start Avoidance Start

Jamali@iust.ac.ir ITransport Layer 3-131


Slow Start Sequence Plot-
Plot- Window Doubles every Round
cwnd
.
.
.

time

Jamali@iust.ac.ir ITransport Layer 3-132


Congestion Avoidance Sequence Plot
Window grows by 1 every round
cwnd

time

Jamali@iust.ac.ir ITransport Layer 3-133


Fast Retransmit
cwnd

Retransmission
X
Duplicate Acks

time

Jamali@iust.ac.ir ITransport Layer 3-134


Cwnd of TCP

Slow Start

Fast Recovery

Congestion Avoidance

Jamali@iust.ac.ir ITransport Layer 3-135


Queue Size

Queue Empty

Queue Full

Queue Not Full

Jamali@iust.ac.ir ITransport Layer 3-136


TCP--Reno AIMD
TCP
(Adaptive Increase, Multiplicative-
Multiplicative-Decrease

 TCP sources change the sending rate by modifying


the window size:
 Window = min {Advertised window, Congestion Window}
Receive Window cwnd

 In other words, send at the rate of the slowest


component: network or receiver.
 “cwnd” follows additive increase/multiplicative
decrease (AIMD)

Jamali@iust.ac.ir ITransport Layer 3-137


TCP--Reno AIMD
TCP
multiplicative decrease: additive increase: increase
cut CongWin in half CongWin by 1 MSS every
after loss event RTT in the absence of loss
events: probing

CongWin
24 Kbytes
ˇ limitation of network
receive window
ˇ ˇ ˇ
16 Kbytes
ˇ ˇ optimal (average)
window size

8 Kbytes

time
Long-lived TCP connection
Jamali@iust.ac.ir ITransport Layer 3-138
TCP--Reno Slow Start (more)
TCP
MSS*CongWin(t) [Bytes/sec]
Rate(t) =
RTT
 When connection begins, Host A Host B
CongWin = 1 MSS

Rate=20kbps
 Example:

RTT
MSS = 500B(4000b)
RTT = 200 msec

Rate=40kbps
 initial rate = 4000/200
=20 kbps

Rate=80kbps
 When connection begins,
increase rate exponentially
until first loss event:
 double CongWin every RTT
 done by incrementing CongWin time
for every ACK received.

Jamali@iust.ac.ir ITransport Layer 3-139


TCP sender congestion control
Event State TCP Sender Action Commentary
ACK receipt Slow Start CongWin = CongWin + MSS, Resulting in a doubling of
for (SS) If (CongWin > Threshold) CongWin every RTT
previously set state to “Congestion
unacked Avoidance”
data
ACK receipt Congestion CongWin = CongWin+MSS * Additive increase,
for Avoidance (MSS/CongWin) resulting in increase of
previously (CA) CongWin by 1 MSS every
unacked RTT
data
Loss event SS or CA Threshold = CongWin/2, Fast recovery,
detected by CongWin = Threshold, implementing
triple Set state to “Congestion multiplicative decrease.
duplicate Avoidance” CongWin will not drop
ACK below 1 MSS.
Timeout SS or CA Threshold = CongWin/2, Enter slow start
CongWin = 1 MSS,
Set state to “Slow Start”
Duplicate SS or CA Increment duplicate ACK CongWin and Threshold
ACK count for segment being not changed
acked

Jamali@iust.ac.ir ITransport Layer 3-140


TCP throughput

 What’s the average throughout ot TCP as a function of


window size and RTT?
 Ignore slow start
 Let W=MSS*CongWin be the window size when loss occurs.
 When window is W, throughput is W/RTT
 Just after loss, window drops to W/2, throughput to
W/2RTT.
 Average throughout: 0.75 W/RTT

 Example: MSS=1500 byte segments, RTT=100ms, want 7.5


Gbps throughput
 Requires window size CongWin = 83,333 in-flight segments

Jamali@iust.ac.ir ITransport Layer 3-141


TCP Futures

 Throughput in terms of loss rate:

1.22 × MSS
RTT e
 For:
 1Gbps throughput,
 RTT=100ms and
 MSS=1500 Byte
e = 2.14×10-8

 New versions of TCP for high-speed needed!

Jamali@iust.ac.ir ITransport Layer 3-142


Problems with TCP-
TCP-Reno

 TCP Reno uses two mechanisms to detect packet losses:


 Triple duplicated ACKs,
 Timeout.
 Triple duplicated ACKs often fails to be triggered due to
either,
 Losses in burst,
 Small window.
 Timeout needs unnecessarily long delay.
 Congestion control in Reno,
 Need to create packet losses to find the available bandwidth of
the connection,
 Continually congesting the network,
 Creating losses for other connections sharing the link,
 Oscillations.

Jamali@iust.ac.ir ITransport Layer 3-143


Performance Evaluation of Vegas
1 MByte Transfer Over the Internet
Reno Vegas-1,3 Vegas-2,4
Throughput (Kb/s) 53.00 72.50 75.30
Throughput Ratio 1.00 1.37 1.42
Retransmission (KB) 47.80 24.50 29.30
Retransmission Ratio 1.00 0.51 0.61
Coarse Timeouts 3.30 0.80 0.90

Effect of Transfer Size Over the Internet


1024KB 512KB 128KB
Reno Vegas Reno Vegas Reno Vegas
Throughput (KB/s) 53.00 72.50 52.00 72.00 31.10 53.10
Throughput Ratio 1.00 1.37 1.00 1.38 1.00 1.71
Retransmission (KB) 47.80 24.50 27.90 10.50 22.90 4.00
Retransmission Ratio 1.00 0.51 1.00 0.38 1.00 0.17
Coarse Timeouts 3.30 0.80 1.70 0.20 1.10 0.20

Jamali@iust.ac.ir ITransport Layer 3-144


New Retransmission Mechanism: Vegas

 Upon receiving a duplicated Received ACK for packet 10 (packets 11 and 12 are in transit)
Send packet 13 (which is lost)
ACK or an ACK for a

One RTT
Received ACK for packet 11
retransmitted packet, Vegas Send packet 14
checks the time interval after
Received ACK for packet 12
the previous packet of the just Send packet 15 (which is also lost)
ACKed packet was sent. Should have gotten ACK for packet 13
 If the time interval is greater Received dup ACK for packet 12 (due to packet 14)
than the timeout value, then Vegas checks timestamp of packet 13 and decides to transmit it

One RTT
(Reno would need to wait for the 3rd duplicate ACK)
the packet is retransmitted
without waiting triple
duplicated ACKs. Received ACK for packets 13 and 14
Since it is 1st or 2nd ACK after retransmission,
 Only decreasing CWND if the Vegas checks timestamp of packet 15 and decide to transmit it
(Reno would need to wait for 3 new duplicate ACKs)
retransmitted packet was sent
after the last decrease.

Jamali@iust.ac.ir ITransport Layer 3-145


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP
 3.4 Principles of reliable  3.8 Multimedia Stream
data transfer & TCP
 3.5 Connection-oriented  3.9 TCP fairness
transport: TCP  3.10 TCP modeling
 segment structure  3.11 http modeling

Jamali@iust.ac.ir ITransport Layer 3-146


Multimedia Stream & TCP

 Window control not really appropriate for


multimedia applications:
 time-scale too short (~ RTT)  constantly switch codecs
 visible or audible transitions
 TCP may start or drop below minimum codec rate.
 Flow control not needed since receiver will need to
process data at the nominal (codec) rate.
 TCP reliability mechanism may impose additional
delay (> 500 ms) on packet loss.
 Thus, only want to maintain same long-term rate as
TCP
 no encouragement to mask file transfer as video
 react to congestion and bandwidth bottlenecks.

Jamali@iust.ac.ir ITransport Layer 3-147


TCP Friendliness
 Internet will soon begin to require applications even uses
TCP or to perform congestion control.
 If do not perform congestion control: be penalized
 probably in the form of preferentially dropping their packets
during times of congestion.
 They are capable of running over a much wider range
bandwidths and are hence more useful in the Internet.
 Any new congestion control must compete with TCP flows.
 Should not clobber TCP flows and grab bulk of link
 Should also be able to hold its own, i.e. grab its fair share, or
it will never become popular.

Jamali@iust.ac.ir ITransport Layer 3-148


TCP Friendly Rate Control (TFRC)
 Non-TCP applications mimic AIMD behavior, possibly with
longer timescales
 can also change A and D parameters ( GAIMD)
 They send with rate B (TCP throughput equation-TCP Reno):
 Round-trip delay RTT
 Packet size MSS [byte]
 Loss event rate e (receiver feedback every RTT)
 Retransmission timeout RTO, b=2

MSS
B=
Byte/sec
RTT ×
2be
3
+ RTO × 3(3be
8
)
× e × (1 + 32e 2 )

non-TCP
non-TCP

Internet

TCP TCP

Jamali@iust.ac.ir ITransport Layer 3-149


Internet Measurements
 3 TCP connections and 1 TFRC connection
 London (UCL) to Berkeley (ACIRI).
 Throughput measured over 1 sec intervals

TFRC much more stable than TCP


Jamali@iust.ac.ir ITransport Layer 3-150
Datagram Congestion Control Protocol
 Delay-sensitive applications, such as streaming media,
typically prefer timeliness to reliability.
 These applications use UDP for transport and implemented
their own congestion control mechanisms - a difficult task,
or no congestion control at all.
 DCCP, is a new transport protocol currently being
standardized by IETF that provides a congestion-controlled
flow of unreliable datagrams.
 DCCP is an unreliable transport protocol like UDP, but it has
congestion control like TCP.
 The protocol can be extended by adding new congestion
control algorithms, TCP-Friendly Rate Control (TFRC) as
profiles, in order to customize the congestion control for
applications with different characteristics.

Jamali@iust.ac.ir ITransport Layer 3-151


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-152


TCP Fairness

 If ‘N’ TCP sessions share a


bottleneck link, then each
session should get ‘1/N’ of the
link capacity.

source1
sink1 source1
10ms
5ms
bottleneck
sink2 200ms source2
5ms 1.5Mbps
5ms
sink3 5ms 100ms
source3
5ms 30ms
sink4
source4

Jamali@iust.ac.ir ITransport Layer 3-153


Why is TCP fair?

Two competing sessions:


 Additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally

R equal bandwidth share

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

A
Connection 1 throughput R

Jamali@iust.ac.ir ITransport Layer 3-154


Fairness (more)

Fairness and UDP Fairness and parallel TCP


 Multimedia apps often
connections
do not use TCP  nothing prevents app from
 do not want rate opening parallel
throttled by congestion connections between 2
control hosts.
 Instead use UDP:  Web browsers do this
 pump audio/video at  Example: link of rate R
constant rate, tolerate
packet loss
supporting 9 connections;
 new app asks for 1 TCP, gets
 Research area: TCP rate R/10
friendly  new app asks for 11 TCPs,
gets R/2 !

Jamali@iust.ac.ir ITransport Layer 3-155


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-156


TCP modeling

Notation, assumptions:
Q: How long does it take to  Assume one link between
receive an object from a client and server of rate R
Web server after sending  S: Segment Size (bits)
a request?  O: object (file) size (bits)
Ignoring congestion, delay is  no retransmissions (no loss,
influenced by: no corruption)
 TCP connection establishment Window size:
 data transmission delay  First assume: fixed
 slow start congestion window, W
segments
 Then dynamic window,
modeling slow start

Jamali@iust.ac.ir ITransport Layer 3-157


Fixed Congestion Window (1)

client server

First case:
ACK for first segment
in window returns

2RTT
before window’s
worth of data sent

Delay = 2RTT + O/R

WS/R > RTT + S/R

Jamali@iust.ac.ir ITransport Layer 3-158


Fixed Congestion Window (2)

client server

Second case:
wait for ACK after
sending window’s
worth of data sent
Delay = 2RTT + O/R

+ (K-1) [S/R + RTT - WS/R]

K = O/WS 
 
WS/R < RTT + S/R

Jamali@iust.ac.ir ITransport Layer 3-159


TCP Delay Modeling: Slow Start

Suppose window grows according to slow start:

It Will shown that the delay for one object is:

O  S S
Delay = 2 RTT + + P  RTT +  − ( 2P −1)
R  R R

where P is the number of times TCP idles at server:

P = min{ Q , K −1 }

- where Q is the number of times the server idles


if the object were of infinite size.

- and K is the number of windows that cover the object.

Jamali@iust.ac.ir ITransport Layer 3-160


TCP Delay Modeling: Slow Start (cont)
Delay components: initiate TCP
connection
• 2 RTT for connection
estab and request request
object
• O/R to transmit first window
= S/R
object k=1
1
Server idle 1
• time server idles due RTT
second window
to slow start k=2
2 = 2S/R
3 Server idle 2
RTT-S/R
4
Server idles: third window
= 4S/R
P = min{K-1,Q} times k=3

Example: fourth window


= 8S/R
• O/S = 15 segments k=4
• K = 4 windows
•Q=2 15
• P = min{K-1,Q} = 2 object
complete
transmission
delivered

Server idles P=2 times time at


time at
server
client

Jamali@iust.ac.ir ITransport Layer 3-161


TCP Delay Modeling: Slow Start (cont)

S + The time from when server begins to transmit the 1st segment
=
RTT until the time when the server receives an acknowledgment the segment.
R
k −1 S
2 = total transmission time for kth window
R

Server starts to send kth window


+
S k −1 S 
 R + RTT − 2  = idle time after kth window
S
R 1st segment is send
[x]+ = max(x,0) R

P
O S
delay = + 2 RTT + ∑ idleTime p
k −1
2 mst segment is send
R p =1
R
S
+ RTT
P
O S S
= + 2 RTT + ∑ [ + RTT − 2 k −1 ]
Server receives 1st ack
R R
k =1 R R
O S S
= + 2 RTT + P[ RTT + ] − (2 P − 1)
R R R
kth window including m segments
m=2k-1

Jamali@iust.ac.ir ITransport Layer 3-162


TCP Delay Modeling: Slow Start (cont)

Recall K = number of windows that cover object


How K is calculated ?

K = min{k : 20 S + 21 S + L + 2k 1 S ≥ O}
= min{k : 20 + 21 + L + 2k −1 ≥ O / S}
O
= min{k : 2k −1 ≥ }
S
O
= min{k : k ≥ log2( +1)}
S
 O 
= log2 ( +1) 
 S 

Jamali@iust.ac.ir ITransport Layer 3-163


TCP Delay Modeling: Slow Start (cont)

Calculation of Q, number of idles for infinite-size object:


How Q is calculated ?
+
S k −1 S 
Server starts to send kst window
R +RTT − 2 R  = idle time after k window
th

S
1st segment is send
R
 S S 
Q = max k : + RTT − 2 k −1 ≥ 0
 R R  k −1 S
2 mst segment is send
 RTT  R
 k −1
= max k : 2 ≤ 1 +
S  S
 R + RTT Server receives 1st ack
R
  RTT  
  + 1
= max k : k ≤ log 2 1 +

 S  
 R  
  RTT  kth window including m segments
= log 2 1 +  + 1
  S 
  R 

Jamali@iust.ac.ir ITransport Layer 3-164


Examples(1)

R O/R P Fixed Window Slow Start


28kbps 28.6sec 1 28.8sec 28.9sec
100kbps 8sec 2 8.2sec 8.4sec
1Mbps 800msec 5 1sec 1.5sec

10Mbps 80msec 7 0.28sec 0.98sec

Assumptions: S = 536B
RTT = 100msec
O = 100kB
K=8

Jamali@iust.ac.ir ITransport Layer 3-165


Examples(2
Examples( 2)

R O/R P Fixed Window Slow Start


28kbps 1.43sec 1 1.63sec 1.73sec

100kbps 0.4sec 2 0.6sec 0.76sec

1Mbps 40ms 3 0.24sec 0.52sec

10Mbps 4ms 3 0.2sec 0.5sec

Assumptions: S = 536B
RTT = 100msec
O = 5kB
K=4

Jamali@iust.ac.ir ITransport Layer 3-166


Examples(3)

R O/R P Fixed Window Slow Start


28kbps 1.43sec 3 3.4sec 5.8sec

100kbps 0.4sec 3 2.4sec 5.2sec

1Mbps 40ms 3 2.0sec 5.0sec

10Mbps 4ms 3 2.0sec 5.0sec

Assumptions: S = 536B
RTT = 1000msec
O = 5kB
K=4

Jamali@iust.ac.ir ITransport Layer 3-167


Delay: Examples(1,2,3)
sec sec
30 28kbps
25 100kbps 5
20 4
1Mbps
15 3
10Mbps
10 2
5 1
0 0
F1 S1 F2 S2 F3 S3 F1 S1 F2 S2 F3 S3

F: Fixed Window S: Slow Start

1 2 3
S = 536B S = 536B S = 536B
RTT = 100msec RTT = 100msec RTT = 1000msec
O = 100kB O = 5kB O = 5kB
K=8 K=4 K=4

Jamali@iust.ac.ir ITransport Layer 3-168


TCP Send Rate (Throughput)1

 We want to characterize the send rate of a bulk


transfer TCP flow as a function of packet loss and
round trip dalay(RTT).
 Bulk transfer means that a Flow with the large data
to send such a ftp transfer.
 If we have TCP send rate model, We can define a
“TCP-friendly” send rate for non-TCP flow such as
Multimedia that interacts with the TCP connections.

Jamali@iust.ac.ir ITransport Layer 3-169


TCP Send Rate (Throughput)2

 Model captures not only the behavior of the fast


retransmit mechanism but also the effect of the
time-out mechanism.
 Model is based on the Reno flavor of TCP, as it is one
of the more popular implementations in the Internet
today.
 We model the congestion avoidance behavior of TCP
in terms of “rounds.”

Jamali@iust.ac.ir ITransport Layer 3-170


Steady--State Model of TCP Throughput
Steady
 Send rate of a bulk transfer TCP Reno flow:
 Let b be the number of packets that are acknowledged by a
received ACK. Many TCP receiver implementations send one
cumulative ACK for two consecutive packets received, so b is
typically 2.
 where Wmax is the maximum window allowed by receiver and sender
(typically 8KB, 16KB, or 32KB),
 we define e to be the probability that a packet is lost. Assumption:
e>0

Wmax 1
B(e) ≅ min( , )
RTT
RTT ×
2be
3
(
+ RTO × min 1,3
3be
8
)
× e × (1 + 32e 2 )

[segments/sec]

Jamali@iust.ac.ir ITransport Layer 3-171


Approximations
Wmax 1
B (e) ≅ min( , )
RTT
RTT ×
2be
3
(
+ RTO × min 1,3 ×
3be
8
)
× e × (1 + 32e 2 )

For e ≤ 0.148 and large Wmax :


1
B (e ) ≅
2be 3be
RTT × + RTO × 3 × × e × (1 + 32e 2 )
3 8

For e ≤ 0.05 :
1 3 1 1.22 b =1
B (e ) ≅ + o( ) ≈ ←
RTT 2be e RTT e
 The notation f = o(g) means that (g > 0 and) (f/g) -> 0. The
notation o(g) indicates that the term is of smaller order of
magnitude than g.

Jamali@iust.ac.ir ITransport Layer 3-172


Total Time To Transfer

 To transfer O [byte] file, the time is calculated by:


O
delay = 2 RTT +
MSS × B(e)
 In which
 RTT is connection setup time
 Mss [byte] is segment size

 The throughput (achieved bandwidth) between source and


destination:
R = Throughput = MSS × B(e)

Jamali@iust.ac.ir ITransport Layer 3-173


Comparison of network throughput and TCP
throughput (send rate)

10000

1000
segments/100 Secs

100

Network Throughput
TCP Throughput (Send Rate)
10 RTT = 0.470
RTO = 3.2, Wmax=12

1
0.001 0.01 0.1 1
Loss Rate

Jamali@iust.ac.ir ITransport Layer 3-174


Chapter 3 outline

 3. Introduction
 reliable data transfer
 3.1 Transport-layer  flow control
services
 connection management
 3.2 Multiplexing and  3.6 Principles of congestion
demultiplexing control
 3.3 Connectionless  3.7 TCP congestion control
transport: UDP
 3.8 Multimedia Stream & TCP
 3.4 Principles of reliable
data transfer  3.9 TCP fairness
 3.5 Connection-oriented  3.10 TCP modeling
transport: TCP  3.11 http modeling
 segment structure

Jamali@iust.ac.ir ITransport Layer 3-175


HTTP Modeling

Non-persistent HTTP issues: Persistent with pipelining:


 requires 2RTTs per object for  default in HTTP/1.1
TCP connection.  client sends requests as soon
 Response time per object= O/R
as it encounters a referenced
object
+ 2RTT + sum of idle times
 one RTT for all the
referenced objects
Persistent without pipelining: Persistent HTTP:
 client issues new request only
 server leaves connection open
when previous response has
been received after sending response
 one RTT for each referenced  subsequent HTTP
object requests/responses between
same client/server are sent
over connection.

Jamali@iust.ac.ir ITransport Layer 3-176


HTTP Modeling

 Assume Web page consists of:


 1 base HTML page (of size O bits)

 M images (each of size O bits)

 What is the Response Time (delay)?

 Non-persistent HTTP:
 M+1 TCP connections in series
O
 .delay = ( M + 1) × 2 RTT + ( M + 1)
MSS × B(e)

Jamali@iust.ac.ir ITransport Layer 3-177


HTTP Modeling-
Modeling-Persistent

 Persistent with pipelining HTTP:


 2 RTT to request and receive base HTML file

 1 RTT to request all images (if all objects reside


in same server)
O
 .delay = 3RTT + ( M + 1)
MSS × B(e
( e)

Jamali@iust.ac.ir ITransport Layer 3-178


HTTP Modeling-
Modeling-Nonpersistent

 Non-persistent HTTP with X parallel connections


 Suppose M/X integer.

 1 TCP connection for base file (2RTT).

 M/X sets of parallel connections for images


(M/X)(2RTT).
M O
 .delay = ( + 1) × 2 RTT + ( M + 1)
X MSS × B(e)

Jamali@iust.ac.ir ITransport Layer 3-179


Summary

 Assume Web page consists of:


 1 base HTML page (of size O bits)

 M images (each of size O bits)

 What is the Response Time (delay)?

Non-persistent HTTP O
delay = ( M + 1) × 2 RTT + ( M + 1)
MSS × B(e)
Persistent with O
pipelining HTTP
delay = 3RTT + ( M + 1)
MSS × B(e)
Non-persistent HTTP M O
with X parallel delay = ( + 1) × 2 RTT + ( M + 1)
connections X MSS × B(e)

Jamali@iust.ac.ir ITransport Layer 3-180


Example--HTTP Response time (in seconds)
Example
RTT = 100msec, O = 5 Kbytes, M=10 and X=5,
Fixed congestion Window.
20
18
16
14
non-persistent
12
10
Persistent-
8
pipeline
6
4 parallel non-
persistent
2
0
28 100 1 10
Kbps Kbps Mbps Mbps

For low bandwidth, connection & response time dominated by transmission time.
Persistent connections only give minor improvement over parallel connections.

Jamali@iust.ac.ir ITransport Layer 3-181


HTTP Response time (in seconds)
RTT =1000sec, O = 5 Kbytes, M=10 and X=5,
Fixed congestion Window.
70
60
50
non-persistent
40
30 Persistent-
pipeline
20
parallel non-
10 persistent
0
28 100 1 10
Kbps Kbps Mbps Mbps
For larger RTT, response time dominated by TCP establishment & slow start
delays.
Persistent connections now give important improvement: particularly in high
delay • bandwidth networks.

Jamali@iust.ac.ir ITransport Layer 3-182


Chapter 3: Summary

 principles behind transport


layer services:
 multiplexing,
demultiplexing Next:
 reliable data transfer  leaving the network
 flow control “edge” (application,
transport layers)
 congestion control
 into the network
 instantiation and “core”
implementation in the
Internet
 UDP

 TCP

Jamali@iust.ac.ir ITransport Layer 3-183


Evolution of TCP

1984
1975 Nagel’s algorithm
Three-way handshake to reduce overhead 1987
Raymond Tomlinson of small packets; Karn’s algorithm 1990
In SIGCOMM 75 predicts congestion to better estimate 4.3BSD Reno
collapse round-trip time fast retransmit
delayed ACK’s
1983
BSD Unix 4.2 1986 1988
1974 supports TCP/IP Congestion Van Jacobson’s
collapse algorithms
TCP described by
observed congestion avoidance
Vint Cerf and Bob Kahn
In IEEE Trans Comm 1982 and congestion control
TCP & IP (most implemented in
RFC 793 & 791 4.3BSD Tahoe)

1975 1980 1985 1990

Berkeley Software Distribution


BSD Socket Layer

Jamali@iust.ac.ir ITransport Layer 3-184


TCP Through the 1990s

1994 1996
T/TCP,rfc1644 SACK TCP
(Braden) (Floyd et al)
Transaction Selective
TCP Acknowledgement

1993 1994 1996 1996


TCP Vegas ECN Hoe FACK TCP
(Brakmo et al) (Floyd) Improving TCP (Mathis et al)
real congestion Explicit startup extension to SACK
avoidance Congestion
Notification

1993 1994 1996

Jamali@iust.ac.ir ITransport Layer 3-185


Recommended Links

 Sally Floyd's Homepage http://www.icir.org/floyd


 Sally Floyd maintains several excellent "pointers to
literature" pages, including pointers to papers in the
following research areas:
– Changes proposed to TCP
– Global Optimization with End-to-End Congestion Control
– Measurement Studies of End-to-End Congestion Control in the
Internet
– Research Questions for the Internet
– The Evolvability of the Internet Infrastructure
– Layering and the Internet Architecture

Jamali@iust.ac.ir ITransport Layer 3-186


Recommended Links

 TCP Friendly Page


http://www.psc.edu/networking/tcp_friendly.html
 This Web site summarizes some of the recent work on
congestion control algorithms for non-TCP based
applications. It focuses on congestion control schemes
that use the "TCP-friendly" equation, (that is, maintaining
the arrival rate to at most some constant over the square
root of the packet loss rate).

 Research on TCP over Wireless Links


http://bbcr.uwaterloo.ca/~jpan/tcpair/
 Wireless links, which often have high bit error rates, can
reek havoc when carrying TCP traffic. This page points to
the numerous research papers that have tried to address
the problems.
Jamali@iust.ac.ir ITransport Layer 3-187
Request for Comments: 1122; R. Braden, Editor October 1989

TCP REFERENCES:
 [TCP:1] "Transmission Control Protocol," J. Postel, RFC-793,
September,1981.
 [TCP:2] "Transmission Control Protocol," MIL-STD-1778, US
Department of, Defense, August 1984.
This specification as amended by RFC-964 is intended to
describe the same protocol as RFC-793 [TCP:1]. If there is a
conflict, RFC-793 takes precedence, and the present document
is authoritative over both.
 [TCP:3] "Some Problems with the Specification of the Military
Standard Transmission Control Protocol," D. Sidhu and T.
Blumer, RFC-964, November 1985.
 [TCP:4] "The TCP Maximum Segment Size and Related Topics,"
J. Postel, RFC-879, November 1983.
 [TCP:5] "Window and Acknowledgment Strategy in TCP," D.
Clark, RFC-813, July 1982.
 [TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge,
ACM SIGCOMM-87, August 1987.
 [TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM
SIGCOMM 88, August 1988.

Jamali@iust.ac.ir ITransport Layer 3-188


References1

 A note on Internet Request for Comments (RFCs): Copies of Internet


RFCs are maintained at multiple sites. The RFC URLs below all point into the
RFC archive at the Information Sciences Institute (ISI), maintained the
the RFC Editor of the Internet Society (the body that oversees the RFCs).
Other RFC sites include http://www.faqs.org/rfc,
http://www.pasteur.fr/other/computer/RFC (located in France), and
http://www.csl.sony.co.jp/rfc/ (located in Japan). Internet RFCs can be
updated or obsoleted by later RFCs. We encourage you to check the sites
listed above for the most up-to-date information. The RFC search facility
at ISI, http://www.rfc-editor.org/rfcsearch.html, will allow you to search
for an RFC and show updates to that RFC.
 [Ahn 1995] J. S. Ahn, P. B. Danzig, Z. Liu, and Y. Yan, "Experience with TCP
Vegas: Emulation and Experiment", Proceedings of ACM SIGCOMM '95
(Boston, MA, Aug. 1995), pp. 185-195.
http://www.acm.org/sigcomm/sigcomm95/papers/ahn.html
 [Bertsekas 1991] D. Bertsekas and R. Gallagher, Data Networks, 2nd Ed. ,
Prentice Hall, Englewood Cliffs, NJ, 1991.

Jamali@iust.ac.ir ITransport Layer 3-189


References2

 [Bochman 84] G. V. Bochmann and C. A. Sunshine, "Formal methods


in communication protocol design," IEEE Transactions on
Communications, Vol. COM-28, No. 4 (Apr. 1980), pp. 624-631.
 [Brakmo 1995] L. Brakmo and L. Peterson, "TCP Vegas: End to End
Congestion Avoidance on a Global Internet," IEEE Journal of
Selected Areas in Communications, Vol. 13, No. 8, pp. 1465-1480,
Oct. 1995. ftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.ps
 [Cela 2000] F. Cela, "A quick Tour around TCP,"
http://www.ce.chalmers.se/%7Efcela/tcp-tour.html
 [Chiu 1989] D. Chiu and R. Jain, "Analysis of the Increase and
Decrease Algorithms for Congestion Avoidance in Computer
Networks," Computer Networks and ISDN Systems, Vol. 17, No. 1,
pp. 1-14. ftp://netlab.ohio-state.edu/pub/jain/papers/cong_av.pdf

Jamali@iust.ac.ir ITransport Layer 3-190


References3

 [Fall 1996] K. Fall, S. Floyd, "Simulation-based Comparisons


of Tahoe, Reno and SACK TCP," ACM Computer
Communication Review, Vol. 26, No. 3, pp. 5- 21, July 1996.
ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z
 [Floyd TCP 1994] S. Floyd, "TCP and Explicit Congestion
Notification," ACM Computer Communication Review, Vol. 24,
No. 5, pp. 10-23, Oct. 1994.
http://www.aciri.org/floyd/papers/tcp_ecn.4.ps.Z
 [Floyd 1999] S. floyd and K. Fall, "Promoting the Use of
End-to-End Congestion Control in the Internet," IEEE/ACM
Transactions on Networking, Vol. 6, No. 5 (Oct. 1998), pp.
458-472.
 [Floyd 2000] S. Floyd, M. Handley, J. Padhye, J. Widmer,
"Equation-Based Congestion Control for Unicast Applications,
" Proceedings of ACM SIGGOMM '00, (Stockholm, Sweden,
Aug. 2000).

Jamali@iust.ac.ir ITransport Layer 3-191


References4

 [Heidemann 1997] J. Heidemann, K. Obraczka, and J. Touch,


"Modeling the Performance of HTTP over Several Transport
Protocols," IEEE/ACM Transactions on Networking, Vol. 5, No. 5
(Oct. 1997), pp. 616-630.
 [Jacobson 1988] V. Jacobson, "Congestion Avoidance and Control,"
Proceedings of ACM SIGCOMM '88, pp. (Stanford, CA, Aug. 1988),
314-329, ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
 [Jain 1989] R. Jain, "A Delay-Based Approach for Congestion
Avoidance in Interconnected Heterogeneous Computer Networks,"
ACM Computer Communications Review, Vol. 19, No. 5 (1989), pp.
56-71.
 [Jain 1996] R. Jain. S. Kalyanaraman, S. Fahmy, R. Goyal, and S.
Kim, "Tutorial Paper on ABR Source Behavior," ATM Forum/96-
1270, Oct. 1996. http://www.cis.ohio-state.edu/~jain/atmf/a96-
1270.htm

Jamali@iust.ac.ir ITransport Layer 3-192


References5

 [Mathis 1996] M. Mathis, J. Mahdavi, "Forward Acknowledgment:


Refining TCP Congestion Control", Proceedings of ACM SIGCOMM
'96, (Stanford, CA, Aug. 1996),
http://www.acm.org/sigcomm/sigcomm96/papers/mathis.html
 [Mahdavi 1997] J. Mahdavi and S. Floyd, "TCP-Friendly Unicast
Rate-Based Flow Control," unpublished note, Jan. 1997.
http://www.psc.edu/networking/papers/tcp_friendly.html
 [Ramakrishnan 1990] K. K. Ramakrishnan and Raj Jain, "A Binary
Feedback Scheme for Congestion Avoidance in Computer
Networks," ACM Transactions on Computer Systems, Vol. 8, No. 2
(May 1990), pp. 158-181.
 [RFC 793] J. Postel, "Transmission Control Protocol," RFC 793,
Sept. 1981. http://www.rfc-editor.org/rfc/rfc793.txt

Jamali@iust.ac.ir ITransport Layer 3-193


References6

 [RFC 1122] R. Braden, "Requirements for Internet Hosts--


Communication Layers," RFC 1122, Oct. 1989. http://www.rfc-
editor.org/rfc/rfc1122.txt
 [RFC 1323] V. Jacobson, S. Braden, and D. Borman, "TCP
Extensions for High Performance," RFC 1323, May 1992.
http://www.rfc-editor.org/rfc/rfc1323.txt
 [RFC 2018] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow, "TCP
Selective Acknowledgment Options," RFC 2018, Oct. 1996.
http://www.rfc-editor.org/rfc/rfc2018.txt
 [RFC 2481] K. K. Ramakrishnan and S. Floyd, "A Proposal to Add
Explicit Congestion Notification (ECN) to IP," RFC 2481, Jan. 1999.
http://www.rfc-editor.org/rfc/rfc2481.txt

Jamali@iust.ac.ir ITransport Layer 3-194


References7

 [RFC 2581] M. Allman, V. Paxson, W. Stevens, " TCP Congestion


Control," RFC 2581, Apr. 1999. http://www.rfc-
editor.org/rfc/rfc2581.txt
 [Rhee 1998] I. Rhee, "Error Control Techniques for Interactive
Low-bit Rate Video Transmission over the Internet," Proceedings
ACM SIGCOMM'98, Vancouver BC, (Aug. 31 - Sept. 4, 1998).
http://www.acm.org/sigcomm/sigcomm98/tp/abs_24.html
 [Schwartz 1982] M. Schwartz, "Performance Analysis of the SNA
Virtual Route Pacing Control," IEEE Transactions on
Communications, Vol. COM-30, No. 1, (Jan. 1982), pp. 172-184.
 [Stevens 1994] W. R. Stevens, TCP/IP Illustrated, Vol. 1: The
Protocols, Addison-Wesley, Reading, MA, 1994.

Jamali@iust.ac.ir ITransport Layer 3-195


References8

 [Stone 1998] J. Stone, M. Greenwald, C. Partridge, and J. Hughes,


"Performance of checksums and CRC's over real data," IEEE/ACM
Transactions on Networking, Vol. 6, No. 5 (Oct. 1998), pp 529 -
543
 [Stone 2000] J. Stone, C. Partridge, "When Reality and the
Checksum Disagree Proceedings of ACM SIGCOMM '00,
(Stockholm, Sweden, Aug. 2000).
 [Sunshine 1978] C. Sunshine and Y. K. Dalal, "Connection
Management in Transport Protocols," Computer Networks, North-
Holland, Amsterdam, 1978.
 [Varghese 1997] G. Varghese and A. Lauck, "Hashed and
Hierarchical Timing Wheels: Efficient Data Structures for
Implementing a Timer Facility, " IEEE/ACM Transactions on
Networking, Vol. 5, No. 6, (Dec. 1997), pp.824 - 834

Jamali@iust.ac.ir ITransport Layer 3-196


Home Work3
‫  ز از
م ب‬
.  ‫ را‬Computer Networking 3edition
39-36-34-21-20-19-7-1
‫ ارل‬jamali@iust.ac.ir ‫درس‬# $% ‫ *)(' &درا‬
, Subject: HW3 ”Student ID Number”.

.. ‫ &داد  &اه‬2 $,-. $ /‫ر‬0 ‫  در‬


. $(0 power point % ‫ *)(' &د را‬

Jamali@iust.ac.ir ITransport Layer 3-197


Control System Model [CJ89]
D. Chiu and R. Jain, “Analysis of the increase and decrease algorithms for
congestion avoidance in computer networks”, Computer Networks and ISDN
Systems, Volume 17 , Issue 1 (June 1989), Pages: 1 - 14 .

D M Chiu And R Jain, "Analysis of Increase and Decrease Algorithms, Part III
of Congestion Avoidance in Computer Networks with a Connectionless Network
Layer", DEC Technical Report 509, August 1987.

User 1 x1

x2
User 2 Σ Σxi>Xgoal
 xn
Simple, yet powerful model
 Explicit binary
User n signal of congestion
y
Jamali@iust.ac.ir ITransport Layer 3-198
Possible Choices
 aI + bI xi (t ) increase 
xi (t + 1) =  
a +
 D D ib x (t ) decrease 
 Multiplicative increase, additive decrease
 aI=0, bI>1, aD<0, bD=1
 Additive increase, additive decrease
 aI>0, bI=1, aD<0, bD=1
 Multiplicative increase, multiplicative decrease
 aI=0, bI>1, aD=0, 0<bD<1
 Additive increase, multiplicative decrease
 aI>0, bI=1, aD=0, 0<bD<1
 Which one?

Jamali@iust.ac.ir ITransport Layer 3-199


Multiplicative Increase,
Additive Decrease
fairness
line
 Fixed point (bI(x1h+aD), bI(x2h+aD))

at bI aD
x1h = x2 h = (x1h,x2h)
1 − bI

2: x2
User 2
(x1h+aD,x2h+aD)

Fixed point
is unstable! efficiency
line

User 1: x1

Jamali@iust.ac.ir ITransport Layer 3-200


Additive Increase,
Additive Decrease
fairness
(x1h+aD+aI), line
 Reaches x2h+aD+aI))
stable
cycle, but (x1h,x2h)

does not

User 2: x2
converge to (x1h+aD,x2h+aD)
fairness

efficiency
line

User 1: x1

Jamali@iust.ac.ir ITransport Layer 3-201


Multiplicative Increase,
Multiplicative Decrease
fairness
line
 Converges (x1h,x2h)
to stable (bIbDx1h,
bIbDx2h)
cycle, but
is not fair

User 2: x2
(bdx1h,bdx2h)

efficiency
line

User 1: x1

Jamali@iust.ac.ir ITransport Layer 3-202


Additive Increase,
Multiplicative Decrease
fairness
line
 Converges (x1h,x2h)
(bDx1h+aI,
to stable bDx2h+aI)
and fair
cycle

User 2: x2
(bDx1h,bDx2h)

efficiency
line

User 1: x1

Jamali@iust.ac.ir ITransport Layer 3-203


Modeling

 Critical to understanding complex systems


 [CJ89] model relevant after 15 years, 106
increase of bandwidth, 1000x increase in
number of users

 Criteria for good models


 Two conflicting goals: reality and simplicity
 Realistic, complex model → too hard to
understand, too limited in
applicability
 Unrealistic, simple model → can be misleading
Jamali@iust.ac.ir ITransport Layer 3-204
TCP Congestion Control

 [CJ89] provides theoretical basis for basic


congestion avoidance mechanism

 Must turn this into real protocol

Jamali@iust.ac.ir ITransport Layer 3-205


TCP Congestion Control

 Maintains three variables:


 cwnd: congestion window
 flow_win: flow window; receiver advertised
window
 Ssthresh: threshold size (used to update cwnd)


 For sending, use: win = min(flow_win, cwnd)

3-206
TCP: Slow Start

 Goal: reach knee quickly

 Upon starting (or restarting):


 Set cwnd =1
 Each time a segment is acknowledged
increment cwnd by one (cwnd++).

 Slow Start is not actually slow


 cwnd increases exponentially

3-207
Slow Start Example

 The congestion
window size cwnd = 1

grows very
rapidly cwnd = 2

cwnd = 4

 TCP slows down


the increase of
cwnd when
cwnd >= cwnd = 8

ssthresh

Jamali@iust.ac.ir ITransport Layer 3-208


Congestion Avoidance

 Slow down “Slow Start”

 ssthresh is lower-bound guess about location of


knee

 If cwnd > ssthresh then


each time a segment is
acknowledged
increment cwnd by 1/cwnd (cwnd +=
1/cwnd).

 So cwnd is increased by one only if all segments


have been acknowledged.

Jamali@iust.ac.ir ITransport Layer 3-209


Slow Start/Congestion Avoidance Example

cwnd = 1

 Assume that cwnd = 2

ssthresh = 8 cwnd = 4
14
12
10 cwnd = 8
Cwnd (in segments)

8
ssthresh
6
4
2 cwnd = 9

0
0

6
t=

t=

t=

t=

Roundtrip times
cwnd = 10

Jamali@iust.ac.ir ITransport Layer 3-210


Putting Everything Together:
TCP Pseudocode
Initially:
cwnd = 1; while (next < unack + win)
ssthresh = infinite; transmit next packet;
New ack received:
if (cwnd < ssthresh)
where win = min(cwnd,
/* Slow Start*/
flow_win);
cwnd = cwnd + 1;
else
/* Congestion Avoidance
*/ seq # unack next
cwnd = cwnd + 1/cwnd;
Timeout:
/* Multiplicative decrease */
win
ssthresh = cwnd/2;
cwnd = 1;

3-211
The big picture

cwnd

Timeout

Congestion
Avoidance

Slow Start

Time

3-212
Fast Retransmit

 Don’t wait for cwnd = 1


window to drain
cwnd = 2

 Resend a segment
after 3 duplicate
ACKs cwnd = 4

3 duplicate
ACKs

Jamali@iust.ac.ir ITransport Layer 3-213


Fast Recovery

 After a fast-retransmit set cwnd to


ssthresh/2
 i.e., don’t reset cwnd to 1

 But when RTO expires still do cwnd = 1

 Fast Retransmit and Fast Recovery


 Implemented by TCP Reno
 Most widely used version of TCP today

Lesson:
 Jamali@iust.ac.ir avoid RTOs at all costs! ITransport Layer 3-214
Fast Retransmit and Fast Recovery

cwnd

Congestion
Avoidance
Slow Start

Time
 Retransmit after 3 duplicated acks
 prevent expensive timeouts
 No need to slow start again
 At steady state, cwnd oscillates around
the optimal window size.
3-215
Engineering vs Science in CC

 Great engineering built useful protocol:


 TCP Reno, etc.

 Good science by CJ and others


 Basis for understanding why it works so well

Jamali@iust.ac.ir ITransport Layer 3-216


Behavior of TCP

 Are packets smoothly paced?


 NO! Ack-compression

 Are long-lived flows nicely interleaved?


 NO!

 How does throughput depend on drop rate?

Tput ~ 1/sqrt(d)

Jamali@iust.ac.ir ITransport Layer 3-217


Extensions to TCP

 Selective acknowledgements: TCP SACK

 Explicit congestion notification: ECN

 Delay-based congestion avoidance: TCP


Vegas

 Discriminating between congestion


losses and other losses: cross-layer
signaling and guesses
Jamali@iust.ac.ir ITransport Layer 3-218
Issues with TCP

 Fairness:
 Throughput depends on RTT

 High speeds:
 to reach 10gbps, packet losses occur every
90 minutes!

 Short flows:
 How to set initial cwnd properly

 What about flows that want congestion


Jamali@iust.ac.ir ITransport Layer 3-219
TCP: Cooperation and Compatibility

 TCP assumes all flows employ TCP-like


congestion control
 TCP-friendly or TCP-compatible

 Selfish flows: can get all the bandwidth


they like

 If new congestion control algorithms are


developed, they must be TCP-friendly

Jamali@iust.ac.ir ITransport Layer 3-220