4

TCP#Congestion#Control
TCP#Tahoe
TCP#Reno
TCP#vegas
!nternet#Congestion#Collapse
„ !n#the#late#;3s/#the#!nternet#suffered#a#
congestion#collapse
Host
Host
Host
Congestion$
Packet#drops$$
Data .#Retransmissions
Nore#Drops$$$
Collapse
Why#is#Today's#Topic#
!mportant?
7KH#DOJRULWKP#IRU#7&3#FRQJHVWLRQ#
FRQWURO#LV#WKH#PDLQ#UHDVRQ#ZH#FDQ#
XVH#WKH#,QWHUQHW#VXFFHVVIXOO\#WRGD\#
GHVSLWH#ODUJHO\#XQSUHGLFWDEOH#XVHU#
DFFHVV#SDWWHUQV#DQG#GHVSLWH#
UHVRXUFH#ERWWOHQHFNV#DQG#
OLPLWDWLRQV1#:LWKRXW#7&3#FRQJHVWLRQ#
FRQWURO/#WKH#,QWHUQHW#FRXOG#KDYH#
EHFRPH#KLVWRU\ D#ORQJ#WLPH#DJR1
Resource#Nanagement#
Solutions
„ Handling#congestion
„ pre0allocate#resources#so#as#to#avoid#congestion#/
„ control#congestion#if#+and#when,#is#occurs#-
„ Two#points#of#implementation
„ routers#inside#the#network#+queuing#discipline,#/
„ hosts#at#the#edges#of#the#network#+transport#protocol,#-
Destination
1.5-Mbps T1 link
Router
Source
2
Source
1
100-M
bps FDDI
10-Mbps Ethernet
Problem#with#;3s#Transport#
Protocol#+Early#TCP,
„ The#connection#flow#rate#does#not#depend#
on#the#level#of#congestion#in#the#network
„ The#;3s#question#was=#How#to#re0design#
TCP#to#send#less#traffic#when#the#network#
gets#congested?
„ How#to#detect#congestion?
„ How#to#make#flow#rate#sensitive#to#congestion#
level?
Detecting#Congestion
„ Packet#drops#indicate#congestion
„ !s#that#really#true?
„ Why#does#it#work?
Src
Dst
Packet
Ack
Drop
Timeout$#No#Ack#
=#Congestion$
5
Controlling#Congestion#- The#
Effect#of#Window#Size
„ Note#that#sender's#window#is#equal#to#the#
number#of#sender#packets#in#flight#+in#the#
network,1#Why?
A#Window's#worth#of#packets
X#acks
Window
X#more#packets
Source####################################Destination
Controlling#Congestion
„ Reduce#window#Æ less#packets#in#the#
network
„ !ncrease#window#Æ more#packets#in#the#
network
„ !dea=#Concept#of#a#FRQJHVWLRQ#ZLQGRZ -
window#is#smaller#when#congestion#is#
larger#and#vice#versa
van#Jacobson's#Congestion#
Control
„ van#Jacobson#+formerly#C!SCO#Chief#
Technical#Officer#- now#a#startup#
founder,#introduced#congestion#control#
into#TCP#in#4<;;04<;<1
„ The#original#version#of#TCP#that#
implements#van#Jacobson's#congestion#
control#is#known#as#TCP#Tahoe1#
Elements#of#Congestion#
Control#in#TCP#Tahoe
„ Additive#increase/#multiplicative#
decrease
„ Slow#start
„ Fast#retransmit
Additive#!ncrease/#
Nultiplicative#Decrease
„ Each#time#a#packet#drop#occurs/#slash#
window#size#in#half#+multiplicative#
decrease,
„ Nultiplicative#decrease#is#necessary#to#avoid#
congestion
„ When#no#losses#are#observed/#gradually#
increase#window#size#+additive#increase,
A!ND#+cont,
„ !n#practice=#increment#a#little#for#each#ACK
Increment = (MSS*MSS)/CongestionWindow
CongestionWindow += Increment
Source Destination

„ Algorithm
„ increment#CongestionWindow by#
one#packet#per#RTT#+OLQHDU#LQFUHDVH,
„ divide#CongestionWindow by#two#
whenever#a#timeout#occurs#
+PXOWLSOLFDWLYH#GHFUHDVH,
6
A!ND#+cont,
„ Trace=#sawtooth#behavior
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
K
B
Time (seconds)
70
30
40
50
10
10.0
Problems#
„ What#should#the#window#size#be
„ !nitially?
„ Upon#packet#loss#and#timeout?
„ Pessimistic#window#size?#+e1g1/#4,
„ Additive#increase#is#too#slow#in#ramping#up#
window#size#- short#connections#will#not#
fully#utilize#available#bandwidth
„ Optimistic#window#size?
„ Large#initial#burst#may#cause#router#queue#
overflow#
Slow#Start
„ Objective=#determine#the#available#
capacity#quickly
„ !dea=
„ Use#CongestionThreshold as#an#
optimistic#CongestionWindow estimate
„ begin#with#CongestionWindow =#4#
packet
„ double#CongestionWindow each#RTT#
+increment#by#4#packet#for#each#ACK,
„ When#CongestionThreshold is#
crossed/#use#additive#increase#
Source Destination

Slow#Start#!llustration
Assume#that#
CongestionThreshold = 8
cwnd = 1
cwnd = 2
cwnd = 4
cwnd = 8
cwnd = 9
cwnd = 10
3
5
7
9
;
43
45
47
W

3
W

5
W

7
W

9
Roundtrip times
C
o
n
g
e
s
t
i
o
n

W
i
n
d
o
w

(
i
n

s
e
g
m
e
n
t
s
)
threshold
Figures#by#Jorg Liebeherr
Slow#Start#+cont,
„ Used.
„ when#first#starting#connection
„ when#connection#goes#dead#waiting#for#timeout
„ Real#TCP#Trace
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
K
B
70
30
40
50
10
Fast#Retransmit
„ Problem=#coarse0grain#
TCP#timeouts#lead#to#
idle#periods
„ Fast#retransmit=#
„ Send#an#ack on#every#
packet#reception
„ Send#duplicate#of#last#
ack when#a#packet#is#
received#out#of#order
„ Use#duplicate#ACKs#to#
trigger#retransmission
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Retransmit
packet 3
ACK 1
ACK 2
ACK 2
ACK 2
ACK 6
ACK 2
Sender Receiver
7
TCP#Reno#and#Fast#Recovery
„ Skip#the#slow#start#phase
„ So#directly#to#half#the#last#successful#
CongestionWindow
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0
K
B
70
30
40
50
10
Recap#of#Congestion#Control
„ TCP#Reno#and#its#derivatives#are#most#widely#
used#today
„ Performance#issue=#try#to#avoid#forcing#the#source#
to#go#to#slow0start#+Why?,#
„ Source#goes#to#slow0start#initially#and#upon#
timeouts#+what's#the#rationale#for#that?,
„ Source#cuts#congestion#window#in#half#upon#a#fast#
retransmit#+why#not#go#to#slow#start?,
„ Single#packet#drops#can#be#caught#by#the#fast#
retransmit2fast#recovery#algorithm#- source#does#
not#go#to#slow#start#+Why?,
„ Nultiple#consecutive#packet#drops#will#most#likely#
force#the#source#into#slow#start#+why?,#
Self#Clocking#and#Slow#Start
„ Each#packet's#transmission#is#¨clocked"#by#an#
ACK#- no#bursts#develop
W=4####W=5###W=7####W=8####W=9####W=:
Slow#Start
.
Self#Clocking#in#Operation
„ Each#packet's#transmission#is#¨clocked"#by#an#
ACK#- no#bursts#develop
W=65
. .
Self#Clocking#!nterrupted
„ During#timeouts/#ACKs#are#drained#from#the#
network1#Self#clocking#is#interrupted1#Next#
transmission#causes#a#burst1#Hence/#slow#start$
W=65
. .
Lost
Timeout
Retransmission
Ack65
49#packet#burst
Cut#window#in#425
ACKs
Drained$$
Self#Clocking#and#Fast#
Retransmit#2#Fast#Recovery
„ When#fast#retransmit#is#used/#the#packet#is#
retransmitted#before#all#ACKs#are#drained1#Slow#
start#is#not#needed
W=65
. .
Lost
Fast
Retransmission
Cut#window#in#425
8
Tahoe2Reno#
Retransmission#Timers
„ The#retransmission#timers#are#set#based#on#round0
trip#time#+RTT,#measurements#that#TCP#performs
‹ Jorg Liebeherr
pkt4
ack5
pkt5#pkt6#pkt7
ack8
pkt8#pkt9#pkt:
ack;
Round0Trip#Time#
Neasurements
„ TCP#uses#RTT#mean#and#variance#estimators/#called#
VUWW and#UWWYDU=
VUWW
Q.4
#α 577##.#+40 α ,#VUWW
Q
UWWYDU
Q.4
#β +#_#577#0 VUWW
Q.4
_#,#.#+40 β ,#UWWYDU
Q
„ RTO#is#set#to#the#mean#RTT#plus#four#times#the#
estimated#RTT#variance1
572
Q.4
VUWW
Q.4
.#7 UWWYDU
Q.4
Problem
„ !f#a#segment#is#retransmitted/#TCP#can't#
compute#the#RTT1
„ Why?
pkt4
pkt4
ack5
Karn's#Algorithm
„ Don't#update#RTT#on#any#segments#that#
have#been#retransmitted1
„ When#TCP#retransmits/#set=
572
Q.4
#PD[#+#5#572
Q
/#97,
Congestion#Avoidance
„ TCP's#strategy
„ control#congestion#once#it#happens
„ repeatedly#increase#load#in#an#effort#to#find#the#point#
at#which#congestion#occurs/#and#then#back#off
„ Alternative#strategy
„ predict#when#congestion#is#about#to#happen
„ reduce#rate#before#packets#start#being#discarded
„ called#congestion#DYRLGDQFH
„ Two#possibilities#
„ router0centric=#DECbit#and#RED#Gateways#
„ host0centric=#TCP#vegas#
TCP#vegas
„ Uses#congestion#avoidance#instead#of#
congestion#control
„ vegas=#Congestion#avoidance= Predict#and#
avoid#congestion#EHIRUH it#occurs
„ Tahoe/#Reno=#Congestion#control= React#to#
congestion#DIWHU it#occurs
„ Question=#How#to#predict#congestion?
9
TCP#vegas
„ !dea=#source#
watches#for#some#
sign#that#router's#
queue#is#building#
up#and#congestion#
will#happen#too>#
e1g1/
„ RTT#grows
„ sending#rate#
flattens
60
20
0.5 1.0 1.5 4.0 4.5 6.5 8.0
K
B
Time (seconds)
Time (seconds)
70
30
40
50
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
900
300
100
0.5 1.0 1.5 4.0 4.5 6.5 8.0
S
e
n
d
in
g
K
B
p
s
1100
500
700
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
Time (seconds)
0.5 1.0 1.5 4.0 4.5 6.5 8.0
Q
u
e
u
e
s
iz
e
in
ro
u
te
r
5
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
Observation
„ Packet#accumulation#in#the#network#can#be#
inferred#by#monitoring#RTT#and#sending#rate#
Bottleneck#Link
Overloaded
Router
S
e
n
d
in
g
#R
a
te
FZQG
Sending#Rate#=#FZQG 2#RTT
Algorithm#
„ Let#BaseRTT be#the#minimum#of#all#measured#RTTs#
+commonly#the#RTT#of#the#first#packet,
„ !f#not#overflowing#the#connection/#then
ExpectRate = CongestionWindow/BaseRTT
„ Source#calculates#ActualRate once#per#RTT
„ Source#compares#ActualRate with#ExpectRate
Diff = ExpectedRate - ActualRate
if Diff < α αα α
increase CongestionWindow linearly
else if Diff > β ββ β
decrease CongestionWindow linearly
else
leave CongestionWindow unchanged
Algorithm#+cont,
„ Parameters
− α αα α =#4#packet
− β ββ β =#6#packets
70
60
50
40
30
20
10
K
B
Time (seconds)
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
C
A
M
K
B
p
s
240
200
160
120
80
40
Time (seconds)
Congestion#Avoidance#in#the#
Network#Layer=#DECbit
„ !n#the#network=
„ Router#computes#average#queue#length
„ !f#average#>#4/#set#congestion#bit#in#packet#header
„ Destination#copies#congestion#bit#onto#the#ACK
„ On#the#source=
„ Source#maintains#a#sliding#window#+as#in#TCP,1
„ !f#less#than#83(#of#the#ACKs#in#last#window#have#
congestion#bit#set/#increase#window#by#41
„ !f#more#than#83(#of#the#ACKs#have#congestion#bit#set/#
shrink#window#to#31;:8#of#its#current#size1#
DECbit
„ Computing#the#average#queue#length
„ Average#carried#out#over#last#busy.idle#intervals#
plus#the#current#busy#interval
Queue length
Current
time
Time
Current
cycle
Previous
cycle
Averaging
interval
:
Congestion#Avoidance#in#RED
„ Source=
Sally#Floyd#and#van#Jacobson/ ¨Random#
Early#Detection#Gateways#for#Congestion#
Avoidance/"#,(((2$&0#7UDQVDFWLRQV#RQ#
1HWZRUNLQJ/ vol1#4/#No1#7/#pp1#6<:0746/#
August#4<<61
Random#Early#Detection#+RED,
„ Notification#is#implicit#
„ just#drop#the#packet#+TCP#will#timeout,
„ could#be#made#explicit#by#marking#the#packet
„ Early#random#drop
„ rather#than#wait#for#queue#to#become#full/#drop#each#
arriving#packet#with#some#GURS#SUREDELOLW\#whenever#
the#queue#length#exceeds#some#GURS#OHYHO
RED#Details
„ Compute#average#queue#length
AvgLen = (1 - Weight) * AvgLen + Weight * SampleLen
3#?#Weight ?#4#+usually#31335,
SampleLen is#queue#length#each#time#a#packet#arrives
MaxThreshold MinThreshold
AvgLen
RED#Details#+cont,
„ Two#queue#length#thresholds
if AvgLen <= MinThreshold then
enqueue the packet
if MinThreshold < AvgLen < MaxThreshold
then
calculate probability P
drop arriving packet with probability P
if ManThreshold <= AvgLen then
drop arriving packet
RED#Details#+cont,
„ Computing#probability#P#+first#pass,=
P = MaxP * (AvgLen - MinThreshold)/
(MaxThreshold - MinThreshold)
„ Drop#Probability#Curve
P(drop)
1.0
MaxP
MinThresh MaxThresh
AvgLen
RED#Details#+cont,
„ Problem#with=
P = MaxP * (AvgLen - MinThreshold)/
(MaxThreshold - MinThreshold)
„ Drop#probability#is#insensitive#to#time#since#last#
drop#- two#or#more#drops#may#occur#back#to#back#
forcing#the#source#into#slow#start
„ !mprovement=
TempP = MaxP * (AvgLen - MinThreshold)/
(MaxThreshold - MinThreshold)
P#=#TempP2+4#- count-TempP,
;
R!O=#RED#with#!n#and#Out
„ Packets#fall#into#two#categories
„ !n#profile#+high#priority,
„ Out#of#profile#+low#priority,1
„ Out#packets#are#dropped#first