Congestion Avoidance and Control: V. Jacobson

Congestion Avoidance and Control
V. Jacobson
(Originally Published in: Proc. SIGCOMM '88, Vo118 No. 4, August 1988)
ACM SIGCOMM -157- Computer Communication Review

Congestion Avoidance and Control
Van Jacobson*
University of California
Lawrence Berkeley Laboratory
Berkeley, CA 94720
van@helios.ee.lbl.gov
In October of '86, the Internet h a d the first of what Algorithms (i) - (v) spring from one observation:
became a series of 'congestion collapses'. During this The flow on a TCP connection (or ISO TP-4 or Xerox NS
period, the data t h r o u g h p u t from LBL to UC Berke- SPP connection) should obey a 'conservation of pack-
ley (sites separated b y 400 yards and three IMP hops) ets' principle. And, if this principle w e r e obeyed, con-
d r o p p e d from 32 Kbps to 40 bps. Mike Karels 1 and I gestion collapse w o u l d b e c o m e the exception rather
were fascinated b y this s u d d e n factor-of-thousand drop than the rule. Thus congestion control involves find-
in b a n d w i d t h and e m b a r k e d on an investigation of w h y ing places that violate conservation and fixing them.
things h a d gotten so bad. We w o n d e r e d , in particular, By 'conservation of packets' I m e a n that for a con-
if the 4.3BSD (Berkeley UNIX) TCP was mis-behaving or nection 'in equilibrium', i.e., r u n n i n g stably with a full
if it could be t u n e d to w o r k better u n d e r abysmal net- w i n d o w of data in transit, the packet flow is w h a t a
w o r k conditions. The answer to b o t h of these questions physicist w o u l d call 'conservative': A n e w packet isn't
was "yes". p u t into the n e t w o r k until an old packet leaves. The
Since that time, we have p u t seven n e w algorithms physics of flow predicts that systems with this p r o p e r t y
into the 4BSD TCP: should be robust in the face of congestion. Observation
of the Internet suggests that it was not particularly ro-
(i) round-trip-time variance estimation bust. W h y the discrepancy?
(ii) exponential retransmit timer backoff There are only three ways for packet conservation to
fail:
(iii) slow-start
1. The connection doesn't get to equilibrium, or
(iv) m o r e aggressive receiver ack policy
2. A sender injects a n e w packet before an old packet
(v) d y n a m i c w i n d o w sizing on congestion
has exited, or
(vi) Karn's c l a m p e d retransmit backoff
3. The equilibrium can't be reached because of re-
(vii) fast retransmit source limits along the path.
O u r m e a s u r e m e n t s and the reports of beta testers sug- In the following sections, we treat each of these in turn.
gest that the final p r o d u c t is fairly good at dealing with
congested conditions on the Internet.
This p a p e r is a brief description of (i) - (v) and the 1 Getting to Equilibrium:
rationale b e h i n d them. (vi) is an algorithm recently de- Slow-start
v e l o p e d b y Phil Karn of Bell Communications Research,
described in [KP87]. (vii) is described in a soon-to-be-
Failure (1) has to be from a connection that is either
published RFC.
starting or restarting after a packet loss. A n o t h e r w a y
*This work was supported in part by the U.S. Department of En- to look at the conservation p r o p e r t y is to say that the
ergy under Contract Number DE-AC03-76SF00098. sender uses acks as a 'clock' to strobe n e w packets into
1The algorithms and ideas described in this paper were developed the network. Since the receiver can generate acks n o
in collaboration with Mike Karels of the UC Berkeley Computer Sys- faster than data packets can get t h r o u g h the network,
tem Research Group. The reader should assume that anything clever
is due to Mike. Opinions and mistakes are the property of the author.

F- P r ~
::::::::::::::::::::::::::::::::: .......
Sender Receiver
L__ \, ~E~ [!it

-- /
|i] l::t
__Y
~ As -'t I-- Ar--t
This is a schematic representation of a sender and receiver on high bandwidth networks

connected by a slower, long-haul net. The sender is just starting and has shipped a
window's worth of packets, back-to-back. The ack for the first of those packets is about
to arrive back at the sender (the vertical line at the mouth of the lower left funnel).
The vertical direction is bandwidth, the horizontal direction is time. Each of the shaded
boxes is a packet. Bandwidth x Time -- Bits so the area of each box is the packet size.
The number of bits doesn't change as a packet goes through the network so a packet
squeezed into the smaller long-haul bandwidth must spread out in time. The time
Pb represents the minimum packet spacing on the slowest link in the path (the bottle-
neck). As the packets leave the bottleneck for the destination net, nothing changes the
inter-packet interval so on the receiver's net packet spacing P, -- Pb. If the receiver
processing time is the same for all packets, the spacing between acks on the receiver's
net A, = Pr = Pb. If the time slot Pb was big enough for a packet, it's big enough for
an ack so the ack spacing is preserved along the return path. Thus the ack spacing on
the sender's net As = Pb.
So, if packets after the first burst are sent only in response to an ack, the sender's packet
spacing will exactly match the packet time on the slowest link in the path.
Figure 1: Window Flow Control 'Self-clocking'
the p r o t o c o l is 'self c l o c k i n g ' (fig. 1). Self clocking sys- of this a l g o r i t h m is r a t h e r subtle, the i m p l e m e n t a t i o n is
t e m s a u t o m a t i c a l l y adjust to b a n d w i d t h a n d d e l a y vari- trivial - - o n e n e w state variable a n d three lines of c o d e
ations a n d h a v e a w i d e d y n a m i c r a n g e ( i m p o r t a n t con- in the sender:
s i d e r i n g t h a t TCP s p a n s a r a n g e f r o m 800 M b p s C r a y
c h a n n e l s to 1200 b p s p a c k e t r a d i o links). But the s a m e • A d d a congestion window, c w n d , to the per-
t h i n g that m a k e s a self-clocked s y s t e m stable w h e n it's c o n n e c t i o n state.
r u n n i n g m a k e s it h a r d to start - - to get d a t a f l o w i n g
there m u s t be acks to clock o u t p a c k e t s b u t to get acks • W h e n starting or restarting after a loss, set c w n d
there m u s t be d a t a flowing. to o n e packet.
To start the 'clock', w e d e v e l o p e d a slow-start al- • O n each ack for n e w data, increase c w n d b y one
g o r i t h m to g r a d u a l l y increase the a m o u n t of d a t a in- packet.
transit. 2 A l t h o u g h w e flatter o u r s e l v e s that the d e s i g n
• W h e n s e n d i n g , s e n d the m i n i m u m of the re-
2Slow-start is quite similar to the CUTEalgorithm described in ceiver's advertised window and cwnd.
[Jai86b]. We didn't know about CUTEat the time we were devel-
oping slow-start but we should have--CUTE preceded our work by slow-startwas coined by John Nagle in a message to the IETF mailing
several months. list in March, '87. This name was clearly superior to ours and we
When describing our algorithm at the Feb., 1987, Internet Engineer- promptly adopted it.
ing Task Force (IETF) meeting, we called it soft-start, a reference to an
electronics engineer's technique to limit in-rush current. The name

I _ OneRoundTripTime _I
OR
~__________~One PacketTime
1R
2R
R .....................................................
The horizontal direction is time. The continuous time line has been chopped into one-
round-trip-time pieces stacked vertically with increasing time going down the page. The
grey, numbered boxes are packets. The white numbered boxes are the corresponding
acks.
As each ack arrives, two packets are generated: one for the ack (the ack says a packet
has left the system so a new packet is added to take its place) and one because an ack
opens the congestion window by one packet. It may be clear from the figure why an
add-one-packet-to-window policy opens the window exponentially in time.
If the local net is much faster than the long haul net, the ack's two packets arrive at the
bottleneck at essentially the same time. These two packets are shown stacked on top of
one another (indicating that one of them would have to occupy space in the gateway's
outbound queue). Thus the short-term queue demand on the gateway is increasing
exponentially and opening a window of size W packets will require W/2 packets of
buffer capacity at the bottleneck.
Figure 2: The Chronology of a Slow-start
Actually, the slow-start w i n d o w increase isn't that 2 Conservation at equilibrium:

slow: it takes time R log 2 W where R is the r o u n d -
trip-time and W is the w i n d o w size in packets (fig. 2).
round-trip timing
This m e a n s the w i n d o w opens quickly e n o u g h to have
Once data is flowing reliably, problems (2) and (3)
a negligible effect on performance, even on links with
should be addressed. Assuming that the protocol im-
a large b a n d w i d t h - d e l a y product. A n d the algorithm
guarantees that a connection will source data at a rate at plementation is correct, (2) m u s t represent a failure of
sender's retransmit timer. A g o o d r o u n d trip time es-
most twice the m a x i m u m possible on the path. Without
slow-start, b y contrast, w h e n 10 Mbps Ethernet hosts timator, the core of the retransmit timer, is the single
talk over the 56 Kbps A r p a n e t via IP gateways, the first- most important feature of any protocol implementa-
h o p gateway sees a burst of eight packets delivered at tion that expects to survive h e a v y load. A n d it is fre-
200 times the p a t h b a n d w i d t h . This burst of packets quently botched ([Zha86] and [Jai86a] describe typical
often puts the connection into a persistant failure m o d e problems).
of continuous retransmissions (figures 3 and 4). One mistake is not estimating the variation, aR, of
the r o u n d trip time, R. From queuing t h e o r y w e k n o w

.,,"
?*
d,
Z
o~
g~
69
e~
o
0.
o
o ..y':":" o/
0 2 4 6 8 10
SendTime(sec)
Trace data of the start of a TCP conversation b e t w e e n t w o Sun 3/50s r u n n i n g Sun os 3.5
(the 4.3BSD TCP). The two Suns were on different Ethemets connected by IP gateways
driving a 230.4 Kbs point-to-point link (essentially the setup shown in fig. 7).
Each dot is a 512 data-byte packet. The x-axis is the time the packet was sent. The y-axis
is the sequence number in the packet header. Thus a vertical array of dots indicate
back-to-back packets and two dots with the same y but different x indicate a retransmit.
'Desirable' behavior on this graph would be a relatively smooth line of clots extending
diagonally from the lower left to the upper right. The slope of this line would equal the
available bandwidth. Nothing in this trace resembles desirable behavior.
The dashed line shows the 20 KBps bandwidth available for this connection. Only 35%
of this bandwidth was used; the rest was wasted on retransmits. Almost everything is
retransmitted at ]east once and data from 54 to 58 KB is sent five times.
Figure 3: Startup behavior of TCP without Slow-start
that R and the variation in R increase quickly with The parameter fl accounts for RTT variation (see
load. If the load is p (the ratio of average arrival rate to [Cla82], section 5). The suggested fl = 2 can adapt
average departure rate), R and aR scale like (1 - p ) - l . to loads of at most 30%. Above this point, a connection
To make this concrete, if the network is running at 75% will respond to load increases by retransmitting packets
of capacity, as the Arpanet was in last April's collapse, that have only been delayed in transit. This forces the
one should expect round-trip-time to vary by a factor network to do useless work, wasting b a n d w i d t h on du-
of sixteen (±2~). plicates of packets that will be delivered, at a time w h e n
The TCP protocol specification, [RFC793], suggests it's k n o w n to be having trouble with useful work. I.e.,
estimating m e a n round trip time via the low-pass filter this is the network equivalent of pouring gasoline on a
fire.
R = c~R + ( 1 - ~ ) M
We developed a cheap m e t h o d for estimating vari-
where R is the average RTT estimate, M is a round trip ation (see appendix A) 3 and the resulting retransmit
time measurement from the most recently acked data timer essentially eliminates spurious retransmissions.
packet, and c~ is a filter gain constant with a suggested
value of 0.9. Once the R estimate is updated, the re- 3 We are far from the first to recognize that transport needs to esti-
transmit timeout interval, rto, for the next packet sent mate both m e a n and variation. See, for example, [Edg83]. But we do
think our estimator is simpler than most.
is set to fiR.
ACM S I G C O M M -161- Computer Communication Review

..'
.f"J /
o _ . . .......'"'""..."' 0/"8 ,,#
v o
,/fy /fj,f, . •
80 -
~" j,Z ZZf "''ff "
o . , , , , ,
2 4 6 8 10
Send Time (sec)
Same conditions as the previous figure (same time of day, same Suns, same network
path, same buffer and window sizes), except the machines were running the 4.3+TCP
with slow-start.
No bandwidth is wasted on retransmits but two seconds is spent on the slow-start so
the effective bandwidth of this part of the trace is 16 KBps - - two times better than
figure 3. (This is slightly misleading: Unlike the previous figure, the slope of the trace
is 20 KBps and the effect of the 2 second offset decreases as the trace lengthens. E.g.,
if this trace had run a minute, the effective bandwidth would have been 19 KBps. The
effective bandwidth without slow-start stays at 7 KBps no matter how long the trace.)
Figure 4: Startup behavior of T C P with Slow-start
A pleasant side effect of estimating ~ rather than using system theory says that if a system is stable, the stability
a fixed value is that low load as well as high load per- is exponential. This suggests that an unstable system
formance improves, particularly over high delay paths (a n e t w o r k subject to r a n d o m load shocks a n d p r o n e to
such as satellite links (figures 5 and 6). congestive collapse s ) can be stabilized b y a d d i n g some
A n o t h e r timer mistake is in the backoff after a re- exponential d a m p i n g (exponential timer backoff) to its
transmit: If a packet has to be retransmitted m o r e than p r i m a r y excitation (senders, traffic sources).
once, h o w s h o u l d the retransmits be spaced? Only one
scheme will work, exponential backoff, b u t p r o v i n g this
is a b i t involved. 4 To finesse a proof, note that a n e t w o r k 3 Adapting to the path: congestion
is, to a v e r y g o o d approximation, a linear system. That avoidance
is, it is c o m p o s e d of elements that b e h a v e like linear op-
erators m integrators, delays, gain stages, etc. Linear
If the timers are in g o o d shape, it is possible to state with
4An in-progress paper attempts a proof. If an IP gateway is viewed some confidence that a timeout indicates a lost packet
as a 'shared resource with fixed capacity', it bears a remarkable resem- and not a b r o k e n timer. At this point, s o m e t h i n g can be
blance to the 'ether' in an Ethernet. The retransmit backoffproblem is d o n e about (3). Packets get lost for t w o reasons: they
essentially the same as showing that no backoff'slower' than an expo-
nential will guarantee stability on an Ethernet. Unfortunately, in the- s The phrase congestion collapse (describing a positive feedback in-
ory even exponential backoff won't guarantee stability (see [Aid87]). stability due to poor retransmit timers) is again the coinage of John
Fortunately, in practise we don't have to deal with the theorist's infi- Nagle, this time from [Nag84].
nite user population and exponential is "good enough".
ACM SIGCOMM -162- C o m p u t e r Communication Review

i
"-''i_i
, i .... . i
ii , --'
!" !'C" -"
I I I I I I I t I I
10 20 30 40 50 60 70 80 90 1O0 110
Packet
Trace data showing per-packet round trip time on a well-behaved Arpanet connection.
The x-axis is the packet number (packets were numbered sequentially, starting with one)
and the y-axis is the elapsed time from the send of the packet to the sender's receipt of
its ack. During this portion of the trace, no packets were dropped or retransmitted.
The packets are indicated by a dot. A dashed line connects them to make the sequence
easier to follow. The solid line shows the behavior of a retransmit timer computed
according to the rules of RFC793.
Figure 5: Performance of an RFC793 retransmit timer
are d a m a g e d in transit, or the n e t w o r k is congested a n d modification (as o p p o s e d to [JRC87] w h i c h requires a

s o m e w h e r e on the p a t h there w a s insufficient buffer n e w bit in the packet h e a d e r s a n d a modification to all
capacity. O n m o s t n e t w o r k paths, loss d u e to d a m a g e existing g a t e w a y s to set this bit).
is rare (<< 1%) so it is p r o b a b l e that a packet loss is d u e The other p a r t of a congestion a v o i d a n c e strategy,
to congestion in the network. 6 the e n d n o d e action, is a l m o s t identical in the DEC/ISO
A 'congestion a v o i d a n c e ' strategy, such as the one scheme a n d o u r TCP 7 a n d follows directly f r o m a first-
p r o p o s e d in [JRC87], will h a v e t w o components: The order time-series m o d e l of the network: Say n e t w o r k
n e t w o r k m u s t be able to signal the t r a n s p o r t e n d p o i n t s load is m e a s u r e d b y a v e r a g e q u e u e length o v e r fixed
that congestion is occurring (or a b o u t to occur). A n d the intervals of s o m e a p p r o p r i a t e length (something n e a r
e n d p o i n t s m u s t h a v e a policy that decreases utilization the r o u n d trip time). If Gi is the load at interval /,
if this signal is received a n d increases utilization if the an u n c o n g e s t e d n e t w o r k can b e m o d e l e d b y saying Gi
signal isn't received. changes slowly c o m p a r e d to the s a m p l i n g time. I.e.,
If packet loss is (almost) always d u e to congestion
Li:N
a n d if a t i m e o u t is (almost) a l w a y s d u e to a lost packet,
we h a v e a g o o d candidate for the ' n e t w o r k is congested' ( N constant). If the n e t w o r k is subject to congestion,
signal. Particularly since this signal is delivered au- this zeroth order m o d e l breaks d o w n . The a v e r a g e
tomatically b y all existing networks, w i t h o u t special q u e u e length b e c o m e s the s u m of t w o terms, the N
a b o v e that accounts for the a v e r a g e arrival rate of n e w
6The congestion control scheme we propose is insensitive to dam- traffic a n d intrinsic delay, a n d a n e w t e r m that accounts
age loss until the loss rate is on the order of one packet per window
(e.g., 12-15% for an 8 packet window). At this high loss rate, any for the fraction of traffic left o v e r f r o m the last time in-
window flow control scheme will perform badly--a 12% loss rate de- terval and the effect of this left-over traffic (e.g., i n d u c e d
grades TCPthroughput by 60%. The additional degradation from the
congestion avoidance window shrinking is the least of one's prob- 7This is not an accident: We copied Jain's scheme after hearing
lems. A presentation in [IETF88] and an in-progress paper address his presentation at [IETE87]and realizing that the scheme was, in a
this subject in more detail. sense, universal.

o
,¢
.i.:i i i:
..
~ :
~:
..:
.." ...../i :
i • f'."
. ,
~" :"
.
," i-y" D
I T I I I I I I I I
10 20 30 40 50 60 70 80 90 1O0 110
Packet
Same data as above but the solid line shows a retransmit timer computed according to
the algorithm in appendix A.
Figure 6: Performance of a Mean+Variance retransmit timer
retransmits): share (since the n e t w o r k is stateless, it cannot k n o w

Li = N + 7Li-1 this). Thus a connection has to increase its b a n d w i d t h
utilization to find out the current limit. E.g., y o u could
(These are the first two terms in a Taylor series expan-
have been sharing the path with s o m e o n e else and con-
sion of L(t). There is reason to believe one might even-
verged to a w i n d o w that gives y o u each half the avail-
tually n e e d a three term, second order model, b u t not
able bandwidth. If she shuts d o w n , 50% of the band-
until the Internet has g r o w n substantially.)
w i d t h will be wasted unless y o u r w i n d o w size is in-
W h e n the n e t w o r k is congested, 7 m u s t be large and creased. What should the increase policy be?
the q u e u e lengths will start increasing exponentially, s
The first t h o u g h t is to use a symmetric, multiplica-
The system will stabilize only if the traffic sources throt-
tive increase, possibly with a longer time constant,
tle back at least as quickly as the queues are growing.
Wi = bWi-1, 1 < b <_ 1/d. This is a mistake. The result
Since a source controls load in a w i n d o w - b a s e d proto-
will oscillate wildly and, on the average, deliver p o o r
col b y adjusting the size of the window, W , we end u p
throughput. There is an analytic reason for this b u t it's
with the sender policy
tedious to derive. It has to d o with that fact that it is
On congestion: easy to drive the net into saturation b u t h a r d for the net
to recover (what [Kle76], chap. 2.1, calls the rush-hour
Wi = dWi_l (d < 1) effect). 9 Thus overestimating the available b a n d w i d t h
I.e., a multiplicative decrease of the w i n d o w size (which 9In fig. 1, note that the 'pipesize' is 16 packets, 8 in each path, but
becomes an exponential decrease over time if the con- the sender is using a window of 22 packets. The six excess packets
gestion persists). will form a queue at the entry to the bottleneck and that queue cannot
shrink, even though the sender carefully clocks out packets at the
If there's no congestion, 7 m u s t be near zero and the bottleneck link rate. This stable queue is another, unfortunate, aspect
load a p p r o x i m a t e l y constant. The n e t w o r k announces, of conservation: The queue would shrink only if the gateway could
via a d r o p p e d packet, w h e n d e m a n d is excessive b u t move packets into the skinny pipe faster than the sender dumped
packets into the fat pipe. But the system tunes itself so each time the
says nothing if a connection is using less than its fair
gateway pulls a packet off the front of its queue, the sender lays a
new packet on the end.
SI.e., the system behaves like Li ,.~ 7 L i - 1 , a difference equation A gateway needs excess output capacity (i.e., p < 1) to dissipate a
with the solution queue and the clearing time will scale like (1 - p)-2 ([Kle76], chap. 2
Ln = 7n Lo is an excellent discussion of this). Since at equilibrium our trans-
which goes exponentially to infinity for any 7 > 1. port connection 'wants' to run the bottleneck link at 100% (p = 1),

is costly. But an exponential, almost regardless of its to slow-start in addition to the above. But, because
time constant, increases so quickly that overestimates both congestion avoidance and slow-start are triggered
are inevitable. by a timeout and both manipulate the congestion win-
Without justification, I'll state that the best increase dow, they are frequently confused. They are actually in-
policy is to make small, constant changes to the window dependent algorithms with completely different objec-
size: tives. To emphasize the difference, the two algorithms
have been presented separately even though in prac-
On no congestion:
tise they should be implemented together. Appendix B
W~ = W~_~+ ~ (~ << Wmo=) describes a combined slow-start/congestion avoidance
algorithm. 11
where W,.,,a= is the pipesize (the delay-bandwidth prod- Figures 7 through 12 show the behavior of TCP con-
uct of the path minus protocol overhead - - i.e., the nections with and without congestion avoidance. Al-
largest sensible window for the unloaded path). This though the test conditions (e.g., 16 KB windows) were
is the additive increase / multiplicative decrease policy deliberately chosen to stimulate congestion, the test sce-
suggested in [JRC87] and the policy we've implemented nario isn't far from common practice: The Arpanet
in TCP. The only difference between the two implemen- IMP end-to-end protocol allows at most eight packets
tations is the choice of constants for d and u. We used in transit between any pair of gateways. The default
0.5 and I for reasons partially explained in appendix C. 4.3BSD window size is eight packets (4 KB). Thus si-
A more complete analysis is in yet another in-progress multaneous conversations between, say, any two hosts
paper. at Berkeley and any two hosts at MIT would exceed
The preceding has probably made the congestion the buffer capacity of the UCB-MIT IMP path and would
control algorithm sound hairy but it's not. Like slow- lead 12 to the behavior shown in the following figures.
start, it's three lines of code:
• On any timeout, set cwnd to half the current win- 4 Future work: the gateway side of
dow size (this is the multiplicative decrease).
congestion control
• On each ack for new data, increase cwnd by
1/cwnd (this is the additive increase). 10 While algorithms at the transport endpoints can insure
the network capacity isn't exceeded, they cannot insure
• When sending, send the minimum of the re-
ceiver's advertised window and cwnd. 11We have also developed a rate-based variant of the congestion
avoidance algorithm to apply to connectiontess traffic (e.g., domain
Note that this algorithm is only congestion avoidance, server queries, RPC requests). Remembering that the goal of the in-
crease and decrease policies is bandwidth adjustment, and that 'time'
it doesn't include the previously described slow-start. (the controlled parameter in a rate-based scheme) appears in the de-
Since the packet loss that signals congestion will re- nominator of bandwidth, the algorithm follows immediately: The
sult in a re-start, it will almost certainly be necessary multiplicative decrease remains a multiplicative decrease (e.g., dou-
ble the interval between packets). But subtracting a constant amount
we have to be sure that during the non-equilibrium window adjust- from interval does not result in an additive increase in bandwidth.
ment, our control policy allows the gateway enough free bandwidth This approach has been tried, e.g., [Kli87] and [PP87], and appears
to dissipate queues that inevitably form due to path testing and traf- to oscillate badly. To see why, note that for an inter-packet interval
fic fluctuations. By an argument similar to the one used to show / and decrement c, the bandwidth change of a decrease-interval-by-
exponential timer backoff is necessary, it's possible to show that an constant policy is
1 1
exponential (multiplicative) window increase policy will be 'faster'
than the dissipation time for some traffic mix and, thus, leads to an 7 --+ [ - c
unbounded growth of the bottleneck queue. a non-linear, and destablizing, increase.
An update policy that does result in a linear increase of bandwidth
10This increment rule may be less than obvious. We want to in-
over time is
crease the window by at most one packet over a time interval of Odi-- I
length R (the round trip time). To make the algorithm "self-clocked',
a + li-1
it's better to increment by a small amount on each ack rather than by
a large amount at the end of the interval. (Assuming, of course, that where Ii is the interval between sends when the i t h packet is sent
the sender has effective silly window avoidance (see [Cla82], section and c~ is the desired rate of increase in packets per packet/sec.
3) and doesn't attempt to send packet fragments because of the frac- We have simulated the above algorithm and it appears to perform
tionally sized window.) A window of size cwnd packets will generate well. To test the predictions of that simulation against reality, we
at most cwnd acks in one R. Thus an increment of 1/cwnd per ack have a cooperative project with Sun Microsystems to prototype RPC
will increase the window by at most one packet in one R. In TCP, dynamic congestion control algorithms using NFS as a test-bed (since
windows and packet sizes are in bytes so the increment translates to NFS is known to have congestion problems yet it would be desirable
maxseg*maxseg/cwnd where maxseg is the maximum segment size and to have it work over the same range of networks as TCP).
cwnd is expressed in bytes, not packets. 12did lead.

C2y M°r°wav°
~'~""~ 10MbsEthernets
Test setup to examine the interaction of multiple, simultaneous TCP conversations shar-
ing a bottleneck link. 1 MByte transfers (2048 512-data-byte packets) were initiated 3
seconds apart from four machines at LBL to four machines at UCB, one conversation
per machine pair (the dotted lines above show the pairing). All traffic went via a 230.4
Kbps link connecting IP router csam at LBL to IP router cartan at UCB.
The microwave link queue can hold up to 50 packets. Each connection was given a
window of 16 KB (32 512-byte packets). Thus any two connections could overflow the
available buffering and the four connections exceeded the queue capacity by 160%.
Figure 7: Multiple conversation test setup
fair sharing of that capacity. Only in gateways, at the on averaging b e t w e e n q u e u e regeneration points. This
convergence of flows, is there e n o u g h information to should yield good burst filtering b u t we think it might
control sharing and fair allocation. Thus, w e view the have convergence problems u n d e r high load or signif-
gateway 'congestion detection' algorithm as the next icant second-order dynamics in the traffic. 13 We plan
big step. to use some of our earlier w o r k on ARMAX models for
The goal of this algorithm to send a signal to the r o u n d - t r i p - t i m e / q u e u e length prediction as the basis
e n d n o d e s as early as possible, b u t not so early that the of detection. Preliminary results suggest that this ap-
gateway b e c o m e s starved for traffic. Since we plan proach works well at high load, is i m m u n e to second-
to continue using packet drops as a congestion sig- order effects in the traffic and is computationally cheap
nal, gateway 'self protection' from a mis-behaving host e n o u g h to not slow d o w n kilopacket-per-second gate-
should faU-out for free: That host will simply have most ways.
of its packets d r o p p e d as the gateway trys to tell it
that it's using m o r e than its fair share. Thus, like the
e n d n o d e algorithm, the gateway algorithm should re- Acknowledgements
duce congestion even if no e n d n o d e is modified to do
congestion avoidance. A n d nodes that do i m p l e m e n t I am v e r y grateful to the m e m b e r s of the Internet Activ-
congestion avoidance will get their fair share of band- ity Board's End-to-End and Internet-Engineering task
w i d t h and a m i n i m u m n u m b e r of packet drops. forces for this past y e a r ' s a b u n d a n t s u p p l y of inter-
Since congestion grows exponentially, detecting it est, encouragement, cogent questions and n e t w o r k in-
early is i m p o r t a n t - - If detected early, small adjust- sights.
ments to the senders' w i n d o w s will cure it. Other-
wise massive adjustments are necessary to give the net 13The time between regeneration points scales like (1 - p)-1 and
e n o u g h spare capacity to p u m p out the backlog. But, the variance of that time like (1 - p)-3 (see [Fe171],chap. VI.9). Thus
the congestion detector becomes sluggish as congestion increases and
given the b u r s t y nature of traffic, reliable detection is a its signal-to-noise ratio decreases dramatically.
non-trivial problem. [JRC87] proposes a scheme based

g
g
,/
.,~¢;
x/
~g d
g
CD :.%.. -*-
,,2' "'
a,t, "%'
0 50 100 150 200

Time (sec)
Trace data from four simultaneous TCP conversations without congestion avoidance
over the paths shown in figure 7.
4,000 of 11,000 packets sent were retransmissions (i.e., half the data packets were re-
transmitted).
Since the link data bandwidth is 25 KBps, each of the four conversations should have
received 6 KBps. Instead, one conversation got 8 KBps, two got 5 KBps, one got 0.5
KBps and 6 KBps has vanished.
Figure 8: Multiple, simultaneous TCPs with no congestion avoidance
I a m also d e e p l y in d e b t to Jeff M o g u l of DEC. With- G i v e n a n e w m e a s u r e m e n t M of the RTT (round trip

out Jeff's patient p r o d d i n g a n d way-beyond-the-call- time), TCP u p d a t e s an estimate of the a v e r a g e RTT A b y
of-duty efforts to help m e get a draft s u b m i t t e d before
deadline, this p a p e r w o u l d n e v e r h a v e existed. A *---(1 - g)A + g M
w h e r e g is a 'gain' (0 < g < 1) that s h o u l d be related

to the signal-to-noise ratio (or, equivalently, variance)
A A fast algorithm for rtt m e a n and of M . This m a k e s a m o r e sense, a n d c o m p u t e s faster, if
variation w e rearrange a n d collect t e r m s m u l t i p l i e d b y g to get
A ~ A +g(M- A)
A.1 Theory
Think of A as a prediction of the n e x t m e a s u r e m e n t .
The RFC793 a l g o r i t h m for estimating the m e a n r o u n d M - A is the error in that prediction a n d the expression
trip time is one of the simplest e x a m p l e s of a class of es- a b o v e says w e m a k e a n e w prediction b a s e d on the old
timators called recursive prediction error or stochastic gra- prediction plus s o m e fraction of the prediction error.
dient algorithms. In the past 20 years these algorithms The prediction error is the s u m of t w o c o m p o n e n t s : (1)
h a v e revolutionized estimation a n d control theory 14 error d u e to 'noise' in the m e a s u r e m e n t (random, un-
a n d it's p r o b a b l y w o r t h looking at the RFC793 estima- predictable effects like fluctuations in c o m p e t i n g traffic)
tor in s o m e detail. and (2) error d u e to a b a d choice of A. Calling the ran-
d o m error E~ a n d the estimation error Ee,
14See, for example [LS83].
A ~-- A + gE~ + gEe

g
o - ..,
/
." .
/
/ /
. ,°o."
o
oo
/'
'2 ~°~°°
OOB
~ 8 (D
oof 6°e
=-
g
o)
o
o
CM
I" °,"
0 50 100 150 200

T i m e (see)
Trace data f r o m four simultaneous TCP conversations using congestion avoidance over
the paths shown in figure 7.
89 of 8281 packets sent were retransmissions (i.e., 1% of the data packets had to be
retransmitted).
Two of the conversations got 8 KBps and two got 4.5 KBps (i.e., all the link bandwidth
is accounted for - - see fig. 11). The difference between the high and low bandwidth
senders was due to the receivers. The 4.5 KBps senders were talking to 4.3BSDreceivers
which would delay an ack until 35% of the window was filled or 200 ms had passed
(i.e., an ack was delayed for 5-7 packets on the average). This meant the sender would
deliver bursts of 5-7 packets on each ack.
The 8 KBps senders were talking to 4.3 + BSDreceivers which would delay an ack for at
most one packet (the author doesn't believe that delayed acks are a particularly good
idea). I.e., the sender would deliver bursts of at most two packets.
The probability of loss increases rapidly with burst size so senders talking to old-style
receivers saw three times the loss rate (1.8% vs. 0.5%). The higher loss rate meant more
time spent in retransmit wait and, because of the congestion avoidance, smaller average
window sizes.
F i g u r e 9: M u l t i p l e , s i m u l t a n e o u s T C P s w i t h c o n g e s t i o n a v o i d a n c e
The gE~ term gives A a kick in the right direction while It's probably obvious that A will oscillate r a n d o m l y
the gE,, term gives it a kick in a r a n d o m direction. Over a r o u n d the true average and the s t a n d a r d deviation of
a n u m b e r of samples, the r a n d o m kicks cancel each A will be g s d e v ( M ) . Also that A converges to the
other out so this algorithm tends to converge to the true average exponentially with time constant 1/g. So
correct average. But g represents a compromise: We a smaller g gives a stabler A at the expense of taking a
w a n t a large g to get mileage out of E¢ but a small g m u c h longer time to get to the true average.
to minimize the d a m a g e from E~. Since the E¢ terms If we w a n t some m e a s u r e of the variation in M , say
m o v e A t o w a r d the real average no matter w h a t value to c o m p u t e a g o o d value for the TCP retransmit timer,
we use for g, it's almost always better to use a gain there are several alternatives. Variance, cr2, is the con-
that's too small rather than one that's too large. Typical ventional choice because it has some nice mathematical
gain choices are 0.1-0.2 (though it's a g o o d idea to take properties. But c o m p u t i n g variance requires squaring
long look at y o u r r a w data before picking a gain). (M - A) so an estimator for it will contain a m u l t i p l y

cO ~:
I I T I I
20 40 60 80 100 120
Time (sec)
The thin line shows the total bandwidth used by the four senders without congestion
avoidance (fig. 8), averaged over 5 second intervals and normalized to the 25 KBps link
bandwidth. Note that the senders send, on the average, 25% more than will fit in the
wire.
The thick line is the same data for the senders with congestion avoidance (fig. 9). The
first 5 second interval is low (because of the slow-start), then there is about 20 seconds
of damped oscillation as the congestion control 'regulator' for each TCP finds the correct
window size. The remaining time the senders run at the wire bandwidth. (The activity
around 110 seconds is a bandwidth 're-negotiation" due to connection one shutting
down. The activity around 80 seconds is a reflection of the 'flat spot' in fig. 9 where
most of conversation two's bandwidth is suddenly shifted to conversations three and
four - - a colleague and I find this 'punctuated equilibrium' behavior fascinating and
hope to investigate its dynamics in a future paper.)
Figure 10: Total bandwidth used by old and new TCPs
w i t h a d a n g e r of integer overflow. Also, m o s t applica- ( X / - ~ ~ 1.25). I.e., m d e v is a g o o d a p p r o x i m a t i o n of

tions will w a n t v a r i a t i o n in the s a m e units as A a n d M , sdev a n d is m u c h easier to c o m p u t e .
so w e ' l l be f o r c e d to take the s q u a r e root of the variance
to use it (i.e., at least a divide, m u l t i p l y a n d t w o adds).
A v a r i a t i o n m e a s u r e t h a t ' s easy to c o m p u t e is the
A.2 Practice
m e a n p r e d i c t i o n e r r o r or m e a n deviation, the a v e r a g e
Fast estimators for a v e r a g e A a n d m e a n d e v i a t i o n D
of IM - A I . Also, since g i v e n m e a s u r e m e n t M f o l l o w directly f r o m the above.
Both estimators c o m p u t e m e a n s so there are t w o in-
m d e v 2 = ( ~ _ ~ I M - A 0 2 _> ~ I M - A I 2=~2
stances of the RFC793 a l g o r i t h m :
mean deviation is a more conservative (i.e., larger) es-
Err = M - A
timate of variation than standard deviation. 15
There's often a simple relation between mdev and A ~-- A + g E r r
sdev. E.g., if the prediction errors are normally dis-
tributed, m d e v = X / ' - ~ s d e v . For m o s t c o m m o n distri-
D ~ D + g(lErrl - D)
b u t i o n s the factor to g o f r o m s d e v to m d e v is n e a r one To c o m p u t e quickly, the a b o v e s h o u l d be d o n e in in-

15Mathematical purists m a y note that I elided a factor of n , the
teger arithmetic. But the e x p r e s s i o n s c o n t a i n fractions
n u m b e r of samples, from the previous inequality. It makes no differ- (g < 1) so s o m e scaling is n e e d e d to k e e p e v e r y t h i n g
ence in the result. integer. A r e c i p r o c a l p o w e r of 2 (i.e., g = 1 / 2 n for s o m e

o
i o 6
>=
c5
¢o
I I I I I
20 40 60 80 1 O0 120
Time (sec)
Figure 10 showed the old TCPs were using 25% more than the bottleneck link b a n d w i d t h .
Thus, once the bottleneck queue filled, 25% of the the senders' packets were being
discarded. If the discards, and only the discards, were retransmitted, the senders w o u l d
have received the full 25 KBps link b a n d w i d t h (i.e., their behavior w o u l d have been anti-
social b u t not self-destructive). But fig. 8 noted that around 25% of the link b a n d w i d t h
was unaccounted for.
Here w e average the total amount of data acked per five second interval. (This gives
the effective or deliveredb a n d w i d t h of the link.) The thin line is once again the old TCPs.
Note that only 75% of the link b a n d w i d t h is being used for data (the remainder m u s t
have been used b y retransmissions of packets that d i d n ' t need to be retransmitted).
The thick line shows delivered b a n d w i d t h for the new TCPs. There is the same slow-start
and turn-on transient followed b y a long period of operation right at the link b a n d w i d t h .
Figure 11: Effective bandwidth of old and new TCPs
n ) is a p a r t i c u l a r l y g o o d c h o i c e for g s i n c e t h e s c a l i n g to c h a n g e s in t h e RTT, i t ' s a g o o d i d e a to g i v e D a

c a n b e i m p l e m e n t e d w i t h shifts. M u l t i p l y i n g t h r o u g h l a r g e r gain. I n p a r t i c u l a r , b e c a u s e of w i n d o w - d e l a y
b y 1/g g i v e s m i s m a t c h , t h e r e a r e o f t e n RTT a r t i f a c t s at i n t e g e r m u l -
2hA *- 2hA + Err t i p l e s of t h e w i n d o w size. 16 To filter t h e s e , o n e w o u l d
l i k e 1/g in the A e s t i m a t o r to b e at l e a s t a s l a r g e a s t h e
2riD <-- 2riD + (IErrl - D)
w i n d o w size (in p a c k e t s ) a n d 1/g in the D e s t i m a t o r to
To m i n i m i z e r o u n d - o f f error, t h e s c a l e d v e r s i o n s of b e less t h a n t h e w i n d o w size. 17
A and D, SA and SD, s h o u l d b e k e p t r a t h e r t h a n t h e U s i n g a g a i n of .25 o n t h e d e v i a t i o n a n d c o m p u t i n g
u n s c a l e d v e r s i o n s . P i c k i n g g = .125 = ~ (close to t h e .1 t h e r e t r a n s m i t timer, rto, as A + 2 D , t h e f i n a l t i m e r c o d e
s u g g e s t e d in RFC793) a n d e x p r e s s i n g t h e a b o v e in C:
16E.g., see packets 10-50 of figure 5. Note that these window ef-
fects are due to characteristics of the Arpa/Milnet subnet. In general,
M -= (SA >> 3); /* = Err */ window effects on the timer are at most a second-order consideration
SA += M; and depend a great deal on the underlying network. E.g., if one were
if (M < 0) using the Wideband with a 256 packet window, 1/256 would not be
M = -M; /* = abs (Err) */ a good gain for A (1/16 might be).
M -= (SD >> 3); 17Although it may not be obvious, the absolute value in the calcu-
SD += M; lation of D introduces an asymmetry in the timer: Because D has
the same sign as an increase and the opposite sign of a decrease, more
gain in D makes the timer go up quickly and come down slowly, 'au-
I t ' s n o t n e c e s s a r y to u s e t h e s a m e g a i n for A a n d tomatically' giving the behavior suggested in [Mi183]. E.g., see the
D. To f o r c e t h e t i m e r to g o u p q u i c k l y in r e s p o n s e region between packets 50 and 80 in figure 6.

. .....'
I I I
o 0 20 40 60 80
Time (sec)
Because of the five second averaging time (needed to smooth out the spikes in the old
TCP data), the congestion avoidance window policy is difficult to make out in figures
10 and 11. Here we show effective throughput (data acked) for T C P s with congestion
control, averaged over a three second interval.
When a packet is dropped, the sender sends until it fills the window, then stops until
the retransmission timeout. Since the receiver cannot ack data beyond the dropped
packet, on this plot we'd expect to see a negative-going spike whose amplitude equals
the sender's window size (minus one packet). If the retransmit happens in the next
interval (the intervals were chosen to match the retransmit timeout), we'd expect to see
a positive-going spike of the same amplitude when receiver acks its cached data. Thus
the height of these spikes is a direct measure of the sender's window size.
The data clearly shows three of these events (at 15, 33 and 57 seconds) and the window
size appears to be decreasing exponentially. The dotted line is a least squares fit to the
six window size measurements obtained from these events. The fit time constant was 28
seconds. (The long time constant is due to lack of a congestion avoidance algorithm in
the gateway. With a 'drop' algorithm running in the gateway, the time constant should
be around 4 seconds)
Figure 12: Window adjustment detail
looks like: b y half a tick and one tick needs to be a d d e d to account

for sends being p h a s e d r a n d o m l y with respect to the
M -= (SA >> 3); clock. So, the 1.5 tick bias contribution from A + 2D
SA += M; equals the desired half tick r o u n d i n g plus one tick phase
if (M < 0) correction.
M = -M;
M -= (SD >> 2);
SD += M;
rto = ((SA >> 2) + SD) >> i;
B The combined slow-start with
congestion avoidance algorithm
Note that S A a n d S D are a d d e d b e f o r e the final shift. In
general, this will correctly r o u n d rto: Because of the S A The sender keeps two state variables for congestion con-
truncation w h e n c o m p u t i n g M - A , S A will converge trol: a congestion window, cwnd, a n d a threshold size,
to the true m e a n r o u n d e d u p to the next tick. Likewise ssthresh, to switch b e t w e e n the two algorithms. The
w i t h S D . Thus, on the average, there is half a tick of s e n d e r ' s o u t p u t routine always sends the m i n i m u m of
bias in each. The r t o c o m p u t a t i o n should be r o u n d e d cwnd and the w i n d o w advertised b y the receiver. O n

a timeout, half the current window size is recorded in time needed to empty the queue. If that sender restarts
ssthresh (this is the multiplicative decrease part of the with the correct window size, the queue won't reform.
congestion avoidance algorithm), then cwnd is set to Thus the delay has been reduced to m i n i m u m without
1 packet (this initiates slow-start). When new data is the system losing any bottleneck bandwidth.
acked, the sender does The 1-packet increase has less justification than the
0.5 decrease. In fact, it's almost certainly too large. If the
if (cwnd < s s t h r e s h ) algorithm converges to a window size of w, there are
/* if w e ' r e s t i l l d o i n g s l o w - s t a r t
O(w 2) packets between drops with an additive increase
* o p e n w i n d o w e x p o n e n t i a l l y */
policy. We were shooting for an average drop rate of
c w n d += 1
else
< 1% and found that on the Arpanet (the worst case of
/* o t h e r w i s e do C o n g e s t i o n the four networks we tested), windows converged to
* A v o i d a n c e i n c r e m e n t - b y - i */ 8-12 packets. This yields I packet increments for a 1%
c w n d += i / c w n d average drop rate.
But, since we've done nothing in the gateways, the
Thus slow-start opens the window quickly to what window we converge to is the maximum the gateway
congestion avoidance thinks is a safe operating point can accept without dropping packets. I.e., in the terms
(half the window that got us into trouble), then con- of [JRC87], we are just to the left of the cliff rather than
gestion avoidance takes over and slowly increases the just to the right of the knee. If the gateways are fixed
window size to probe for more bandwidth becoming so they start dropping packets when the queue gets
available on the path. pushed past the knee, our increment will be much too
aggressive and should be dropped by about a factor of
four (since our measurements on an unloaded Arpanet
C Window Adjustment Policy place its 'pipe size' at 4-5 packets). It appears trivial to
implement a second order control loop to adaptively de-
A reason for using ½ as a the decrease term, as op- termine the appropriate increment to use for a path. But
second order problems are on hold until we've spent
posed to the 7 in [JRC87], was the following handwav-
some time on the first order part of the algorithm for
ing: When a packet is dropped, you're either starting
the gateways.
(or restarting after a drop) or steady-state sending. If
you're starting, you know that half the current window
size 'worked', i.e., that a window's worth of packets
were exchanged with no drops (slow-start guarantees
this). Thus on congestion you set the window to the
largest size that you know works then slowly increase
the size. If the connection is steady-state running and
a packet is dropped, it's probably because a new con-
nection started up and took some of your bandwidth.
We usually run our nets with p < 0.5 so it's probable
that there are now exactly two conversations sharing
the bandwidth. I.e., you should reduce your window
by half because the bandwidth available to you has been
reduced by half. And, if there are more than two con-
versations sharing the bandwidth, halving your win-
dow is conservative - - and being conservative at high
traffic intensities is probably wise.
Although a factor of two change in window size
seems a large performance penalty, in system terms the
cost is negligible: Currently, packets are dropped only
when a large queue has formed. Even with an [ISO86]
'congestion experienced' bit to force senders to reduce
their windows, we're stuck with the queue because the
bottleneck is running at 100% utilization with no excess
bandwidth available to dissipate the queue. If a packet
is tossed, some sender shuts up for two RTT,exactly the

References [LS83] Lennart Ljung and Torsten S6derstr6m. Theory and
Practice of Recursive Identification. MIT Press, 1983.
[Ald87] David J. Aldous. Ultimate instability of exponen- [Mi183] David Mills. Internet Delay Experiments. ARPANET
tial back-off protocol for acknowledgment based Working Group Requests for Comment, DDN Net-
transmission control of random access communi- work Information Center, SRI International, Menlo
cation channels. IEEE Transactions on Information Park, CA, December 1983. RFC-889.
Theory, IT-33(2), March 1987.
[Nag84] John Nagle. Congestion Control in IP/TCP Inter-
[Cla82] David Clark. Window and Acknowlegement Strat- networks. ARPANETWorking Group Requests for
egy in TCP. ARPANETWorking Group Requests for Comment, DDN Network Information Center, SRI
Comment, DDN Network Information Center, SRI International, Menlo Park, CA, January 1984. RFC-
International, Menlo Park, CA, July 1982. RFC-813. 896.
[Edg831 Stephen William Edge. An adaptive timeout algo- [PP87] W. Prue and J. Postel. Something A Host Could Do
rithm for retransmission across a packet switching with Source Quench. ARPANETWorking Group Re-
network. In Proceedings of SIGCOMM "83. ACM, quests for Comment, DDN Network Information
March 1983. Center, SRI International, Menlo Park, CA, July
[Fe171] William Feller. Probability Theory and its Applica- 1987. RFC-1016.
tions, volume II. John Wiley & Sons, second edi- [RFC793] Jon Postel, editor. Transmission Control Protocol
tion, 1971. Specification. ARPANETWorking Group Requests
[IETF87] Proceedings of the Sixth Internet Engineering Task for Comment, DDN Network Information Cen-
Force, Boston, MA, April 1987. Proceedings avail- ter, SRI International, Menlo Park, CA, September
able as NIC document IETF-87/2P from DDN Net- 1981. RFC-793.
work Information Center, SRI International, Menlo [Zha86] Lixia Zhang. Why TCP timers don't work well. In
Park, CA. Proceedings of SIGCOMM '86. ACM, August 1986.
[IETF88] Proceedings of the Ninth Internet Engineering Task
Force, San Diego, CA, March 1988. Proceedings
available as NIC document IETF-88/?P from DDN
Network Information Center, SRI International,
Menlo Park, CA.
[ISO86] International Organization for Standardization.
ISO International Standard 8473, Information Pro-
cessing Systems - - Open Systems Interconnection - -
Connectionless-mode Network Service Protocol Specie-
cation, March 1986.
[Jai86a] Raj Jain. Divergence of timeout algorithms for
packet retransmissions. In Proceedings Fifth An-
nual International Phoenix Conference on Computers
and Communications, Scottsdale, AZ, March 1986.
[Jai86b] Raj Jain. A timeout-based congestion control
scheme for window flow-controlled networks.
IEEE Journal on Selected Areas in Communications,
SAC-4(7), October 1986.
[JRC871 Raj Jain, K.K. Ramakrishnan, and Dah-Ming Chiu.
Congestion avoidance in computer networks with
a connectionless network layer. Technical Report
DEC-TR-506, Digital Equipment Corporation, Au-
gust 1987.
[Kle76] Leonard Kleinrock. Queueing Systems, volume II.
John Wiley & Sons, 1976.
[Kli87] Charley Kline. Supercomputers on the Internet: A
case study. In Proceedings of SIGCOMM "87. ACM,
August 1987.
[KP87] Phil Karn and Craig Partridge. Estimating round-
trip times in reliable transport protocols. In Pro-
ceedings of SIGCOMM '87. ACM, August 1987.

Congestion Avoidance and Control: V. Jacobson

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Congestion Avoidance and Control: V. Jacobson

Uploaded by

Copyright:

Available Formats

Congestion Avoidance and Control

ACM SIGCOMM -157- Computer Communication Review

ACM SIGCOMM -158- Computer Communication Review

L__ \, ~E~ [!it

This is a schematic representation of a sender and receiver on high bandwidth networks

Figure 1: Window Flow Control 'Self-clocking'

ACM SIGCOMM -159- Computer Communication Review

Figure 2: The Chronology of a Slow-start

Actually, the slow-start w i n d o w increase isn't that 2 Conservation at equilibrium:

ACM SIGCOMM -160- Computer Communication Review

Figure 3: Startup behavior of TCP without Slow-start

ACM S I G C O M M -161- Computer Communication Review

~" j,Z ZZf "''ff "

Figure 4: Startup behavior of T C P with Slow-start

ACM SIGCOMM -162- C o m p u t e r Communication Review

Figure 5: Performance of an RFC793 retransmit timer

are d a m a g e d in transit, or the n e t w o r k is congested a n d modification (as o p p o s e d to [JRC87] w h i c h requires a

ACM SIGCOMM -163- Computer Communication Review

Figure 6: Performance of a Mean+Variance retransmit timer

retransmits): share (since the n e t w o r k is stateless, it cannot k n o w

ACM SIGCOMM -164- Computer Communication Review

ACM SIGCOMM -165- Computer Communication Review

Figure 7: Multiple conversation test setup

ACM SIGCOMM -166- Computer Communication Review

0 50 100 150 200

Figure 8: Multiple, simultaneous TCPs with no congestion avoidance

I a m also d e e p l y in d e b t to Jeff M o g u l of DEC. With- G i v e n a n e w m e a s u r e m e n t M of the RTT (round trip

w h e r e g is a 'gain' (0 < g < 1) that s h o u l d be related

ACM SIGCOMM -167- Computer Communication Review

0 50 100 150 200

ACM SIGCOMM -168- Computer Communication Review

Figure 10: Total bandwidth used by old and new TCPs

w i t h a d a n g e r of integer overflow. Also, m o s t applica- ( X / - ~ ~ 1.25). I.e., m d e v is a g o o d a p p r o x i m a t i o n of

b u t i o n s the factor to g o f r o m s d e v to m d e v is n e a r one To c o m p u t e quickly, the a b o v e s h o u l d be d o n e in in-

ACM SIGCOMM -169- Computer Communication Review

Figure 11: Effective bandwidth of old and new TCPs

n ) is a p a r t i c u l a r l y g o o d c h o i c e for g s i n c e t h e s c a l i n g to c h a n g e s in t h e RTT, i t ' s a g o o d i d e a to g i v e D a

ACM SIGCOMM -170- Computer Communication Review

Figure 12: Window adjustment detail

looks like: b y half a tick and one tick needs to be a d d e d to account

ACM SIGCOMM -171- Computer Communication Review

ACM SIGCOMM -172- Computer Communication Review

ACM SIGCOMM -173- Computer Communication Review

You might also like