IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION 1

Host-to-Host Congestion Control for TCP
Alexander Afanasyev, Neil Tilley, Peter Reiher, and Leonard Kleinrock
Abstract—The Transmission Control Protocol (TCP) carries
most Internet traffic, so performance of the Internet depends to a
great extent on how well TCP works. Performance characteristics
of a particular version of TCP are defined by the congestion
control algorithm it employs. This paper presents a survey of
various congestion control proposals that preserve the original
host-to-host idea of TCP—namely, that neither sender nor
receiver relies on any explicit notification from the network. The
proposed solutions focus on a variety of problems, starting with
the basic problem of eliminating the phenomenon of congestion
collapse, and also include the problems of effectively using the
available network resources in different types of environments
(wired, wireless, high-speed, long-delay, etc.). In a shared, highly
distributed, and heterogeneous environment such as the Internet,
effective network use depends not only on how well a single TCP-
based application can utilize the network capacity, but also on
how well it cooperates with other applications transmitting data
through the same network. Our survey shows that over the last
20 years many host-to-host techniques have been developed that
address several problems with different levels of reliability and
precision. There have been enhancements allowing senders to
detect fast packet losses and route changes. Other techniques
have the ability to estimate the loss rate, the bottleneck buffer
size, and level of congestion. The survey describes each congestion
control alternative, its strengths and its weaknesses. Additionally,
techniques that are in common use or available for testing are
described.
Index Terms—TCP, congestion control, congestion collapse,
packet reordering in TCP, wireless TCP, high-speed TCP
I. INTRODUCTION
M
OST CURRENT Internet applications rely on the
Transmission Control Protocol (TCP) [1] to deliver
data reliably across the network. Although it was not part of its
initial design, the most essential element of TCP is congestion
control; it defines TCP’s performance characteristics. In this
paper we present a survey of the congestion control proposals
for TCP that preserve its fundamental host-to-host principle,
meaning they do not rely on any kind of explicit signaling
from the network.
1
The proposed algorithms introduce a wide
variety of techniques that allow senders to detect loss events,
congestion state, and route changes, as well as measure the
loss rate, the RTT, the RTT variation, bottleneck buffer sizes,
and congestion level with different levels of reliability and
precision.
The key feature of TCP is its ability to provide a reliable,
bi-directional, virtual channel between any two hosts on the
Internet. Since the protocol works over the IP network [3],
which provides only best-effort service for delivering packets
Manuscript received 15 December 2009; revised 15 March 2010.
The authors are with the University of California, Los Angeles (e-mail:
{afanasev,tilleyns,reiher, lk}@cs.ucla.edu).
Digital Object Identifier 10.1109/SURV.2010.042710.00114
1
Lochert et al. [2] have presented a thorough survey on the congestion
control approaches which rely on explicit network signaling.
Transmitted packets
(the window)
Packets prepared for
transmission
Acknowledged packets
Inital transmission
After receipt of
acknowledgements
Sender’s output buffer
Fig. 1. Sliding window concept: the window “slides” along the sender’s
output buffer as soon as the receiver acknowledges delivery of at least one
packet
across the network, the TCP standard [1] specifies a sliding
window based flow control. This flow control has several
mechanisms. First, the sender buffers all data before the
transmission, assigning a sequence number to each buffered
byte. Continuous blocks of the buffered data are packetized
into TCP packets that include a sequence number of the
first data byte in the packet. Second, a portion (window)
of the prepared packets is transmitted to the receiver using
the IP protocol. As soon as the sender receives delivery
confirmation for at least one data packet, it transmits a new
portion of packets (the window “slides” along the sender’s
buffer, Figure 1). Finally, the sender holds responsibility for
a data block until the receiver explicitly confirms delivery of
the block. As a result, the sender may eventually decide that a
particular unacknowledged data block has been lost and start
recovery procedures (e.g., retransmit one or several packets).
To acknowledge data delivery, the receiver forms an ACK
packet that carries one sequence number and (optionally)
several pairs of sequence numbers. The former, a cumulative
ACK, indicates that all data blocks having smaller sequence
numbers have already been delivered. The latter, a selective
ACK (Section II-E—a TCP extension standardized 15 years
after the introduction of TCP itself), explicitly indicates the
ranges of sequence numbers of delivered data packets. To be
more precise, TCP does not have a separate ACK packet, but
rather uses flags and option fields in the common TCP header
for acknowledgment purposes. (A TCP packet can be both a
data packet and an ACK packet at the same time.) However,
without loss of generality, we will discuss a notion of ACK
packets as a separate entity.
Although a sliding window based flow control is relatively
simple, it has several conflicting objectives. For example, on
the one hand, throughput of a TCP flow should be maximized.
This essentially requires that the size of a sliding window
1553-877X/10/$25.00 c 2010 IEEE
2 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
Sender’s output buffer Receiver’s input buffer
8 free spaces (rwnd)
5 new data packets (wnd)
rwnd=3
wnd=3
rwnd=1
Receiver processed packet and slot
in the input buffer become available
Fig. 2. Receiver’s window concept: receiver reports a size of the available
input buffer (receiver’s window, rwnd) and sender sends a portion (window,
wnd) of data packets that does not exceed rwnd
also be maximized. (It can be shown that the maximum
throughput of a TCP flow depends directly on the sliding
window size and inversely on the round-trip time of the
network path.) On the other hand, if the sliding window is
too large, there is a high probability of packet loss because
the network and the receiver have resource limitations. Thus,
minimization of packet losses requires minimizing the sliding
window. Therefore, the problem is finding an optimal value
for the sliding window (which is usually referred to as the
congestion window) that provides good throughput, yet does
not overwhelm the network and the receiver.
Additionally, TCP should be able to recover from packet
losses in a timely fashion. This means that the shorter the
interval between packet transmission and loss detection, the
faster TCP can recover. However, this interval cannot be too
short, or otherwise the sender may detect a loss prematurely
and retransmit the corresponding packet unnecessarily. This
overreaction simply wastes network resources and may induce
high congestion in the network. In other words, when and how
a sender detects packet losses is another hard problem for TCP.
The initial TCP specification [1] is designed to guard only
against overflowing the input buffers at the receiver end. The
incorporated mechanism is based on the receiver’s window
concept, which is essentially a way for the receiver to share the
information about the available input buffer with the sender.
Figure 2 illustrates this concept in schematic fashion. When
establishing a connection, the receiver informs the sender
about the available buffer size for incoming packets (in the
example shown, the receiver’s window reported initially is
8). The sender transmits a portion (window) of prepared data
packets. This portion must not exceed the receiver’s window
and may be smaller if the sender is not willing (or ready)
to send a larger portion. In the case where the receiver is
unable to process data as fast as the sender generates it, the
receiver reports decreasing values of the window (3 and 1 in
the example). This induces the sender to shrink the sliding
window. As a result, the whole transmission will eventually
synchronize with the receiver’s processing rate.
Unfortunately, protocol standards that remain unaware of
the network resources have created various unexpected ef-
fects on the Internet, including the appearance of congestion
collapse (see Section II). The problem of congestion control,
meaning intelligent (i.e., network resource-aware) and yet ef-
fective use of resources available in packet-switched networks,
is not a trivial problem, but the efficient solution to it is
highly desirable. As a result, congestion control is one of the
extensively studied areas in the Internet research conducted
over the last 20 years, and a number of proposals aimed at
improving various aspects of the congestion-responsive data
flows is very large. Several groups of these proposals have
been studied by Hanbali et al. [4] (congestion control in
ad hoc networks), Lochert et al. [2] (congestion control for
mobile ad hoc networks), Widmer et al. [5] (congestion control
for non-TCP protocols), Balakrishnan et al. [6] (congestion
control for wireless networks), Leung at al. [7] (congestion
control for networks with high levels of packet reordering),
Low et al. [8] (current up to 2002 TCP variants and their
analytical models), Hasegawa and Murata [9] (fairness issues
in congestion control), and others researchers. Unlike previous
studies, in this survey we tried to collect, classify, and analyze
major congestion control algorithms that optimize various pa-
rameters of TCP data transfer without relying on any explicit
notifications from the network. In other words, they preserve
the host-to-host principle of TCP, whereby the network is seen
as a black box.
Section II is devoted to congestion control proposals that
build a foundation for all currently known host-to-host al-
gorithms. This foundation includes 1) the basic principle
of probing the available network resources, 2) loss-based
and delay-based techniques to estimate the congestion state
in the network, and 3) techniques to detect packet losses
quickly. However, the techniques that are developed are not
universal. For example, Tahoe’s initial assumption that pack-
ets are not generally reordered during transmission may be
wrong in some environments. As a result, the performance
of Tahoe flows in these environments will prove inadequate
(Section II-A). In Section III we discuss congestion control
proposals that modify previously developed algorithms to
tolerate various levels of packet reordering.
As data transfer technologies and the Internet itself have
evolved, the research focus for congestion control algorithms
has been changing from basic congestion to more sophisticated
problems. In Section IV we review the network resource
optimization problem. In particular, we discuss two algorithms
which discover the ability of a TCP congestion control to
provide traffic prioritization in a pure host-to-host fashion.
In Section V we discuss congestion control algorithm
proposals which try to improve the performance of TCP
flows running in wireless networks, where it is common to
have high packet losses (e.g., random losses due to wireless
interference).
In Section VI we review several proposed solutions that
have attracted the most research interest over the recent past.
These proposals aim to solve the problem of poor utilization of
high-speed and long-delay network channels by standard TCP
flows. They introduce several direct and indirect approaches
to more aggressive network probing. The indirect approaches
combine various loss-based and delay-based ideas to create
congestion control approaches that try to be aggressive enough
when there are enough network resources, yet remain gentle
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 3
Offered load
E
f
f
e
c
t
i
v
e

l
o
a
d
Capacity
Fig. 3. Effective TCP load versus offered load from TCP senders
Router
75% of packets rejected on both
input and output paths
400% A B
400%
Fig. 4. Congestion collapse rationale. 75% of data packets dropped on
forward path and 75% of ACKs dropped on reverse: only 6.25% of packets
are acknowledged
when all resources are utilized.
Finally, we present opportunities for the future research in
Section VII and conclude our survey in Section VIII.
II. CONGESTION COLLAPSE
The initial TCP standard has a serious drawback: it lacks
any means to adjust the transmission rate to the state of
the network. When there are many users and user demands
for shared network resources, the aggregate rate of all TCP
senders sharing the same network can easily exceed (and
in practice do exceed) the capacity of the network. It is
commonly known in the flow-control world that if the offered
load in an uncontrolled distributed sharing system (e.g., road
traffic) exceeds the total system capacity, the effective load
will go to zero (collapses) as load increases [10] (Figure 3).
With regard to TCP, the origins of this effect, known as
a congestion collapse [11]–[13], can be illustrated using a
simple example. Let us consider a router placed somewhere
between networks A and B which generate excessive amounts
of TCP traffic (Figure 4). Clearly, if the path from A to B is
congested by 400% (4 times more than the router can deliver),
at least 75% of all packets from network A will be dropped
and at most 25% of data packets may result in ACKs. If the
reverse path from B to A is also congested (also by 400%,
for example), the chance that ACK packets get through is
also 25%. In other words, only 25% of 25% (i.e., 6.25%)
of the data packets sent from A to B will be acknowledged
successfully. If we assume that each data packet requires its
own acknowledgement (not a requirement for TCP, but serves
to illustrate the point), then a 75% loss in each direction
causes a 93.75% drop in throughput (goodput) of the TCP-
like flow. Implementing cumulative ACKs help shift the bend
of the curve in Figure 3, but cumulative ACK are not able to
eliminate the sharp downward bend.
RFC 793
Tahoe
Reno
DUAL
Vegas FACK NewReno
Vegas+
Veno
Vegas A
Proactive
(delay-based)
Reactive
(loss-based)
Fig. 5. Evolutionary graph of TCP variants that solve the congestion collapse
problem
To resolve the congestion collapse problem, a number of
solutions have been proposed. All of them share the same
idea, namely of introducing a network-aware rate limiting
mechanism alongside the receiver-driven flow control. For
this purpose the congestion window concept was introduced:
a TCP sender’s estimate of the number of data packets the
network can accept for delivery without becoming congested.
In the special case where the flow control limit (the so-
called receiver window) is less than the congestion control
limit (i.e., the congestion window), the former is considered
a real bound for outstanding data packets. Although this is a
formal definition of the real TCP rate bound, we will only
consider the congestion window as a rate limiting factor,
assuming that in most cases the processing rate of end-hosts
is several orders of magnitude higher than the data transfer
rate that the network can potentially offer. Additionally, we
will compare different algorithms, focusing on the congestion
window dynamics as a measure of the particular congestion
control algorithm effectiveness.
In the next section we will discuss basic congestion control
algorithms that have been proposed to extend the TCP spec-
ification. As we shall see, these algorithms not only preserve
the idea of treating the network as a black box but also
provide a good precision level to detect congestion and prevent
collapse. Table I gives a summary of features of the various
algorithms. Additionally, Figure 5 shows the evolutionary
graph of these algorithms. However, solving the congestion
problem introduces new problems that lead to network channel
underutilization. Here we focus primarily on the congestion
problem itself and basic approaches to improve data transfer
effectiveness. In the following sections other problems and
solutions will be discussed.
A. TCP Tahoe
One of the earliest host-to-host solutions to solve the con-
gestion problem in TCP flows has been proposed by Jacobson
[14]. The solution is based on the original TCP specification
(RFC 793 [1]) and includes a number of algorithms that can be
divided into three groups. The first group tackles the problem
4 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
TABLE I
FEATURES OF TCP VARIANTS THAT SOLVE THE CONGESTION COLLAPSE PROBLEM
TCP Variant Section Year Base Added/Changed Modes or Features Mod
1
Status
Implementation
BSD
2
Linux Win Mac
TCP Tahoe
[14]
II-A 1988 RFC793 Slow Start, Congestion Avoidance, Fast
Retransmit
S Obsolete
Standard
>4.3 1.0
TCP-DUAL
[15]
II-B 1992 Tahoe Queuing delay as a supplemental congestion
prediction parameter for Congestion Avoidance
S Experimental
TCP Reno
[16], [17]
II-C 1990 Tahoe Fast Recovery S Standard >4.3
>F2.2
>
1.3.90
>95/NT
TCP NewReno
[18], [19]
II-D 1999 Reno Fast Recovery resistant to multiple losses S Standard >F4 >
2.1.36
>10.4.6
(opt)
TCP SACK
[20]
II-E 1996 RFC793 Extended information in feedback messages P+S+R Standard >S2.6,
>N1.1,
>F2.1R
>
2.1.90
> 98 >
10.4.6
TCP FACK
[21]
II-F 1996 Reno,
SACK
SACK-based loss recovery algorithm S Experimental >N1.1 >2.1.92
TCP-Vegas
[22]
II-G 1995 Reno Bottleneck buffer utilization as a primary
feedback for the Congestion Avoidance and
secondary for the Slow Start
S Experimental >
2.2.10
TCP-Vegas+
[23]
II-H 2000 NewReno,
Vegas
Reno/Vegas Congestion Avoidance mode
switching based of RTT dynamics
S Experimental
TCP-Veno
[24]
II-I 2002 NewReno,
Vegas
Reno-type Congestion Avoidance and Fast
Recovery increase/decrease coefficient
adaptation based on bottleneck buffer state
estimation
S Experimental >
2.6.18
TCP-Vegas A
[25]
II-J 2005 Vegas Adaptive bottleneck buffer state aware
Congestion Avoidance
S Experimental
1
TCP specification modification: S = the sender reactions, R = the receiver reactions, P = the protocol specification
2
S for Sun, F for FreeBSD, N for NetBSD
of an erroneous retransmission timeout estimate (RTO). If
this value is overestimated, the TCP packet loss detection
mechanism becomes very conservative, and performance of
individual flows may severely degrade. In the opposite case,
when the value of the RTO is underestimated, the error de-
tection mechanism may perform unnecessary retransmissions,
wasting shared network resources and worsening the overall
congestion in the network. Since it is practically impossible
to distinguish between an ACK for an original and a retrans-
mitted packet, RTO calculation is further complicated.
The round-trip variance estimation (rttvar) algorithm tries
to mitigate the overestimation problem. Instead of a linear
relationship between the RTO and estimated round-trip time
(RTT) value (β · SRTT, in which β is a constant in range
from 1.3 to 2 [1] and SRTT is an exponentially averaged RTT
value), the algorithm calculates an RTT variation estimate to
establish a fine-grained upper bound for the RTO (SRTT +
4 · rttvar).
The exponential retransmit timer backoff algorithm solves
the underestimation problem by doubling the RTO value
on each retransmission event. In other words, during severe
congestion, detection of subsequent packet losses results in ex-
ponential RTO growth, significantly reducing the total number
of retransmissions and helping stabilize the network state.
The ACK ambiguity problem is resolved by Karn’s clamped
retransmit backoff algorithm [26]. Importantly, the RTT of
a data packet that has been retransmitted is not used in
calculation for the average RTT and RTT variance, and thus
it has no impact on the RTO estimate.
The second group of algorithms enhances the detection of
packet losses. The original TCP specification defines the RTO
as the only loss detection mechanism. Although it is sufficient
to reliably detect all losses, this detection is not fast enough.
Clearly, the minimum time when loss can be detected is the
RTT—i.e., if the receiver is able to instantly detect and report a
loss to the sender, the report will reach the sender exactly one
RTT after sending the lost packet. The RTO, by definition, is
greater than RTT. If we require that TCP receivers immediately
reply to all out-of-order data packets with reports of the last in-
order packet (a duplicate ACK) [27], the loss can be detected
by the Fast Retransmit algorithm [28], almost within the RTT
interval. In other words, assuming the probability of packet
reordering and duplication in the network is negligible, the
duplicate ACKs can be considered a reliable loss indicator.
Having this new indicator, the sender can retransmit lost data
without waiting for the corresponding RTO event.
The third and most important group includes the Slow Start
and Congestion Avoidance algorithms. These provide two
slightly different distributed host-to-host mechanisms which
allow a TCP sender to detect available network resources and
adjust the transmission rate of the TCP flow to the detected
limits. Assuming the probability of random packet corruption
during transmission is negligible ( 1%), the sender can
treat all detected packet losses as congestion indicators. In
addition, the reception of any ACK packet is an indication
that the network can accept and deliver at least one new packet
(i.e., the ACKed packet has left and a new one can enter the
network). Thus the sender, reasonably sure it will not cause
congestion, can send at least the amount of data that has just
been acknowledged. This in-out packet balancing is called the
packet conservation principle and is a core element, both of
Slow Start and of Congestion Avoidance.
In the Slow Start algorithm, reception of an ACK packet
is considered an invitation to send double the amount of data
that has been acknowledged by the ACK packet (multiplicative
increase policy). In other words, instead of a step-like growth
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 5
Receiver limit (window)
M
a
x

#

o
f

o
u
t
s
t
a
n
d
i
n
g

p
a
c
k
e
t
s
Time
Transmission
start up
Maximum data transfer
(same for similar graphs
throughout paper)
Fig. 6. Outstanding data packets allowance dynamics as defined in RFC793
(network limits are not considered)
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Receiver limit
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Receiver limit
Network limit
Detected packet loss
Network limit
Fig. 7. Congestion window dynamics and effectiveness of Slow-Start if limit
is imposed by legacy flow control (left) and network (right)
(Figure 6) in the number of outstanding packets (as given
in the original specification [1]), this growth follows an
exponential function on an RTT-defined scale (Figure 7). The
word “slow” in the algorithm name makes reference to this
difference. If a packet loss is detected (i.e., the network is ex-
periencing congestion because all network resources have been
utilized), the congestion window is reset to the initial value
(e.g., one) to ensure release of network resources. Graphs on
Figure 7 show two cases of the congestion window dynamics:
the left graph represents the case when the receiver cannot
process at the receiving rate (i.e., the original assumption
of TCP), and the right graph shows the congestion window
dynamics when the network cannot deliver everything at the
transmitted rate.
We can define algorithm effectiveness as the ratio of the area
below the congestion window graph (e.g., Figure 7, hatched
area) to the area below the limit line (Figure 7, under “Network
Limit” line). It is clear (observing the right graph in Figure 7)
that where the available network resources are lower than
limits imposed by the receiver, the effectiveness, in the long
term, of the Slow Start algorithm is very low.
The other algorithm of the third group is Congestion
Avoidance. It is aimed at improving TCP effectiveness in
networks with limited resources, i.e., where the network is
a real transmission bottleneck. In comparison to the Slow
Start, this algorithm is much more conservative in response
to received ACK packets and to detection of packet losses.
As opposed to doubling, the congestion window increases by
one only if all data packets have been successfully delivered
during the last RTT (additive increase policy). And in contrast
to restarting at one after a loss, the congestion window is
Network limit
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Fig. 8. Congestion window dynamics and effectiveness of Congestion
Avoidance
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
ssthresh
SS CA SS SS
Loss detection
Network limit
Fig. 9. Congestion window dynamics of combined Slow-Start (SS) Conges-
tion Avoidance (CA)
merely halved (multiplicative decrease policy). Jacobson’s
analysis [14] has shown that to achieve network decongestion,
exponentially reducing network resource utilization by each
individual flow is sufficient. The multiplicative decrease policy
mimics such exponential behavior when several packets in
succession are determined as lost (e.g., during the persistent
congestion state). As can be seen in Figure 8, the Congestion
Avoidance algorithm is quite effective in the long term. The
tradeoff is a slow discovery of available network resources
due to the conservative rate of the additive phase.
The implementation of TCP Tahoe includes both Slow
Start and Congestion Avoidance algorithms as distinct opera-
tional phases. This combines fast network resource discovery
and long-term efficiency. For phase-switching purposes, a
threshold parameter (ssthresh) is introduced. This threshold
determines the maximum size of the congestion window in
the Slow Start phase, and any detected packet loss adjusts
the threshold to half of the current congestion window. The
congestion window itself, as in the Slow Start algorithm, is
always reset to a minimum value upon loss detection. As
long as the value of the congestion window is lower than the
threshold parameter, the Slow Start phase is used. Once the
window is greater than the threshold, Congestion Avoidance is
used. This is known as the Slow Start-Congestion Avoidance
phase cycle (Figure 9).
Effectiveness is not the only important parameter of con-
gestion control algorithms. Due to the resource-sharing nature
of IP networks, TCP algorithms should enforce fair resource
sharing. Chiu and Jain [29] developed a fairness measure F
(the so-called Jain’s fairness index) as a function of network
6 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
Network Share (flow1)
E
q
u
a
l

(
f
a
i
r
)

s
h
a
r
e
N
e
t
w
o
r
k

S
h
a
r
e

(
f
l
o
w
2
)
Network Limit
Packet
losses
x
0
x
1
x
3
x
2
Fig. 10. Slow-Start Algorithm
x0 −x1, . . . , xn −xn+1 multiplicative increase (both flows have the
same increase rate of their congestion windows)
x1 −x2 equalization of the congestion window sizes
resources consumed by each user sharing the same path:
F =
(
n

i
f
i
)
2
n ·
n

i
f
2
i
where n is the number of users sharing the path and f
i
is the
network share of i
th
user. If we assume that each user has
only one TCP connection per particular network path, then
Jain’s index can be considered a fairness measure for TCP
flows. This index ranges from 0 to 1, where 1 is achieved if
and only if each flow has an equal (fair) share (f
i
= f
j
∀i, j)
and tends to zero if a single flow usurps all network resources
(lim
n→∞
F = 0).
Slow Start and Congestion Avoidance exhibit good fairness
(F → 1) under certain network conditions as follows. Let us
consider two flows competing with each other on the same
network path and with no other flows present. If we assume
that (a) the network share for each flow is directly proportional
to its congestion window size, (b) both flows have equal RTT
values, and (c) we can simultaneously detect packet losses (a
so-called synchronized loss environment), then the network
share dynamics for each algorithm can be represented by
the convergence diagrams in Figures 10 and 11. The equal
share line represents states when network resources are fairly
distributed between flows and the network limit line when
all network resources are consumed (either by one or both
flows). These diagrams show how network resource propor-
tions would change (paths x
0
−x
1
, x
1
−x
2
, . . . , x
n−1
−x
n
)
if two TCP flows started competing with each other from an
initial state x
0
under the ideal network conditions.
In Figure 10 the aggressive (multiplicative) congestion
window increase in Slow Start favors the flow having a larger
network share. More precisely, the slope of x
0
−x
1
segment
is proportional to the ratio between each flow’s share in state
x
0
. After detection of a packet loss, both flows reset their
congestion windows (x
1
− x
2
). Obviously, reseting of the
congestion window equalizes the network share of the flows
that provides fairness of the network resource distribution in
the future (the flows become locked-in between the states x
2
and x
3
).
Network Share (flow1)
E
q
u
a
l

(
f
a
i
r
)

s
h
a
r
e
N
e
t
w
o
r
k

S
h
a
r
e

(
f
l
o
w
2
)
Network Limit
Packet
losses
x
0
x
1
x
2
x
n
x
n+1
Fig. 11. Congestion avoidance (AIMD)
x0 −x1, . . . , xn −xn+1 additive increase (both flows have the same
increase rate of their congestion windows)
x1−x2, . . . , xn−1−xn multiplicative decrease (a flow with the larger
congestion window decreases more than a flow with the smaller)
Congestion Avoidance ensures a uniform congestion win-
dow increase by each flow from any initial state (45

slope of
x
1
−x
2
and x
n
−x
n+1
segments in Figure 11). This property
eliminates the necessity of the congestion window equaliza-
tion. Instead, to provide fair network usage between flows, it is
enough that the flow having a larger network share decreases
by a greater amount. In fact, the multiplicative decrease (i.e.,
congestion window halving) as a reaction to a packet loss in
Congestion Avoidance guarantees share equalization (fairness)
in a finite number of steps.
A convergence diagram of TCP Tahoe can be represented as
a combination of the Slow Start and Congestion Avoidance di-
agrams. Depending on the values of the Slow Start thresholds,
the initial dynamics can follow the Slow Start path (x
0
− x
1
in Figure 10) or the Congestion Avoidance path (x
0
− x
1
in
Figure 11), or be a combination of both algorithms. Because
the reaction to a packet loss in TCP Tahoe is the same as in
Slow Start (i.e., congestion window reset), exactly one loss
event is enough to equalize shares (similar to path x
1
−x
2
in
Figure 10).
B. TCP DUAL
TCP Tahoe (Section II-A) has rendered a great service to
the Internet community by solving the congestion collapse
problem. However, this solution has an unpleasant drawback
of straining the network with high-amplitude periodic phases.
This behavior induces significant periodic changes in sending
rate, round-trip time, and network buffer utilization, leading
to variability in packet losses.
Wang and Crowcroft [15] presented TCP DUAL, which
refines the Congestion Avoidance algorithm. DUAL tries to
mitigate the oscillatory patterns in network dynamics by
using a proactive congestion detection mechanism coupled
with softer reactions to detected events. More specifically, it
introduces the queuing delay as a prediction parameter of the
network congestion state.
Let us assume that routes do not change during the trans-
mission and that the receiver acknowledges each data packet
immediately. Then we can consider the minimal RTT value
observed by the sender (RTT
min
) as a good indication that the
path is in a congestion-free state (left diagram in Figure 12). If
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 7
D
a
t
a
A
C
K
Time
D
a
t
a
A
C
K
TCP Sender
TCP Receiver
RTTmin RTTmin + Q
Queuing due to
congestion (Q)
Congestion -free Congested
Fig. 12. Correlation between RTT dynamics and congestion situation
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
sshthresh
Reaching queuing delay
threshold
Network limit
Loss detection
...
SS CA SS
Fig. 13. Congestion window dynamics of TCP DUAL
(SS: the Slow Start phase, CA: the Congestion Avoidance phase)
we make one more assumption that an increase of the RTT can
only occur due to increasing buffer utilization, the difference
between the measured and the minimal RTT value (queuing
delay Q = RTT −RTT
min
) can be viewed as an indicator of
the congestion level in the path (right diagram in Figure 12).
To quantify the congestion level, DUAL additionally main-
tains a maximum RTT value observed during the transmission
(RTT
max
). The difference between maximum and minimum
RTT values is considered a measure of the maximum con-
gestion level (i.e., the maximum queuing delay Q
max
=
RTT
max
− RTT
min
). Finally, a fraction of the maximum
queuing delay (Q
thresh
= α· Q
max
, where 0 < α < 1) serves
as a threshold, which, when exceeded, indicates the congested
network state.
In the proposal [15], the delay threshold in DUAL is
selected as half the maximum queuing delay (Q
thresh
=
Q
max
/2), and the congestion estimation is performed once per
RTT period based on the average RTT value (Q = RTT
avg

RTT
min
). If this threshold is exceeded (Q > Q
thresh
), the
congestion window decreases by 1/8
th
(i.e., applied multi-
plicative decrease policy). As we can see from theoretical
congestion window dynamics of TCP DUAL (Figure 13),
the effectiveness is greatly improved compared to Tahoe
(i.e., graphically, the hatched area is proportionately larger).
However, there are a number of trade-offs.
If the network saturation point is estimated incorrectly, the
flow cannot utilize the available network resources fairly and
effectively. On the one side, in the case where the threshold
is underestimated (e.g., observed RTT
max
is not the real
maximum) network resources will be underutilized. On the
other side, threshold overestimation can potentially cause an
unfair resource distribution between different TCP DUAL
flows. For example, if a DUAL flow is already transmitting
data when a new DUAL flow appears, the new flow will
observe a higher RTT
min
value and overestimate the queuing
delay threshold. The flow with the lower queuing threshold
(the old flow) has a higher probability of predicting the
congestion state and trigger congestion window reduction,
while the other flow will continue congestion window growth
without noticing anything abnormal. Thus, the new flow can
potentially capture a larger share of the network resources.
C. TCP Reno
Reducing the congestion window to one packet as a reaction
to packet loss, as occurs with TCP Tahoe (Section II-A), is
rather draconian and can, in some cases, lead to significant
throughput degradation. For example, a 1% packet loss rate
can cause up to a 75% throughput degradation of a TCP flow
running the Tahoe algorithm [16]. To resolve this problem,
Jacobson [16] revised the original composition of Slow Start
and Congestion Avoidance by introducing the concept of
differentiating between major and minor congestion events.
A loss detection through the retransmission timeout indi-
cates that for a certain time interval (as an example, RTO
minus RTT) some major congestion event has prevented
delivery of any data packets on the network. Therefore, the
sender should apply the conservative policy of resetting the
congestion window to a minimal value.
Quite a different state can be inferred from a loss detected
by duplicate ACKs. Suppose the sender has received four
ACKs, where the first one acknowledges some new data and
the rest are the exact copies of the first one (usually referred
to as three duplicate ACKs). The duplicate ACKs indicate
that the some packets have failed to arrive. Nonetheless,
presence of each ACK—including the duplicates—indicates
the successful delivery of a data packet. The sender, in addition
to detecting the lost packet, is also observing the ability of the
network to deliver some data. Thus, the network state can be
considered to be lightly congested, and the reaction to the loss
event can be more optimistic. In TCP Reno, the optimistic
reaction is to use the Fast Recovery algorithm [17].
The intention of Fast Recovery is to halve a flow’s network
share (i.e., to halve the congestion window) and to taper back
network resource probing (holding all growth in the congestion
window) until the error is recovered. In other words, the
sender stays in Fast Recovery until it receives a non-duplicate
acknowledgment. The algorithm phases are illustrated in Fig-
ure 14, where congestion window sizes (cwnd) in various
states are denoted as the line segments above the State lines,
and the arrows indicate the effective congestion window size—
the amount of packets in transit.
The transition from State 1 to State 2 shows the core concept
of optimistic network share reduction, using the multiplica-
8 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
Sent data,
waiting for ACK
cwnd
Buffered
data
ACKed
data
State 1
State 2
cwnd/2
cwnd/2+#dup
cwnd/2
cwnd/2+#dup
State 3
State 4
State 5
Just before the loss
detection
Just after the loss
detection
“Inflating” cwnd by the
number of dup ACKs
Additional dup ACKs lead to
additional cwnd “inflation”
After the successful
recovery (cwnd “deflation”)
Outstanding data which is not allowed to be retransmitted
Amount of new data allowed to be sent by “deflated” congestion window
Amount of successful delivered
data inferred from dup ACKs
Amount of packets in transit
The congestion window size is a
sum of these two elements
Fig. 14. Characteristic states of TCP Reno’s Fast Recovery
tive decrease policy. After the reduction (i.e., from cwnd
to cwnd/2), the algorithm not only retransmits the oldest
unacknowledged data packet (i.e., applies the Fast Retransmit
algorithm), but also inflates the congestion window by the
number of duplicate packets (see transition from State 2 to
State 3 in Figure 14). As we already know, an ACK indicates
delivery of at least one data packet. Thus, if we want to
maintain a constant number of packets in transit, we have
to inflate our congestion window to open a slot for sending
new data (State 4 in Figure 14). Without this increase, new
packets cannot be sent before the error is recovered, and the
amount of packets in transit can decrease more than expected.
In the final stage (State 5), when a non-duplicate ACK
is received, we want to resume Congestion Avoidance with
half of the original congestion window. With high proba-
bility, the non-duplicate ACK will acknowledge delivery of
all data packets previously inferred by the duplicate ACKs
previously received. At this point, congestion window deflation
to cwnd/2 (to the value just after entering recovery, State 2
in Figure 14) is a simple and reliable way to ensure the target
exit state from Fast Recovery.
The resulting theoretical congestion window dynamics in
TCP Reno are presented in Figure 15. Compared to the
dynamics of TCP Tahoe (Figure 9), the overall effectiveness
in the steady state is considerably improved by replacing Slow
Start phases after each loss detection by typically shorter Fast
Retransmit phases.
In fact, recovering from a single loss would usually occur
within one RTT. However, efficiency is improved not only
by shortening the recovery period, but also by allowing data
transfers during the recovery. Having substantial performance
improvement compared to Tahoe, TCP Reno remains fair to
other TCP Reno flows (in terms defined in Section II-A). If
we try to build a convergence diagram, it would match the
diagram for the Congestion Avoidance algorithm in Figure 11
exactly. However, a slightly worse situation can be observed
when a TCP Reno flow competes with a Tahoe flow. Un-
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
ssthresh
Loss detection
Network limit
SS CA FR
Fig. 15. Congestion window dynamics of TCP Reno
(SS: the Slow Start phase, CA: the Congestion Avoidance phase,
FR: the Fast Recovery phase)
equal reactions to packet loss detection lead to shifting the
distribution of network resources to the Reno side. This can
be demonstrated using the convergence diagram in Figure 16.
With a finite number of steps, the system reaches a steady
state in which the Reno flow has a larger share of network
resources. To quantify fairness in this case, one can easily
calculate the Jain’s fairness index (see Section II-A). In
Figure 16, this value equals 0.9 (after the convergence—state
x
n+1
—network shares are distributed as 2:1 in favor of a
Reno flow). This can be considered an acceptable level for
the transition period when the congestion control algorithm is
changed from Tahoe to Reno at all network hosts.
A comparison to TCP DUAL shows that, in an ideal
network environment with only one TCP flow present, the
DUAL algorithm will normally outperform Reno. But DUAL
has several important drawbacks. First, the delay characteristic
is not always a true congestion indicator, which can lead
to network resource underutilization or unfair distribution of
network resources. Second, there is an open question about
how well the DUAL algorithm performs in less ideal environ-
ments and when DUAL flows compete with other DUAL or
Tahoe flows. Clearly, in a situation of higher packet losses, the
performance of DUAL and Tahoe would be about the same,
and as a result, Reno could outperform the both of them.
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 9
Network Share (Tahoe)
E
q
u
a
l

(
f
a
i
r
)

s
h
a
r
e
N
e
t
w
o
r
k

S
h
a
r
e

(
R
e
n
o
)
Network Limit
Packet
losses
x
0
x
1
x
n
x
n+1
Fig. 16. Convergence diagram when Reno flow is competing with Tahoe
flow
x0 −x1, . . . , xn −xn+1 additive increase (both flows have the same
increase rate of their congestion windows)
x1 −x2, . . . , xn−1 −xn Tahoe flow reset its congestion window but
Reno flow only halves it
Additionally, doubts about DUAL’s fairness to other DUAL
flows, as discussed in Section II-B, tend to give further favor
to TCP Reno.
Because of its simplicity and performance characteristics,
Reno is generally the congestion control standard for TCP. At
the same time, there are a wide range of network environments
where Reno has inadequate performance. For example, it has
severe performance degradation in the presence of consecutive
packet losses, random packet losses, and reordering of packets.
It is also unfair if competing flows have different RTTs, and
it does not utilize high-speed/long-delay network channels
efficiently.
In the remainder of this paper we will discuss a num-
ber of the most important TCP proposals which ad-
dress these issues without deviating from the original
host-to-host principle of TCP.
D. TCP NewReno
One of the vulnerabilities of TCP Reno’s Fast Recovery
algorithm manifests itself when multiple packet losses occur as
part of a single congestion event. This significantly decreases
Reno’s performance in heavy load environments. This problem
is demonstrated in Figure 17, where a single congestion event
(e.g., a short burst of cross traffic) causes the loss of several
data packets (indicated by x). As we can see, the desired opti-
mistic reaction of Fast Recovery (i.e., the congestion window
halving) suddenly transforms into a conservative exponential
congestion window decrease. That is, the first loss causes entry
into the recovery phase and the halving of the congestion
window. The reception of any non-duplicate ACK would finish
the recovery. However, the subsequent loss detections cause
the congestion window to decrease further, using the same
mechanisms of entering and exiting the recovery state.
In one sense, this exponential reaction to multiple losses is
expected from the congestion control algorithm, the purpose
of which is to reduce consumption of network resources in
complex congestion situations. But this expectation rests on
the assumption that congestion states, as deduced from each
detected loss, are independent, and in the example above
this does not hold true. All packet losses from the original
data bundle (i.e., from those data packets outstanding at the
moment of loss detection) have a high probability of being
caused by a single congestion event. Thus, the second and
third losses from the example above should be treated only as
requests to retransmit data and not as congestion indicators.
Moreover, reducing the congestion window does not guarantee
the instant release of network resources. All packets sent
before the congestion window reduction are still in transit.
Before the new congestion window size becomes effective,
we should not apply any additional rate reduction policies.
This can be interpreted as reducing the congestion window
no more often than once per one-way propagation delay or
approximately RTT/2.
Floyd et al. [18], [19] introduce a simple refinement of
Reno’s Fast Recovery. It solves the ambiguity of congestion
events by restricting the exit from the recovery phase until all
data packets from the initial congestion window are acknowl-
edged. More formally, the NewReno algorithm adds a special
state variable to remember the sequence number of the last
data packet sent before entering the Fast Recovery state. This
value helps to distinguish between partial and new data ACKs.
The reception of a new data ACK means that all packets sent
before the error detection were successfully delivered and any
new loss would reflect a new congestion event. A partial ACK
confirms the recovery from only the first error and indicates
more losses in the original bundle of packets.
Figure 18 illustrates the differences between Reno and
NewReno. Similar to the original Reno algorithm, reception
of any duplicate ACKs triggers only the inflation of the
congestion window (States 3, 4, 6). A partial ACK provides
the exact information about some part of the delivered data.
Therefore, reaction to partial ACK is only a deflation of the
congestion window (State 4) and a retransmission of the next
unacknowledged data packet (State 5). Finally, exit from the
NewReno’s Fast Recovery can proceed only when the sender
receives a new data ACK, which is accompanied by the full
congestion window deflation (State 7 in Figure 18).
Notice that during the recovery phase, duplicate acknowl-
edgments transfer their role as error indicators to the partial
ACKs. Retransmission of the first lost packet and reception
of the corresponding ACK will take exactly one RTT, and
the sender therefore can be absolutely sure that during this
interval all previously sent data will be either delivered or lost.
That is, this data no longer consumes the network resources.
Partial ACKs can be sent by the receiver only if more than one
packet is lost from the original packet bundle. Thus there is
no reason for the sender to wait for additional signals before
retransmitting the lost packet inferred from the partial ACK.
Remembering the sequence number of the last data packet
sent before entering the Fast Recovery phase and using this to
distinguish between partial and new data ACKs, is the solution
to most of the cases of unnecessary congestion window
reduction. However, in some cases, particularly when the
retransmission timeout is triggered during the Fast Retransmit
phase, unnecessary congestion window reduction still may
occur [30]. The solution (“bugfix” in terms of [30]) is to
remember the highest sequence number transmitted so far
after each triggering of the retransmission timeout, and to
ignore all duplicate ACKs that acknowledge lower sequence
10 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
Sent data,
waiting for ACK
cwnd
Buffered
data
ACKed
data
State 1
Just before 1
st
loss detection
Packet retransmission
Packet losses due to the minor congestion event
cwnd/2
State 2
cwnd/2
2
State 3
Detected packet loss (e.g., 3 dup ACKs)
cwnd/2
3
State 4
After exit from the recovery and
just before 2
nd
loss detection
After exit from the recovery and
just before 3
rd
loss detection
After exit from the last recovery
phase
Fig. 17. The performance problem in Reno’s Fast Recovery
Sent data,
waiting for ACK
cwnd
Buffered
data
ACKed
data
State 1
Initial congestion window before
loss detection
Packet retransmission
Packet losses due to the minor congestion event
Detected packet loss (e.g., 3 dup ACKs)
#dup+cwnd/2
State 3
State 5
State 6
cwnd/2
State 7
Retransmission of the lost packet.
Each dup ACK “inflates” the cwnd
Retransmission of the lost packet ( ).
The cwnd remains unchanged
#dup+cwnd/2-ACK
State 4
Partial ACK “deflates” the cwnd, packets
before are received
Non-duplicate ACK
#dup+cwnd/2-ACK
#dup+cwnd/2-ACK
All dup ACKs only “inflates”
the cwnd
Exit recovery and deflate cwnd when
non-duplicate ACK is received
Amount of successful delivered
data inferred from dup ACKs
State 2
cwnd/2 Just after the loss
detection
Amount of packets in transit
The congestion window size is a
sum of these two elements
Fig. 18. Characteristic states of TCP NewReno’s Fast Recovery
numbers. This solution optimistically resolves the ambiguity
of duplicate ACKs that can indicate either lost or duplicate
packets.
NewReno modifies only the Fast Recovery algorithm by
improving its response in the event of multiple losses. Mean-
while, in the steady state performance and fairness character-
istics are similar to the ones shown in Section II-C. A slightly
more aggressive recovery procedure would allow a NewReno
flow in some cases to obtain more network resources than
a competing Reno flow. But generally, this imbalance only
happens due to the inability of Reno itself to utilize the
network resources under those network conditions effectively.
For this reason, we consider NewReno to have the same
fairness characteristics as Reno.
E. TCP SACK
The problem with Reno’s Fast Recovery algorithm dis-
cussed in the Section II-D arises solely because the receiver
can report limited information to the sender. The TCP speci-
fication [1] defines that the only feedback message be in the
form of cumulative ACKs, i.e., acknowledgments of only the
last in-order delivered data packet. This property limits the
ability of the sender to detect more than one packet loss per
RTT. For example, if a second and a third data packet from
some continuous TCP stream are lost, the receiver, according
to the cumulative ACK policy, would reply to fourth and
consecutive packets with duplicate acknowledgments of the
first packet (Figure 19). Clearly, the loss can be detected no
sooner than after one RTT. Moreover, because Fast Recovery
assumes loss of only one data packet—i.e., only one packet
will be retransmitted after a loss detection—loss of the third
data packet will be detected after another RTT at best. Thus,
a duration of the recovery in Reno is directly proportional to
the number of packet losses and RTT.
NewReno resolves Reno’s problem of excessive rate reduc-
ing in the presence of multiple losses, but it does not solve
the fundamental problem of prolonged recovery. The recovery
process can be sped up if the sender retransmits several packets
instead of a single one upon error detection. However, this
technique assumes certain patterns of packet losses and may
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 11
RTT
1 2 3 4 5
Data packets
1 1 1
ACK packets
...
...
duplicates
TCP Sender TCP Receiver
Fig. 19. Duplicate ACKs allow loss detection no sooner than after one RTT
1 s
t

d
a
t
a

p
a
c
k
e
t
1
s
t

A
C
K
Time
TCP Sender
TCP Receiver
RTT
l
a
s
t

d
a
t
a

p
a
c
k
e
t
l
a
s
t

A
C
K
2 RTTs
Fig. 20. The interval between the first and last data packets sent before
reception of any ACK is 2×RTT in the worst case
just waste network resources if actual losses deviate from these
patterns.
If a receiver can provide information about several packet
losses within a single feedback message, the sender would
be able to implement a simple algorithm to resolve the long
recovery problem. Moreover, Reno’s problem discussed in
Section II-D can be solved by restricting the congestion win-
dow reduction to no more than once per RTT period, instead of
implementing the NewReno algorithm. The rationale behind
this solution is that in the worst case, the interval between
the first and last data packets sent before reception of any
ACK is exactly one RTT (Figure 20). All loses, if any, will be
reported to the sender within the next RTT. Thus, if we apply
the mechanism of limiting the congestion window reduction to
no more than once per RTT period to the problem illustrated in
Figure 17, the first error detection should cause retransmission
and shrink the congestion window. The rest of the losses in
the original packet bundle would be reported within one RTT,
and thus will cause only retransmission of the lost packets,
preserving the value of the congestion window.
Mathis et al. [20] address the problem of limited infor-
mation available in a cumulative ACK. As a solution, they
propose extending the TCP protocol by standardizing the se-
lective acknowledgment (SACK) option. This option provides
the ability for the receiver to report blocks of successfully
delivered data packets. Using this information, TCP senders
can easily calculate blocks of lost packets (gaps in sequence
numbers) and quickly retransmit them (Figure 21).
Unfortunately the SACK mechanism has serious limitations
in its current form. The TCP specification restricts the length
of the option field to 40 bytes. A simple calculation reveals
that the SACK option can contain at most four blocks of
data packets received in order (2 bytes to identify option
and specify option length, and up to four pairs of 4-byte
sequence numbers [20]). The situation becomes aggravated
if we want to use additional TCP options, which decrease the
Left edge of 1
st
block
Right edge of 1
st
block
Left edge of n
th
block
Right edge of n
th
block
...
Fig. 21. SACK option
Left edge – the first sequence number of the block,
Right edge – the sequence number immediately following the last
sequence number of the block
space for the sequence number pairs being included in SACK.
For example, the Timestamp option [31] reduces the available
space in the TCP header by 8 bytes, which decreases SACK
space to only 3 gaps of lost packets. In some environments,
the pattern of packet losses may easily exceed this SACK
limit. In the worst case, when every other packet is lost, this
limit is exceeded just after the first 4 packets are received
(thus, 4 packets being lost). The inability of the receiver to
quickly indicate all detected losses returns us to the original
problem. Although this worst-case situation is unlikely to
happen in wired networks (since during congestion events
consecutive packets are usually dropped), random losses in
wireless networks can show patterns approximating the worst
case. This observation shows that SACK is not a universal
solution to the multiple loss problem.
F. TCP FACK
Although SACK (Section II-E) provides the receiver with
extended reporting capabilities, it does not define any partic-
ular congestion control algorithms. We have informally dis-
cussed one possible extension of the Reno algorithm utilizing
SACK information, whereby the congestion window is not
multiplicatively reduced more than once per RTT. Another ap-
proach is the FACK (Forward Acknowledgments) congestion
control algorithm [21]. It defines recovery procedures which,
unlike the Fast Recovery algorithm of standard TCP (TCP
Reno), use additional information available in SACK to handle
error recovery (flow control) and the number of outstanding
packets (rate control) in two separate mechanisms.
The flow control part of the FACK algorithm uses se-
lective ACKs to indicate losses. It provides a means for
timely retransmission of lost data packets, as well. Because
retransmitted data packets are reported as lost for at least
one RTT and a loss cannot be instantly recovered, the FACK
sender is required to retain information about retransmitted
data. This information should at least include the time of the
last retransmission in order to detect a loss using the legacy
timeout method (RTO).
The rate control part, unlike Reno’s and NewReno’s Fast
Recovery algorithms, has a direct means to calculate the num-
ber of outstanding data packets using information extracted
from SACKs. Instead of the congestion window inflation
technique, the FACK maintains three special state variables
(Figure 22): (1) H, the highest sequence number of all sent
data packets—all data packets with sequence number less than
H have been sent at least once; (2) F, the forward-most
sequence number of all acknowledged data packets—no data
packets with sequence number above F have been delivered
12 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
Send
buffer
Retransmitted (R)
Highest sent (H)
Forward-most received (F)
Acknowledged data packets
Lost data packets
Sent and not yet ACKed data packets
Fig. 22. Special state variables in the FACK algorithm
(acknowledged); and (3) R, the number of retransmitted
packets.
The simple relation H−F +R provides a reliable estimate
(in the sense of robustness to ACK losses) of outstanding data
packets in the network. This estimate can be utilized by the
sender to decide whether or not to send a new portion of
data. More formally, data can be sent when the calculated
number of outstanding data packets is under an allowed limit
(the congestion window).
Simulation results [21] confirm that FACK has a much faster
recovery time from errors than Reno or NewReno. In fact,
advantages of FACK have long been widely recognized and
FACK has been an embedded part of the Linux kernel since
the 2.1.92 version. Because FACK modifies only reactions in
the recovery phase, the steady-state characteristics of effec-
tiveness and fairness are exactly the same as for Reno (see
Section II-C).
G. TCP Vegas
The approaches discussed in Sections II-C, II-D, and II-F
improve various aspects of Tahoe, Reno and NewReno con-
gestion controls. However, all of them share the same reactive
method of rate adaptation. That is, each of them detects
that the network is congested only if some packets are lost.
Moreover, these variants of TCP bring about packet losses
because their algorithms can grow packet transmission rates
to the point of network congestion. Therefore, the problem
discussed in Section II-B (i.e., induced periodic changes in
sending rate, round-trip time, network buffer utilization, packet
losses, etc.), also applies to Reno, NewReno, and FACK
congestion control algorithms.
TCP DUAL (Section II-B) makes an attempt to provide a
proactive method of quantifying the congestion level before an
actual congestion event occurs using an estimate of queuing
delay. But the solution only mitigates the oscillatory patterns
of network parameters (RTT, buffer utilization, etc.) and never
fully eliminates them. Moreover, as mentioned in Section II-B,
fairness of the DUAL algorithm is questionable.
Brakmo and Peterson [22] proposed the Vegas algorithm
as another proactive method to replace the reactive Con-
gestion Avoidance algorithm. The key component is making
an estimate of the used buffer size at the bottleneck router.
Similar to the DUAL algorithm, this estimate is based on
RTT measurements. The minimal RTT value observed during
the connection lifetime is considered a baseline measurement
indicating a congestion-free network state (analogous to Fig-
ure 12). In other words, a larger RTT is due to increased
queuing in the transmission path. Unlike DUAL, Vegas tries
Congestion window
T
r
a
n
s
m
i
s
s
i
o
n

r
a
t
e
Rate when RTT is min
cwnd
0
cwnd
1/RTT
min
cwnd/RTT
min
– expected rate
cwnd/RTT

– actual rate
Network limit
(RTT grows as cwnd grows)
/ RTT
min
Fig. 23. TCP Vegas—the utilized buffer size Δ as function of expected and
actual rate
to quantify, not a relative, but an absolute number of packets
enqueued at the bottleneck router as a function of the expected
and actual transmission rate (Figure 23).
The expected rate (dashed line in Figure 23) is a theoretical
rate of a TCP flow in a congestion-free network state. This
rate can occur if all transmitted data packets are successfully
acknowledged within the minimal RTT (i.e., no loss, no
congestion). Assuming that RTT
min
is constant, the expected
rate is directly proportional to the size of the congestion
window with a proportionality coefficient of 1/RTT
min
.
The actual rate (bold solid line in Figure 23) can be
expressed as the ratio between the current congestion win-
dow and the current RTT value. However, due to the finite
capacity of the path, we can always find a point cwnd
0
on
the graph when the actual rate is numerically equal to the
expected rate, and all attempts to send at a faster rates (i.e.,
> cwnd
0
/RTT
min
) will fail. Clearly, the number of packets
enqueued during the last RTT is the difference Δ between
the current congestion window and the inflection point in our
graph; thus we have Δ = cwnd − cwnd
0
. According to our
assumptions, this excess of data packets is the only cause of
a corresponding RTT increase. Thus, Δ can be expressed as a
function of the congestion window size, RTT and RTT
min
:
Δ = cwnd ×
RTT −RTT
min
RTT
Vegas incorporates this Δ measure into the Congestion
Avoidance phase to control the sender’s window of allowed
outstanding data packets (see beneath “Congestion Avoid-
ance,” Figure 24). In other words, once every RTT Vegas
checks the difference Δ between the expected rate (small cir-
cles in Figure 24) and the actual rate (solid line in Figure 24).
If Δ is more than the predefined threshold β (e.g., according to
Linux implementation, more than 4), the congestion window is
decreased by one; otherwise, it is increased by one. However,
to mitigate the effects of network parameter fluctuations and to
provide system stabilization, the proposed algorithm defines a
control dead-zone (hatched area in Figure 24) using additional
threshold α. That is, the congestion window increase is
allowed only if Δ is strictly less than α (e.g., less than 2).
If Δ is between α and β, the system is considered to be in
a steady state and no modifications to the congestion window
are applied.
If no packets are dropped in the network, Vegas controls
the congestion window using an additive increase and additive
decrease (AIAD) policy. Reactions to packet losses are defined
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 13
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Zone of zero
buffering
Slow-Start Congestion Avoidance
Expected rate calculations
(in Slow-Start every other RTT, in Congestion Avoidance every RTT)
Control dead-zone
(lower and upper
thresholds)
Fig. 24. TCP Vegas – congestion window dynamics and corresponding
estimates of bottleneck buffer size Δ
by any of the standard congestion control algorithms (either
Reno, NewReno, or FACK).
Additionally, Vegas revises the Slow Start algorithm by
slowing-down the opportunistic network resource probing.
In particular, the updated algorithm restricts the congestion
window to increase every other RTT (see beneath “Slow-
Start,” Figure 24). This period is required in order to employ
the bottleneck buffer estimation technique. As soon as Vegas
detects increasing queues in bottleneck routers (i.e., Δ be-
comes larger than α), the Slow Start algorithm terminates and
transfers control to the Vegas Congestion Avoidance algorithm.
Although the Slow Start modification was designed to
reduce network stress, experimental results [22] show al-
most no measurable impact. The main reason for this is the
negligible working time of the Slow Start phase compared
to the Congestion Avoidance phase. In practice, available
Linux implementations do not perform any changes to the
original Slow Start algorithm and implement only the modified
Congestion Avoidance phase.
As we can see from Figure 24, TCP Vegas has the amazing
property of rate stabilization in a steady state, which can
significantly improve the overall throughput of a TCP flow.
Unfortunately, despite this and other advantages, later research
[23], [25], [32] discovered a number of issues, including
the inability of Vegas to get a fair share when competing
with aggressive TCP Reno-style flows (a reactive approach
is always more aggressive). It also underestimates available
network resources in some environments (e.g., in the case
of multipath routing) and has a bias to new streams (i.e.,
newcomers get a bigger share) due to inaccurate RTT
min
estimates.
H. TCP Vegas+
Hasegawa et al. [23] have recognized a serious problem in
TCP Vegas which prevents any attempts to deploy it. The
Vegas proactive congestion-prevention mechanism (limiting
buffering in the path) cannot effectively compete with the
highly deployed Reno reactive mechanism (inducing network
buffering and buffer overflowing). This point can be illus-
trated using an idealized convergence diagram for competition
between Reno and Vegas flows (Figure 25). While there is
no buffering on the path, both flows slowly increase their
share of network resources (x
0
–x
1
). Excessive buffering forces
Vegas to decrease its congestion window, but the Reno flow,
Network Share (Vegas)
E
q
u
a
l

(
f
a
i
r
)

s
h
a
r
e
N
e
t
w
o
r
k

S
h
a
r
e

(
R
e
n
o
)
Network Limit
Packet
losses
x
0
Zero Buffering
x
2
x
3
x
4
x
5
x
1
Fig. 25. Convergence diagram when an ideal Vegas flow is competing with
a Reno flow
unaware of this buffering, continues acquiring more network
resources (x
1
–x
2
). That is, opposite reactions of the two
different congestion control algorithms (one growing, one
diminishing) maintain the fixed buffering level in the network,
leading to the proactive algorithm being completely pinched
off (x
2
). Buffers are purged only when the Reno flow detects
a packet loss (x
3
). After that, the convergence dynamics will
start looping along the path x
4
–x
5
–x
2
–x
3
–x
4
.
TCP Vegas+ was proposed as a way to provide a way of
incremental Vegas deployment. For this purpose, Vegas+ bor-
rows from both the reactive (Reno-like aggressive) and proac-
tive (Vegas-like moderate) congestion avoidance approaches.
More specifically, the Congestion Avoidance phase of Vegas+
initially assumes a Vegas-friendly network environment and
employs bottleneck buffer estimation to control the congestion
window (i.e., Vegas rules). At the moment when an internal
heuristic detects a Vegas-unfriendly environment, Congestion
Avoidance falls back to the Reno algorithm.
The Vegas-friendliness/unfriendliness detection heuristic is
based on a trend estimate of the RTT. The special state variable
C is increased if the sender estimates an increase in the
RTT and concurrently the size of the congestion window
is unchanged or even reduced. In the opposite case, if the
estimated RTT grows smaller, C is decreased. Clearly, large
values of C indicate a Vegas-unfriendly network state (i.e.,
if the congestion window is stable, the RTT also should be
stable). Transition to the aggressive mode is triggered when
C exceeds a predefined threshold. Return to the moderate
mode occurs only when C becomes zero. Vegas+ additionally
defines two special cases for modifying the state variable: (1)
entering Fast Recovery, C is divided in half, and (2) a packet
loss detected by the retransmission timer reduces C to zero.
In the example of Figure 25, the unfriendliness will be easily
detected during the transition from x
1
to x
2
and Reno’s rules
will be enforced, allowing the Vegas+ flow to obtain its fair
share of network resources.
The Vegas+ solution does not try to solve the fundamental
problems of Vegas discussed in Section II-G. Moreover, the
reactive congestion control elements of Vegas+ practically
nullify the inherited advantages of Vegas. Additionally, un-
der some network conditions (like presence of a fluctuated
congestion-unresponsive traffic or rerouting in the path) Ve-
gas+ can unnecessarily transition to the aggressive mode and
14 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
A. Normal
increase rate
B. Reduced
increase rate
Loss detection
High buffering
zone ( > )
Fig. 26. Congestion window dynamics of TCP Veno
stay there indefinitely (assuming that probability of the loss
detected by the retransmission timer is very low).
I. TCP Veno
Fu and Liew [24] propose a modification to the Reno con-
gestion control algorithm (Section II-C) aimed at improving
the throughput utilization of TCP. The key idea is to use
the Vegas bottleneck buffer estimation technique to perform
early detection of the congestion state. Unlike Vegas, this
buffer estimation is used only to adjust the increase/decrease
coefficient of the Reno congestion control algorithm, and thus
does not inherit Vegas’ problems.
The Veno (VEgas and reNO) algorithm defines two modifi-
cations. First, it limits the increase of the congestion window
during the congestion avoidance phase if the Vegas buffer esti-
mate shows excessive buffer utilization (i.e., Δ > β). In other
words, if the Vegas estimate indicates a congestion state, the
sender starts probing network resources very conservatively
(increasing by one for every two RTT, “B” in Figure 26).
Second, reducing the congestion window upon entering Fast
Recovery is modified to halve the cwnd value only if the buffer
estimate also indicates congestion. That is, in the event of
detecting a loss and Δ > β, the congestion window will be
halved. Otherwise, if only a loss is detected, it will be reduced
to 80% of its current size.
To summarize, the effectiveness of the Veno algorithm is
slightly improved in comparison to Reno. Veno flow tends
to stay longer in the congestion avoidance state with larger
congestion window values. However, the price for this is
additional latency to discover network resources. The Veno
modification has practically no effect on fairness. Therefore
we can consider it to have the same characteristics as the base
Reno algorithm.
J. TCP Vegas A
Besides the inability to compete with Reno flows effectively
(see Section II-H), TCP Vegas has a number of other internal
problems [25]. For instance, under certain circumstances,
Vegas can inappropriately choke off the flow rate to nearly
zero. This happens because the assumption that the RTT will
change only due to buffering is not entirely true. In fact, if
the RTT increases due to a routing change, the algorithm
will make a wrong decision, leading to the reduced flow rate.
To illustrate, Figure 27 presents two curves, one for a low-
RTT/low-rate (1, for example a DSL link) and another for a
high-RTT/high-rate (2, for example a satellite link) path. If a
Congestion window
T
r
a
n
s
m
i
s
s
i
o
n

r
a
t
e
cwnd
Rate
max
1
Rate
max
2
1
2
RTT
min
1
RTT
min
2
Fig. 27. TCP Vegas—estimation error if the path has been rerouted
route changes from 1 to 2 when the congestion window size is
equal to cwnd, the algorithm will wrongly calculate buffering
Δ
2
and may exceed a threshold. The minimal RTT for the low-
RTT/low-rate link will be erroneously used as a baseline in
calculating the expected rate for the high-RTT/high-rate link.
That is, having a congestion-free state, the estimate indicates
congestion.
Another assumption that surfaces occasionally and is in-
correct is that all flows competing along the same path will
observe the same RTT
min
. Let us consider a situation with
two Vegas flows, one that has been transmitting data for a
long time and the other which has just started transmitting.
Naturally, the long-lived flow has more chances to observe the
true minimal RTT, compared to the new flow. The difference
between the minimum RTTs that the two flows observe causes
a difference in congestion state estimates (Figure 28): the old
flow thinks that the network is congested while the new one
estimated a congestion-free network state. As a consequence,
the distribution of network resources favors the new flow.
Srithith et al. [25] have presented the VegasA (Vegas
with Adaptation) algorithm, which extends the original Ve-
gas congestion control with an adaptable mechanism. The
threshold coefficients α and β from the Vegas algorithm are
adjusted depending on the steady state dynamics of the actual
transmission rate. That is, if VegasA detects an increase in
the actual bandwidth while the system is in a stable state
(i.e., α < Δ < β), it assumes a path change and shifts the
boundaries of the control dead-zone upward (α = α + 1 and
β = β + 1). The boundaries are shifted downward if some
network anomaly is detected; for example, if the estimate is
showing congestion-free state Δ < α, yet the rate in actuality
has decreased. Additionally, boundaries are shifted downward
every time the estimate shows the actual congestion.
Besides the threshold adaptability, VegasA adds additional
conditions to the congestion window management algorithm.
An increase is allowed in three cases: (1) if the estimate shows
no congestion and a lower threshold α has a minimal value
(the original Vegas rule); (2) if the actual rate has increased
and the estimate is showing no congestion (Δ < α); or (3) the
actual rate has decreased while the flow is in a steady state
(α < Δ < β). A decrease should occur if either the network
has been determined to be in a congestion state (Δ > β) or if
the actual flow rate has decreased and the network has been
determined to be congestion-free.
According to simulation results [25], VegasA has substan-
tial improvements in various aspects when compared to the
original Vegas design. It preserves the Vegas properties of
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 15
Congestion window
T
r
a
n
s
m
i
s
s
i
o
n

r
a
t
e
cwnd
RTT
min
1
RTT
min
2
Rate
max
Congestion
No congestion
O
l
d

f
l
o
w
N
e
w

f
l
o
w
Fig. 28. TCP Vegas—estimation error if a new flow is observing a higher
RTT
min
stabilizing throughput in a steady state and does not suffer
significantly in the long term from changes in path RTT. To
some extent, a VegasA flow can compete with Reno flows
and acquire its resource share. However, these were only
testing environments; the algorithm has not been evaluated
in real networks. Moreover, VegasA do not address many
other problem that are discussed in the next sections of this
survey (such as the scalability issue in high-speed networks,
resistance to random losses, etc.).
III. PACKET REORDERING
All the congestion control algorithms discussed in the
previous section share the same assumption that the net-
work generally does not reorder packets. This assumption
has allowed the algorithms to create a simple loss detection
mechanism without any need to modify the existing TCP
specification [1]. The standard already requires receivers to
report the sequence number of the last in-order delivered data
packet each time a packet is received, even if received out of
order [27]. For example, in response to a data packet sequence
5,6,7,10,11,12, the receiver will ACK the packet sequence
5,6,7,7,7,7). In the idealized case, the absence of reordering
guarantees that an out-of-order delivery occurs only if some
packet has been lost. Thus, if the sender sees several ACKs
carrying the same sequence numbers (duplicate ACKs), it can
be sure that the network has failed to deliver some data and can
act accordingly (e.g., retransmit lost packet, infer a congestion
state, and reduce the sending rate).
Of course in reality, packets are reordered [33], [34]. This
means that we cannot consider a single duplicate ACK (i.e.,
ACK for an already ACKed data packet) as a loss detection
mechanism with high reliability. To solve this problem of a
false loss detection, a solution employed as a rule of thumb
establishes a threshold value for the minimal number of du-
plicate ACKs required to trigger a packet loss detection (e.g.,
three) [17], [18], [28]. However, there is a clear conflict with
this approach. Loss detection will be unnecessarily delayed
if the network does not reorder packets. At the same time,
the sender will overreact (e.g., retransmit data or reduce
transmission rate needlessly) if the network does in fact
reorder packets.
Packet reordering can stem from various causes. For ex-
ample, it can be erroneous software or hardware behavior,
such as bugs, misconfigurations, or malfunctions. But packets
can also be reordered in some networks as a side effect
of a normal delivery process. For example, packets can be
TD-FR
RR
Eifel
DOOR PR
Reactive
(loss-based)
Fig. 29. Evolutionary graph of TCP variants that solve the packet reordering
problem
reordered if a router enforces diverse packets handling services
(differentiated services [35], [36]) and internally reschedules
packets in its queue (active queue management [37], [38]).
Also if the network provides some level of delivery guarantees
(e.g., wireless networks), the underlying layer (physical or link
layer) can retransmit some portion of the data without TCP’s
prompting and cause a shuffling of the upper layer packets.
Finally, channel bundling and packet processing parallelism
will likely contribute a good portion of the future Internet
[39]–[41].
In this section we present a number of proposed TCP mod-
ifications that try to eliminate or mitigate reordering effects
on TCP flow performance (Table II). All of these solutions
share the following ideas: (a) they allow nonzero probability of
packet reordering, and (b) they can detect out-of-order events
and respond with an increase in flow rate (optimistic reaction).
Nonetheless, these proposals have fundamental differences due
to a range of acceptable degrees of packet reordering, from
moderate in TD-FR to extreme in TCP PR, and different
baseline congestion control approaches. The development of
these proposals is highlighted in Figure 29.
A. TD-FR
A number of measurements conducted in the mid 1990s
[33] proved the presence of out-of-order packet delivery in
the Internet. This highlights the problem of potentially over-
penalizing a TCP flow if its congestion control mechanism
employs loss detection using duplicate ACKs. That is, in the
absence of the congestion or packet losses, each event of
packet reordering triggers at least one (and probably more)
duplicate ACKs, which can be considered an indication of
congested pathways and a guide for reducing the transmission
rate.
At the same time, there are two observations about out-
of-order packet delivery. According to Paxson’s work [33],
this effect is not uniformly distributed across network sites.
Measurements identified a low level of reorderings (0.1%–
2% on average), with peaks in some traces as high as 36%.
Moreover, Paxson made the most interesting observation: the
data transfers having the highest degrees of reordering also
experienced almost no packet losses.
Paxson [33] proposed a simple way to eliminate the penal-
ties of reordering through TD-FR, time delayed Fast Recovery.
If a receiver does not respond immediately to out-of-order
data packets with duplicate ACKs, but postpones the action
(e.g., by 8–20 msec, depending on the reordering pattern),
a majority of the reordering events will be hidden from the
sender. However, the advantage of this solution is, at the same
16 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
TABLE II
FEATURES OF TCP VARIANTS THAT SOLVE THE PACKET REORDERING PROBLEM
TCP Variant Section Year Base Added/Changed Modes or Features Mod
1
Status
Implementation
BSD
2
Linux Sim
TD-FR
[33]
III-A 1997 Reno Time delayed fast recovery R Experimental
Eifel
[42], [43]
III-B 2000 NewReno Differentiation between transmitted and
retransmitted data packets
S or
(S+R+P)
Standard i3.0, F

2.2.10

ns2
TCP DOOR
[44]
III-C 2002 NewReno Out-of-order detection and feedback, temporary
congestion control disabling and instant
recovery
S+R+P Experimental ns2

TCP PR
[45]
III-D 2003 NewReno Fine-grained retransmission timeouts, no
reaction to DUP ACKs
S Experimental ns2
DSACK
[46]
III-E 2000 SACK Reporting duplicate segments R Standard >2.4.0
RR-TCP
[47], [48]
III-F 2002 DSACK Duplicate ACK threshold adaptation S Experimental ns2
1
TCP specification modification: S = the sender reactions, R = the receiver reactions, P = the protocol specification
2
i for BSDi, F for FreeBSD

optional or available in patch form
time, a disadvantage as well. The artificial delay, aimed at
preventing overreaction, adds to the time required to detect
actual losses. If the delay grows too big, the “fast” loss-
detection mechanism becomes slower than a conventional
loss detection based on RTO. Clearly, the nondeterministic
nature of the reordering effect demands some path adaptation
mechanisms, which unfortunately are not implemented in
Paxson’s solution.
B. Eifel Algorithm
Ludwig and Katz [42] introduced the Eifel
2
algorithm as
an alternative method to alleviate the negative effects of
packet reordering in TCP throughput. Instead of the TD-FR
approach of introducing additional delay to the loss detection
process based on duplicate ACKs (Section III-A), Eifel tries
to distinguish reordering and real loss events. It does not try
to guess the event type upon reception of the first duplicate,
but rather postpones the decision until the first non-duplicate
ACK is received. In other words, if the TCP sender receives
a number (e.g., 3) of duplicate ACKs, as in NewReno, it
enters Fast Recovery. When a non-duplicate ACK is received,
Eifel checks its content and makes a decision whether to
continue Fast Recovery or abort recovery and restore the
original congestion window value. The advantage of the Eifel
algorithm is clearly visible in Figure 30. On the one hand,
the defined actions of Eifel do not affect normal operations
of the base congestion control algorithm when there is no
packet reordering. On the other hand, when some packets
are reordered, the original sending rate will be restored very
quickly.
To reach the right decision, Eifel must resolve the ambiguity
of a retransmission [26]. To clarify, let us consider a situation
where a TCP sender decides to retransmit a data packet (e.g.,
due to receiving several duplicate ACKs). After receiving the
first non-duplicate ACK, the sender does not know whether
the retransmission helped resolve the problem or whether the
problem resolved itself (as in case of a long burst of reordered
packets). If either ACKs carry additional information to indi-
cate not only a sequence number, but also some identification
of the actual transmission, or if the ACKs can indicate that
2
Authors choice of spelling, after a mountain range in western Germany
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Reordering detection
With Eifel (improved
robustness)
Original Reno w/o Eifel
Fig. 30. Comparison of congestion window dynamics between Reno
(NewReno) and Eifel
the ACK itself has been triggered by a retransmitted data
packet, the ambiguity problem is easily resolved. The latter
case is the easiest and most “cost-effective” way. For example,
we can assign two bits from the unused space in the TCP
header, where one bit is used to indicate retransmission of a
data packet and the other one to echo this information back
to the sender in an ACK. Although theoretically possible, a
change in the TCP protocol is highly undesirable, as it makes
deployment practically impossible.
We could instead use a standardized and highly deployed
TCP Timestamp option [31]. In this case, the sender maintains
an additional state variable (a time of the first retransmission)
for each retransmitted data packet. Having the Timestamp
option, each ACK packet will explicitly indicate what we need.
If a received non-duplicate ACK has a timestamp less than
a corresponding state variable, the sender can be sure that
no actual losses have occurred on the path and transmission
should be returned to the original state. This ability to protect
TCP transfer from the packet reorderings in Internet paths,
achieved with relative simplicity, allowed Eifel to become an
RFC standard in 2005 [43].
C. TCP DOOR
Wang and Zhang [44] were concerned with TCP perfor-
mance in mobile ad hoc networks (MANETs), which feature
route changes with high probability and thus are highly
penalized by the conventional congestion control algorithms.
During route changes many packets can be lost, causing
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 17
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
T2 T
1
Reordering
detection and
instant recovery
Loss detection
Original Reno w/o DOOR
With DOOR
(improved
robustness)
Fig. 31. Comparison of congestion window dynamics between Reno (or
SACK) and DOOR
congestion control to make the wrong decision for reducing
the rate of flow. If we can identify a time interval during
which the network route has changed, then we can eliminate
the penalty in TCP throughput by temporarily disabling the
congestion control actions during this interval. This idea
underlies the proposed TCP DOOR (Detection of Out-of-
Order and Response).
During a route change event it is very probable that the
order of IP packets will be changed. Thus, the problem to
identify a route change can be replaced by identifying an out-
of-order packet delivery. Similar to Eifel (Section III-B), in
order to detect packet reordering reliably, each data and ACK
packet should carry some additional information. For example,
this information can be included in a new TCP option in the
form of a special counter, which is increased every time a
new packet is sent. In this case, the receiver can easily detect
reordering and report it to the sender using some bit, either in
the TCP header or in a new TCP option field. Another variant
considered in paper [44] is utilizing a well-known Timestamp
option [31] in a manner similar to the standard Eifel algorithm.
A reaction to detecting packet reordering (which can be
considered an equivalent to route change in MANETs) entails
two components (Figure 31). First, congestion control should
be temporarily disabled to mitigate transition effects (time
period T
1
in Figure 31). Second, if congestion control has
recently reduced the sending rate due to loss detection (during
time interval T
2
in Figure 31), the original state (the conges-
tion window and retransmission timeout values) should revert
(so called instant recovery). This action alleviates previous
penalties from the detected rerouting event. The interval for
the temporary congestion control disabling and the preceding
time period for the instant recovery are not known a priori and
depend on the actual network. Wang and Zhang conducted a
number of simulations where they varied underlying routing
protocols, timing values, and network conditions. Although
some of the results show more than a 50% throughput im-
provement compared to TCP with the SACK option, there are
cases with minimal to zero improvement.
D. TCP PR
Bohacek et al. [45] noticed that since packet reordering
is a common event in the network (e.g., in mobile ad hoc
networks), duplicate ACKs cannot be considered reliable
indications of either loss in the path or of congestion. In TCP
PR (Persistent Reordering), the authors no longer assume the
validity of inferring something from duplicate ACKs. Instead,
they focus on making the retransmission timeout a robust
and reliable loss and congestion indicator in a wide range
of network environments.
In contrast to previously developed congestion controls,
TCP PR maintains a timestamp for each transmitted data
packet. A loss is detected whenever the timestamp of a data
packet becomes older than the estimated RTT maximum (M).
The concept of RTT maximum is similar to the RTO, but
differs in implementation. Instead of the RTO recalculation
once per RTT, the maximum estimate M is readjusted on each
ACK arrival according to the formula:
M = β · max

α
1
cwnd
· M, RTT

where α and β are constants (0 < α < 1, β > 1). Taking
into account that a recalculation is made with each ACK, α
represents a maximal decrease rate of the M in RTT timescale
(i.e., cwnd ×α
1
cwnd
is α).
As long as timeouts are treated optimistically (i.e., after a
loss detection, flow is allowed to transmit at a multiplicatively
reduced rate), TCP PR faces the problem of overreaction to
multiple losses from the same congestion event, similar to
Reno. To resolve this, each transmitted packet is also tagged
with the current value of the congestion window. When a
packet is lost, the congestion window is reduced by no more
than half of the stored value for the lost packet.
Because of different loss detection mechanisms, we cannot
directly compare TCP PR with the previous congestion control
algorithms. If we assume that an algorithm based on fine-
grained timeouts is as robust as one based on duplicate
ACKs, the fairness and effectiveness characteristics will be
exactly the same as presented in Section II-C. Though this
is not entirely true in all network environments, there are
networks (e.g., MANETs), where duplicate ACKs are highly
unreliable feedback. Thus, TCP PR can greatly help improve
TCP efficacy in those cases, e.g., where the network normally
reorders packets.
E. DSACK
The specification of the selective ACK extension for TCP
[20] does not define particular actions to take if a receiver
encounters a data packet which has already been delivered.
This can happen, for example, if the network reorders or
replicates data packets, or if the sender wrongly estimates
the retransmission timeout. The DSACK (Duplicate Selective
ACKnowledgements) specification [46] complements the stan-
dard and provides a backward-compatible way to report such
duplicates.
DSACK requires the receiver to report each receipt of
a duplicate packet to the sender. However, there are two
possibilities of duplication, which should be treated in slightly
different ways. First, the duplicated data can be some part of
the acknowledged continuous data stream. Second, it can be
a part of some isolated block. In the former case, a DSACK-
compliant receiver should include a range of sequence num-
bers in the first block of the SACK option (Figure 32a). In
the latter case, besides including a duplicate range in the first
block, the receiver should attach the isolated block at the
second position in the SACK option (Figure 32b). In that way,
18 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
ACK
dup_start
dup_end
...
...
dup_end<ACK
(a) DSACK report
ACK
dup_start
dup_end
iso_start
iso_end
...
...
iso_start <= dup_start
dup_end <= iso_end
(b) DSACK+SACK report
Fig. 32. DSACK reporting
DSACK, without violating the SACK standard [20], provides
a way to report packet duplication.
Similar to SACK, the DSACK specification does not specify
any particular actions for the sender. Instead, its authors
merely discuss several issues for future research. One such
issue is the detection of packet reordering events. If a sender
can assume that duplication is caused primarily by packet
reordering, it can undo some previous congestion control
actions upon receipt of a DSACK packet (similar to Eifel and
DOOR). Other discussed issues include a differential treatment
of the normal SACK and DSACK packets, an implementation
of some form of an ACK congestion control, and resolving the
issue of the RTO underestimation. However, DSACK-based
solutions should not blindly trust the DSACK information, as
the receiver can send faulty information, either intentionally
or unintentionally.
F. RR TCP
The SACK option by itself can provide a lot of information
about patterns of packet delivery. For example, the occurrence
of a reordering event can be detected if the sender receives
a selective ACK packet followed by a cumulative ACK.
Moreover, in this case we can also calculate a reordering
length, i.e., how long a packet was delayed, in terms of
packets. However, this would work only if no packets were
retransmitted. Otherwise, it is unknown which event (original
or repeated transmission) might have helped to recover a
packet loss previously reported by the SACK. An approach
presented in RR TCP (Reordering Robust) [47], [48] uses a
DSACK to resolve the retransmission ambiguity. In short, after
a sender retransmits a data packet that has been detected as
lost, a succession of an ACK (or a SACK) and a DSACK,
both covering the retransmitted packet, indicates that both
the original transmission and the retransmission were actually
successful. Because the sender knows the exact transmission
sequence (the order packets were transmitted and retransmit-
ted), reordering length can be easily calculated. The downside
of this approach is that we cannot infer anything if either of
the first or second ACK is lost.
RR TCP defines a way to use the calculated reordering
length. If we know how long packets are usually delayed, we
can adjust a threshold of duplicate ACKs (dupthresh, which
usually is 3), which triggers the Fast Recovery phase. This,
in contrast to Eifel (Section III-B) or DOOR (Section III-C),
will proactively protect the sender from overreacting if packets
LP Nice
Proactive
(delay-based)
Reactive
(loss-based)
Fig. 33. Evolutionary graph of TCP variants that implement a low-priority
data transfer service
have been reordered, not lost. Unfortunately, if dupthresh
is set too high, all advantages of the robust loss detection
will be eliminated. RR TCP includes a concept of a con-
trolling loop for finding the optimal dupthresh value for a
given path using a combined cost function, which integrates
several costs including false timeouts and fast retransmits.
Experimental evaluation shows consistent improvements with
RR TCP, compared to TCP with the SACK option, in a
wide range of network environments (i.e., varying delays, loss
ration, reordering lengths). However, these improvements are
effective only in long-lived TCP connections.
IV. DIFFERENTIAL SERVICES
Different application types have different data transfer re-
quirements. Some applications, composing one group, have
strict requirements for request-response delay and throughput
(e.g., WEB browsing, FTP transfer, etc). Other applications do
not have any particular requirements and are highly tolerant of
the network conditions (e.g., automatic updates). In general,
if traffic of the first application group can be prioritized, the
overall user-perceived quality of service in the network (QoS)
can be increased [49]. Unfortunately, due to a high level of
Internet heterogeneity, even though there have been a number
of attempts to provide a QoS functionality on the network
(IP) level [50], [51], this feature is not yet globally available
[52]. To overcome the deployment problem and yet provide
some level of QoS, two host-to-host TCP-based prioritization
techniques have been proposed (Table III).
The central idea among the proposed solutions is to enforce
and guarantee, through special congestion control policies,
an “unfair” network share distribution between high- and
low-priority flows. This idea may seem to contradict the
basic fairness requirement for TCP congestion control: a new
congestion control should not be more aggressive than the
standard TCP congestion control algorithms (Reno, NewReno,
and SACK). However, if we restrict the TCP-based QoS scope
only to a low-priority service (i.e., to a problem of finding
congestion control policies that would guarantee the network
resource release if there are high-priority—standard TCP—
flows present), then we definitely will comply with the fairness
requirement.
In the remaining part of this section we will provide
an overview of the two existing TCP-based QoS proposals
(Table III), which share the idea of providing a one-level,
low-priority data transfer service. The key differences between
proposals are: (a) different baseline congestion control algo-
rithms (Vegas for Nice and Reno for LP, see Figure 33), and
(b) different mechanisms to detect the presence of a high-
priority data transfer.
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 19
TABLE III
FEATURES OF TCP VARIANTS THAT IMPLEMENT A LOW-PRIORITY DATA TRANSFER SERVICE
TCP Variant Section Year Base Added/Changed Modes or Features Mod
1
Status
Implementation
Linux Sim
Nice
[53]
IV-A 2002 Vegas Delay threshold as a secondary congestion
indicator
S Experimental 2.3.15

LP
[49], [54]
IV-B 2002 NewReno Early congestion detection S Experimental >2.6.18 ns2
1
TCP specification modification: S = the sender reactions, R = the receiver reactions, P = the protocol specification

optional or available in patch form
A. TCP Nice
Venkataramani et al. [53] identified the need to optimize
the network resources in the presence of a large number
of background transfers—automatic updates, data backups,
peer-to-peer file sharing, etc. As a solution, they proposed a
new congestion control algorithm, TCP Nice, that enables a
simple distributed host-to-host mechanism to minimize the in-
terference between high-priority (foreground) and low-priority
(background) flows. More particularly, Nice’s congestion con-
trol policies are adjusted to react highly conservatively to all
detected network state changes. In one sense, Nice considers
all standard TCP flows as carrying high-priority data and tries
to consume the network resources only if nobody else uses
them.
The design of Nice is based on the Vegas algorithm (see
Section II-G). There are two main reasons for this choice: (1)
Vegas incorporates a proactive congestion detection mecha-
nism which allows redistributing network resources between
competing TCP flows without inducing any packet losses (i.e.,
interference between Vegas flows is lower than that for stan-
dard TCP flows); (2) due to its proactive nature, a TCP flow
running the Vegas congestion control algorithm has problems
capturing its network resource share while competing with
a reactive Reno-like TCP flow (i.e., standard TCP flow), i.e.,
Vegas itself provides some level of a low-priority data transfer
service.
To provide a guarantee of the transmission rate reduction
in the presence of standard TCP flows, Nice defines a con-
cept similar to the queueing delay threshold defined in the
DUAL algorithm (Section II-B). However, there are several
major differences. First, the queuing delay is compared to the
threshold upon the arrival of each non-duplicate ACK packet.
Second, instead of the averaged RTT, a current RTT sample is
used in queuing delay calculations. Finally, the occurrence of
a current queuing delay estimate exceeding the threshold does
not automatically trigger changes in the congestion windows.
Instead, Nice counts the number of times (X) that the queuing
delay exceeds the threshold during each RTT period:
∀t ∈ (t
0
, t
0
+RTT) Q
ACK
(t) > Q
thresh
⇒ X = X + 1
The counted value X estimates the number of ACK packets
which have been delayed due to interference with cross traffic
(e.g., high-priority flows). If we assume the idealized case
when no ACKs are lost or delayed by the receiver, then
the ratio between X and the congestion window (measured
in packets) would estimate a percent of enqueued (delayed)
packets during the latest RTT. In Nice, if this estimate exceeds
a predefined threshold, the congestion window is halved
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
High priority flow is present
( - interference triggering)
Fig. 34. Congestion window dynamics of TCP Nice
(Figure 34). The right choice of threshold value can make Nice
much more sensible than both the original DUAL and Vegas
algorithms. In addition, Nice allows the congestion window
size to be a fraction (the minimum is 1/48), meaning that
only one packet is allowed to be sent in several RTT periods
(48 RTTs in the worst case). This makes Nice even more
conservative in network resource utilization in the presence of
cross traffic.
B. TCP LP
Almost concurrently with the Nice algorithm proposal (see
Section IV-A), Kuzmanovic and Knightly [49], [54] presented
a similar algorithm, TCP LP (Low Priority). It aimed to
provide a low-priority data transfer service for background
applications (e.g., software updates, data backup, etc.). How-
ever, for the baseline congestion control algorithm, its authors
have chosen NewReno instead of Vegas. Other differences are
in the way the presence of cross traffic is detected and what
preventive measures are applied to minimize interference.
In TCP LP the DUAL’s calculation of a queuing delay
(see Section II-B) is refined progressively by using more
accurate delay estimates. For this purpose LP makes use of the
Timestamp option [31] and applies heuristics to estimate the
one-way propagation delay (e.g., similar to Choi and Yoo’s
proposal [55]). Although this can complicate queueing delay
calculations, the resulting values are much more resistant to
congestion in the reverse channel, thus the level of false
congestion detections is substantially decreased. The actual
process of congestion detection (in terms of LP it is early
congestion detection) with minor modifications repeats the one
defined in DUAL: (1) LP maintains minimum and maximum
one-way delays during the connection lifetime, and (2) once
every RTT, TCP LP compares the current one-way delay
estimate with a predefined threshold (a fraction of queuing
delay plus a minimum of one-way delay).
The unique feature of the TCP LP algorithm is its reaction
to early congestion detection. Upon detection of a first such
event, LP reduces the congestion window to half the current
20 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
High priority flow is present
inference timeout
early congestion detection
Fig. 35. Congestion window dynamics of TCP LP
value and starts the inference timer. If the sender triggers
another early congestion detection event before the timer
elapses, LP infers presence of the high-priority flow and
the congestion window is reduced to the minimal value. In
other cases, LP resumes the normal (Reno-like) congestion
avoidance actions (Figure 35).
NS2 simulations and real-world experiments using a Linux
implementation of the LP algorithm have shown that it indeed
has the desired property of yielding network resources to the
standard TCP (high-priority) flows and, at the same time,
successfully utilizing the network bandwidth if no such flows
are present. Moreover, LP is able to fairly distribute the
network resources among low-priority flows (inter-fairness).
There is no definitive answer to whether the TCP LP
algorithm or Nice algorithm is better. On one hand, both
of them are extremely sensitive to activity in the network,
and thus fulfill a necessary condition for the low-priority
service implementation. But on the other hand, there is a big
question as to how well both algorithms are able to utilize
the network capacity if only low-priority flows are present.
Although Nice and LP should have the same characteristics
as the baseline algorithms (Vegas and NewReno respectively),
this has not been proved. Moreover, widespread use of wireless
and high-speed networks limit the applicability of either Nice
or LP, due to ineffectiveness of the baseline algorithms in
those environments. Although Kuzmanovic et al. [56] made
an attempt to create a high-speed modification of LP, HSTCP-
LP, additional research is required to investigate a real-world
applicability of the designed solution.
V. WIRELESS NETWORKS
The growing spread of wireless networks has highlighted
the need for TCP protocol modification. Originally designed
for wired networks where congestion is the primary cause
of packet losses, TCP is unable to react adequately to packet
losses not related to congestion. Indeed, if a data packet is lost
due to short-term radio frequency interference, then there are
no router buffer overflows and TCP’s decision to reduce the
congestion window is wrong. Instead, it should just recover
from the loss and continue the transmission as if nothing had
happened.
Several solutions have been proposed to resolve this prob-
lem. One group gives up the idea of a pure host-to-host
data transfer either by (a) requiring routers to disclose the
network state (e.g., using explicit congestion notification [57]),
by (b) relying on network channels to recover from the non-
congestion-related losses (e.g., link-layer retransmission [58]
Westwood
CRB ABSE
Westwood+ BR
BBE
Reactive
(loss-based)
Reactive
(loss-based with bandwidth estimation)
Fig. 36. Evolutionary graph of TCP variants that enable resistance to random
losses
or TCP packet snooping and loss recovery by intermediate
routers [59]), or by (c) isolating the wireless error-prone and
wired error-safe transmission paths using an intermediate host
[60], [61]. These approaches are beyond the scope of this
survey and have been thoroughly discussed by Lochert et al.
[2].
In this section we focus on solutions that keep the host-
to-host idea and at the same time provide some level of
resistance to non-congestion related packet losses. The band-
width estimation technique proposed by Mascolo et al. as
a part of TCP Westwood [62] laid the foundation for the
sender-side distinguishing between a congestion-related and an
unrelated (random) loss without any support from the network.
Research that follows (Figure 36) identified several of West-
wood’s weaknesses, for example, bandwidth overestimation,
insufficient robustness in networks with extreme levels of
transmission errors, etc. Table IV shows characteristic features
of refinements in Westwood that try to mitigate discovered
problems.
A. TCP Westwood/Westwood+
TCP Westwood proposed by Mascolo et al. [62] keeps the
distributed network-independent ideology of TCP and is a
modification of the NewReno TCP congestion control algo-
rithm. At the same time, it can significantly improve the data
transfer efficiency in error-prone networks (e.g., wireless). To
do so Westwood replaces the blind Reno’s congestion control
actions that are triggered by loss detection (i.e., halving if three
duplicate ACKs are received) with a heuristic-based procedure
of setting the congestion windoww to an optimal value (Faster
Recovery). As an optimum, the heuristic considers a value
which corresponds to a data transfer rate observed in the
recent past (w ≈ rate × RTT). Indeed if there is a random
error due to wireless interference, the optimum would reflect
the best choice for the sender: transmission without any rate
reduction. In another case, if a packet is lost due to congestion
in the network, the data reception rate recently observed by the
receiver is exactly the rate at which the network is capable of
delivering data from the sender (“achieved data rate”). If the
sender continues transmission at a rate equal to that observed
by the receiver, the number of newly transmitted packets will
be equal to the number of delivered packets (router queues
would not be growing), and additional congestion will be
prevented.
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 21
TABLE IV
FEATURES OF TCP VARIANTS THAT ENABLE RESISTANCE TO RANDOM LOSSES
TCP Variant Section Year Base Added/Changed Modes or Features Mod
1
Status
Implementation
Linux Sim
TCP Westwood
[62], [63]
V-A 2001 NewReno Estimate of available bandwidth (ACK
granularity), Faster Recovery
S Experimental ns2

TCP Westwood+
[64]
V-A 2004 Westwood Estimate of available bandwidth (RTT
granularity)
S Experimental >2.6.12
TCPW CRB
[65]
V-B 2002 Westwood Available bandwidth estimate (combination of
ACK and long-term granularity), identifying
predominant cause of packet loss
S Experimental ns2

TCPW ABSE
[66]
V-C 2002 CRB Available bandwidth estimate (continuously
varied sampling interval), varied exponential
smoothing coefficient
S Experimental ns2

TCPW BR
[67]
V-D 2003 Westwood Loss type estimation technique (queuing delay
estimation threshold, rate gap threshold),
retransmission of all outstanding data packets,
limiting retransmission timer backoff
S Experimental
TCPW BBE
[68]
V-E 2003 Westwood Effective bottleneck buffer capacity estimation,
reduction coefficient adaptation, congestion
window boosting
S Experimental
1
TCP specification modification: S = the sender reactions, R = the receiver reactions, P = the protocol specification

optional or available in patch form
Forward path
Return path
A
C
K
A
C
K
A
C
K
Data Data Data BW=Data/
BWACKed data/
Fig. 37. Rationale for the available bandwidth estimation technique
Having this win-win situation for all packet loss cases,
the only question is how the sender can discover the rate
observed by the receiver. As a direct solution we can ask the
receiver to send special rate notifications. However, from the
deployment point of view, this is extremely hard. The proposed
[62] and later patented [63] solution is to perform a sender-
side estimation of the actual delivery rate based on an existing
notification mechanism (i.e., using ACK packets).
To illustrate the rationale behind this estimate, let us con-
sider the following example (Figure 37). If we assume that an
ACK packet is generated right after a data packet is received
and that ACKs are evenly delayed in the return path, the ACK
rate observed by the sender will be equal to the data delivery
rate observed by the receiver. To calculate the forward-path
bandwidth actually utilized, we just need to multiply the ACK
rate by the amount of acknowledged data. The bandwidth
calculation holds in the long term even if some ACKs are
lost or delayed by the receiver; i.e., a decrease in the ACK
rate will be compensated by an increase in the acknowledged
data amount.
To mitigate fluctuations, Westwood has a two-level band-
width estimate processing capability. On the first level, the
instantaneous estimate is calculated upon reception of an ACK
packet (b = d/Δ, where d is the amount of acknowledged data
by the ACK and Δ is the time elapsed since the last ACK
received). On the second level, the calculated instantaneous
values are averaged with a special discrete time filter [62]:
B = α(Δ) · B
−1
+ [1 −α(Δ)] ·

b +b
−1
2

where α(Δ) is the averaging coefficient, as a function of Δ;
b and b
−1
are current and previous samples of the bandwidth
estimate; and B
−1
is the previously calculated average value
of the estimate.
Although set-up experiments have shown a good level of
precision for Westwood’s estimate, practice has discovered
that the calculation may be substantially wrong in certain
network conditions [64], [66]. For example, in the presence of
the ACK compression effect [69], when ACKs are differently
delayed and grouped due to congestion over the reverse
path, discrete averaging of instantaneous bandwidth estimation
samples leads to substantial overestimation. For that reason,
in the revised Westwood+ algorithm [64] the estimate has
been changed so that it is calculated with RTT granularity;
i.e., in the formula b = d/Δ, d is now the amount of
acknowledged data during the last RTT and Δ is the RTT
itself. This estimate of average bandwidth during the last RTT
is defined to be further averaged in long-term using the well-
known exponential smoothing technique, with a smoothing
factor α = 0.9:
B = α · B
−1
+ (1 −α) · b
Although it has been asserted the Westwood algorithm
shows good fairness properties, this is not entirely straight-
forward from a theoretical point of view. Presence of intra-
fairness property (i.e., fairness between TCP flows running
the Westwood algorithm) can be shown using the diagram in
Figure 38. After two flows start competing from any state
x
0
, they increase their congestion window (i.e., share of
network resources) evenly, until a network limit is reached.
It can be shown that if two Westwood flows simultaneously
detect a congestion event and reduce their congestion windows
w based on the achieved rate estimate (w = B × RTT),
the ratio between flows’ congestion windows would remain
intact. During the consecutive Congestion Avoidance phase,
22 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
Network Share
E
q
u
a
l

(
f
a
i
r
)

s
h
a
r
e
N
e
t
w
o
r
k

S
h
a
r
e
Network Limit
Packet losses
x
0
Zero Buffering
x
1
x
n
x
n+1
Fig. 38. Convergence diagram when two Westwood flows are competing
with each other
the ratio would slowly increase (e.g., if upon loss detection
the congestion window sizes of the two flows have the ratio
1:10, then after ten steps of linear increase—in ten RTTs—a
new ratio would be 11:20). Clearly, in a finite number of steps,
the ratio between congestion window sizes will become very
close to one.
Unfortunately, inter-fairness or fairness between Westwood
and legacy Reno-type flows is not quite definite. In an
idealized case, when a Westwood flow knows the exact
amount of utilized network resources, a Reno flow will be
suppressed. This will happen, in theory, because Reno always
halves its network share while Westwood sets it depending
on the estimate value. However, in practice, due to various
random processes in the network and an imprecise band-
width estimation technique (ACKs may be delayed or lost),
Westwood/Westwood+ flows can compete successfully and
relatively fairly with Reno-type flows.
B. TCPW CRB
Wang et al. [65] acknowledged the critical vulnerability of
Westwood: under certain network conditions the bandwidth
estimation (BE) technique gives highly inaccurate results.
As a solution to this problem they proposed TCPW CRB
(Westwood with Combined Rate and Bandwidth estimation),
which refines the estimation algorithm by complementing it
with a conservative long-term bandwidth calculation (“rate
estimation” RE) technique. It is similar to one from the West-
wood+ proposal, but the sampling period is some predefined
constant T, instead of a measured RTT.
Experimental results show that the long-term estimate pre-
vents overestimation if a network is experiencing congestion.
At the same time, it is likely to underestimate bandwidth in
the presence of random errors. To tackle both underestimation
and overestimation problems simultaneously, CRB maintains
two estimates, an old and a new. Upon detecting a packet
loss, CRB chooses one of the estimates depending on the
assumed predominant loss type: the old estimate for random
loss (i.e., new value of the congestion window is calculated
as BE ×RTT
min
) and the new one for congestion loss (i.e.,
the congestion window is set to RE ×RTT
min
).
A primary cause for loss is assumed to be congestion
when the long-term bandwidth estimate RE shows a high
level of imprecision, which is determined by comparing the
ratio between the current congestion window size and relation
RE × RTT
min
to a predefined threshold θ. If this ratio is
lower than the threshold θ (e.g., θ = 1.4), a congestion event
is assumed; otherwise, CRB thinks that loss is not related to
congestion.
As long as CRB does not conceptually change Westwood
policies upon detecting a loss (i.e., Faster Recovery), intra-
fairness characteristics remain unchanged. CRB authors claim
that the dual bandwidth estimate (BE and RE) improves
Westwood fairness to legacy Reno/NewReno flows. However,
this has been confirmed only through a number of NS2
simulations and the authors agree that future investigation is
required to evaluate CRB in wide-range network scenarios that
include real Internet experiments.
C. TCPW ABSE
As an extension of CRB (Section V-B), Wang et al. [66]
proposed TCPW ABSE (Westwood with Adaptive Band-
width Share Estimation). ABSE leverages the idea of dual
bandwidth estimation by introducing a bandwidth sampling
interval adaptation mechanism. In other words, instead of two
predetermined sample intervals for CRB (ACK inter-arrival
and a long predefined constant period), ABSE continuously
changes the interval depending on an estimated network state.
The network state estimation heuristic is adapted to directly
control the length of a sampling interval Δ in slightly changed
form compared to CRB:
Δ = max

Δ
min
,
RTT · (V E −RE)
V E

where Δ
min
is a predefined minimal sampling interval,
V E is a Vegas-type estimation of expected rate (V E =
cwnd/RTT
min
, see Section II-G), and RE is an exponentially
averaged bandwidth estimate with a sampling interval equal to
the RTT, similar to Westwood+ (Section V-A). If the current
value of RE is significantly smaller than predicted by the
Vegas-like estimation V E, the network is likely to be in severe
congestion. Thus, similar to CRB, a long sampling interval
will be calculated (i.e., Δ is close to RTT when RE → 0).
In the opposite case, when V E and RE are close (i.e., when
a number of lost packets are close to or equal zero and when
RTT is close to the minimal value), the minimal sampling
interval will be used. Clearly, these observations of the border
cases comply with the definition of the CRB heuristic. It
is claimed that smooth adaptation of the sampling interval
improves estimation precision in transition periods.
In addition to the adaptive calculation of a sampling interval,
ABSE also defines a varied exponential smoothing coefficient
for averaging bandwidth estimation samples. The basic idea
is to make the averaging sharper if availability of network re-
sources is changing very dynamically (the new sample should
have a bigger impact on the averaged value), and smoother
otherwise. The level of dynamics is calculated through a
bandwidth estimate jitter. Through NS2 simulations, ABSE’s
authors have confirmed that the varied smoothing coefficient
is able to help achieve a fast response to changes, and at the
same time provide resistance to the noise.
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 23
Similar to CRB, ABSE does not change Westwood’s con-
cept of Faster Recovery, and thus has similar inter-fairness
properties. In addition, NS2 simulations showed very good
characteristics of ABSE fairness to legacy NewReno flows.
However, real-world experiments are required to confirm sim-
ulation results.
D. TCPW BR
Though the Westwood approach (Sections V-A through
V-C) can significantly improve the effective TCP throughput
in the presence of non-congestion related packet losses, Yang
et al. [67] discovered that it cannot effectively handle volumes
of random errors (> 2%). A newly proposed TCPW BR
(Westwood with Bulk Repeat) algorithm is also based on
Westwood, but additionally integrates a special loss-type de-
tection mechanism. Upon each loss detection, if it is estimated
to be non-congestion related, BR applies very aggressive
(highly optimistic) recovery policies instead of the original
ones. The proposed loss type estimation mechanism in BR is
a compound of two loss-detection algorithms [70]: the queuing
delay estimation threshold and rate gap threshold algorithms.
The queuing delay estimation threshold (QDET) algorithm
is similar to Spike [71] and is based on the DUAL concept
of measuring the queuing delay (Section II-B). The main
difference is that QDET maintains two thresholds T
start
and
T
end
. T
start
represents a condition for entering the state
when all losses are assumed to be caused by congestion
(T
start
= α · Q
max
). T
end
is a condition for returning to
the default state when non-congestion loss type is assumed
(T
start
= β · Q
max
). The threshold coefficients α and β can
be, for example, 0.4 and 0.05 respectively.
The rate gap threshold algorithm is based on comparing a
Westwood’s bandwidth estimate (BE) to a fraction α of the
expected throughput (V E). The latter is calculated in a manner
similar to Vegas (Section II-G): V E = cwnd/RTT
min
. If
Westwood’s estimate is less than the predefined fraction of
expected throughput, loss is assumed to be due to a congestion
event; otherwise, non-congestion related loss type is assumed.
The rationale behind this comparison is that when there is
no congestion, even in the face of a substantial number of
packet losses, the data throughput is still relatively close to
the expected (i.e., RTT is close to RTT
min
and the amount
of delivered data packets during last the RTT is close to cwnd).
Yand et al. [67] claimed that utilizing two independent loss-
type estimation mechanisms increases estimation precision
and reduces the number of false positives. This is especially
crucial because of BR’s policies when detecting a non-
congestion loss. If this is the case, BR immediately retransmits
all data packets that have been transmitted and have not yet
been acknowledged (outstanding data packets), and does not
modify the congestion window size. In some environments
these policies can be extremely helpful in the case of real
non-congestion losses, and much more effective than TCP
SACK/FACK (Section II-E, II-F) policies due to their internal
limitations (i.e., one SACK can indicate no more than four
blocks of lost packets).
In addition, BR changes the retransmission timer backoff
algorithm (see Section II-A) by limiting a maximum timer
value during non-congestion packet losses with a predefined
constant. This decision improves the recovery time in envi-
ronments where the probability of loss is extremely high.
E. TCPW BBE
Shimonishi et al. [68] showed that flows running the West-
wood, Westwood+ (Section V-A), or ABSE (Section V-C)
congestion control algorithms can be highly unfair to standard
TCP flows if the network has limited buffering capabilities. To
resolve this problem, they introduced BBE, a Bottleneck and
Buffer Estimation algorithm that refines the Westwood policy
of reducing the congestion window size w upon detecting
a loss. More specifically, BBE complements the congestion
window reduction policy with an additional variable coef-
ficient μ ≤ 1: w = RTT
min
× B × μ, where B is the
Westwood estimate of a recent achievable rate. This coefficient
borrows DUAL’s concept (Section II-B) of sensing the current
network state: when the current queuing delay Q is close to a
maximum Q
max
, the network is considered to be experiencing
congestion and μ should be 1/2 (i.e., same as the standard
TCP during congestion); otherwise, the network is congestion-
free, and μ can be 1. The actual proposed calculation of
the coefficient μ is defined as μ = Q
max
/(Q + Q
max
)
where Q
max
is not just the maximum queuing delay observed
during the connection lifetime, but the exponentially smoothed
queuing delay samples obtained just before each loss detection
event. This technique allows BBE to adapt easily to network
changes.
Additionally, BBE recognizes the overestimation problem in
the original Westwood algorithm and proposes a hybrid esti-
mation technique. In particular, BBE calculates the achievable
rate as a weighted sum of two estimates: the variable sampling
rate B
v
(e.g., ACK rate, as in original Westwood) and a
constant sampling rate B
c
(e.g., 1/RTT as in Westwood+).
The weighting is as follows: B = γ · B
v
+ (1 − γ) · B
c
.
The coefficient γ, similar to the above-mentioned additional
reduction coefficient μ, relies on the queuing delay concept:
γ = 1/e
α×RTT/RTTmax
where α is some large positive
constant. As we can see, the estimate calculation follows a
general observation: the constant rate sampling is more precise
when the network is experiencing congestion (i.e., B
c
has
more weight when RTT is close to RTT
max
), and the variable
rate is more precise when the network is congestion-free (i.e.,
B
v
has more weight when RTT is far from RTT
max
).
The simulation results provided by BBE’s authors have
shown quite a good fairness to the legacy NewReno flows,
along with a good data transfer performance comparable to
the original Westwood algorithm. However, without further
theoretical and practical investigations it cannot be claimed
that BBE provides a universal solution for congestion control
in wired-wireless networks. Moreover, appearance of the high-
speed networks (both wired and wireless) has opened a
number of other problems (see Section VI) which outweigh
the issue of fairness to NewReno, under a wide variety of
network conditions.
VI. HIGH-SPEED/LONG-DELAY NETWORKS
The emergence of high-speed networks uncovered the in-
ability of deployed TCP variants (Reno, NewReno, SACK,
24 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
HS-TCP
STCP
FAST
BIC
H-TCP
Hybla
Africa
Compound
Libra
NewVegas
Illinois
YeAH
CUBIC
ARENO
Fusion
Reactive
(loss-based)
Proactive
(delay-based)
Reactive
(loss-based with
bandwidth estimation)
TCPW-A
LogWestwood+
Fig. 39. Evolutionary graph of TCP variants aimed at improving efficiency
in high-speed or long-delay networks
etc.) to use the resources of these networks effectively. All
of the congestion control algorithms discussed in Sections II
through V improve different aspects of efficiency for data
transfer, without questioning the basic principle on which
it rests, which was defined as far back as 1988 as part of
Tahoe (Section II-A): network resource discovery during the
congestion avoidance phase should be highly conservative. In
TCP implementations, this principle was generally realized
with a congestion window (cwnd) increase by one packet for
each RTT if no errors were detected. This works quite well if
network capacity or round-trip delays are relatively small, but
does not work well otherwise.
To illustrate the problem—sometimes referred to as the
bandwidth-delay product problem (BDP problem)—let us con-
sider a TCP flow trying to discover all the resources of
some network channel. The minimum time required for this,
assuming there are no packet losses, is on the order of the
channel bandwidth delay product (BDP). More precisely, to
get to a theoretical upper bound of a TCP data transfer rate
(D × cwnd/RTT, where D is a maximum data packet),
Reno/NewReno flow needs about D × cwnd RTTs, because
cwnd increases by one every RTT. In a network having
10 GiBit/s capacity, 100 ms round-trip delay, and a maximum
data packet size of 1500 bytes, it would take almost two hours
[72], [73]. Moreover, all the packets must be delivered during
these two hours, and that is equivalent to an unrealistic packet
loss probability.
In the remaining part of this section we discuss various
solutions (Table V) that address several congestion control
problems. Although these solutions rest on different assump-
tions and approaches (see Figure 39), they have the same
objective: to create an ideal algorithm for high-speed (e.g.,
optical) or large delay (e.g., satellite) links. The algorithm
should simultaneously (a) provide for efficient use of network
resources, (b) respond quickly to network changes, and (c) be
fair to other flows present in the network. The latter is
divided generally into the three categories: (1) intra-fairness—
≈83k (10Gbps if RTT=100ms)
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
,

p
k
t
s
10
10
2
10
3
10
4
10
-10
10
-7
10
-3
HS-TCP
Reno
Fig. 40. Objective of HS-TCP
characteristic of resource distribution between flows running
the same congestion control algorithm in the same network
environment; (2) inter-fairness—characteristic of distribution
between flows running different algorithms in the same en-
vironments; and (3) RTT-fairness—characteristic of resource
distribution between flows sharing the same bottleneck link
but having different RTTs.
A. HS TCP
After recognizing TCP’s efficiency problem in high-speed
networks, Floyd [72], [74] proposed the HS-TCP (High-
Speed TCP) algorithm. This is an experimental congestion
control method that has several objectives. Among them
are (a) efficiency in high bandwidth-delay product (BDP)
networks, without relying on unrealistically low loss rates; and
(b) fairness to standard TCP in high loss rate environments.
For this purpose HS-TCP replaces the standard NewReno
increase coefficient α in Congestion Avoidance and decrease
factor β after a minor loss detection (during the Fast Recovery
phase) by functions of the congestion window size (α(w) and
β(w), respectively).
These functions α(w) and β(w) are obtained based on the
above-mentioned objectives defined in terms of the achievable
congestion window size and the required loss rate (bold curve
in Figure 40). That is, on the one hand, HS-TCP should be
able to utilize the 10 Gbps link for the network with a loss rate
not exceeding 10
−7
(NewReno is unable to utilize this link if
the loss rate exceeds 10
−10
). On the other hand, it should
act like a standard NewReno [19] in environments with loss
probability higher than 10
−3
.
The resulting functions α(w) and β(w) vary from 1 and
0.5, respectively, when the congestion window is less than or
equal to 38 packets (i.e., it has same behavior as NewReno
when the congestion window is small) to (and beyond) 70 and
0.1 when the congestion window is more than 84k packets.
Figure 41 shows the schematic comparison between HS-TCP
and NewReno behavior during Congestion Avoidance/Fast
Recovery phases. As we can see, at high congestion window
sizes (or low loss rates) HS-TCP probes the network resources
more aggressively than Reno and, at the same time, reacts
more conservatively to loss detection events. This behavior
considerably increases the efficiency of high-speed/long-delay
networks. However, the tradeoff is an increased level of packet
losses: more packets are lost during congestion events, which
occur more frequently.
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 25
TABLE V
FEATURES OF TCP VARIANTS AIMED AT IMPROVING EFFICIENCY IN HIGH-SPEED OR LONG-DELAY NETWORKS
TCP Variant Section Year Base Added/Changed Modes or Features Mod
1
Status
Implementation
Win
2
Linux Sim
3
HS-TCP
[72], [74]
VI-A 2003 NewReno Additive increase steps and multiplicative
decrease factors as functions of the congestion
window size, Limited Slow-Start
S Experimental >2.6.13 ns2
STCP
[73]
VI-B 2003 NewReno Multiplicative Increase Multiplicative Decrease
congestion avoidance policy
S Experimental >2.6.13
H-TCP
[75], [76]
VI-C 2004 NewReno Congestion window increase steps as a
function of time elapsed since the last packet
loss detection, scaling increase step to a
reference RTT, multiplicative decrease
coefficient adaptation
S Experimental >2.6.13
TCP Hybla
[77]
VI-D 2004 NewReno Scaling the increase steps in Slow-Start and
Congestion Avoidance to the reference RTT,
data packet pacing, initial slow-start threshold
estimation
S Experimental >2.6.13
BIC TCP
[78]
VI-E 2004 HS-TCP Binary congestion window search, Limited
Slow-Start
S Experimental >2.6.12 ns2.6

TCPW-A
[79]
VI-F 2005 Westwood Agile probing, persistent non-congestion
detection
S Experimental ns2

LogWestwood+
[80]
VI-G 2008 Westwood+ Logarithmic congestion window increase S Experimental ns2

TCP Cubic
[81]
VI-H 2008 BIC The congestion window control as a cubic
function of time elapsed since a last congestion
event
S Experimental >2.6.16 ns2.6

FAST TCP
[82]–[84]
VI-I 2003 Vegas Constant-rate congestion window
equation-based update
S Experimental ns2.29

TCP Libra
[85]
VI-J 2005 NewReno Adaptation of the packet pairs to estimate the
bottleneck link capacity, scale the congestion
window increase step by the bottleneck link
capacity and queuing delay
S Experimental ns2
TCP NewVegas
[32]
VI-K 2005 Vegas Rapid window convergence, packet pacing,
packet pairing
S Experimental
TCP AR
[86]
VI-L 2005 Westwood,
Vegas
Congestion window increase steps as a
function of the achievable rate and queuing
delay estimates
S Experimental
TCP Fusion
[87]
VI-M 2007 Westwood,
Vegas
Congestion window increase steps as a
function of the achievable rate and queuing
delay estimates
S Experimental
TCP Africa
[88]
VI-N 2005 HS-TCP,
Vegas
Switching between fast (HS-TCP) and slow
(NewReno) mode depending on the Vegas-type
network state estimation
S Experimental ns2
Compound TCP
[89]
VI-O 2005 HS-TCP,
Vegas
Two components (slow and scalable) in the
congestion window calculation
S Experimental Vista,
S’08,
XP

,
S’03

2.6.14–
2.6.25

TCP Illinois
[90]
VI-P 2006 NewReno,
DUAL
Additive increase steps and multiplicative
decrease factors as functions of the queuing
delay
S Experimental >2.6.22
YeAH TCP
[91]
VI-Q 2007 STCP, Ve-
gas
Switching between fast (STCP) and slow
(NewReno) mode depending on a combined
Vegas-type and DUAL-type estimate,
precautionary decongestion
S Experimental >2.6.22
1
TCP specification modification: S = the sender reactions, R = the receiver reactions, P = the protocol specification
2
Microsoft
TM
operating systems: S for server versions
3
Network simulators

optional or available in patch form
The high-speed/long-delay networks create one more prob-
lem for TCP. During the initial Slow Start phase when an
approximate network limit is still unknown, the unbounded
exponential probing (see Tahoe in Section II-A) can lead to a
loss of extremely large numbers of packets. For example, in
a 10 Gbps link with 100 ms RTT, Slow Start (in the worst
case) can cause a loss of about 83,000 packets, which is
approximately 120 MBytes of wasted network resources. To
resolve this problem, Floyd [92] proposed a complementary
algorithm that bounds the maximum increase step during Slow
Start to 100 packets (Limited Slow Start). It is expected
that this limitation will not have a significant impact on
performance. There are two reasons why this is so. First, Slow
Start operates only during initialization, or re-initialization
after a timeout. In other words, it is insignificant for long-
lived flow performance. Second, it takes about 8 seconds to
fully utilize a 1 Gbps link with 100 ms RTT, which is assumed
to be a reasonable payoff for a significant reduction of induced
packet losses.
Another important question for a new congestion control
algorithm is how flows that utilize it interact with each
other (intra-fairness) and with other flows (inter-fairness),
including standard TCP flows. By definition, during severe
congestion situations (when the loss probability is high) HS-
TCP is equivalent to standard Reno and thus inherits all its
characteristics. In high-speed/long-delay networks, HS-TCP
26 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Network limit
Standard Reno/NewReno
HS-TCP
improvement
Fig. 41. Congestion window dynamics of HS-TCP
explicitly does not consider fairness to standard TCP flows
to be significantly important, because standard flows cannot
effectively utilize the available network resources. However,
intra-fairness property is highly important. Fortunately, it can
be shown that because HS-TCP does not change the core
additive increase multiplicative decrease (AIMD) concept of
NewReno—namely, that during Congestion Avoidance the
window is increased by a constant number of packets each
RTT and decreased during Fast Recovery by a fraction of
itself (Figure 11)—NewReno’s intra-fairness properties are
preserved. However, HS-TCP has substantial problems with
fairness if flows have different RTTs. Although this problem
is inherited from Reno [78], subsequent research discovered
that AIMD coefficient scaling (functions instead of constants)
significantly intensifies this problem. A number of congestion
control algorithms discussed later in this survey (Hybla in
Section VI-D, H-TCP in Section VI-C, FAST in Section VI-I,
CUBIC in Section VI-H) address this problem and, at the same
time, preserve data transfer effectiveness in high-speed/long-
delay networks.
B. STCP
Kelly [73] proposed STCP (Scalable TCP) as an alternative
to HS-TCP (Section VI-A) to solve the data transfer effective-
ness problem in high-speed/long-delay networks. Instead of
complicated AIMD coefficient calculations, STCP rejects the
core AIMD concept and introduces a multiplicative increase
multiplicative decrease idea (MIMD). In other words, during
Congestion Avoidance an STCP flow increases its congestion
window w by a fraction α of the window size with each RTT
(i.e., w = w+α×w, where α = 0.01). During Fast Recovery,
it reduces the congestion window by a different fraction β
upon detecting a loss (i.e., w = w−β×w, where β = 0.125).
A hypothetical congestion window dynamic of STCP looks
similar to that of HS-TCP, but with increased frequency and
sharpness of increase/decrease phases (Figure 42).
Clearly, the proposed modifications resolve the target prob-
lem by making the increase/decrease dynamics follow expo-
nential functions, which scale quite well in many environ-
ments. However, the solution creates a number of critical
problems. First, from Figure 42 we can easily recognize that
even one STCP flow moves the network to a state of nearly
constant congestion. This is generally undesirable for most
networks. Second, inter-fairness characteristics (i.e., fairness
to standard flows) are similar to HS-TCP: in the low-loss
rate zone (<10
−3
), STCP does not even try to be inter-fair,
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Network limit
Standard Reno/NewReno
STCP
improvement
Fig. 42. Congestion window dynamics of STCP
Network Share (STCP)
E
q
u
a
l

(
f
a
i
r
)

s
h
a
r
e
N
e
t
w
o
r
k

S
h
a
r
e

(
S
T
C
P
)
Network Limit
x
0
x
1
x
2
Packet
losses
Fig. 43. Convergence diagram of the two STCP flow competition
x0 − x1, x2 − x1 multiplicative increase (a flow with the larger
congestion window increases more than a flow with the smaller)
x1 − x2 multiplicative decrease (a flow with the larger congestion
window decreases more than a flow with the smaller)
assuming that standard TCP flows cannot effectively utilize
network resources; in the high-loss zone, STCP behaves like
standard TCP. Third, the MIMD approach does not con-
ceptually provide intra-fairness (i.e., fairness between STCP
flows). That is, under the assumption that two STCP flows are
experiencing the same RTTs and are able to detect a packet
loss simultaneously, the flow with the larger initial share will
always have an advantage (Figure 43). This happens because
multiplicative increase and multiplicative decrease policies
essentially preserve a ratio between congestion window sizes
of the flows. Finally, it can be shown that due to MIMD
policies, an STCP flow is extremely unfair, both to STCP and
to standard TCP flows that have higher RTT values [78].
C. H-TCP
Leith and Shorten [75], [76] presented one more alternative
to congestion control for TCP, called H-TCP (Hamilton TCP),
which is intended to have good fairness (inter-, intra-, RTT-)
and effectiveness properties. The key idea of their proposal
is that the congestion window increase step α in Congestion
Avoidance should be a non-decreasing function of the time
elapsed since the last congestion event (Δ). In one sense,
this is similar to HS-TCP (Section VI-A) where the network
resource probing steps (i.e., the congestion window increase
per RTT) grow as the congestion window itself is growing.
However, the functional dependence of the elapsed time Δ
has one significant advantage over the dependence of the
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 27
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time elapsed from the last congestion event
Flow with RTT1
Flow with
RTT
2
=2·RTT
1
- calculation of the target congestion window value
T
1 T
2
Fig. 44. Rationale of H-TCP’s RTT-unfairness
congestion window: namely, that no matter how large the
initial congestion window sizes have been, flows experiencing
the same network conditions will exhibit the same congestion
window increase dynamics. In other words, an H-TCP flow is
fair to other H-TCP flows present in the same network path.
To demonstrate that this holds, one can build a convergence
diagram of the two competing H-TCP flows and observe that it
looks similar to Figure 11, assuming that after a loss detection,
both flows decrease their congestion windows by half.
More specifically, H-TCP defines the increase in the conges-
tion window w as α(Δ) for each RTT (equivalent to increase
as a fraction α(Δ)/w for each reception of non-duplicate
ACK, where w is the current congestion window size. α(Δ)
is the polynomial function over time Δ elapsed since the last
congestion event, as follows:
α(Δ) = 1 + 10(Δ−Δ
low
) + 0.5 · (Δ−Δ
low
)
2
where Δ
low
is a predefined threshold of H-TCP’s compatibil-
ity mode—i.e., whenever Δ < Δ
low
, α(Δ) = 1.
It can be noted that this definition of α(Δ) still leads to
some degree of RTT-unfairness. For example, let us consider
two H-TCP flows competing with each other and having
different RTT values (Figure 44). If we assume that α(Δ)
is calculated once per RTT at time 0, T
1
, and T
2
(note, with
this assumption we do not change the H-TCP principle, but it
allows us to highlight the problem), we see that a flow having
a longer RTT always loses to a flow with a shorter RTT. To
mitigate this effect, H-TCP defines an optional mechanism of
scaling the α(Δ) to a reference RTT (RTT
ref
) which, as an
example, can be 100 ms: α

(Δ) = α(Δ) ×RTT/RTT
ref
.
In addition to these changes in the Congestion Avoidance
phase, the H-TCP proposal includes a small modification of
the congestion window reduction policy in Fast Recovery.
More specifically, upon detecting a packet loss, H-TCP es-
timates the achieved flow’s throughput B(k) and compares it
with the estimate of the preceding loss event B(k −1). If the
absolute value of the relation [B(k) − B(k − 1)]/B(k − 1)
is less than 0.2, the congestion window is reduced by the
ratio RTT
min
/RTT
max
; otherwise, the coefficient 0.5 is
used. However, later in the Internet-Draft proposal, Leith [76]
removed the Fast Recovery modification from H-TCP.
D. TCP Hybla
Caini and Firrincieli [77] emphasized the problem of degra-
dation of TCP throughput with standard congestion control—
NewReno—in long-delay networks. It can be shown that in
0
32
96
160
186
96
48
224
100 0 0 5 0 0 4 0 0 3 0 0 2 0
Time, ms
C
o
n
g
e
s
t
i
o
n

W
i
n
d
o
w
,

K
i
B
y
t
e
s
RTT = 100 ms
RTT = 50 ms
RTT = 25 ms (reference RTT)
Fig. 45. Congestion window evolution in Hybla
NewReno’s Congestion Avoidance, the congestion window
size w is inversely dependent on RTT, and the the TCP
throughput B has an upper bound that is inversely depen-
dant on RTT
2
(the throughput can be approximated by the
expression B

= w/RTT). Clearly, a flow with a shorter
RTT will always have an advantage compared to a flow with
a longer RTT. In heterogeneous networks, especially with
satellite segments, the RTTs may be different by several orders
of magnitude, potentially resulting in catastrophic unfairness
in the network resource distribution.
To resolve this RTT-unfairness problem, a Hybla algo-
rithm has been proposed [77]. This algorithm introduces
modifications to the NewReno’s Slow Start and Congestion
Avoidance phases that make them semi-independent of RTT.
In particular, to obtain the normalized increase steps in both
phases, the scaling factor ρ is calculated according to the
equation ρ = RTT/RTT
ref
, where RTT
ref
is a reference
RTT (e.g., 25 ms). Formally, the increase steps upon receiving
ACK packet are defined as follows:
w = w + 2
ρ
−1, in Slow Start
w = w +ρ
2
/w, in Congestion Avoidance
This definition is illustrated in Figure 45, where three flows
having different RTT values are presented. The higher the
RTT value, the higher the ratio ρ becomes and the congestion
window is increased more rapidly with each ACK packet re-
ception. As a consequence, during the same time period, flows
will result in different congestion window values. However, if
we calculate the upper bound of the TCP throughput (i.e.,
the ratio between the congestion window and the RTT), we
will see that all three flows can transmit data at similar rates
(≈ 2 MByte/s after 500 ms).
In addition, Hybla introduces two more techniques that
complement the congestion control: pacing the transmission of
data packets [93] and estimating the initial slow-start threshold
using the packet pair algorithm [85]. The pacing is essentially
setting up a minimal delay between transmission of any two
consecutive packets. It is meant to smooth the burst-nature
of TCP transmissions. The packet pair algorithm provides an
ability to estimate the network path capacity. Knowledge of
the network capacity may help us improve the convergence
speed and, to some degree, provides a scalability in high-BDP
networks.
28 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Network limit
w
max
w
min
Fig. 46. Binary search for the optimal congestion window in TCP BIC
A number of experimental evaluations [77] have confirmed
remarkable RTT-friendliness of the Hybla algorithm. However,
the cost of this friendliness is an increased aggressiveness of
the flows with larger RTT values. At the same time, these flows
have a slower feedback rate—a packet loss can be detected
no earlier than delivery of a packet can be confirmed (i.e.,
feedback rate is proportional to 1/RTT). Thus, more aggressive
flows can easily congest the network before they detect any
packet loss. To some extent, pacing technique soften, but
cannot eliminate this problem. Additionally, Hybla is designed
to fall back to the standard mode (to Reno-like congestion
control rules) if a flow’s RTT is less than a predefined
reference value. This property limits applicability of Hybla to
satellite-like channels: Hybla, similar to the standard Reno,
is unable to work effectively in high-speed networks with
relatively small delays.
E. BIC TCP
Xu et al. [78] pointed out the RTT unfairness problem
of HS-TCP (Section VI-A) and STCP (Section VI-B). For
example, if we assume that two competing flows can detect
a loss simultaneously (a synchronized loss detection), the
analytical calculations reveal that an HS-TCP flow having
an RTT x times smaller will get a network share which is
x
4.56
times larger. Similar calculations for STCP show that,
in theory, the STCP flow with the smaller RTT will always
get all of the network resources, and a flow with the higher
RTT will get nothing (i.e., absolute unfairness). The problem
in both cases lies in the way these algorithms discover network
resources: a flow with a larger congestion window will try to
increase its share more than a flow with a smaller window.
In an attempt to create a congestion control that can scale
well in any high-BDP (high bandwidth-delay product) network
and yet to remain relatively RTT-fair, Xu et al. [78] proposed
a BIC (Binary Increase Congestion control) algorithm. This
algorithm extends NewReno with an additional operational
phase, Rapid Convergence. This phase rapidly discovers,
in a binary search manner, the optimal congestion window
size (i.e., the value corresponding to the available network
resources) by relying on detection of a packet loss as an
indication of congestion window overshooting. Schematically,
the congestion control concept of BIC as a search problem
is illustrated in Figure 46. While the network successfully
delivers data packets (i.e., the sender receives all ACKs during
the last RTT), the congestion window is updated to the median
of the search range between minimum w
min
and maximum
w
max
congestion window sizes (initially, w
min
is set to one
and w
max
to some arbitrary high value). Besides updating the
Time
Network limit
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Loss detection Binary Search
Limited Slow Start
Fig. 47. Congestion window dynamics in TCP BIC
congestion window, an indication of successful data delivery
raises the lower boundary w
min
to the previous congestion
window size—the value when the network is expected to
be congestion-free. As soon as packet loss is detected (e.g.,
three duplicate ACKs are received), BIC sets the upper search
boundary w
max
to the current congestion window size—
the value when the network is experiencing congestion—
and enters the well-known Fast Recovery phase, similar to
NewReno (see Section II-D). Additionally, to increase the
convergence rate in the low-loss network environments, BIC
reduces the multiplicative decrease coefficient from 0.5 to
0.125 (i.e., w = w −0.125 · w) when the congestion window
size is more than 38. This number is borrowed from HS-
TCP and is aimed at providing compatibility to Reno in
environments with loss rates exceeding 10
−3
.
Though a true binary search algorithm features a very fast
(logarithmic) convergence time, in a high-BDP network it
may create the same problem that was discovered in Slow-
Start: if the congestion window is increased too fast, a large
number of packets can be lost (see Section VI-A). For this
reason, BIC not only adopts HS-TCP’s Limited Slow Start,
but it also limits the increase in Rapid Convergence when the
search range is too wide. In other words, during the RTT,
Rapid Convergence is not allowed to increase the congestion
window by more than some predefined value S
max
. To address
the opposite case when the search range is too narrow—near
the estimated optimum—BIC defines the congestion window
increase by at least some constant S
min
number of packets.
Finally, when the current congestion window value during
Rapid Convergence becomes very close to, or exceeds, the
target congestion window value, BIC enables the Limited Slow
Start phase with an unlimited slow-start threshold value. This
action is to discover a new upper bound and restart the binary
search (Figure 47).
The BIC approach for optimal congestion window discov-
ery has a unique feature for loss-based congestion control
approaches—the congestion window probing steps decrease
as the window approaches a target value. Xu et al. [78]
showed that in a synchronized loss model, the congestion
window ratio of two flows with different RTTs (RTT-fairness)
changes from e
(1/RTT1−1/RTT2)t·ln 2
for small window sizes
to
RTT1
RTT2
for large ones. In other words, in theory BIC is no
less RTT-fair than the standard Reno algorithm. However, a
number of later experimental evaluations [81], [94] showed
that in certain environments BIC may have low RTT-fairness
and inter-fairness values (fairness to other deployed TCP
congestion controls). A revised version of BIC, called CUBIC
[81], is meant to improve these properties and is discussed in
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 29
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Network limit
Westwood estimate
(ERE*RTTmin)
Agile probing
Fig. 48. Congestion window dynamics of TCPW-A
Section VI-H.
F. TCPW-A
Wang et al. [79] proposed TCPW-A, a modification to
TCP Westwood (Section V-A) that improves its properties
in high-speed or long-delay networks. More specifically, they
introduced a concept of agile probing, which includes steps of
calculation of the Westwood-like eligible rate estimate (ERE)
and update to the slow start threshold based on this estimate.
These steps simultaneously provide two benefits: reduced net-
work stressing during the Slow Start phase (dynamic adjusting
of the threshold) and rapid discovery of available network
resources during the Congestion Avoidance phase (temporary
return to the Slow Start when the threshold becomes larger
than the congestion window; see Figure 48).
To prevent unnecessary switches to the Slow Start when
network resources are almost fully consumed, TCPW-A intro-
duces an additional technique called persistent non-congestion
detection (PNCD). The main idea of this technique is based
on the assumption that when the network is not congested, a
Vegas-like rate estimate (RE = cwnd/RTT
min
) increases as
the congestion window increases.
In the TCPW-A proposal authors presented several exper-
imental evaluations. Although their results showed consider-
able improvements compared to standard NewReno, they did
not answer the question of how well TCPW-A behaves in
comparison to other high-speed TCP variants discussed in this
section.
G. LogWestwood+
Kliazovich et al. [80] proposed another high-speed ex-
tension of TCP Westwood (Section V-A). Besides the main
objective of being able to effectively utilize resources of
high-speed or long-delay networks, the proposed algorithm,
TCP LogWestwood+, features behavior similar to BIC (Sec-
tion VI-E): congestion window is increased rapidly when the
current value is small, and gently increased when approaching
an estimated maximum. Similar to BIC, for this “maximum”
LogWestwood+ takes a value of the congestion window ob-
served just before the last detection of a packet loss (i.e.,
just before the last reduction). The key difference between
LogWestwood+ and BIC, other than that the first is based on
Westwood+ and the second is based on NewReno, is that the
standard Congestion Avoidance phase (linear increase) is used
instead of the Slow Start phase after reaching the maximum
(Figure 49).
Analytical modeling and ns-2 simulations of the algorithm
behavior allowed the authors to claim that LogWestwood+ is
Time
Network limit
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Loss detection
Westwood estimate
(ERE*RTTmin)
Linear increase Log increase
Fig. 49. Congestion window dynamics of TCP LogWestwood+
much more efficient and RTT-fair than the standard NewReno
algorithm. However, in some scenarios LogWestwood+ may
lose its scalability characteristics. For example, the initial Slow
Start phase may be prematurely terminated (e.g., due to a
burst of temporal congestion or even some random losses),
and later LogWestwood+ will not be able to rapidly discover
a new maximum in high-speed or long-delay networks.
H. CUBIC TCP
Rhee and Xu [81] noted the highly challenging problem
of creating a simple congestion control algorithm that scales
well in high-BDP networks, and at the same time has good
intra-, inter-, and RTT-fairness properties. Some of the previ-
ous congestion control proposals (e.g., HS-TCP, STCP, BIC)
enforce inter-fairness (i.e., fairness to the standard TCP flows)
by switching to standard congestion window update rules in
high-loss environments. The switching criteria in most cases is
a pre-calculated congestion window size w which corresponds
to a certain loss rate p (w = 1.2/

p). For example, the
threshold in HS-TCP is set to 38 packets, which corresponds to
a loss of one out of every 1000 consecutive packets (p = 10
−3
pkt/s). However, this definition of loss rate is not the ideal
guideline, especially in heterogeneous networks where RTT
can vary significantly. For example, in a network with 10 ms
RTT, the loss rate 10
−3
allows one loss every 380 ms, while
in a network with 100 ms RTT, the same rate allows only one
loss every 3.8 seconds. Thus if two flows competing in the
same bottleneck link have different RTTs, the flow with the
larger RTT is likely to remain in a compatible mode all the
time, while the flow with the smaller RTT quickly switches
to a scalable mode and acquires all available resources. This
observation allowed Rhee and Xu [81] to propose CUBIC
congestion control, which enhances the previously introduced
BIC algorithm (Section VI-E) with RTT-independent conges-
tion window growth functions. To accomplish this, CUBIC
borrows the H-TCP approach (Section VI-C) of defining the
congestion window w as a cubic function of elapsed time Δ
since the last congestion event, as follows:
w = C

Δ−
3

β · w
max
/C

3
+w
max
where C is a predefined constant, β is a coefficient of
multiplicative decrease in Fast Recovery, and w
max
is the
congestion window size just before the last registered loss
detection. This function preserves not only RTT-fairness, since
the window growth does not depend much on RTT, but also
scalability and intra-fairness properties of BIC’s Limited Slow
Start and Rapid Convergence phases. The function has a
30 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
Time
Network limit
w
max
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Loss detection
target window
1
2 2
3
1 – right branch of cubic function
2 – left branch of cubic function
3 – left and right branches of cubic function
2
Fig. 50. Congestion window dynamics in CUBIC
very fast growth when the current window w is far from the
estimated target w
max
, and it is very conservative when w
is close to w
max
. Figure 50 shows the theoretical dynamics
of the growth of the congestion window in CUBIC. In the
initial step labeled 1, the target window w
max
is unknown
and is discovered using the right branch of the cubic function.
This discovery is much more conservative than the exponential
discovery used in conventional Slow Start, but it is still
scalable to high-BDP networks. At later stages, following a
reduction upon detecting a loss, the congestion window gently
approaches the target (phase 2). If a loss is detected before
w
max
is reached, the target is updated. If it was a temporary
congestion event, we will see a congestion window growth
according to both left and right branches of the cubic function
(phase 3).
Additionally, CUBIC provides a mechanism to ensure that
its performance is no worse than the performance of the
standard (Reno) congestion control. This mechanism includes
calculating a supplementary congestion window size w
reno
that approximates the performance of a corresponding stan-
dard Reno flow. Because the congestion window in CUBIC
can be reduced by a fraction β different from 0.5 (i.e.,
generally β
cubic
= β
reno
), the appropriate performance (an
average sending rate) can be achieved only if the supple-
mentary congestion window increase steps are scaled with
s = 3 ×(β −1)/(β +1) [81]. Formally, this can be written as
an increase of the window w
reno
by s every RTT. If CUBIC
detects that the supplementary window w
reno
exceeds the
main window, the latter is reset to be equal to the former.
The good performance and fairness properties of CUBIC
were confirmed by various experimental studies [81], [91] and
by real-world measurements. CUBIC is currently the second
most-used congestion control algorithm for TCP, due to the
fact that it has been the default for the Linux TCP suite
since 2006 (i.e., Linux kernel version 2.6.16). Nevertheless,
CUBIC does not have 100% network resource utilization and
can induce a large number of packet losses in the network (as
long as a loss is the only signaling mechanism).
I. FAST TCP
Jin et al. [82]–[84], inspired by the Vegas idea of congestion
control with the queuing delay as a primary congestion indica-
tor (see Section II-G), introduced a FAST algorithm. In some
sense, FAST may be considered a scalable variant of Vegas
that defines a periodic congestion window update based on the
internal delay-based estimate of the network state. However,
there are two fundamental differences between Vegas and
FAST: FAST defines a periodic fixed-rate congestion window
update (e.g., each 20 ms) and, to calculate the new target
congestion window size, FAST uses a specially designed
equation which incorporates a simple delay-based congestion
estimation feature:
w = w ·
RTT
min
RTT

where w is a current congestion window size, RTT and
RTT
min
are current and minimum RTT, and α is an important
protocol parameter, as described below.
According to this equation, if the network is experiencing
congestion (RTT > RTT
min
), FAST will decrease the con-
gestion window (use of the network resources) proportionally
to the congestion level estimated using RTT measurements
(RTT
min
/RTT); otherwise, the window will be increased
based solely on the predefined parameter α. Selection of α has
conflicting effects on two important protocol parameters: scal-
ability and stability. In other words, if α is too large, the proto-
col will scale easily to any high-BDP (high bandwidth-delay
product) networks, but it will have substantial convergence
problems (the stable state when w = w×RTT
min
/RTT +α
will be barely reachable). In the opposite case, when α is too
small, FAST will easily stabilize but will have scalability prob-
lems (e.g., if α = 1, FAST behavior is practically equivalent
to Vegas). Although the problem of accurate selection of α
is still an open issue, FAST’s authors have concluded that α
should be a constant. Attempts at making α vary depending
on the congestion window size, and RTT measurements are
reported to lead to substantial intra- and inter-unfairness [84].
To make the equation-based algorithm tolerant of short-
term fluctuations in network parameters, FAST uses a well-
known technique of exponential smoothing of the calculated
congestion window value. In addition, FAST limits a potential
increase in the congestion window (when α w) to be no
more than a current value, which is roughly equivalent to the
increase in the standard Slow-Start mode. The only difference
is that FAST increases the congestion window based on the
internal timer expiration (e.g., each 20 ms), but Slow-Start is
clocked by ACK packets reception.
Although simulation-based and real-world experiments
show remarkable intra-fairness, RTT fairness, stability, and
scalability, ongoing research [84], [95] recognizes a number
of serious issues with the design. First, FAST’s characteristics
depend highly on the true minimal RTT value, which is hard
to calculate in some environments (e.g., when routes tend to
be dynamic) without relying on additional messaging from the
network. Second, the RTT is not always a good substitute for
the queuing delay, especially when there is congestion along
the reverse path or when there are route changes. Finally, the
proposed congestion window update rule is not friendly to
standard TCP (Reno, NewReno, or SACK), even in small-
BDP networks.
J. TCP Libra
Marfia et al. [85] proposed Libra as another variant of
congestion control to resolve the scalability issues in stan-
dard TCP, while preserving and improving the RTT-fairness
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 31
properties. Libra’s design is based on NewReno (Section II-D)
and modifies the Congestion Avoidance congestion window
increase steps to follow a specially designed function of both
the RTT and the bottleneck link capacity. The latter value in
Libra is estimated using a well-known packet pair technique
[96]. Formally, in Libra’s Congestion Avoidance, if no loss has
occurred, the congestion window is increased by α packets
every RTT, according to the equation:
α = k
1
· C · P · RTT
2
/(RTT +γ)
where RTT is the current RTT estimate, γ and k
1
are
predefined constants (e.g., 1 and 2, respectively), C is a
value responsible for Libra’s scalability and represents the
capacity of the bottleneck link estimated using the packet pair
technique, and P is a penalizing factor which reduces the
increase step if the network experiences congestion (e.g., the
congestion window size is close to the convergence point).
In particular, penalizing factor P can be represented with the
expression based on queuing delay measurements:
P = e
−k2×Q/Qmax
where k
2
is some constant (e.g., 2), and Q and Q
max
are cur-
rent and maximum queuing delay estimates (see Section II-B).
The rationale of the congestion window increase steps α
is as follows: the first part of the functional dependence
k
1
· C makes the increase steps scalable to the bottleneck
link capacity. The penalizing part P forces Libra to decrease
the network resource probing intensity (the congestion win-
dow increase steps) exponentially, if the estimated level of
buffering in the network (Q/Q
max
) increases. The last part,
RTT
2
/(RTT + γ), is responsible for Libra’s RTT-fairness.
If RTT is significantly less than the constant γ, the increase
steps are scaled to RTT
2
—the essential requirement for RTT-
fariness (see Section VI-D). The constant γ is selected in a
way that considers all links with an RTT close to or more than
γ to have some pathological problems when RTT-fairness is
not an issue.
In addition, Libra also defines a change in the multiplicative
decrease policy of the Fast Recovery phase (w = w − β ×
w): the decrease coefficient β scales with the expression
θ/(RTT + γ), where θ and γ are constants. Although this
scaling factor is derived analytically [85], the recommended
values for these constants (θ and γ are equal to 1 second)
make the scaling factor close to 1 in most cases. Thus,
it is not very significant. Moreover, when the current RTT
value is large (e.g., potential congestion in the network), the
scaling factor will reduce β further. Yet this is the opposite
of what congestion control should do: the decrease should be
maximized in the presence of congestion and minimized when
the network is in a congestion-free state.
A number of experimental evaluations (ns2 simulations)
show that Libra can help improve the high-BDP link utilization
and fairness properties of TCP. However, the same results
show that Libra does not always outperform other congestion
control approaches, including the non-scalable Reno with se-
lective ACKs (Section II-E). Additionally, due to high reliance
on the queuing delay estimation (i.e., RTT
min
, RTT
max
, and
RTT measurement consistency), Libra’s properties, similar to
FAST (Section VI-I) and C-TCP (Section VI-O), will further
worsen because of estimation biases.
K. TCP New Vegas
Sing and Soh [32] recognized the advantages of the delay-
based congestion control approach presented in Vegas (Sec-
tion II-G). However, they also found that it has three seri-
ous problems (two of them have been inherited from Reno,
Section II-C): (a) it cannot effectively utilize high-BDP links,
(b) during the (re-)initialization phases (Slow Start and Fast
Recovery) the Vegas congestion control can generate very
bursty traffic, and (c) Vegas’ estimation of network buffering
can be significantly biased if receivers use the standardized
delayed ACK technique [17], [27].
To reduce the convergence time and improve to some degree
the high-BDP link utilization, the proposed New Vegas algo-
rithm defines a new phase called Rapid Window Convergence.
The key idea of this phase is not to immediately terminate
the Slow Start phase when the estimate of network buffering
exceeds the threshold (Δ > α), but to continue the opportunis-
tic exponential-like resource probing with reduced intensity. In
detail, when New Vegas’s Slow Start detects that the threshold
has been exceeded (early congestion detection), it remembers
the current congestion window value in a special state variable
w
r
and switches to the Rapid Window Convergence. In this
state, for every RTT, the congestion window is allowed to be
increased by x packets:
x = (w
r
)
−2
3+n
where n is a number of times the early congestion indicator
triggers in the Rapid Window Convergence phase. According
to the proposal [32], when n becomes more than 3, Rapid
Window Convergence terminates and normal Vegas-like Con-
gestion Avoidance takes its place. At any point, if a packet loss
is detected, NewVegas reacts exactly the same as the original
Vegas algorithms. In other words, if the loss is detected using
three duplicate ACKs, NewVegas switches to Fast Recovery
followed by Congestion Avoidance; if the loss is detected using
the RTO, NewVegas resets the congestion window size and
moves to Slow Start.
In order to solve the second problem of generation of bursty
traffic during initialization and re-initialization, NewVegas
applies the well-known packet pacing technique; it sets-up a
minimal delay between transmission of any two consecutive
packets [69], [97]. Although it is reported to have a negative
impact on TCP Reno performance [93], NewVegas authors
believe and experimentally confirm that with delay-based
congestion controls, packet pacing has only a positive effect.
The last problem of the estimation bias is solved by re-
quiring the sender to transmit data packets in pairs. In terms
of TCP, this means that if there are data to be sent and the
current value of the congestion window allows sending only
one data packet, NewVegas will hold any transmission until
the window increases by at least one packet. Clearly, this
technique can help overcome the RTT estimation problem,
when the TCP receiver does not immediately respond to
each data packet, but waits for a timeout or another data
packet. However, this is based on the assumption that two data
32 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Network limit
Wprobe
W
base
Loss detection
Standard Reno/NewReno
TCP-AR
improvement
Fig. 51. Congestion window dynamics in TCP-AR
packets will not be separated much during delivery (e.g., due
to congestion) and that the TCP receiver sends ACK packets
for every other data packet, as a minimum. Neither of these
assumptions is entirely true in real networks. Thus, packet
pairing has questionable benefits on the precision of RTT-
based estimations (e.g., queuing delay). Moreover, the RTT
measurement can be improved significantly just by employing
the Timestamp option [31].
Although NewVegas authors have recognized the problem
of high-BDP link utilization, the proposed Rapid Window
Convergence resolves only one part of the problem: this phase
of scalable congestion window increase steps intends only
to improve the early termination of Slow Start. No scalable
features are designed for Congestion Avoidance and Fast
Recovery, which limits the potential applicability of NewVegas
in the future.
L. TCP-AR
Shimonishi and Murase [86] presented TCP-AR (Adaptive
Reno) as another approach to improve TCP performance and
preserve friendliness to standard TCP in high-speed networks.
It extends TCPW-BBE (Section V-E) with a scalable con-
gestion window probing in the Congestion Avoidance phase.
More specifically, the congestion window increase function
is defined to have two components: a slow constant increase
component W
base
(increased by one for every RTT) and a
scalable increase component W
probe
(increased by a function
of the Westwood-like achievable rate estimate and the queuing
delay, see Sections II-B and V-A respectively). The scalable
component is a continuous function that has two important
properties. First, when the network is congestion-free (i.e.,
when queuing delay is close to zero), the function gives a value
close to the Westwood-like achievable rate estimate. Second,
when the network is experiencing congestion (i.e., when
queuing delay is near maximum), the value of the scalable
component W
probe
is zero. Figure 51 shows conceptually the
congestion window dynamics of the TCP-AR algorithm.
Experimental results showed that TCP-AR can improve
the network utilization successfully and at the same time
preserve a good level of intra-fairness. However, relying on
the queuing delay and achievable rate metrics makes this
algorithm vulnerable if RTT measurements should become
noisy. In the worst case, when the queuing delay is wrongly
estimated to be close to the maximum, TCP-AR totally loses
its ability to scale in high-BDP networks.
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Network limit
1
3
2
1 – scalable increase (Q<threshold)
2 – constant congestion window (threshold < Q <3·threshold)
3 – reducing to the lower bound
Standard Reno/
NewReno
Fusion
improvement
Fig. 52. Congestion window dynamics in Fusion
M. TCP Fusion
Kaneko et al. [87] presented a Fusion algorithm which, in a
fashion similar to TCP-AR (Section VI-L), combines the ideas
of Westwood’s achievable rate (Section V-A), DUAL’s queu-
ing delay (Section II-G), and Vegas’ used network buffering
(Section II-G) estimations. Instead of TCP-AR’s congestion
window increase in Congestion Avoidance as a continuous
function over the queuing delay, Fusion defines three separate
linear functions which are switched, depending on an absolute
(i.e., expressed in seconds) queuing delay threshold value. If
the current queuing delay is less than the predefined threshold
(zone 1 in Figure 52), the congestion window is increased at
a fast rate each RTT by a predefined fraction of Westwood’s
achievable rate estimate (scalable increase). If the queuing
delay grows more than three times the threshold (zone 3 in
Figure 52), the congestion window decreases by the number
of packets buffered in the network (i.e., the Vegas estimate).
In the case where the queuing delay lies somewhere in the
range between one and three times the threshold (zone 2 in
Figure 52), the congestion window remains unchanged. To
make Fusion behave at least as well as the standard Reno
congestion control, a conventional Reno-like window w
r
is
maintained along with the Fusion congestion window w
f
. If
w
f
becomes smaller than w
r
, the w
f
is reset equal to w
r
.
In addition, Fusion changes the constant congestion win-
dow reduction ratio β in Fast Recovery to the value β =
max(0.5, RTT
min
/RTT). This ratio is essentially a simpli-
fied form of Westwood’s, where the sampling interval equals
the RTT [87].
Although experimental results of evaluating Fusion [87]
have shown some improvements in terms of utilization and
fairness characteristics in comparison to other scalable algo-
rithms (e.g., C-TCP, HS-TCP, BIC, FAST), Fusion not only
has the same vulnerabilities as TCP-AR, but also introduces
a new, more serious problem. Defining the threshold in abso-
lute terms requires manually adapting Fusion to a particular
environment. This manual configuration is highly undesirable
and usually impossible to perform. Another problem of Fusion
is the way it quickly defaults to standard congestion window
control rules. As one can see in Figure 52, in certain cases
Fusion may stay in the compatible, slow, non-scale mode most
of the time.
N. TCP Africa
King et al. [88] were concerned with the problems of
several previously proposed congestion control algorithms
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 33
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Network limit
Fast mode
(HS-TCP)
Slow mode
(NewReno)
High buffering
zone ( > )
Fig. 53. Congestion window dynamics of TCP Africa
for high-BDP networks, including HS-TCP (Section VI-A)
and STCP (Section VI-B). In response to these concerns
they developed the Africa (Adaptive and Fair Rapid Increase
Congestion Avoidance) algorithm. This algorithm combines
the aggressiveness (scalability) of HS-TCP when the network
is determined to be congestion-free and the conservative char-
acter of standard NewReno (Section II-D) when the network is
experiencing congestion. The congestion/non-congestion crite-
ria was borrowed from the Vegas algorithm (see Section II-G):
the estimate of network buffering Δ is compared to some
predefined constant α. More formally, if Africa sees that
there is little buffering (Δ < α), it moves to fast mode and
directly applies the HS-TCP rules of the Congestion Avoidance
and Fast Recovery phases. These dictate congestion window
increase and decrease steps as functions of the congestion
window itself (see Figure 53). Otherwise, it moves to slow
mode and applies the Reno rules: increase by one, decrease
by half.
In a number of simulations conducted by its authors, Africa
showed good network utilization in high-BDP networks, lower
induced loss rate compared to HS-TCP and STCP, and fairness
properties (intra-, inter-, RTT-) comparable to those exhibited
by NewReno flows. Unfortunately, Africa has not been im-
plemented and evaluated in real networks. However, the idea
of multiple-mode congestion control for high-BDP networks
with delay-based mode-switching has been widely adopted
by several proposals discussed later. For example, the dual-
mode C-TCP algorithm (Section VI-O) is currently the most
deployed TCP congestion control in the world, since it is
embedded into the Microsoft Windows operating system (see
Table V).
O. C-TCP
Tan et al. [89] presented C-TCP (Compound TCP), a
congestion control approach similar in spirit to Africa (Sec-
tion VI-N). It also tries to use a delay-based estimate of
the network state to combine the conventional Reno-type
congestion control (Section II-C) with a congestion control
that is scalable in high-BDP networks. However, instead of
explicitly defining the fast and slow modes, C-TCP defines an
additional scalable component w
fast
to be added to the final
congestion window calculations (w = w
reno
+ w
fast
). This
component is updated according to the slightly modified HS-
TCP rules (Section VI-A) but only when the Vegas estimate
Δ (Section II-G) shows a small level of network buffering
(Δ < α, where α is some small predefined constant). When
the estimate exceeds the threshold α, the scalable component
w
fast
is gently reduced by a value proportional to the estimate
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Network limit
HS-TCP
rules
Transition from
HS-TCP to Reno
rules
High buffering
zone ( > )
Fig. 54. Congestion window dynamics of C-TCP
itself (w
fast
= w
fast
− ζ · Δ, where ζ is a predefined
constant). This reduction can be understood as a smooth
transition between scalable HS-TCP and slow Reno modes,
as opposed to the instant transitions between the fast and
slow modes of Africa. As a result, the theoretical congestion
window dynamics of C-TCP (Figure 54) are very similar to
Africa’s with the exception that after the threshold has been
exceeded (Δ > α), we will see a convex curve (shown in
hatched bar) in the transition from the scalable HS-TCP to
the slow Reno mode.
Both simulation results and real-world performance evalua-
tion show substantial advantages of the C-TCP scheme: a good
utilization of the high-BDP links and good intra-, inter-, and
RTT-fairness properties. As a result, C-TCP has replaced the
conventional congestion control for TCP in Microsoft Win-
dows operating systems and is currently the most deployed
congestion control worldwide. However, on the weaker side,
because C-TCP relies on the Vegas estimate, it has inherited
the Vegas sensitivity to the correctness of RTT measurements.
For example, if flows competing with each other in the same
network observe different minimal RTT values (e.g., one flow
already sending data when a second one appears), the flow
seeing a higher RTT (which is equivalent to having a higher
threshold α value) will be much more aggressive and unfair
to the other flow.
P. TCP Illinois
Liu et al. [90] noted that congestion control algorithms
that interpret a delay as a primary signal for inferring the
network state (e.g., Vegas and Fast) are able to achieve a better
efficiency and do not stress the network excessively, compared
to congestion controls that rely only on packet losses (e.g.,
Reno, HS-TCP, STCP, etc.). However, the performance of
delay-based algorithms may suffer greatly when the delay
(RTT) measurements are very noisy, for example, due to a
high volume of cross traffic, route dynamics, etc. To resolve
this contradiction, the Illinois algorithm has been proposed.
This algorithm, similar to Africa (Section VI-N) and C-TCP
(Section VI-O), is based on NewReno (Section II-D) and is
designed on the one hand to be very aggressive when the
network is determined to be in a congestion-free state and on
the other hand be very gentle when the network is experienc-
ing congestion. However, Illinois has several implementation
differences. It defines both the congestion window w increase
steps α in Congestion Avoidance (i.e., w = w+α every RTT)
and the decrease ratio β in Fast Recovery (i.e., w = w−β · w,
upon detecting loss using three duplicate ACKs) to be special
functions of the queuing delay. The queuing delay calculation
34 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
β
max
(0.5)
β
min
(0.125)
α
max
(10)
α
min
(0.1)
Q
1 Q
max
Q
2
Q
3
Fig. 55. Additive increase α and multiplicative decrease δ coefficients as a
function of queuing delay Q
follows the definition introduced in DUAL (Section II-B).
The increase coefficient α depends inversely on the queuing
delay, while the decrease coefficient is directly proportional
(Figure 55). The minimum and maximum values of α and
β, and the queuing delay thresholds Q
1
, Q
2
, and Q
3
, can
be varied to achieve desired performance characteristics. In
the Linux implementation, Illinois sets the default values of
α
max
= 10, α
min
= 0.3, β
min
= 0.125, b
max
= 0.5,
Q
1
= 0.01 · Q
max
, Q
2
= 0.1 · Q
max
, and Q
3
= 0.8 · Q
max
,
where Q
max
is a maximum queuing delay observed over the
lifetime of the connection.
According to the Illinois specification, the α and β coeffi-
cients are updated once every RTT. However, to mitigate the
effects of queuing delay measurement noise, the α coefficient
is allowed to be set to the maximum, only if during several
consecutive RTTs (e.g., 5) the value of the queuing delay is
less than the first threshold Q
1
. Additionally, Illinois switches
to the compatibility mode (α = 1 and β = 0.5) when
the congestion window size is less than a predefined thresh-
old w
t
(e.g., ten packets). This switch, similar to HS-TCP
(Section VI-A) and STCP (Section VI-B), improves fairness
properties of Illinois to some extent, making it behave like
NewReno during severe congestion events. Figure 56 shows
the key cases of the Illinois theoretical congestion window
dynamics.
The theoretical and experimental evaluation of Illinois
showed that it is able to use the available resources in the
high-BDP networks better than the standard Reno congestion
control. At the same time, it preserves and improves the intra-,
inter-, and RTT-fairness properties. However, although the
queuing delay is a secondary parameter by which to infer the
network state—i.e., it controls only the amount of the conges-
tion window increase and cannot enforce its reduction—the
advantages of Illinois can easily be nullified. It can fall back
to the Reno mode (α = α
min
and β = β
max
) whenever either
the minimum or the maximum RTT values are incorrectly
estimated or the RTT includes large random components
(e.g., processing delay, different propagation delays when path
frequently changes, etc).
Q. YeAH TCP
Baiocchi et al. [91] introduced one more alternative for
congestion control that combines packet loss detection and
measurement of RTT as mechanisms to estimate the net-
work state. Similar to Africa (Section VI-N), the proposed
YeAH (Yet Another High-speed) algorithm defines the slow
NewReno (Section II-C) and the fast STCP (Section VI-B)
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
Time
Loss detection
Network limit
Transition from
α
max
(10) to α
min
(0.1)
Transition from
β
min
(0.125) to
β
max
(0.5)
Buffering zone (Q>0)
Fig. 56. Congestion window dynamics of TCP Illinois
modes in Congestion Avoidance and Fast Recovery explicitly.
For the former, the congestion window increases by at most
one packet every RTT and decreases by half upon detecting a
loss from three duplicate ACKs. In the latter, the congestion
window is updated aggressively—increased by a fraction of
the congestion window itself each RTT and decreased by
another fraction which is much smaller than that in slow mode
(i.e., less than half). To provide a reliable mechanism for
mode switching, YeAH defines simultaneous use of two delay-
based metrics: the Vegas-type estimate of a number of packets
buffered in the network (see Section II-G) and the DUAL-type
network congestion level estimate (see Section II-B). However,
there are two differences in the definition of the latter metric
from what was introduced in DUAL. First, in queuing delay
calculations YeAH uses the minimum of recently measured
RTTs (e.g., during the last RTT) instead of an averaged RTT.
Second, the congestion level is measured, not as a fraction of
the maximum queuing delay (e.g., Q/Q
max
), but as a fraction
of the minimum RTT observed during the connection lifetime.
To summarize, if YeAH estimates a low level of packet buffer-
ing in the network (Δ < α, where α is a predefined threshold)
and the queuing delay estimate shows a low congestion level
(Q/RTT
min
< ϕ, where ϕ is another predefined threshold),
then it behaves exactly as STCP; otherwise, the slow Reno-
like mode is enforced.
In addition to mode switching, YeAH includes two more
mechanisms for improving robustness during congestion
events and enhancing intra-fairness properties. The first mech-
anism, precautionary decongestion, is close in spirit to C-TCP
(Section VI-O), whereby it reduces the congestion window w
by the number of packets Δ estimated to be buffered in the
network if this number exceeds a predefined threshold ε (i.e.,
w = w −ε · Δ). The second mechanism repeats another idea
presented in C-TCP: the congestion window is restricted to
be a value that is greater than if only Reno rules are applied.
For this purpose, YeAH maintains a reference congestion
window size w
reno
that varies according to Reno rules. YeAH
furthermore disables the precautionary decongestion if the
reference window is more than the actual congestion window
size.
Experimental evaluation showed that YeAH maintains high
efficiency in high-BDP networks that maintain a network
buffering at a very low level. Additionally, the results con-
firmed that approaches which combine delay-based and loss-
based metrics (e.g., Africa, C-TCP, YeAH) can improve inter-,
intra-, and RTT-fairness properties substantially compared to
pure loss-based approaches (Reno, HS-TCP, STCP). How-
ever, the performance of YeAH—similar to all delay-based
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 35
approaches—can degrade when RTT measurements have sig-
nificant noise.
VII. OPPORTUNITIES FOR FUTURE RESEARCH
Currently we have as situation where there is no single
congestion control approach for TCP that can universally
be applied to all network environments. One of the primary
causes is a wide variety of network environments and different
(and sometimes opposing) network owners’ views regarding
which parameters should be optimized. A number of the
congestion control algorithms from Section VI (HS-TCP, S-
TCP, Africa, TCP-AR, C-TCP, etc.) address this problem
by incorporating at least two sets of rules to control the
transmission rate of a flow (i.e., conventional Reno-like rules
when the network seems to be congested and scalable rules
otherwise).
Some algorithms switch modes based on a currently
achieved transmission rate (e.g., “reactive” HS-TCP and S-
TCP behave as standard Reno while the congestion window
is less then a predefined threshold). In some network environ-
ments, especially when the probability of loss is high, such
switching rules makes algorithms behave in a non-high-speed,
therefore inefficient, mode. Other algorithms (e.g., “proactive”
Vegas, C-TCP, Illinois, YeAH, etc.) use patterns in delay
measurements for switching purposes. If the delay patterns
change because of non-congestion-related factors, for example
because of a re-routing path, these algorithms may suffer from
efficiency and fairness degradation. This happens because such
algorithms do not have the ability to invalidate various internal
parameters during the transmission.
Moreover, the current version of Linux kernel provides
an API for software developers to choose any one of the
supported algorithms for a particular connection. However,
there are not yet the well-defined and broadly-accepted criteria
to serve as a good baseline for appropriately selecting a con-
gestion control algorithm. Additionally, objective guidelines
to select a proper congestion control for a concrete network
environment are yet to be defined.
Another aspect of congestion control not yet fully in-
vestigated is the problem of short-lived flows (e.g., DNS
requests/responses via TCP). The congestion control tech-
niques developed so far do not really work if the connection
lifetime is only one or two RTTs. The only congestion control
parameter useful during such connections is the initial value of
the congestion window. Clearly this value has a direct impact
on the performance of short-lived flows. However, if the initial
value is large enough and the number of short-lived flows in
the network is substantial, the available capacity can easily
be exceeded. Because all TCP flows behave independently, a
new short-lived flow has no idea of the present network state
and can only exacerbate any congestion present in the system.
Thus, we need a mechanism to make new flows aware of the
current network state (e.g., an ability to estimate the available
network path capacity before the actual data transfer). One
potential direction to solving this problem is to maintain global
estimates of network states. The challenge is distinguishing
the independent network states without knowing the exact
topology of the Internet.
There are also questions about the fundamental assumptions
of TCP congestion control. First of all, it was initially assumed
that each TCP flow should be fair to each other. Often,
Jain’s index (see Section II-A) is used as a fairness measure.
However, Jain’s metric was based on user shares [29], but
everything in TCP is based on individual flows. Essentially,
this allows users to game even the “ideally” fair TCP conges-
tion control system and acquire an advantage in the network
resources distribution. For example, if one user opens only
one TCP connection and another opens five, the network
resources will be distributed as one to five. Thus, “ideally”
fair congestion control under current definitions is not really
“ideal.” To summarize, a fundamental research question is how
to enforce fairness on a user-level basis without sacrificing
throughput of individual flows.
Several problems have arisen as user mobility significantly
increased. More and more users now have multiple physical
access channels to the Internet. However, TCP is fundamen-
tally unable to use them simultaneously, for example to speed-
up data transfer (since a TCP connection is identified by the
tuple {srcIP, srcPort, dstIP, dstPort}). A new generation of
the reliable data transfer protocol, SCTP [98], provides basic
support for multi-homing, but problems of efficient channel
utilization, reliable detection of congestion events in separate
and common network paths, and user fairness are still to be
solved.
Recently, a new congestion control-related problem has
appeared on the Internet. For many years, there has been a
belief that the more data packets routers can buffer, the more
effectively network channels are utilized. The best known
recommendation is to set the buffer size equal to a bandwidth-
delay product (BDP) of the connection served [99]. In practice,
router manufacturers and network administrators often choose
maximum values for the bandwidth and delay (or choose
some large buffer size). As a result, instead of providing
fast feedback to a TCP sender by dropping a number of
packets, routers extensively buffer packets making a TCP
sender unaware of an abnormal situation in the network.
Recently there were several reports of the excessive buffering
syndrome (also known as a congestive queueing or buffer
madness event) on the “End-to-end” mailing list [100], where
round-trip delays grew in excess of 5–10 seconds.
VIII. CONCLUSION
In this work we have presented a survey of various ap-
proaches to TCP congestion control that do not rely on any
explicit signaling from the network. The survey highlighted
the fact that the research focus has changed with the devel-
opment of the Internet, from the basic problem of eliminating
the congestion collapse phenomenon to problems of using
available network resources effectively in different types of
environments (wired, wireless, high-speed, long-delay, etc.).
In the first part of this survey, we classified and discussed
proposals that build a foundation for host-to-host congestion
control principles. The first proposal, Tahoe, introduces the
basic technique of gradually probing network resources and
relying on packet loss to detect that the network limit has
been reached. Unfortunately, although this technique solves
36 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
the congestion problem, it creates a great deal of inefficient
use of the network. As we showed, solutions to the efficiency
problem include algorithms that (1) refine the core congestion
control principle by making more optimistic assumptions
about the network (Reno, NewReno); or (2) refine the TCP
protocol to include extended reporting abilities of the receiver
(SACK, DSACK), which allows the sender to estimate the
network state more precisely (FACK, RR-TCP); or (3) intro-
duce alternative concepts for network state estimation through
delay measurements (DUAL, Vegas, Veno).
The second part of the survey is devoted to a group of
congestion control proposals that are focused on environments
where packets are frequently reordered. These proposals show
that in such environments, efficiency can be improved signifi-
cantly by (1) delaying the control actions (TD-FR), or (2) by
undoing previously applied actions if reordering is detected
(Eifel, DOOR), or (3) by refining the network state estimation
heuristic (PR, RR).
In the third part of our survey, we showed that basic host-
to-host congestion control principles can solve not only the
direct congestion problem but also provide a simple traffic
prioritizing feature. Two algorithms examined (Nice and LP),
applying slightly different techniques to achieve the same
goal, have the same aim: to provide an opportunity to send
non-critical data reliably without interfering with other data
transfers.
In the last two sections of the survey, we showed that
technology advances have introduced new challenges for TCP
congestion control. First, we discussed several solutions (the
Westwood-family algorithms) which apply similar techniques
for estimating the last “good” flow rate and using this rate
as a baseline to distinguish between congestion or random
packet loss. Second, we reviewed a group of solutions with
the most research interest over the recent past. These proposals
aim to solve the problem of poor utilization of high-speed
or long-delay network channels by TCP flows. The first
proposals addressing this problem (HS-TCP, STCP, H-TCP)
introduced simple but highly optimistic (aggressive) policies
to probe networks for the available resources. Unfortunately,
such techniques led to the appearance of a number of other
problems, including the intra-, inter-, and RTT-unfairness.
Later proposals employed more intelligent techniques to
make congestion control aggressive only when the network
is considered congestion-free and conservative during a con-
gestion state. Two proposals, BIC and CUBIC, use packet
loss to establish an approximated network resource limit,
which is used as a secondary criterion to estimate the current
network state. Another group of proposals (FAST, Africa,
TCP-AR, C-TCP, Libra, Illinois, Fusion, YeAH) perform this
by relying on secondary delay-based network state estimation
techniques. Unfortunately, there are disadvantages to both of
these approaches, and there is no current consensus in the
research community regarding which approach is superior. Not
surprisingly, they co-exist in the current Internet: C-TCP is
deployed in the Windows-world, and the Linux-world uses
CUBIC.
IX. ACKNOWLEDGMENT
The authors are very much obliged to Erik Kline and Janice
Wheeler for the valuable input on the survey organization and
sleepless nights spent in reading and correcting errors in this
text.
APPENDIX
See Figure 57 for an evolutionary graph of variants of TCP
congestion control.
REFERENCES
[1] J. Postel, “RFC793—transmission control protocol,” RFC, 1981.
[2] C. Lochert, B. Scheuermann, and M. Mauve, “A survey on congestion
control for mobile ad hoc networks,” Wireless Communications and
Mobile Computing, vol. 7, no. 5, p. 655, 2007.
[3] J. Postel, “RFC791—Internet Protocol,” RFC, 1981.
[4] A. Al Hanbali, E. Altman, and P. Nain, “A survey of TCP over ad hoc
networks,” IEEE Commun. Surveys Tutorials, vol. 7, no. 3, pp. 22–36,
3rd quarter 2005.
[5] J. Widmer, R. Denda, and M. Mauve, “A survey on TCP-friendly
congestion control,” IEEE Network, vol. 15, no. 3, pp. 28–37, May/June
2001.
[6] H. Balakrishnan, V. N. Padmanabhan, S. Seshan, and R. H. Katz,
“A comparison of mechanisms for improving TCP performance over
wireless links,” IEEE/ACM Trans. Netw., vol. 5, no. 6, pp. 756–769,
December 1997.
[7] K.-C. Leung, V. Li, and D. Yang, “An overview of packet reordering
in transmission control protocol (TCP): problems, solutions, and chal-
lenges,” IEEE Trans. Parallel Distrib. Syst., vol. 18, no. 4, pp. 522–535,
April 2007.
[8] S. Low, F. Paganini, and J. Doyle, “Internet congestion control,” IEEE
Control Syst. Mag., vol. 22, no. 1, pp. 28–43, February 2002.
[9] G. Hasegawa and M. Murata, “Survey on fairness issues in TCP
congestion control mechanisms,” IEICE Trans. Commun. (Special Issue
on New Developments on QoS Technologies for Information Networks),
vol. E84-B, no. 6, pp. 1461–1472, June 2001.
[10] M. Gerla and L. Kleinrock, “Flow control: a comparative survey,” IEEE
Trans. Commun., vol. 28, no. 4, pp. 553–574, April 1980.
[11] J. Nagle, “RFC896—Congestion control in IP/TCP internetworks,”
RFC, 1984.
[12] C. A. Kent and J. C. Mogul, “Fragmentation considered harmful,” in
Proceedings of the ACM workshop on Frontiers in computer commu-
nications technology (SIGCOMM), Stowe, Vermont, August 1987, pp.
390–401.
[13] S. Floyd and K. Fall, “Promoting the use of end-to-end congestion
control in the Internet,” IEEE/ACM Trans. Netw., vol. 7, no. 4, pp.
458–472, August 1999.
[14] V. Jacobson, “Congestion avoidance and control,” ACM SIGCOMM,
pp. 314–329, 1988.
[15] Z. Wang and J. Crowcroft, “Eliminating periodic packet losses in 4.3–
Tahoe BSD TCP congestion control,” ACM Computer Communication
Review, vol. 22, no. 2, pp. 9–16, 1992.
[16] V. Jacobson, “Modified TCP congestion avoidance algorithm,” email
to the end2end list, April 1990.
[17] M. Allman, V. Paxson, and W. Stevens, “RFC2581—TCP congestion
control,” RFC, 1999.
[18] S. Floyd and T. Henderson, “RFC2582—the NewReno modification to
TCP’s fast recovery algorithm,” RFC, 1999.
[19] S. Floyd, T. Henderson, and A. Gurtov, “RFC3782—the NewReno
modification to TCP’s fast recovery algorithm,” RFC, 2004.
[20] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanov, “RFC2018—TCP
selective acknowledgment options,” RFC, 1996.
[21] M. Mathis and J. Mahdavi, “Forward acknowledgement: refining
TCP congestion control,” in Proc. conference on applications, tech-
nologies, architectures, and protocols for computer communications
(SIGCOMM), New York, NY, USA, 1996, pp. 281–291.
[22] L. Brakmo and L. Peterson, “TCP Vegas: end to end congestion
avoidance on a global Internet,” IEEE J. Sel. Areas Commun., vol. 13,
no. 8, pp. 1465–1480, October 1995.
[23] G. Hasegawa, K. Kurata, and M. Murata, “Analysis and improvement
of fairness between TCP Reno and Vegas for deployment of TCP Vegas
to the Internet,” in Proc. IEEE ICNP, 2000, pp. 177–186.
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 37
RFC 793
Tahoe
Reno
DUAL
Vegas FACK NewReno
Vegas+
Veno
Vegas A
Reactive
(loss-based)
TD-FR
RR
Eifel
DOOR
PR
LP
Nice
Westwood
CRB ABSE
Westwood+
BR
BBE
Reactive
(loss-based with
bandwidth estimation)
HS-TCP
STCP
FAST
BIC
H-TCP
Hybla
Africa
Compound
Libra
NewVegas
Illinois
YeAH
CUBIC
ARENO
Fusion
Proactive
(delay-based)
Congestion
collapse
Reordering
Low-priority
Wireless
High-speed
TCPW-A
LogWestwood+
Fig. 57. Evolutionary graph of variants of TCP congestion control.
[24] C. P. Fu and S. C. Liew, “TCP Veno: TCP enhancement for trans-
mission over wireless access networks,” IEEE J. Sel. Areas Commun.,
vol. 21, no. 2, February 2003.
[25] K. Srijith, L. Jacob, and A. Ananda, “TCP Vegas-A: Improving the
performance of TCP Vegas,” Computer Communications, vol. 28, no. 4,
pp. 429–440, 2005.
[26] P. Karn and C. Partridge, “Improving round-trip time estimates in
reliable transport protocols,” in Proc. SIGCOMM, 1987.
[27] R. Braden, “RFC1122—Requirements for Internet Hosts - Communi-
cation Layers,” RFC, 1989.
[28] W. Stevens, “RFC2001—TCP Slow Start, Congestion Avoidance, Fast
Retransmit,” RFC, 1997.
[29] D.-M. Chiu and R. Jain, “Analysis of the increase and decrease
algorithms for congestion avoidance in computer networks,” Computer
Networs and ISDN Systems, vol. 17, no. 1, pp. 1–14, 1989.
[30] S. Floyd, “Revisions to RFC 2001,” Presentation to the TCPIMPL
Working Group, August 1998. [Online]. Available: ftp://ftp.ee.lbl.gov/
talks/sf-tcpimpl-aug98.pdf
[31] V. Jacobson, R. Braden, and D. Borman, “RFC1323—TCP Extensions
for High Performance,” RFC, 1992.
[32] J. Sing and B. Soh, “TCP New Vegas: Improving the Performance of
TCP Vegas Over High Latency Links,” in Proc. 4th IEEE International
Symposium on Network Computing and Applications (IEEE NCA05),
2005, pp. 73–80.
[33] V. Paxson, “End-to-end Internet packet dynamics,” SIGCOMM Com-
puter Communication Review, vol. 27, no. 4, pp. 139–152, 1997.
[34] M. Przybylski, B. Belter, and A. Binczewski, “Shall we worry about
packet reordering,” Computational Methods in Science and Technology,
vol. 11, no. 2, pp. 141–146, 2005.
[35] K. Nichols, S. Blake, F. Baker, and D. Black, “RFC2574—definition
of the differentiated services field (DS field) in the IPv4 and IPv6
headers,” RFC, 1998.
[36] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss,
“RFC2575—an architecture for differentiated services,” RFC, 1998.
[37] T. Bu and D. Towsley, “Fixed point approximations for TCP behavior
in an AQM network,” in Proc. SIGMETRICS, New York, NY, USA,
2001, pp. 216–225.
[38] C. Hollot, V. Misra, D. Towsley, and W.-B. Gong, “On designing
improved controllers for AQM routers supporting TCP flows,” in Proc.
IEEE INFOCOM, vol. 3, 2001, pp. 1726–1734.
[39] J. Bennett, C. Partridge, and N. Shectman, “Packet reordering is not
pathological network behavior,” IEEE/ACM Trans. Netw., vol. 7, no. 6,
pp. 789–798, December 1999.
[40] C. M. Arthur, A. Lehane, and D. Harle, “Keeping order: Determining
the effect of TCP packet reordering,” in Proc. Third International
Conference on Networking and Services (ICNS), June 2007.
[41] J. Arkko, B. Briscoe, L. Eggert, A. Feldmann, and M. Handley,
“Dagstuhl perspectives workshop on end-to-end protocols for the future
38 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION
internet,” SIGCOMM Computer Communication Review, vol. 39, no. 2,
pp. 42–47, 2009.
[42] R. Ludwig and R. H. Katz, “The Eifel algorithm: making TCP robust
against spurious retransmissions,” SIGCOMM Computer Communica-
tion Review, vol. 30, no. 1, pp. 30–36, 2000.
[43] R. Ludwig and A. Gurtov, “RFC4015—the Eifel response algorithm
for TCP,” RFC, 2005.
[44] F. Wang and Y. Zhang, “Improving TCP performance over mobile ad-
hoc networks with out-of-order detection and response,” in Proceedings
of the 3rd ACM international symposium on mobile ad hoc networking
& computing, New York, NY, 2002, pp. 217–225.
[45] S. Bohacek, J. Hespanha, J. Lee, C. Lim, and K. Obraczka, “TCP-
PR: TCP for Persistent Packet Reordering,” in Proc. International
Conference on Distributed Computing Systems, vol. 23, 2003, pp. 222–
233.
[46] S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky, “RFC2883—An
Extension to the Selective Acknowledgement (SACK),” RFC, 2000.
[47] M. Zhang, B. Karp, S. Floyd, and L. Peterson, “RR-TCP: a reordering-
robust TCP with DSACK,” International Computer Science Institute,
Tech. Rep. TR-02-006, July 2002.
[48] ——, “RR-TCP: a reordering-robust TCP with DSACK,” in Proc. 11th
IEEE International Conference on Network Protocols (ICNP), 2003,
pp. 95–106.
[49] A. Kuzmanovic and E. Knightly, “TCP-LP: low-priority service via
end-point congestion control,” IEEE/ACM Trans. Netw., vol. 14, no. 4,
pp. 739–752, 2006.
[50] D. Clark and W. Fang, “Explicit allocation of best-effort packet delivery
service,” IEEE/ACM Trans. Netw., vol. 6, no. 4, pp. 362–373, 1998.
[51] X. Xiao and L. Ni, “Internet QoS: A big picture,” IEEE Network,
vol. 13, no. 2, pp. 8–18, 1999.
[52] B. Davie, “Deployment experience with differentiated services,” in
Proceedings of the ACM SIGCOMM workshop on Revisiting IP QoS:
What have we learned, why do we care?, New York, NY, 2003, pp.
131–136.
[53] A. Venkataramani, R. Kokku, and M. Dahlin, “TCP Nice: A Mecha-
nism for Background Transfers,” Operating Systems Review, vol. 36,
pp. 329–344, 2002.
[54] A. Kuzmanovic and E. W. Knightly, “TCP-LP: a distributed algorith
for low priority data transfer,” in Proc. IEEE INFOCOM, April 2003.
[55] J.-H. Choi and C. Yoo, “One-way delay estimation and its application,”
Computer Communications, vol. 28, no. 7, pp. 819–828, 2005.
[56] A. Kuzmanovic, E. Knightly, and R. Les Cottrell, “HSTCP-LP: A
protocol for low-priority bulk data transfer in high-speed high-RTT
networks.”
[57] K. Ramakrishnan, S. Floyd, and D. Black, “RFC3168—the addition of
explicit congestion notification (ECN),” RFC, 2001.
[58] M. Gast and M. Loukides, 802.11 wireless networks: the definitive
guide. O’Reilly & Associates, Inc. Sebastopol, CA, USA, 2002,
chapter 2.
[59] H. Balakrishnan, S. Seshan, and R. Katz, “Improving reliable transport
and handoff performance in cellular wireless networks,” Wireless
Networks, vol. 1, no. 4, pp. 469–481, 1995.
[60] A. Bakre and B. Badrinath, “I-TCP: Indirect TCP for Mobile Hosts,”
Department of Computer Science, Rutgers University, Tech. Rep. DCS-
TR-314, 1994.
[61] K. Brown and S. Singh, “M-TCP: TCP for mobile cellular networks,”
SIGCOMM Computer Communication Review, vol. 27, no. 5, pp. 19–
43, 1997.
[62] S. Mascolo, C. Casetti, M. Gerla, M. Y. Sanadidi, and R. Wang, “TCP
Westwood: Bandwidth estimation for enhanced transport over wireless
links,” in Proc. ACM MOBICOM, 2001, pp. 287–297.
[63] M. Gerla, M. Y. Sanadidi, and C. E., “Method and apparatus for TCP
with faster recovery,” U.S. Patent 7 299 280, November 20, 2007.
[64] L. A. Grieco and S. Mascolo, “Performance evaluation and comparison
of Westwood+, New Reno and Vegas TCP congestion control,” ACM
Computer Communication Review, vol. 342, April 2004.
[65] R. Wang, M. Valla, M. Sanadidi, B. Ng, and M. Gerla, “Effi-
ciency/friendliness tradeoffs in TCP Westwood,” Proc. Seventh Inter-
national Symposium on Computers and Communications, pp. 304–311,
2002.
[66] R. Wang, M. Valla, M. Sanadidi, and M. Gerla, “Adaptive bandwidth
share estimation in TCP Westwood,” in Proc. IEEE GLOBECOM,
vol. 3, November 2002, pp. 2604–2608.
[67] G. Yang, R. Wang, M. Sanadidi, and M. Gerla, “TCPW with bulk
repeat in next generation wireless networks,” IEEE International
Conference on Communications 2003, vol. 1, pp. 674–678, May 2003.
[68] H. Shimonishi, M. Sanadidi, and M. Gerla, “Improving efficiency-
friendliness tradeoffs of TCP in wired-wireless combined networks,”
in Proc. IEEE ICC, vol. 5, May 2005, pp. 3548–3552.
[69] L. Zhang, S. Shenker, and D. Clark, “Observations on the dynamics of
a congestion control algorithm: The effects of two-way traffic,” ACM
SIGCOMM Computer Communication Review, vol. 21, no. 4, pp. 133–
147, 1991.
[70] N. Samaraweera, “Non-congestion packet loss detection for TCP error
recovery using wireless links,” IEEE Proc. Communications, vol. 146,
no. 4, pp. 222–230, August 1999.
[71] S. Cen, P. Cosman, and G. Voelker, “End-to-end differentiation of
congestion and wireless losses,” IEEE/ACM Trans. Netw., vol. 11,
no. 5, pp. 703–717, 2003.
[72] S. Floyd, “RFC3649—HighSpeed TCP for large congestion windows,”
RFC, 2003.
[73] T. Kelly, “Scalable TCP: improving performance in highspeed wide
area networks,” Computer Communications Review, vol. 32, no. 2,
April 2003.
[74] S. Floyd, HighSpeed TCP and Quick-Start for Fast Long-Distance
HighSpeed TCP and Quick-Start for fast longdistance networks
(slides), TSVWG, IETF, March 2003.
[75] D. Leith and R. Shorten, “H-TCP: TCP for high-speed and long-
distance networks,” in Proceedings of PFLDnet, 2004.
[76] D. Leith, “H-TCP: TCP congestion control for high bandwidth-delay
product paths,” IETF Internet Draft, http://tools.ietf.org/html/draft-
leith-tcp-htcp-06, 2008.
[77] C. Caini and R. Firrincieli, “TCP Hybla: a TCP enhancement for
heterogeneous networks,” International J. Satellite Communications
and Networking, vol. 22, pp. 547–566, 2004.
[78] L. Xu, K. Harfoush, and I. Rhee, “Binary increase congestion control
for fast, long distance networks,” in Proc. IEEE INFOCOM, vol. 4,
March 2004, pp. 2514–2524.
[79] R. Wang, K. Yamada, M. Sanadidi, and M. Gerla, “TCP with sender-
side intelligence to handle dynamic, large, leaky pipes,” IEEE J. Sel.
Areas Commun., vol. 23, no. 2, pp. 235–248, February 2005.
[80] D. Kliazovich, F. Granelli, and D. Miorandi, “Logarithmic window
increase for TCP Westwood+ for improvement in high speed, long
distance networks,” Computer Networks, vol. 52, no. 12, pp. 2395–
2410, August 2008.
[81] I. Rhee and L. Xu, “CUBIC: a new TCP-friendly high-speed TCP
variant,” SIGOPS Operating Systems Review, vol. 42, no. 5, pp. 64–
74, July 2008.
[82] C. Jin, D. Wei, S. Low, G. Buhrmaster, J. Bunn, D. Choe, R. Cottrel,
J. Doyle, W. Feng, O. Martin, H. Newman, F. Paganini, S. Ravot, and
S. Singh, “FAST TCP: from theory to experiments,” December 2003.
[83] C. Jin, D. Wei, S. Low, J. Bunn, H. Choe, J. Doyle, H. Newman,
S. Ravot, S. Singh, F. Paganini et al., “FAST TCP: From Theory to
Experiments,” IEEE Network, vol. 19, no. 1, pp. 4–11, 2005.
[84] D. X. Wei, C. Jin, S. H. Low, and S. Hegde, “FAST TCP: motiva-
tion, architecture, algorithms, performance,” IEEE/ACM Trans. Netw.,
vol. 14, no. 6, pp. 1246–1259, 2006.
[85] G. Marfia, C. Palazzi, G. Pau, M. Gerla, M. Sanadidi, and M. Roccetti,
“TCP Libra: Exploring RTT-Fairness for TCP,” UCLA Computer
Science Department, Tech. Rep. UCLA-CSD TR-050037, 2005.
[86] H. Shimonishi and T. Murase, “Improving efficiency-friendliness trade-
offs of TCP congestion control algorithm,” in Proc. IEEE GLOBE-
COM, 2005.
[87] K. Kaneko, T. Fujikawa, Z. Su, and J. Katto, “TCP-Fusion: a hybrid
congestion control algorithm for high-speed networks,” in Proc. PFLD-
net, ISI, Marina Del Rey (Los Angeles), California, February 2007.
[88] R. King, R. Baraniuk, and R. Riedi, “TCP-Africa: an adaptive and
fair rapid increase rule for scalable TCP,” in Proc. IEEE INFOCOM,
vol. 3, March 2005, pp. 1838–1848.
[89] K. Tan, J. Song, Q. Zhang, and M. Sridharan, “A compound TCP
approach for high-speed and long distance networks,” July 2005.
[90] S. Liu, T. Basar, and R. Srikant, “TCP-Illinois: A loss and delay-based
congestion control algorithm for high-speed networks,” in Proc. First
International Conference on Performance Evaluation Methodologies
and Tools (VALUETOOLS), 2006.
[91] A. Baiocchi, A. P. Castellani, and F. Vacirca, “YeAH-TCP: yet an-
other highspeed TCP,” in Proc. PFLDnet, ISI, Marina Del Rey (Los
Angeles), California, February 2007.
[92] S. Floyd, “RFC3742—Limited slow-start for TCP with large conges-
tion windows,” RFC, 2004.
[93] A. Aggarwal, S. Savage, and T. Anderson, “Understanding the perfor-
mance of TCP pacing,” in Proc. IEEE INFOCOM, vol. 3, March 2000,
pp. 1157–1165.
AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 39
[94] S. Ha, Y. Kim, L. Le, I. Rhee, and L. Xu, “A step toward realistic
performance evaluation of high-speed TCP variants,” in Fourth Interna-
tional Workshop on Protocols for Fast Long-Distance Networks, Nara,
Japan, March 2006.
[95] S. Belhaj, “VFAST TCP: an improvement of FAST TCP,” in Proc.
Tenth International Conference on Computer Modeling and Simulation,
2008.
[96] R. Kapoor, L.-J. Chen, L. Lao, M. Gerla, and M. Y. Sanadidi,
“CapProbe: A simple and accurate capacity estimation technique,” in
Proceedings of SIGCOMM, Portland, Oregon, USA, August/September
2004.
[97] D. Wei, P. Cao, and S. Low, “TCP Pacing Revisited,” in Proceedings
of IEEE INFOCOM, 2006.
[98] A. Caro Jr, J. Iyengar, P. Amer, S. Ladha, and K. Shah, “SCTP:
a proposed standard for robust internet data transport,” Computer,
vol. 36, no. 11, pp. 56–63, 2003.
[99] A. Dhamdhere and C. Dovrolis, “Open issues in router buffer sizing,”
ACM SIGCOMM Computer Communication Review, vol. 36, no. 1, pp.
87–92, 2006.
[100] “End-to-end mailing list,” http://www.postel.org/e2e.htm.
Alexander Afanasyev received his B.Tech. and
M.Tech. degrees in Computer Science from Bauman
Moscow State Technical University, Moscow, Russia
in 2005 and 2007, respectively. In 2006 he received
the medal for the best student scientific project in
Russian universities.
He is currently working towards his Ph.D. degree
in computer science at the University of California,
Los Angeles in the Laboratory for Advanced System
Research. His research interests include network
systems, network security, mobile systems, multi-
media systems, and peer-to-peer environments.
Neil Tilley received his bachelor’s degree from the
University of California, Davis. His current research
interests include parallel and networked systems. He
has been pursuing a Ph.D. in Computer Science at
the University of California, Los Angels since 2009.
Peter Reiher received his B.S. in Electrical Engi-
neering and Computer Science from the University
of Notre Dame in 1979. He received his M.S. and
Ph.D. in Computer Science from UCLA in 1984
and 1987, respectively. He has done research in the
fields of distributed operating systems, network and
distributed systems security, file systems, ubiquitous
computing, mobile computing, and optimistic paral-
lel discrete event simulation. Dr. Reiher is an Ad-
junct Professor in the Computer Science Department
at UCLA.
Leonard Kleinrock received his B.E.E. degree from
City College of New York (CCNY) in 1957 and
received his Ph.D. from Massachusetts Institute of
Technology in 1963. He is a Distinguished Professor
of Computer Science at UCLA and served as chair-
man of the department from 1991 to 1995. He re-
ceived honorary doctorates from CCNY (1997), the
University of Massachusetts, Amherst (2000), the
University of Bologna (2005), Politecnico di Torino
(2005), and the University of Judaism (2007). He
has published more than 250 papers and authored six
books on a wide array of subjects including queuing theory, packet switching
networks, packet radio networks, local area networks, broadband networks,
gigabit networks, nomadic computing, peer-to-peer networks and intelligent
agents. He is a member of the American Academy of Arts and Sciences, the
National Academy of Engineering, an IEEE Fellow, an ACM Fellow, and a
founding member of the Computer Science and Telecommunications Board of
the National Research Council. Among his many honors, he is the recipient of
the CCNY Townsend Harris Medal, the CCNY Electrical Engineering Award,
the Marconi Award, the L.M. Ericsson Prize, the NAE Charles Stark Draper
Prize, the Okawa Prize, the Communications and Computer Prize, NEC C&C,
the IEEE Internet Millennium Award, the UCLA Outstanding Teacher Award,
the Lanchester Prize, the ACM SIGCOMM Award, the Sigma Xi Monie Ferst
Award, the INFORMS Presidents Award, and the IEEE Harry Goode Award.
He was listed by the Los Angeles Times in 1999 as among the “50 People
Who Most Influenced Business This Century.” He was also listed as among the
33 most influential living Americans in the December 2006 Atlantic Monthly.
Kleinrock’s work was further recognized when he received the 2007 National
Medal of Science, the highest honor for achievement in science bestowed by
the President of the United States. This Medal was awarded “for fundamental
contributions to the mathematical theory of modern data networks, for the
functional specification of packet switching which is the foundation of the
Internet Technology, for mentoring generations of students and for leading
the commercialization of technologies that have transformed the world.”

2

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, ACCEPTED FOR PUBLICATION

Sender’s output buffer

Receiver’s input buffer

8 free spaces (rwnd) 5 new data packets (wnd) rwnd=3 wnd=3

rwnd=1 Receiver processed packet and slot in the input buffer become available

Fig. 2. Receiver’s window concept: receiver reports a size of the available input buffer (receiver’s window, rwnd) and sender sends a portion (window, wnd) of data packets that does not exceed rwnd

also be maximized. (It can be shown that the maximum throughput of a TCP flow depends directly on the sliding window size and inversely on the round-trip time of the network path.) On the other hand, if the sliding window is too large, there is a high probability of packet loss because the network and the receiver have resource limitations. Thus, minimization of packet losses requires minimizing the sliding window. Therefore, the problem is finding an optimal value for the sliding window (which is usually referred to as the congestion window) that provides good throughput, yet does not overwhelm the network and the receiver. Additionally, TCP should be able to recover from packet losses in a timely fashion. This means that the shorter the interval between packet transmission and loss detection, the faster TCP can recover. However, this interval cannot be too short, or otherwise the sender may detect a loss prematurely and retransmit the corresponding packet unnecessarily. This overreaction simply wastes network resources and may induce high congestion in the network. In other words, when and how a sender detects packet losses is another hard problem for TCP. The initial TCP specification [1] is designed to guard only against overflowing the input buffers at the receiver end. The incorporated mechanism is based on the receiver’s window concept, which is essentially a way for the receiver to share the information about the available input buffer with the sender. Figure 2 illustrates this concept in schematic fashion. When establishing a connection, the receiver informs the sender about the available buffer size for incoming packets (in the example shown, the receiver’s window reported initially is 8). The sender transmits a portion (window) of prepared data packets. This portion must not exceed the receiver’s window and may be smaller if the sender is not willing (or ready) to send a larger portion. In the case where the receiver is unable to process data as fast as the sender generates it, the receiver reports decreasing values of the window (3 and 1 in the example). This induces the sender to shrink the sliding window. As a result, the whole transmission will eventually synchronize with the receiver’s processing rate. Unfortunately, protocol standards that remain unaware of the network resources have created various unexpected effects on the Internet, including the appearance of congestion

collapse (see Section II). The problem of congestion control, meaning intelligent (i.e., network resource-aware) and yet effective use of resources available in packet-switched networks, is not a trivial problem, but the efficient solution to it is highly desirable. As a result, congestion control is one of the extensively studied areas in the Internet research conducted over the last 20 years, and a number of proposals aimed at improving various aspects of the congestion-responsive data flows is very large. Several groups of these proposals have been studied by Hanbali et al. [4] (congestion control in ad hoc networks), Lochert et al. [2] (congestion control for mobile ad hoc networks), Widmer et al. [5] (congestion control for non-TCP protocols), Balakrishnan et al. [6] (congestion control for wireless networks), Leung at al. [7] (congestion control for networks with high levels of packet reordering), Low et al. [8] (current up to 2002 TCP variants and their analytical models), Hasegawa and Murata [9] (fairness issues in congestion control), and others researchers. Unlike previous studies, in this survey we tried to collect, classify, and analyze major congestion control algorithms that optimize various parameters of TCP data transfer without relying on any explicit notifications from the network. In other words, they preserve the host-to-host principle of TCP, whereby the network is seen as a black box. Section II is devoted to congestion control proposals that build a foundation for all currently known host-to-host algorithms. This foundation includes 1) the basic principle of probing the available network resources, 2) loss-based and delay-based techniques to estimate the congestion state in the network, and 3) techniques to detect packet losses quickly. However, the techniques that are developed are not universal. For example, Tahoe’s initial assumption that packets are not generally reordered during transmission may be wrong in some environments. As a result, the performance of Tahoe flows in these environments will prove inadequate (Section II-A). In Section III we discuss congestion control proposals that modify previously developed algorithms to tolerate various levels of packet reordering. As data transfer technologies and the Internet itself have evolved, the research focus for congestion control algorithms has been changing from basic congestion to more sophisticated problems. In Section IV we review the network resource optimization problem. In particular, we discuss two algorithms which discover the ability of a TCP congestion control to provide traffic prioritization in a pure host-to-host fashion. In Section V we discuss congestion control algorithm proposals which try to improve the performance of TCP flows running in wireless networks, where it is common to have high packet losses (e.g., random losses due to wireless interference). In Section VI we review several proposed solutions that have attracted the most research interest over the recent past. These proposals aim to solve the problem of poor utilization of high-speed and long-delay network channels by standard TCP flows. They introduce several direct and indirect approaches to more aggressive network probing. The indirect approaches combine various loss-based and delay-based ideas to create congestion control approaches that try to be aggressive enough when there are enough network resources, yet remain gentle

AFANASYEV et al.: HOST-TO-HOST CONGESTION CONTROL FOR TCP

3

Effective load

RFC 793

Capacity

Tahoe DUAL Reno FACK NewReno Vegas+ Veno Vegas A Vegas

Offered load
Fig. 3. Effective TCP load versus offered load from TCP senders
400%

A

Router

400%

B

Reactive (loss-based)

Proactive (delay-based)

Fig. 5. Evolutionary graph of TCP variants that solve the congestion collapse problem
75% of packets rejected on both input and output paths

Fig. 4. Congestion collapse rationale. 75% of data packets dropped on forward path and 75% of ACKs dropped on reverse: only 6.25% of packets are acknowledged

when all resources are utilized. Finally, we present opportunities for the future research in Section VII and conclude our survey in Section VIII. II. C ONGESTION C OLLAPSE The initial TCP standard has a serious drawback: it lacks any means to adjust the transmission rate to the state of the network. When there are many users and user demands for shared network resources, the aggregate rate of all TCP senders sharing the same network can easily exceed (and in practice do exceed) the capacity of the network. It is commonly known in the flow-control world that if the offered load in an uncontrolled distributed sharing system (e.g., road traffic) exceeds the total system capacity, the effective load will go to zero (collapses) as load increases [10] (Figure 3). With regard to TCP, the origins of this effect, known as a congestion collapse [11]–[13], can be illustrated using a simple example. Let us consider a router placed somewhere between networks A and B which generate excessive amounts of TCP traffic (Figure 4). Clearly, if the path from A to B is congested by 400% (4 times more than the router can deliver), at least 75% of all packets from network A will be dropped and at most 25% of data packets may result in ACKs. If the reverse path from B to A is also congested (also by 400%, for example), the chance that ACK packets get through is also 25%. In other words, only 25% of 25% (i.e., 6.25%) of the data packets sent from A to B will be acknowledged successfully. If we assume that each data packet requires its own acknowledgement (not a requirement for TCP, but serves to illustrate the point), then a 75% loss in each direction causes a 93.75% drop in throughput (goodput) of the TCPlike flow. Implementing cumulative ACKs help shift the bend of the curve in Figure 3, but cumulative ACK are not able to eliminate the sharp downward bend.

To resolve the congestion collapse problem, a number of solutions have been proposed. All of them share the same idea, namely of introducing a network-aware rate limiting mechanism alongside the receiver-driven flow control. For this purpose the congestion window concept was introduced: a TCP sender’s estimate of the number of data packets the network can accept for delivery without becoming congested. In the special case where the flow control limit (the socalled receiver window) is less than the congestion control limit (i.e., the congestion window), the former is considered a real bound for outstanding data packets. Although this is a formal definition of the real TCP rate bound, we will only consider the congestion window as a rate limiting factor, assuming that in most cases the processing rate of end-hosts is several orders of magnitude higher than the data transfer rate that the network can potentially offer. Additionally, we will compare different algorithms, focusing on the congestion window dynamics as a measure of the particular congestion control algorithm effectiveness. In the next section we will discuss basic congestion control algorithms that have been proposed to extend the TCP specification. As we shall see, these algorithms not only preserve the idea of treating the network as a black box but also provide a good precision level to detect congestion and prevent collapse. Table I gives a summary of features of the various algorithms. Additionally, Figure 5 shows the evolutionary graph of these algorithms. However, solving the congestion problem introduces new problems that lead to network channel underutilization. Here we focus primarily on the congestion problem itself and basic approaches to improve data transfer effectiveness. In the following sections other problems and solutions will be discussed. A. TCP Tahoe One of the earliest host-to-host solutions to solve the congestion problem in TCP flows has been proposed by Jacobson [14]. The solution is based on the original TCP specification (RFC 793 [1]) and includes a number of algorithms that can be divided into three groups. The first group tackles the problem

In other words.e. In other words. Fast Retransmit Queuing delay as a supplemental congestion prediction parameter for Congestion Avoidance Fast Recovery Fast Recovery resistant to multiple losses Extended information in feedback messages SACK-based loss recovery algorithm Mod1 S S S S P+S+R S Status Obsolete Standard Experimental Standard BSD2 >4. N for NetBSD of an erroneous retransmission timeout estimate (RTO).1. In addition. P = the protocol specification 2 S for Sun.6. These provide two slightly different distributed host-to-host mechanisms which allow a TCP sender to detect available network resources and adjust the transmission rate of the TCP flow to the detected limits. this detection is not fast enough. 2. The ACK ambiguity problem is resolved by Karn’s clamped retransmit backoff algorithm [26].4. the sender can retransmit lost data without waiting for the corresponding RTO event. Since it is practically impossible to distinguish between an ACK for an original and a retransmitted packet.4. If we require that TCP receivers immediately reply to all out-of-order data packets with reports of the last inorder packet (a duplicate ACK) [27]. Assuming the probability of random packet corruption during transmission is negligible ( 1%).6 >F2. This in-out packet balancing is called the packet conservation principle and is a core element. If this value is overestimated. The RTO. and performance of individual flows may severely degrade. The second group of algorithms enhances the detection of packet losses. in which β is a constant in range from 1.10 secondary for the Slow Start II-H 2000 NewReno. Clearly. instead of a step-like growth .4 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. Congestion Avoidance.3 to 2 [1] and SRT T is an exponentially averaged RTT value). F for FreeBSD.2 >F4 Bottleneck buffer utilization as a primary S Experimental > feedback for the Congestion Avoidance and 2. Importantly. detection of subsequent packet losses results in exponential RTO growth. Reno-type Congestion Avoidance and Fast S Experimental > Vegas Recovery increase/decrease coefficient 2..0 Mac 1988 RFC793 1992 Tahoe 1990 Tahoe 1999 Reno 1996 RFC793 1996 Reno.1. wasting shared network resources and worsening the overall congestion in the network.6 2.90 Standard > >10. when the value of the RTO is underestimated. can send at least the amount of data that has just been acknowledged. Thus the sender. SACK 1995 Reno > >95/NT 1. In the Slow Start algorithm. significantly reducing the total number of retransmissions and helping stabilize the network state. both of Slow Start and of Congestion Avoidance. the error detection mechanism may perform unnecessary retransmissions. the ACKed packet has left and a new one can enter the network). the minimum time when loss can be detected is the RTT—i. by definition. the algorithm calculates an RTT variation estimate to establish a fine-grained upper bound for the RTO (SRT T + 4 · rttvar).2. reception of an ACK packet is considered an invitation to send double the amount of data that has been acknowledged by the ACK packet (multiplicative increase policy).3 >F2. [17] TCP NewReno [18]. during severe congestion..e. Reno/Vegas Congestion Avoidance mode S Experimental Vegas switching based of RTT dynamics II-I 2002 NewReno. the sender can treat all detected packet losses as congestion indicators. The round-trip variance estimation (rttvar) algorithm tries to mitigate the overestimation problem. In the opposite case. R = the receiver reactions. the loss can be detected by the Fast Retransmit algorithm [28]. The original TCP specification defines the RTO as the only loss detection mechanism. [19] TCP SACK [20] TCP FACK [21] TCP-Vegas [22] TCP-Vegas+ [23] TCP-Veno [24] TCP-Vegas A [25] II-A II-B II-C II-D II-E II-F II-G Base Added/Changed Modes or Features Slow Start.6. the duplicate ACKs can be considered a reliable loss indicator. reasonably sure it will not cause congestion. The third and most important group includes the Slow Start and Congestion Avoidance algorithms. is greater than RTT.18 adaptation based on bottleneck buffer state estimation II-J 2005 Vegas Adaptive bottleneck buffer state aware S Experimental Congestion Avoidance 1 TCP specification modification: S = the sender reactions. the TCP packet loss detection mechanism becomes very conservative.1. Although it is sufficient to reliably detect all losses.1R Experimental >N1.3. The exponential retransmit timer backoff algorithm solves the underestimation problem by doubling the RTO value on each retransmission event.1 >2.36 (opt) Standard >S2. and thus it has no impact on the RTO estimate. assuming the probability of packet reordering and duplication in the network is negligible. if the receiver is able to instantly detect and report a loss to the sender. RTO calculation is further complicated. the reception of any ACK packet is an indication that the network can accept and deliver at least one new packet (i.90 10.1. the RTT of a data packet that has been retransmitted is not used in calculation for the average RTT and RTT variance. In other words. the report will reach the sender exactly one RTT after sending the lost packet. almost within the RTT interval.92 >4. Having this new indicator. ACCEPTED FOR PUBLICATION TABLE I F EATURES OF TCP VARIANTS THAT SOLVE THE CONGESTION COLLAPSE PROBLEM TCP Variant Section Year TCP Tahoe [14] TCP-DUAL [15] TCP Reno [16]. > > 98 > >N1.3 Implementation Linux Win 1. Instead of a linear relationship between the RTO and estimated round-trip time (RTT) value (β · SRT T .

Graphs on Figure 7 show two cases of the congestion window dynamics: the left graph represents the case when the receiver cannot process at the receiving rate (i.AFANASYEV et al. a threshold parameter (ssthresh) is introduced. the congestion window is merely halved (multiplicative decrease policy). one) to ensure release of network resources. The word “slow” in the algorithm name makes reference to this difference. Outstanding data packets allowance dynamics as defined in RFC793 (network limits are not considered) Receiver limit Fig. the Slow Start phase is used. This is known as the Slow Start-Congestion Avoidance phase cycle (Figure 9). TCP algorithms should enforce fair resource sharing. And in contrast to restarting at one after a loss.g. For phase-switching purposes.. As long as the value of the congestion window is lower than the threshold parameter. The other algorithm of the third group is Congestion Avoidance. It is aimed at improving TCP effectiveness in networks with limited resources. 6.e. Figure 7. the network is experiencing congestion because all network resources have been utilized). the congestion window is reset to the initial value (e. We can define algorithm effectiveness as the ratio of the area below the congestion window graph (e. As can be seen in Figure 8. the congestion window increases by one only if all data packets have been successfully delivered during the last RTT (additive increase policy). in the long term. hatched area) to the area below the limit line (Figure 7. the effectiveness.. 9.g. during the persistent congestion state). The congestion window itself. This combines fast network resource discovery and long-term efficiency. The multiplicative decrease policy mimics such exponential behavior when several packets in succession are determined as lost (e. The implementation of TCP Tahoe includes both Slow Start and Congestion Avoidance algorithms as distinct operational phases.. this growth follows an exponential function on an RTT-defined scale (Figure 7). 7. where the network is a real transmission bottleneck. If a packet loss is detected (i. Congestion window dynamics of combined Slow-Start (SS) Congestion Avoidance (CA) Fig. Jacobson’s analysis [14] has shown that to achieve network decongestion.. 8. Due to the resource-sharing nature of IP networks. In comparison to the Slow Start.g. under “Network Limit” line).: HOST-TO-HOST CONGESTION CONTROL FOR TCP 5 Max # of outstanding packets Loss detection Congestion window Receiver limit (window) Maximum data transfer (same for similar graphs throughout paper) Network limit Transmission start up Time Time Fig. the Congestion Avoidance algorithm is quite effective in the long term. Congestion window dynamics and effectiveness of Slow-Start if limit is imposed by legacy flow control (left) and network (right) (Figure 6) in the number of outstanding packets (as given in the original specification [1]). of the Slow Start algorithm is very low. Chiu and Jain [29] developed a fairness measure F (the so-called Jain’s fairness index) as a function of network . This threshold determines the maximum size of the congestion window in the Slow Start phase.. and any detected packet loss adjusts the threshold to half of the current congestion window.e. i.. The tradeoff is a slow discovery of available network resources due to the conservative rate of the additive phase. and the right graph shows the congestion window dynamics when the network cannot deliver everything at the transmitted rate. Effectiveness is not the only important parameter of congestion control algorithms. Congestion window dynamics and effectiveness of Congestion Avoidance Congestion window Network limit Congestion window Receiver limit Congestion window Network limit Loss detection Network limit ssthresh Time SS SS CA SS Time Time Detected packet loss Fig. It is clear (observing the right graph in Figure 7) that where the available network resources are lower than limits imposed by the receiver.e. this algorithm is much more conservative in response to received ACK packets and to detection of packet losses. as in the Slow Start algorithm. the original assumption of TCP). is always reset to a minimum value upon loss detection. As opposed to doubling. Once the window is greater than the threshold. Congestion Avoidance is used. exponentially reducing network resource utilization by each individual flow is sufficient.

ACCEPTED FOR PUBLICATION Network Limit Network Limit e ar Network Share (flow2) sh Network Share (flow2) ir ) (fa al Eq u x3 Packet losses xn+1 xn x2 Eq u al (fa ir ) x1 x0 Network Share (flow1) x2 x0 Network Share (flow1) Fig. exactly one loss event is enough to equalize shares (similar to path x1 − x2 in Figure 10). If we assume that (a) the network share for each flow is directly proportional to its congestion window size. x1 − x2 . Then we can consider the minimal RTT value observed by the sender (RT Tmin ) as a good indication that the path is in a congestion-free state (left diagram in Figure 12). xn − xn+1 additive increase (both flows have the same increase rate of their congestion windows) x1 −x2 . . . leading to variability in packet losses. Wang and Crowcroft [15] presented TCP DUAL. which refines the Congestion Avoidance algorithm. . and (c) we can simultaneously detect packet losses (a so-called synchronized loss environment). Congestion Avoidance ensures a uniform congestion window increase by each flow from any initial state (45◦ slope of x1 − x2 and xn − xn+1 segments in Figure 11). If sh x1 ar Packet losses e . Because the reaction to a packet loss in TCP Tahoe is the same as in Slow Start (i. B. . The equal share line represents states when network resources are fairly distributed between flows and the network limit line when all network resources are consumed (either by one or both flows).e. and network buffer utilization. Slow Start and Congestion Avoidance exhibit good fairness (F → 1) under certain network conditions as follows. However. the initial dynamics can follow the Slow Start path (x0 − x1 in Figure 10) or the Congestion Avoidance path (x0 − x1 in Figure 11). . j) and tends to zero if a single flow usurps all network resources (limn→∞ F = 0).. . . This behavior induces significant periodic changes in sending rate. congestion window halving) as a reaction to a packet loss in Congestion Avoidance guarantees share equalization (fairness) in a finite number of steps. xn−1 −xn multiplicative decrease (a flow with the larger congestion window decreases more than a flow with the smaller) Fig. . 10. xn−1 − xn ) if two TCP flows started competing with each other from an initial state x0 under the ideal network conditions. . then the network share dynamics for each algorithm can be represented by the convergence diagrams in Figures 10 and 11. . this solution has an unpleasant drawback of straining the network with high-amplitude periodic phases. Depending on the values of the Slow Start thresholds. In Figure 10 the aggressive (multiplicative) congestion window increase in Slow Start favors the flow having a larger network share. DUAL tries to mitigate the oscillatory patterns in network dynamics by using a proactive congestion detection mechanism coupled with softer reactions to detected events. Obviously. the multiplicative decrease (i. it is enough that the flow having a larger network share decreases by a greater amount. A convergence diagram of TCP Tahoe can be represented as a combination of the Slow Start and Congestion Avoidance diagrams. Instead. round-trip time. . Congestion avoidance (AIMD) resources consumed by each user sharing the same path: n ( F = n· i fi ) n 2 fi2 i where n is the number of users sharing the path and fi is the network share of ith user. More precisely. reseting of the congestion window equalizes the network share of the flows that provides fairness of the network resource distribution in the future (the flows become locked-in between the states x2 and x3 ). the slope of x0 − x1 segment is proportional to the ratio between each flow’s share in state x0 . In fact. congestion window reset). This property eliminates the necessity of the congestion window equalization. or be a combination of both algorithms.6 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. . Let us assume that routes do not change during the transmission and that the receiver acknowledges each data packet immediately. . After detection of a packet loss. 11. This index ranges from 0 to 1. (b) both flows have equal RTT values. to provide fair network usage between flows. . These diagrams show how network resource proportions would change (paths x0 − x1 . it introduces the queuing delay as a prediction parameter of the network congestion state. TCP DUAL TCP Tahoe (Section II-A) has rendered a great service to the Internet community by solving the congestion collapse problem. More specifically. .. xn − xn+1 multiplicative increase (both flows have the same increase rate of their congestion windows) x1 − x2 equalization of the congestion window sizes Slow-Start Algorithm x0 − x1 . x0 − x1 . then Jain’s index can be considered a fairness measure for TCP flows. both flows reset their congestion windows (x1 − x2 ). If we assume that each user has only one TCP connection per particular network path.e. where 1 is achieved if and only if each flow has an equal (fair) share (fi = fj ∀i. . Let us consider two flows competing with each other on the same network path and with no other flows present.

Nonetheless.. a fraction of the maximum queuing delay (Qthresh = α · Qmax . which. The transition from State 1 to State 2 shows the core concept of optimistic network share reduction.. Therefore. and the reaction to the loss event can be more optimistic. if a DUAL flow is already transmitting data when a new DUAL flow appears. the delay threshold in DUAL is selected as half the maximum queuing delay (Qthresh = Qmax /2). For example. the optimistic reaction is to use the Fast Recovery algorithm [17]. where the first one acknowledges some new data and the rest are the exact copies of the first one (usually referred to as three duplicate ACKs). As we can see from theoretical congestion window dynamics of TCP DUAL (Figure 13).AFANASYEV et al. using the multiplica- Fig. The sender. If the network saturation point is estimated incorrectly. RT O minus RT T ) some major congestion event has prevented delivery of any data packets on the network. Correlation between RTT dynamics and congestion situation Loss detection Congestion window Reaching queuing delay threshold .. If this threshold is exceeded (Q > Qthresh ).g. when exceeded.. the effectiveness is greatly improved compared to Tahoe (i. a 1% packet loss rate can cause up to a 75% throughput degradation of a TCP flow running the Tahoe algorithm [16].e. observed RT Tmax is not the real maximum) network resources will be underutilized. The algorithm phases are illustrated in Figure 14. graphically. Jacobson [16] revised the original composition of Slow Start and Congestion Avoidance by introducing the concept of differentiating between major and minor congestion events. DUAL additionally maintains a maximum RTT value observed during the transmission (RT Tmax). there are a number of trade-offs. the flow cannot utilize the available network resources fairly and effectively. the sender stays in Fast Recovery until it receives a non-duplicate acknowledgment. In other words. the new flow can potentially capture a larger share of the network resources. A loss detection through the retransmission timeout indicates that for a certain time interval (as an example. where congestion window sizes (cwnd) in various states are denoted as the line segments above the State lines. To resolve this problem. and the arrows indicate the effective congestion window size— the amount of packets in transit. Thus. as occurs with TCP Tahoe (Section II-A). Quite a different state can be inferred from a loss detected by duplicate ACKs. the hatched area is proportionately larger). C.e. the network state can be considered to be lightly congested. CA: the Congestion Avoidance phase) we make one more assumption that an increase of the RTT can only occur due to increasing buffer utilization.e. To quantify the congestion level. the difference between the measured and the minimal RTT value (queuing delay Q = RT T − RT Tmin) can be viewed as an indicator of the congestion level in the path (right diagram in Figure 12). while the other flow will continue congestion window growth without noticing anything abnormal. threshold overestimation can potentially cause an unfair resource distribution between different TCP DUAL flows.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 7 Congestion -free TCP Sender Congested Queuing due to congestion (Q) Time K AC RTTmin TCP Receiver RTTmin + Q Fig. the congestion window decreases by 1/8th (i. to halve the congestion window) and to taper back network resource probing (holding all growth in the congestion window) until the error is recovered. the new flow will observe a higher RT Tmin value and overestimate the queuing AC K D a at D a at . The flow with the lower queuing threshold (the old flow) has a higher probability of predicting the congestion state and trigger congestion window reduction.. Network limit sshthresh Time SS SS CA delay threshold. The difference between maximum and minimum RTT values is considered a measure of the maximum congestion level (i. TCP Reno Reducing the congestion window to one packet as a reaction to packet loss. where 0 < α < 1) serves as a threshold. Suppose the sender has received four ACKs. The duplicate ACKs indicate that the some packets have failed to arrive. Finally. and the congestion estimation is performed once per RTT period based on the average RTT value (Q = RT Tavg − RT Tmin ). in some cases... lead to significant throughput degradation. For example. However. On the other side. On the one side. Congestion window dynamics of TCP DUAL (SS: the Slow Start phase. 13. indicates the congested network state. the sender should apply the conservative policy of resetting the congestion window to a minimal value. the maximum queuing delay Qmax = RT Tmax − RT Tmin). in addition to detecting the lost packet. is rather draconian and can.e. is also observing the ability of the network to deliver some data. In TCP Reno. presence of each ACK—including the duplicates—indicates the successful delivery of a data packet. in the case where the threshold is underestimated (e. Thus. The intention of Fast Recovery is to halve a flow’s network share (i. applied multiplicative decrease policy). In the proposal [15]. 12.

there is an open question about how well the DUAL algorithm performs in less ideal environments and when DUAL flows compete with other DUAL or Tahoe flows. First. but also by allowing data transfers during the recovery. Thus. the DUAL algorithm will normally outperform Reno. the overall effectiveness in the steady state is considerably improved by replacing Slow Start phases after each loss detection by typically shorter Fast Retransmit phases. At this point. Compared to the dynamics of TCP Tahoe (Figure 9). efficiency is improved not only by shortening the recovery period.. the non-duplicate ACK will acknowledge delivery of all data packets previously inferred by the duplicate ACKs previously received.9 (after the convergence—state xn+1 —network shares are distributed as 2:1 in favor of a Reno flow). With a finite number of steps. in a situation of higher packet losses. In the final stage (State 5). With high probability. Characteristic states of TCP Reno’s Fast Recovery Loss detection Congestion window Network limit tive decrease policy. if we want to maintain a constant number of packets in transit. But DUAL has several important drawbacks.. and as a result. Having substantial performance improvement compared to Tahoe.e. but also inflates the congestion window by the number of duplicate packets (see transition from State 2 to State 3 in Figure 14). FR: the Fast Recovery phase) equal reactions to packet loss detection lead to shifting the distribution of network resources to the Reno side. the performance of DUAL and Tahoe would be about the same. it would match the diagram for the Congestion Avoidance algorithm in Figure 11 exactly. Reno could outperform the both of them. one can easily calculate the Jain’s fairness index (see Section II-A). CA: the Congestion Avoidance phase. Without this increase. Clearly. the algorithm not only retransmits the oldest unacknowledged data packet (i.e. However. Second. TCP Reno remains fair to other TCP Reno flows (in terms defined in Section II-A). As we already know. in an ideal network environment with only one TCP flow present. The resulting theoretical congestion window dynamics in TCP Reno are presented in Figure 15.8 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. In fact. . which can lead to network resource underutilization or unfair distribution of network resources. To quantify fairness in this case. the delay characteristic is not always a true congestion indicator. After the reduction (i. Congestion window dynamics of TCP Reno (SS: the Slow Start phase. this value equals 0. an ACK indicates delivery of at least one data packet. from cwnd to cwnd/2). 15. However. 14. This can be considered an acceptable level for the transition period when the congestion control algorithm is changed from Tahoe to Reno at all network hosts. congestion window deflation to cwnd/2 (to the value just after entering recovery. a slightly worse situation can be observed when a TCP Reno flow competes with a Tahoe flow. This can be demonstrated using the convergence diagram in Figure 16. when a non-duplicate ACK is received. we have to inflate our congestion window to open a slot for sending new data (State 4 in Figure 14). and the amount of packets in transit can decrease more than expected. waiting for ACK cwnd cwnd/2 Buffered data Just before the loss detection Just after the loss detection “Inflating” cwnd by the number of dup ACKs Additional dup ACKs lead to additional cwnd “inflation” cwnd/2 After the successful recovery (cwnd “deflation”) State 2 cwnd/2+#dup State 3 cwnd/2+#dup State 4 State 5 Outstanding data which is not allowed to be retransmitted Amount of new data allowed to be sent by “deflated” congestion window Amount of successful delivered data inferred from dup ACKs Amount of packets in transit The congestion window size is a sum of these two elements Fig. In Figure 16. ACCEPTED FOR PUBLICATION ACKed data State 1 Sent data. new packets cannot be sent before the error is recovered. applies the Fast Retransmit algorithm). recovering from a single loss would usually occur within one RTT. If we try to build a convergence diagram. we want to resume Congestion Avoidance with half of the original congestion window. State 2 in Figure 14) is a simple and reliable way to ensure the target exit state from Fast Recovery. A comparison to TCP DUAL shows that. the system reaches a steady state in which the Reno flow has a larger share of network resources. Un- ssthresh SS FR CA Time Fig.

tend to give further favor to TCP Reno. where a single congestion event (e. However. That is.e. from those data packets outstanding at the moment of loss detection) have a high probability of being caused by a single congestion event. the desired optimistic reaction of Fast Recovery (i. In the remainder of this paper we will discuss a number of the most important TCP proposals which address these issues without deviating from the original host-to-host principle of TCP. . the second and third losses from the example above should be treated only as requests to retransmit data and not as congestion indicators. there are a wide range of network environments where Reno has inadequate performance. It solves the ambiguity of congestion events by restricting the exit from the recovery phase until all data packets from the initial congestion window are acknowledged. D. particularly when the retransmission timeout is triggered during the Fast Retransmit phase. Because of its simplicity and performance characteristics. and it does not utilize high-speed/long-delay network channels efficiently. and the sender therefore can be absolutely sure that during this interval all previously sent data will be either delivered or lost. and in the example above this does not hold true. Retransmission of the first lost packet and reception of the corresponding ACK will take exactly one RTT. the purpose of which is to reduce consumption of network resources in complex congestion situations. the first loss causes entry into the recovery phase and the halving of the congestion window. this exponential reaction to multiple losses is expected from the congestion control algorithm. . All packet losses from the original data bundle (i. . However. TCP NewReno One of the vulnerabilities of TCP Reno’s Fast Recovery algorithm manifests itself when multiple packet losses occur as part of a single congestion event. Before the new congestion window size becomes effective. and to ignore all duplicate ACKs that acknowledge lower sequence ir ) sh ar e . . flow Convergence diagram when Reno flow is competing with Tahoe Additionally. [19] introduce a simple refinement of Reno’s Fast Recovery.. But this expectation rests on the assumption that congestion states. At the same time. exit from the NewReno’s Fast Recovery can proceed only when the sender receives a new data ACK. The solution (“bugfix” in terms of [30]) is to remember the highest sequence number transmitted so far after each triggering of the retransmission timeout. doubts about DUAL’s fairness to other DUAL flows. 16. Therefore.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 9 Network Limit Network Share (Reno) Eq u al (fa xn+1 xn Packet losses x1 x0 Network Share (Tahoe) x0 − x1 . random packet losses. is the solution to most of the cases of unnecessary congestion window reduction. the NewReno algorithm adds a special state variable to remember the sequence number of the last data packet sent before entering the Fast Recovery state. A partial ACK confirms the recovery from only the first error and indicates more losses in the original bundle of packets. 4. This value helps to distinguish between partial and new data ACKs. the subsequent loss detections cause the congestion window to decrease further. . xn − xn+1 additive increase (both flows have the same increase rate of their congestion windows) x1 − x2 . This significantly decreases Reno’s performance in heavy load environments. we should not apply any additional rate reduction policies. All packets sent before the congestion window reduction are still in transit. Thus there is no reason for the sender to wait for additional signals before retransmitting the lost packet inferred from the partial ACK. Finally. reaction to partial ACK is only a deflation of the congestion window (State 4) and a retransmission of the next unacknowledged data packet (State 5). the congestion window halving) suddenly transforms into a conservative exponential congestion window decrease. xn−1 − xn Tahoe flow reset its congestion window but Reno flow only halves it Fig. Floyd et al.. it has severe performance degradation in the presence of consecutive packet losses. 6). Moreover. . The reception of a new data ACK means that all packets sent before the error detection were successfully delivered and any new loss would reflect a new congestion event. It is also unfair if competing flows have different RTTs. This can be interpreted as reducing the congestion window no more often than once per one-way propagation delay or approximately RT T /2. in some cases. a short burst of cross traffic) causes the loss of several data packets (indicated by x). [18]. using the same mechanisms of entering and exiting the recovery state. this data no longer consumes the network resources.e. Similar to the original Reno algorithm.g. which is accompanied by the full congestion window deflation (State 7 in Figure 18). as discussed in Section II-B. As we can see. Reno is generally the congestion control standard for TCP. Thus. More formally. In one sense. and reordering of packets.AFANASYEV et al. Partial ACKs can be sent by the receiver only if more than one packet is lost from the original packet bundle. The reception of any non-duplicate ACK would finish the recovery. reducing the congestion window does not guarantee the instant release of network resources. This problem is demonstrated in Figure 17. duplicate acknowledgments transfer their role as error indicators to the partial ACKs. are independent. . For example. That is. Notice that during the recovery phase. A partial ACK provides the exact information about some part of the delivered data. as deduced from each detected loss. Remembering the sequence number of the last data packet sent before entering the Fast Recovery phase and using this to distinguish between partial and new data ACKs. unnecessary congestion window reduction still may occur [30]. reception of any duplicate ACKs triggers only the inflation of the congestion window (States 3. Figure 18 illustrates the differences between Reno and NewReno. ..

The recovery process can be sped up if the sender retransmits several packets instead of a single one upon error detection. This solution optimistically resolves the ambiguity of duplicate ACKs that can indicate either lost or duplicate packets. However. if a second and a third data packet from some continuous TCP stream are lost. packets before are received Retransmission of the lost packet ( ). The performance problem in Reno’s Fast Recovery ACKed data State 1 cwnd/2 State 2 State 3 State 4 State 5 State 6 State 7 #dup+cwnd/2-ACK #dup+cwnd/2-ACK cwnd/2 #dup+cwnd/2 #dup+cwnd/2-ACK Sent data. The cwnd remains unchanged All dup ACKs only “inflates” the cwnd Exit recovery and deflate cwnd when non-duplicate ACK is received Packet losses due to the minor congestion event Non-duplicate ACK Detected packet loss (e. For this reason. 18. TCP SACK The problem with Reno’s Fast Recovery algorithm discussed in the Section II-D arises solely because the receiver can report limited information to the sender. Characteristic states of TCP NewReno’s Fast Recovery numbers. ACCEPTED FOR PUBLICATION ACKed data State 1 State 2 State 3 State 4 Sent data.e. waiting for ACK cwnd Buffered data Initial congestion window before loss detection Just after the loss detection Retransmission of the lost packet. NewReno modifies only the Fast Recovery algorithm by improving its response in the event of multiple losses.e. Meanwhile. A slightly more aggressive recovery procedure would allow a NewReno flow in some cases to obtain more network resources than a competing Reno flow.g. in the steady state performance and fairness characteristics are similar to the ones shown in Section II-C. this technique assumes certain patterns of packet losses and may . only one packet will be retransmitted after a loss detection—loss of the third data packet will be detected after another RTT at best.. a duration of the recovery in Reno is directly proportional to the number of packet losses and RTT. Moreover. according to the cumulative ACK policy.. waiting for ACK cwnd cwnd/2 cwnd/22 cwnd/23 Buffered data Just before 1st loss detection After exit from the recovery and just before 2nd loss detection After exit from the recovery and just before 3rd loss detection After exit from the last recovery phase Packet losses due to the minor congestion event Detected packet loss (e. But generally. The TCP specification [1] defines that the only feedback message be in the form of cumulative ACKs. the receiver. we consider NewReno to have the same fairness characteristics as Reno. Each dup ACK “inflates” the cwnd Partial ACK “deflates” the cwnd.. but it does not solve the fundamental problem of prolonged recovery. For example. i. 3 dup ACKs) Packet retransmission Fig. because Fast Recovery assumes loss of only one data packet—i. Thus. acknowledgments of only the last in-order delivered data packet.g. would reply to fourth and consecutive packets with duplicate acknowledgments of the first packet (Figure 19). E. 17. 3 dup ACKs) Packet retransmission Amount of successful delivered The congestion window size is a data inferred from dup ACKs sum of these two elements Amount of packets in transit Fig.10 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. the loss can be detected no sooner than after one RTT. Clearly. NewReno resolves Reno’s problem of excessive rate reducing in the presence of multiple losses. this imbalance only happens due to the inability of Reno itself to utilize the network resources under those network conditions effectively. This property limits the ability of the sender to detect more than one packet loss per RTT..

as well. the forward-most sequence number of all acknowledged data packets—no data packets with sequence number above F have been delivered AC K 1 st ta da t st ke la ac p da ta pa ck et . Another approach is the FACK (Forward Acknowledgments) congestion control algorithm [21]. if we apply the mechanism of limiting the congestion window reduction to no more than once per RTT period to the problem illustrated in Figure 17. Unfortunately the SACK mechanism has serious limitations in its current form. The inability of the receiver to quickly indicate all detected losses returns us to the original problem. [20] address the problem of limited information available in a cumulative ACK. As a solution. This information should at least include the time of the last retransmission in order to detect a loss using the legacy timeout method (RTO). which decrease the la st Time space for the sequence number pairs being included in SACK. 19. TCP FACK Although SACK (Section II-E) provides the receiver with extended reporting capabilities. Fig. which decreases SACK space to only 3 gaps of lost packets. The rate control part. whereby the congestion window is not multiplicatively reduced more than once per RTT. In some environments. if any. the Timestamp option [31] reduces the available space in the TCP header by 8 bytes. 4 packets being lost). they propose extending the TCP protocol by standardizing the selective acknowledgment (SACK) option. This option provides the ability for the receiver to report blocks of successfully delivered data packets. The interval between the first and last data packets sent before reception of any ACK is 2×RTT in the worst case just waste network resources if actual losses deviate from these patterns. the sender would be able to implement a simple algorithm to resolve the long recovery problem. and up to four pairs of 4-byte sequence numbers [20]).. In the worst case. preserving the value of the congestion window. the interval between the first and last data packets sent before reception of any ACK is exactly one RTT (Figure 20). this limit is exceeded just after the first 4 packets are received (thus. The rest of the losses in the original packet bundle would be reported within one RTT. It defines recovery procedures which. instead of implementing the NewReno algorithm. Using this information. SACK option duplicates Left edge – the first sequence number of the block. Left edge of nth block Right edge of nth block 11 RTT ACK packets 1 1 1 . We have informally discussed one possible extension of the Reno algorithm utilizing SACK information.. and thus will cause only retransmission of the lost packets. Although this worst-case situation is unlikely to happen in wired networks (since during congestion events consecutive packets are usually dropped). the pattern of packet losses may easily exceed this SACK limit. Mathis et al. the FACK sender is required to retain information about retransmitted data. when every other packet is lost. Thus. unlike the Fast Recovery algorithm of standard TCP (TCP Reno). The rationale behind this solution is that in the worst case.AFANASYEV et al. A simple calculation reveals that the SACK option can contain at most four blocks of data packets received in order (2 bytes to identify option and specify option length. will be reported to the sender within the next RTT. It provides a means for timely retransmission of lost data packets. TCP Receiver Left edge of 1st block Right edge of 1st block .. the FACK maintains three special state variables (Figure 22): (1) H. 20. has a direct means to calculate the number of outstanding data packets using information extracted from SACKs. If a receiver can provide information about several packet losses within a single feedback message. the highest sequence number of all sent data packets—all data packets with sequence number less than H have been sent at least once. 21. This observation shows that SACK is not a universal solution to the multiple loss problem.. Right edge – the sequence number immediately following the last sequence number of the block Fig. Moreover. The situation becomes aggravated if we want to use additional TCP options. Duplicate ACKs allow loss detection no sooner than after one RTT 2 RTTs TCP Sender AC K TCP Receiver 1 st RTT Fig. Because retransmitted data packets are reported as lost for at least one RTT and a loss cannot be instantly recovered. use additional information available in SACK to handle error recovery (flow control) and the number of outstanding packets (rate control) in two separate mechanisms. the first error detection should cause retransmission and shrink the congestion window. All loses. The TCP specification restricts the length of the option field to 40 bytes. F. The flow control part of the FACK algorithm uses selective ACKs to indicate losses. random losses in wireless networks can show patterns approximating the worst case.: HOST-TO-HOST CONGESTION CONTROL FOR TCP TCP Sender Data packets 1 2 3 4 5 . For example.. Instead of the congestion window inflation technique. Reno’s problem discussed in Section II-D can be solved by restricting the congestion window reduction to no more than once per RTT period. (2) F . TCP senders can easily calculate blocks of lost packets (gaps in sequence numbers) and quickly retransmit them (Figure 21).. unlike Reno’s and NewReno’s Fast Recovery algorithms. it does not define any particular congestion control algorithms.

this estimate is based on RTT measurements. once every RTT Vegas checks the difference Δ between the expected rate (small circles in Figure 24) and the actual rate (solid line in Figure 24). the congestion window is decreased by one. less than 2). the problem discussed in Section II-B (i. The actual rate (bold solid line in Figure 23) can be expressed as the ratio between the current congestion window and the current RTT value. This estimate can be utilized by the sender to decide whether or not to send a new portion of data. NewReno. In other words. fairness of the DUAL algorithm is questionable. but an absolute number of packets enqueued at the bottleneck router as a function of the expected and actual transmission rate (Figure 23). the steady-state characteristics of effectiveness and fairness are exactly the same as for Reno (see Section II-C).1. This rate can occur if all transmitted data packets are successfully acknowledged within the minimal RTT (i. and II-F improve various aspects of Tahoe. More formally. Unlike DUAL. TCP DUAL (Section II-B) makes an attempt to provide a proactive method of quantifying the congestion level before an actual congestion event occurs using an estimate of queuing delay.. Reno and NewReno congestion controls. > cwnd0 /RT Tmin ) will fail.e. G. That is.. induced periodic changes in sending rate. etc. If no packets are dropped in the network. In fact. But the solution only mitigates the oscillatory patterns of network parameters (RTT. data can be sent when the calculated number of outstanding data packets is under an allowed limit (the congestion window).e. The key component is making an estimate of the used buffer size at the bottleneck router. according to Linux implementation. no loss. Thus. RT T and RT Tmin : Δ = cwnd × RT T − RT Tmin RT T Vegas incorporates this Δ measure into the Congestion Avoidance phase to control the sender’s window of allowed outstanding data packets (see beneath “Congestion Avoidance. it is increased by one.. However. However. However. 22. as mentioned in Section II-B. the system is considered to be in a steady state and no modifications to the congestion window are applied. also applies to Reno. Special state variables in the FACK algorithm (acknowledged).g. due to the finite capacity of the path.). a larger RTT is due to increased queuing in the transmission path. otherwise. Reactions to packet losses are defined . The minimal RTT value observed during the connection lifetime is considered a baseline measurement indicating a congestion-free network state (analogous to Figure 12). Because FACK modifies only reactions in the recovery phase. Δ can be expressed as a function of the congestion window size. no congestion). Vegas controls the congestion window using an additive increase and additive decrease (AIAD) policy. If Δ is more than the predefined threshold β (e. the number of packets enqueued during the last RTT is the difference Δ between the current congestion window and the inflection point in our graph. Simulation results [21] confirm that FACK has a much faster recovery time from errors than Reno or NewReno. Vegas tries Fig. round-trip time. Assuming that RT Tmin is constant. to mitigate the effects of network parameter fluctuations and to provide system stabilization.e. these variants of TCP bring about packet losses because their algorithms can grow packet transmission rates to the point of network congestion. and FACK congestion control algorithms. If Δ is between α and β. In other words. and all attempts to send at a faster rates (i.12 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. Similar to the DUAL algorithm.92 version. the number of retransmitted packets. The expected rate (dashed line in Figure 23) is a theoretical rate of a TCP flow in a congestion-free network state. etc. more than 4).. and (3) R. thus we have Δ = cwnd − cwnd0 . each of them detects that the network is congested only if some packets are lost. 23. we can always find a point cwnd0 on the graph when the actual rate is numerically equal to the expected rate. II-D. TCP Vegas The approaches discussed in Sections II-C. packet losses. Moreover. That is. According to our assumptions.. network buffer utilization. the proposed algorithm defines a control dead-zone (hatched area in Figure 24) using additional threshold α. all of them share the same reactive method of rate adaptation. the congestion window increase is allowed only if Δ is strictly less than α (e.g. this excess of data packets is the only cause of a corresponding RTT increase. Therefore. TCP Vegas—the utilized buffer size Δ as function of expected and actual rate to quantify. Moreover. Clearly. buffer utilization.) and never fully eliminates them. Brakmo and Peterson [22] proposed the Vegas algorithm as another proactive method to replace the reactive Congestion Avoidance algorithm. the expected rate is directly proportional to the size of the congestion window with a proportionality coefficient of 1/RT Tmin. ACCEPTED FOR PUBLICATION Retransmitted (R) Forward-most received (F) Send buffer Highest sent (H) Transmission rate cwnd/RTTmin – expected rate cwnd/RTT – actual rate Rate when RTT is min / RTTmin Network limit (RTT grows as cwnd grows) Acknowledged data packets Sent and not yet ACKed data packets Lost data packets cwnd0 1/RTTmin cwnd Congestion window Fig. not a relative. advantages of FACK have long been widely recognized and FACK has been an embedded part of the Linux kernel since the 2.” Figure 24). The simple relation H − F + R provides a reliable estimate (in the sense of robustness to ACK losses) of outstanding data packets in the network.

: HOST-TO-HOST CONGESTION CONTROL FOR TCP Slow-Start Congestion Avoidance 13 Congestion window Network Share (Reno) Zone of zero buffering x4 x1 x0 Eq u al Network Limit Zero Buffering x5 Time Expected rate calculations (in Slow-Start every other RTT. NewReno. Return to the moderate mode occurs only when C becomes zero.. the RTT also should be stable). large values of C indicate a Vegas-unfriendly network state (i. That is. newcomers get a bigger share) due to inaccurate RT Tmin estimates.e. The Vegas-friendliness/unfriendliness detection heuristic is based on a trend estimate of the RTT. [25]. including the inability of Vegas to get a fair share when competing with aggressive TCP Reno-style flows (a reactive approach is always more aggressive). later research [23].g. under some network conditions (like presence of a fluctuated congestion-unresponsive traffic or rerouting in the path) Vegas+ can unnecessarily transition to the aggressive mode and (fa ir ) Control dead-zone (lower and upper thresholds) x3 x2 Packet losses sh a re . While there is no buffering on the path. or FACK). This period is required in order to employ the bottleneck buffer estimation technique. In the opposite case. available Linux implementations do not perform any changes to the original Slow Start algorithm and implement only the modified Congestion Avoidance phase. both flows slowly increase their share of network resources (x0 –x1 ). C is divided in half. in the case of multipath routing) and has a bias to new streams (i. Moreover.AFANASYEV et al. Vegas rules). Transition to the aggressive mode is triggered when C exceeds a predefined threshold. More specifically. the Congestion Avoidance phase of Vegas+ initially assumes a Vegas-friendly network environment and employs bottleneck buffer estimation to control the congestion window (i. TCP Vegas – congestion window dynamics and corresponding estimates of bottleneck buffer size Δ Network Share (Vegas) Fig. Additionally. the updated algorithm restricts the congestion window to increase every other RTT (see beneath “SlowStart. H. opposite reactions of the two different congestion control algorithms (one growing. despite this and other advantages.. Vegas revises the Slow Start algorithm by slowing-down the opportunistic network resource probing. if the estimated RTT grows smaller. Buffers are purged only when the Reno flow detects a packet loss (x3 ).” Figure 24). Δ becomes larger than α). It also underestimates available network resources in some environments (e..e. the Slow Start algorithm terminates and transfers control to the Vegas Congestion Avoidance algorithm. the convergence dynamics will start looping along the path x4 –x5 –x2 –x3 –x4 . Although the Slow Start modification was designed to reduce network stress. experimental results [22] show almost no measurable impact. The main reason for this is the negligible working time of the Slow Start phase compared to the Congestion Avoidance phase. Clearly. Unfortunately. leading to the proactive algorithm being completely pinched off (x2 ). In the example of Figure 25. which can significantly improve the overall throughput of a TCP flow. TCP Vegas has the amazing property of rate stabilization in a steady state. if the congestion window is stable. Congestion Avoidance falls back to the Reno algorithm. the reactive congestion control elements of Vegas+ practically nullify the inherited advantages of Vegas. In particular. [32] discovered a number of issues. one diminishing) maintain the fixed buffering level in the network. allowing the Vegas+ flow to obtain its fair share of network resources. The special state variable C is increased if the sender estimates an increase in the RTT and concurrently the size of the congestion window is unchanged or even reduced.. The Vegas+ solution does not try to solve the fundamental problems of Vegas discussed in Section II-G. For this purpose.. This point can be illustrated using an idealized convergence diagram for competition between Reno and Vegas flows (Figure 25). TCP Vegas+ Hasegawa et al. Excessive buffering forces Vegas to decrease its congestion window. As soon as Vegas detects increasing queues in bottleneck routers (i. Vegas+ borrows from both the reactive (Reno-like aggressive) and proactive (Vegas-like moderate) congestion avoidance approaches.e. The Vegas proactive congestion-prevention mechanism (limiting buffering in the path) cannot effectively compete with the highly deployed Reno reactive mechanism (inducing network buffering and buffer overflowing). and (2) a packet loss detected by the retransmission timer reduces C to zero. continues acquiring more network resources (x1 –x2 ). [23] have recognized a serious problem in TCP Vegas which prevents any attempts to deploy it. 25.e. TCP Vegas+ was proposed as a way to provide a way of incremental Vegas deployment. At the moment when an internal heuristic detects a Vegas-unfriendly environment. 24. After that. As we can see from Figure 24. Vegas+ additionally defines two special cases for modifying the state variable: (1) entering Fast Recovery. Additionally. but the Reno flow. unaware of this buffering. C is decreased. In practice. in Congestion Avoidance every RTT) Fig. Convergence diagram when an ideal Vegas flow is competing with a Reno flow by any of the standard congestion control algorithms (either Reno. the unfriendliness will be easily detected during the transition from x1 to x2 and Reno’s rules will be enforced.

under certain circumstances. for example a DSL link) and another for a high-RTT/high-rate (2. reducing the congestion window upon entering Fast Recovery is modified to halve the cwnd value only if the buffer estimate also indicates congestion. The key idea is to use the Vegas bottleneck buffer estimation technique to perform early detection of the congestion state. for example. this buffer estimation is used only to adjust the increase/decrease coefficient of the Reno congestion control algorithm. the price for this is additional latency to discover network resources. According to simulation results [25]. TCP Vegas—estimation error if the path has been rerouted Fig. if the estimate is showing congestion-free state Δ < α. the effectiveness of the Veno algorithm is slightly improved in comparison to Reno. boundaries are shifted downward every time the estimate shows the actual congestion. Therefore we can consider it to have the same characteristics as the base Reno algorithm. the long-lived flow has more chances to observe the true minimal RTT. TCP Veno Fu and Liew [24] propose a modification to the Reno congestion control algorithm (Section II-C) aimed at improving the throughput utilization of TCP. 27. it will be reduced to 80% of its current size. However. having a congestion-free state. it assumes a path change and shifts the boundaries of the control dead-zone upward (α = α + 1 and β = β + 1). which extends the original Vegas congestion control with an adaptable mechanism. TCP Vegas A Besides the inability to compete with Reno flows effectively (see Section II-H).14 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. the algorithm will wrongly calculate buffering Δ2 and may exceed a threshold. Reduced increase rate Time cwnd Congestion window Fig. compared to the new flow. ACCEPTED FOR PUBLICATION Transmission rate Loss detection Congestion window High buffering zone ( > ) RTTmin1 Ratemax 2 RTTmin2 Ratemax1 1 2 A. 26. It preserves the Vegas properties of . (2) if the actual rate has increased and the estimate is showing no congestion (Δ < α). Otherwise.e. the distribution of network resources favors the new flow. Δ > β). one that has been transmitting data for a long time and the other which has just started transmitting. First.. The minimal RTT for the lowRTT/low-rate link will be erroneously used as a baseline in calculating the expected rate for the high-RTT/high-rate link. or (3) the actual rate has decreased while the flow is in a steady state (α < Δ < β). Srithith et al. Unlike Vegas. An increase is allowed in three cases: (1) if the estimate shows no congestion and a lower threshold α has a minimal value (the original Vegas rule). The boundaries are shifted downward if some network anomaly is detected.e. for example a satellite link) path. To summarize. Figure 27 presents two curves. it limits the increase of the congestion window during the congestion avoidance phase if the Vegas buffer estimate shows excessive buffer utilization (i. VegasA adds additional conditions to the congestion window management algorithm. The Veno (VEgas and reNO) algorithm defines two modifications. The Veno modification has practically no effect on fairness. Congestion window dynamics of TCP Veno stay there indefinitely (assuming that probability of the loss detected by the retransmission timer is very low). if only a loss is detected. To illustrate. That is. Vegas can inappropriately choke off the flow rate to nearly zero. leading to the reduced flow rate. Second. if the Vegas estimate indicates a congestion state. J. TCP Vegas has a number of other internal problems [25]. That is. one for a lowRTT/low-rate (1. Let us consider a situation with two Vegas flows. if the RTT increases due to a routing change. Additionally. Naturally. In other words. “B” in Figure 26). yet the rate in actuality has decreased. the congestion window will be halved. As a consequence. The threshold coefficients α and β from the Vegas algorithm are adjusted depending on the steady state dynamics of the actual transmission rate. Besides the threshold adaptability. [25] have presented the VegasA (Vegas with Adaptation) algorithm. α < Δ < β). This happens because the assumption that the RTT will change only due to buffering is not entirely true. and thus does not inherit Vegas’ problems.. Veno flow tends to stay longer in the congestion avoidance state with larger congestion window values. if VegasA detects an increase in the actual bandwidth while the system is in a stable state (i. I. the estimate indicates congestion. The difference between the minimum RTTs that the two flows observe causes a difference in congestion state estimates (Figure 28): the old flow thinks that the network is congested while the new one estimated a congestion-free network state. In fact. in the event of detecting a loss and Δ > β. VegasA has substantial improvements in various aspects when compared to the original Vegas design. Normal increase rate B. the algorithm will make a wrong decision. A decrease should occur if either the network has been determined to be in a congestion state (Δ > β) or if the actual flow rate has decreased and the network has been determined to be congestion-free. For instance. That is. Another assumption that surfaces occasionally and is incorrect is that all flows competing along the same path will observe the same RT Tmin . the sender starts probing network resources very conservatively (increasing by one for every two RTT. If a route changes from 1 to 2 when the congestion window size is equal to cwnd.

6.g. in the absence of the congestion or packet losses. if the sender sees several ACKs carrying the same sequence numbers (duplicate ACKs). This assumption has allowed the algorithms to create a simple loss detection mechanism without any need to modify the existing TCP specification [1]. 28. [38]). III. a majority of the reordering events will be hidden from the sender. three) [17]. the absence of reordering guarantees that an out-of-order delivery occurs only if some packet has been lost. But packets can also be reordered in some networks as a side effect of a normal delivery process. At the same time. these proposals have fundamental differences due to a range of acceptable degrees of packet reordering. [28]. This means that we cannot consider a single duplicate ACK (i. infer a congestion state. Moreover. ACK for an already ACKed data packet) as a loss detection mechanism with high reliability. a VegasA flow can compete with Reno flows and acquire its resource share. each event of packet reordering triggers at least one (and probably more) duplicate ACKs. This highlights the problem of potentially overpenalizing a TCP flow if its congestion control mechanism employs loss detection using duplicate ACKs. with peaks in some traces as high as 36%. there are two observations about outof-order packet delivery.1%– 2% on average). retransmit data or reduce transmission rate needlessly) if the network does in fact reorder packets.g. To some extent. resistance to random losses. However. the receiver will ACK the packet sequence 5. Of course in reality. For example. The development of these proposals is highlighted in Figure 29. According to Paxson’s work [33]. Measurements identified a low level of reorderings (0. A. which can be considered an indication of congested pathways and a guide for reducing the transmission rate. the advantage of this solution is.g. Evolutionary graph of TCP variants that solve the packet reordering problem Fig..6. VegasA do not address many other problem that are discussed in the next sections of this survey (such as the scalability issue in high-speed networks. such as bugs. [18].: HOST-TO-HOST CONGESTION CONTROL FOR TCP 15 Transmission rate RTTmin1 RTTmin2 Reactive (loss-based) TD-FR DOOR RR Eifel PR w Ol d Ratemax Ne wf low flo cwnd Congestion No congestion Congestion window Fig. these were only testing environments. If a receiver does not respond immediately to out-of-order data packets with duplicate ACKs. All of these solutions share the following ideas: (a) they allow nonzero probability of packet reordering.. PACKET R EORDERING All the congestion control algorithms discussed in the previous section share the same assumption that the network generally does not reorder packets.. Packet reordering can stem from various causes. In this section we present a number of proposed TCP modifications that try to eliminate or mitigate reordering effects on TCP flow performance (Table II). packets can be reordered if a router enforces diverse packets handling services (differentiated services [35].g. misconfigurations. and reduce the sending rate). Paxson [33] proposed a simple way to eliminate the penalties of reordering through TD-FR. To solve this problem of a false loss detection. etc. For example. at the same . [34]. by 8–20 msec.7. However. Finally. In the idealized case. the algorithm has not been evaluated in real networks. the sender will overreact (e. depending on the reordering pattern). Also if the network provides some level of delivery guarantees (e. Thus. but postpones the action (e.7. However.12. Paxson made the most interesting observation: the data transfers having the highest degrees of reordering also experienced almost no packet losses. a solution employed as a rule of thumb establishes a threshold value for the minimal number of duplicate ACKs required to trigger a packet loss detection (e. packets are reordered [33]. the underlying layer (physical or link layer) can retransmit some portion of the data without TCP’s prompting and cause a shuffling of the upper layer packets. there is a clear conflict with this approach. or malfunctions.10. time delayed Fast Recovery.g..7). even if received out of order [27].7. and different baseline congestion control approaches. [36]) and internally reschedules packets in its queue (active queue management [37].e. channel bundling and packet processing parallelism will likely contribute a good portion of the future Internet [39]–[41]. it can be sure that the network has failed to deliver some data and can act accordingly (e. wireless networks). Loss detection will be unnecessarily delayed if the network does not reorder packets.. it can be erroneous software or hardware behavior. TD-FR A number of measurements conducted in the mid 1990s [33] proved the presence of out-of-order packet delivery in the Internet. from moderate in TD-FR to extreme in TCP PR. The standard already requires receivers to report the sequence number of the last in-order delivered data packet each time a packet is received..11. 29. and (b) they can detect out-of-order events and respond with an increase in flow rate (optimistic reaction). TCP Vegas—estimation error if a new flow is observing a higher RT Tmin stabilizing throughput in a steady state and does not suffer significantly in the long term from changes in path RTT.AFANASYEV et al. this effect is not uniformly distributed across network sites. At the same time. That is. Moreover. For example. in response to a data packet sequence 5. Nonetheless.7. retransmit lost packet.).

This ability to protect TCP transfer from the packet reorderings in Internet paths. F for FreeBSD ∗ optional or available in patch form With Eifel (improved robustness) time. In other words. [48] III-A III-B III-C III-D III-E III-F 1 Base Added/Changed Modes or Features Time delayed fast recovery Mod1 R Status Experimental Implementation BSD2 Linux Sim ns2 ns2∗ ns2 1997 Reno 2000 NewReno 2002 NewReno 2003 NewReno 2000 SACK 2002 DSACK Differentiation between transmitted and S or Standard i3. To reach the right decision. 3) of duplicate ACKs. no S Experimental reaction to DUP ACKs Reporting duplicate segments R Standard >2. which feature route changes with high probability and thus are highly penalized by the conventional congestion control algorithms. Instead of the TD-FR approach of introducing additional delay to the loss detection process based on duplicate ACKs (Section III-A).10∗ retransmitted data packets (S+R+P) Out-of-order detection and feedback. when some packets are reordered.g. if the TCP sender receives a number (e.0 Duplicate ACK threshold adaptation S Experimental ns2 TCP specification modification: S = the sender reactions. It does not try to guess the event type upon reception of the first duplicate. The advantage of the Eifel algorithm is clearly visible in Figure 30. due to receiving several duplicate ACKs). If either ACKs carry additional information to indicate not only a sequence number. The latter case is the easiest and most “cost-effective” way. which unfortunately are not implemented in Paxson’s solution. B. Eifel tries to distinguish reordering and real loss events.4. During route changes many packets can be lost. If the delay grows too big. Comparison of congestion window dynamics between Reno (NewReno) and Eifel the ACK itself has been triggered by a retransmitted data packet. the original sending rate will be restored very quickly.16 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. it enters Fast Recovery. Although theoretically possible. causing choice of spelling. the sender maintains an additional state variable (a time of the first retransmission) for each retransmitted data packet. temporary S+R+P Experimental congestion control disabling and instant recovery Fine-grained retransmission timeouts. P = the protocol specification 2 i for BSDi. but rather postpones the decision until the first non-duplicate ACK is received. or if the ACKs can indicate that 2 Authors Loss detection Congestion window Reordering detection Time Original Reno w/o Eifel Fig. ACCEPTED FOR PUBLICATION TABLE II F EATURES OF TCP VARIANTS THAT SOLVE THE PACKET REORDERING PROBLEM TCP Variant Section Year TD-FR [33] Eifel [42]. achieved with relative simplicity. a disadvantage as well. Eifel checks its content and makes a decision whether to continue Fast Recovery or abort recovery and restore the original congestion window value. After receiving the first non-duplicate ACK.2. as in NewReno. 30. TCP DOOR Wang and Zhang [44] were concerned with TCP performance in mobile ad hoc networks (MANETs).. Eifel must resolve the ambiguity of a retransmission [26]. the sender does not know whether the retransmission helped resolve the problem or whether the problem resolved itself (as in case of a long burst of reordered packets). In this case. the nondeterministic nature of the reordering effect demands some path adaptation mechanisms.0. To clarify. On the other hand. each ACK packet will explicitly indicate what we need. If a received non-duplicate ACK has a timestamp less than a corresponding state variable. On the one hand. after a mountain range in western Germany . R = the receiver reactions. Clearly. We could instead use a standardized and highly deployed TCP Timestamp option [31]. The artificial delay. the defined actions of Eifel do not affect normal operations of the base congestion control algorithm when there is no packet reordering. aimed at preventing overreaction. where one bit is used to indicate retransmission of a data packet and the other one to echo this information back to the sender in an ACK. When a non-duplicate ACK is received. the “fast” lossdetection mechanism becomes slower than a conventional loss detection based on RTO. F∗ 2.. the ambiguity problem is easily resolved. Having the Timestamp option. [43] TCP DOOR [44] TCP PR [45] DSACK [46] RR-TCP [47]. we can assign two bits from the unused space in the TCP header. a change in the TCP protocol is highly undesirable. the sender can be sure that no actual losses have occurred on the path and transmission should be returned to the original state. let us consider a situation where a TCP sender decides to retransmit a data packet (e. Eifel Algorithm Ludwig and Katz [42] introduced the Eifel2 algorithm as an alternative method to alleviate the negative effects of packet reordering in TCP throughput. allowed Eifel to become an RFC standard in 2005 [43].g. but also some identification of the actual transmission. C. as it makes deployment practically impossible. adds to the time required to detect actual losses. For example.

During a route change event it is very probable that the order of IP packets will be changed. This idea underlies the proposed TCP DOOR (Detection of Out-ofOrder and Response). cwnd × α cwnd is α). For example. Though this is not entirely true in all network environments.g. To resolve this. if the network reorders or replicates data packets. Comparison of congestion window dynamics between Reno (or SACK) and DOOR they focus on making the retransmission timeout a robust and reliable loss and congestion indicator in a wide range of network environments. there are two possibilities of duplication. either in the TCP header or in a new TCP option field. a DSACKcompliant receiver should include a range of sequence numbers in the first block of the SACK option (Figure 32a). When a packet is lost. In that way. 31. This action alleviates previous penalties from the detected rerouting event. If we assume that an algorithm based on finegrained timeouts is as robust as one based on duplicate ACKs. the problem to identify a route change can be replaced by identifying an outof-order packet delivery. the receiver can easily detect reordering and report it to the sender using some bit. [45] noticed that since packet reordering is a common event in the network (e. TCP PR Bohacek et al. for example. This can happen. The DSACK (Duplicate Selective ACKnowledgements) specification [46] complements the standard and provides a backward-compatible way to report such duplicates. the duplicated data can be some part of the acknowledged continuous data stream. DSACK The specification of the selective ACK extension for TCP [20] does not define particular actions to take if a receiver encounters a data packet which has already been delivered. the maximum estimate M is readjusted on each ACK arrival according to the formula: M = β · max α cwnd · M. TCP PR can greatly help improve TCP efficacy in those cases. Wang and Zhang conducted a number of simulations where they varied underlying routing protocols. DSACK requires the receiver to report each receipt of a duplicate packet to the sender. this information can be included in a new TCP option in the form of a special counter. the fairness and effectiveness characteristics will be exactly the same as presented in Section II-C. E. Instead of the RTO recalculation once per RTT. we cannot directly compare TCP PR with the previous congestion control algorithms. β > 1). similar to Reno. Another variant considered in paper [44] is utilizing a well-known Timestamp option [31] in a manner similar to the standard Eifel algorithm. Although some of the results show more than a 50% throughput improvement compared to TCP with the SACK option.g. then we can eliminate the penalty in TCP throughput by temporarily disabling the congestion control actions during this interval. and network conditions.e. Thus. The interval for the temporary congestion control disabling and the preceding time period for the instant recovery are not known a priori and depend on the actual network. 1 congestion control to make the wrong decision for reducing the rate of flow. RT T where α and β are constants (0 < α < 1. there are networks (e. In this case. the receiver should attach the isolated block at the second position in the SACK option (Figure 32b). where the network normally reorders packets. the original state (the congestion window and retransmission timeout values) should revert (so called instant recovery). As long as timeouts are treated optimistically (i. Because of different loss detection mechanisms.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 17 Loss detection Congestion window Reordering detection and instant recovery With DOOR (improved robustness) Original Reno w/o DOOR T2 T1 Time Fig. Similar to Eifel (Section III-B).. Taking into account that a recalculation is made with each ACK. Thus.AFANASYEV et al. First.e. the authors no longer assume the validity of inferring something from duplicate ACKs. D. The concept of RTT maximum is similar to the RTO. In contrast to previously developed congestion controls.. in mobile ad hoc networks). TCP PR maintains a timestamp for each transmitted data packet. α represents a maximal decrease rate of the M in RTT timescale 1 (i. . after a loss detection. or if the sender wrongly estimates the retransmission timeout. it can be a part of some isolated block. A loss is detected whenever the timestamp of a data packet becomes older than the estimated RTT maximum (M ). First. duplicate ACKs cannot be considered reliable indications of either loss in the path or of congestion. where duplicate ACKs are highly unreliable feedback. MANETs). Instead. However.. if congestion control has recently reduced the sending rate due to loss detection (during time interval T2 in Figure 31)..g. In the former case. in order to detect packet reordering reliably. Second. each data and ACK packet should carry some additional information. In TCP PR (Persistent Reordering). timing values. the congestion window is reduced by no more than half of the stored value for the lost packet. which should be treated in slightly different ways. In the latter case. Second. If we can identify a time interval during which the network route has changed. e. but differs in implementation.. each transmitted packet is also tagged with the current value of the congestion window. besides including a duplicate range in the first block. A reaction to detecting packet reordering (which can be considered an equivalent to route change in MANETs) entails two components (Figure 31). congestion control should be temporarily disabled to mitigate transition effects (time period T1 in Figure 31). flow is allowed to transmit at a multiplicatively reduced rate). which is increased every time a new packet is sent. TCP PR faces the problem of overreaction to multiple losses from the same congestion event. there are cases with minimal to zero improvement.

we can adjust a threshold of duplicate ACKs (dupthresh. not lost. dup_start dup_end . which share the idea of providing a one-level. WEB browsing. However. An approach presented in RR TCP (Reordering Robust) [47]. the overall user-perceived quality of service in the network (QoS) can be increased [49]. this would work only if no packets were retransmitted.. However. how long a packet was delayed. Otherwise.. reordering lengths). will proactively protect the sender from overreacting if packets have been reordered. without violating the SACK standard [20].. have strict requirements for request-response delay and throughput (e..e. automatic updates). it is unknown which event (original or repeated transmission) might have helped to recover a packet loss previously reported by the SACK. Some applications. a succession of an ACK (or a SACK) and a DSACK. and (b) different mechanisms to detect the presence of a highpriority data transfer.g. RR TCP includes a concept of a controlling loop for finding the optimal dupthresh value for a given path using a combined cost function. Unfortunately.. DSACK reporting Reactive (loss-based) LP Proactive (delay-based) Nice ACK . 32. Instead. in a wide range of network environments (i. (b) DSACK+SACK report Fig.. To overcome the deployment problem and yet provide some level of QoS. if we restrict the TCP-based QoS scope only to a low-priority service (i.e. the DSACK specification does not specify any particular actions for the sender. One such issue is the detection of packet reordering events. 33. Unfortunately. etc). to a problem of finding congestion control policies that would guarantee the network resource release if there are high-priority—standard TCP— flows present).. ACCEPTED FOR PUBLICATION iso_start <= dup_start dup_end <= iso_end dup_end<ACK ACK . this feature is not yet globally available [52]. IV. [51]. The central idea among the proposed solutions is to enforce and guarantee. as the receiver can send faulty information. indicates that both the original transmission and the retransmission were actually successful. which usually is 3). which triggers the Fast Recovery phase. This. The key differences between proposals are: (a) different baseline congestion control algorithms (Vegas for Nice and Reno for LP. [48] uses a DSACK to resolve the retransmission ambiguity. low-priority data transfer service.. Other applications do not have any particular requirements and are highly tolerant of the network conditions (e. In general. dup_start dup_end iso_start iso_end . varying delays. and resolving the issue of the RTO underestimation. and SACK). it can undo some previous congestion control actions upon receipt of a DSACK packet (similar to Eifel and DOOR). the occurrence of a reordering event can be detected if the sender receives a selective ACK packet followed by a cumulative ACK. in contrast to Eifel (Section III-B) or DOOR (Section III-C). two host-to-host TCP-based prioritization techniques have been proposed (Table III).g. an “unfair” network share distribution between high.. reordering length can be easily calculated. i. these improvements are effective only in long-lived TCP connections. . F.. its authors merely discuss several issues for future research. composing one group. However. RR TCP The SACK option by itself can provide a lot of information about patterns of packet delivery. if traffic of the first application group can be prioritized. Similar to SACK. Other discussed issues include a differential treatment of the normal SACK and DSACK packets. This idea may seem to contradict the basic fairness requirement for TCP congestion control: a new congestion control should not be more aggressive than the standard TCP congestion control algorithms (Reno. through special congestion control policies. then we definitely will comply with the fairness requirement. either intentionally or unintentionally.e.. in terms of packets. RR TCP defines a way to use the calculated reordering length. after a sender retransmits a data packet that has been detected as lost. Because the sender knows the exact transmission sequence (the order packets were transmitted and retransmitted). Experimental evaluation shows consistent improvements with RR TCP. For example. D IFFERENTIAL S ERVICES Different application types have different data transfer requirements. see Figure 33). compared to TCP with the SACK option. provides a way to report packet duplication. even though there have been a number of attempts to provide a QoS functionality on the network (IP) level [50]. NewReno. in this case we can also calculate a reordering length.. Evolutionary graph of TCP variants that implement a low-priority data transfer service DSACK.. If we know how long packets are usually delayed. an implementation of some form of an ACK congestion control. DSACK-based solutions should not blindly trust the DSACK information. Moreover. all advantages of the robust loss detection will be eliminated. if dupthresh is set too high. both covering the retransmitted packet. In short. In the remaining part of this section we will provide an overview of the two existing TCP-based QoS proposals (Table III). However. due to a high level of Internet heterogeneity.and low-priority flows.18 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. loss ration. FTP transfer. If a sender can assume that duplication is caused primarily by packet reordering. The downside of this approach is that we cannot infer anything if either of the first or second ACK is lost. which integrates several costs including false timeouts and fast retransmits. (a) DSACK report Fig.

. Kuzmanovic and Knightly [49]. a current RTT sample is used in queuing delay calculations. The design of Nice is based on the Vegas algorithm (see Section II-G). TCP LP Almost concurrently with the Nice algorithm proposal (see Section IV-A). Instead. etc. LP reduces the congestion window to half the current . B. Finally. meaning that only one packet is allowed to be sent in several RTT periods (48 RTTs in the worst case). For this purpose LP makes use of the Timestamp option [31] and applies heuristics to estimate the one-way propagation delay (e. there are several major differences. P = the protocol specification ∗ optional or available in patch form High priority flow is present ( . a TCP flow running the Vegas congestion control algorithm has problems capturing its network resource share while competing with a reactive Reno-like TCP flow (i. To provide a guarantee of the transmission rate reduction in the presence of standard TCP flows.. and (2) once every RTT.3. Nice allows the congestion window size to be a fraction (the minimum is 1/48). i. the congestion window is halved Congestion window A. As a solution. similar to Choi and Yoo’s proposal [55]). There are two main reasons for this choice: (1) Vegas incorporates a proactive congestion detection mechanism which allows redistributing network resources between competing TCP flows without inducing any packet losses (i. the resulting values are much more resistant to congestion in the reverse channel. R = the receiver reactions. [54] IV-A IV-B 1 Base Added/Changed Modes or Features Delay threshold as a secondary congestion indicator Early congestion detection Mod1 S S Status Experimental Experimental 2002 Vegas 2002 NewReno Implementation Linux Sim 2.6. the queuing delay is compared to the threshold upon the arrival of each non-duplicate ACK packet.. In TCP LP the DUAL’s calculation of a queuing delay (see Section II-B) is refined progressively by using more accurate delay estimates. Nice defines a concept similar to the queueing delay threshold defined in the DUAL algorithm (Section II-B). the occurrence of a current queuing delay estimate exceeding the threshold does not automatically trigger changes in the congestion windows. etc. However.interference triggering) Venkataramani et al. Nice’s congestion control policies are adjusted to react highly conservatively to all detected network state changes. that enables a simple distributed host-to-host mechanism to minimize the interference between high-priority (foreground) and low-priority (background) flows. Other differences are in the way the presence of cross traffic is detected and what preventive measures are applied to minimize interference. interference between Vegas flows is lower than that for standard TCP flows). then the ratio between X and the congestion window (measured in packets) would estimate a percent of enqueued (delayed) packets during the latest RTT.e. thus the level of false congestion detections is substantially decreased. In Nice. It aimed to provide a low-priority data transfer service for background applications (e. The unique feature of the TCP LP algorithm is its reaction to early congestion detection. Upon detection of a first such event.e. for the baseline congestion control algorithm.g. if this estimate exceeds a predefined threshold.. [53] identified the need to optimize the network resources in the presence of a large number of background transfers—automatic updates. TCP LP compares the current one-way delay estimate with a predefined threshold (a fraction of queuing delay plus a minimum of one-way delay). TCP Nice.g. standard TCP flow). they proposed a new congestion control algorithm. Nice considers all standard TCP flows as carrying high-priority data and tries to consume the network resources only if nobody else uses them.PRIORITY DATA TRANSFER SERVICE TCP Variant Section Year Nice [53] LP [49]. Nice counts the number of times (X) that the queuing delay exceeds the threshold during each RTT period: ∀t ∈ (t0 .)...18 ns2 TCP specification modification: S = the sender reactions. (2) due to its proactive nature. t0 + RT T ) QACK (t) > Qthresh ⇒ X = X + 1 The counted value X estimates the number of ACK packets which have been delayed due to interference with cross traffic (e. The right choice of threshold value can make Nice much more sensible than both the original DUAL and Vegas algorithms. The actual process of congestion detection (in terms of LP it is early congestion detection) with minor modifications repeats the one defined in DUAL: (1) LP maintains minimum and maximum one-way delays during the connection lifetime. 34. However.g. its authors have chosen NewReno instead of Vegas. software updates. peer-to-peer file sharing. In one sense. Although this can complicate queueing delay calculations.15∗ >2. data backups.AFANASYEV et al. high-priority flows). [54] presented a similar algorithm. TCP Nice Time Fig. In addition. More particularly. Second. Congestion window dynamics of TCP Nice (Figure 34).e. If we assume the idealized case when no ACKs are lost or delayed by the receiver. This makes Nice even more conservative in network resource utilization in the presence of cross traffic. Vegas itself provides some level of a low-priority data transfer service. data backup. TCP LP (Low Priority). First.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 19 TABLE III F EATURES OF TCP VARIANTS THAT IMPLEMENT A LOW. instead of the averaged RTT.

. Originally designed for wired networks where congestion is the primary cause of packet losses. ACCEPTED FOR PUBLICATION Congestion window High priority flow is present Reactive (loss-based) Westwood ABSE BR CRB Westwood+ BBE early congestion detection inference timeout Time Fig. Moreover. it should just recover from the loss and continue the transmission as if nothing had happened. If the sender continues transmission at a rate equal to that observed by the receiver. [61]. if a packet is lost due to congestion in the network. As an optimum.. [62] keeps the distributed network-independent ideology of TCP and is a modification of the NewReno TCP congestion control algorithm. Several solutions have been proposed to resolve this problem. as a part of TCP Westwood [62] laid the foundation for the sender-side distinguishing between a congestion-related and an unrelated (random) loss without any support from the network. then there are no router buffer overflows and TCP’s decision to reduce the congestion window is wrong. LP infers presence of the high-priority flow and the congestion window is reduced to the minimal value. [2]. 35. Instead. In other cases. V. etc.g. successfully utilizing the network bandwidth if no such flows are present. the heuristic considers a value which corresponds to a data transfer rate observed in the recent past (w ≈ rate × RT T ). the optimum would reflect the best choice for the sender: transmission without any rate reduction. W IRELESS N ETWORKS The growing spread of wireless networks has highlighted the need for TCP protocol modification. TCP Westwood/Westwood+ TCP Westwood proposed by Mascolo et al. A. both of them are extremely sensitive to activity in the network.20 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. Indeed. halving if three duplicate ACKs are received) with a heuristic-based procedure of setting the congestion window w to an optimal value (Faster Recovery). On one hand. LP is able to fairly distribute the network resources among low-priority flows (inter-fairness).g. The bandwidth estimation technique proposed by Mascolo et al. 36.. at the same time. widespread use of wireless and high-speed networks limit the applicability of either Nice or LP.g. and additional congestion will be prevented.. These approaches are beyond the scope of this survey and have been thoroughly discussed by Lochert et al. additional research is required to investigate a real-world applicability of the designed solution. bandwidth overestimation. Congestion window dynamics of TCP LP Reactive (loss-based with bandwidth estimation) value and starts the inference timer. Indeed if there is a random error due to wireless interference. this has not been proved. if a data packet is lost due to short-term radio frequency interference. insufficient robustness in networks with extreme levels of transmission errors. At the same time. it can significantly improve the data transfer efficiency in error-prone networks (e. Table IV shows characteristic features of refinements in Westwood that try to mitigate discovered problems. Although Kuzmanovic et al. To do so Westwood replaces the blind Reno’s congestion control actions that are triggered by loss detection (i. But on the other hand. In another case. Although Nice and LP should have the same characteristics as the baseline algorithms (Vegas and NewReno respectively). Evolutionary graph of TCP variants that enable resistance to random losses or TCP packet snooping and loss recovery by intermediate routers [59]). there is a big question as to how well both algorithms are able to utilize the network capacity if only low-priority flows are present. Research that follows (Figure 36) identified several of Westwood’s weaknesses. by (b) relying on network channels to recover from the noncongestion-related losses (e. HSTCPLP. using explicit congestion notification [57]). and thus fulfill a necessary condition for the low-priority service implementation. the data reception rate recently observed by the receiver is exactly the rate at which the network is capable of delivering data from the sender (“achieved data rate”). LP resumes the normal (Reno-like) congestion avoidance actions (Figure 35). for example. In this section we focus on solutions that keep the hostto-host idea and at the same time provide some level of resistance to non-congestion related packet losses. If the sender triggers another early congestion detection event before the timer elapses. [56] made an attempt to create a high-speed modification of LP. wireless). Moreover. NS2 simulations and real-world experiments using a Linux implementation of the LP algorithm have shown that it indeed has the desired property of yielding network resources to the standard TCP (high-priority) flows and. There is no definitive answer to whether the TCP LP algorithm or Nice algorithm is better. TCP is unable to react adequately to packet losses not related to congestion.. the number of newly transmitted packets will be equal to the number of delivered packets (router queues would not be growing).e. link-layer retransmission [58] Fig. due to ineffectiveness of the baseline algorithms in those environments. or by (c) isolating the wireless error-prone and wired error-safe transmission paths using an intermediate host [60]. One group gives up the idea of a pure host-to-host data transfer either by (a) requiring routers to disclose the network state (e.

the ACK rate observed by the sender will be equal to the data delivery rate observed by the receiver.. For that reason. share of network resources) evenly.. the instantaneous estimate is calculated upon reception of an ACK packet (b = d/Δ. Faster Recovery V-A 2004 Westwood Estimate of available bandwidth (RTT S granularity) V-B 2002 Westwood Available bandwidth estimate (combination of S ACK and long-term granularity).12 ns2∗ ns2∗ Experimental the protocol specification Forward path Data Data Data BW=Data/ values are averaged with a special discrete time filter [62]: B = α(Δ) · B −1 + [1 − α(Δ)] · b + b−1 2 BW ACKed data/ Return path Fig. Rationale for the available bandwidth estimation technique Having this win-win situation for all packet loss cases. varied exponential smoothing coefficient V-D 2003 Westwood Loss type estimation technique (queuing delay S estimation threshold. Westwood has a two-level bandwidth estimate processing capability. 37. b and b−1 are current and previous samples of the bandwidth estimate. in the revised Westwood+ algorithm [64] the estimate has been changed so that it is calculated with RTT granularity. rate gap threshold). from the deployment point of view. To illustrate the rationale behind this estimate. On the second level. this is extremely hard. i.e. with a smoothing factor α = 0. For example. R = the receiver reactions. ACK ACK ACK . practice has discovered that the calculation may be substantially wrong in certain network conditions [64]. fairness between TCP flows running the Westwood algorithm) can be shown using the diagram in Figure 38. a decrease in the ACK rate will be compensated by an increase in the acknowledged data amount.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 21 TABLE IV F EATURES OF TCP VARIANTS THAT ENABLE RESISTANCE TO RANDOM LOSSES TCP Variant TCP Westwood [62]. as a function of Δ. let us consider the following example (Figure 37). limiting retransmission timer backoff V-E 2003 Westwood Effective bottleneck buffer capacity estimation.9: B = α · B −1 + (1 − α) · b Although it has been asserted the Westwood algorithm shows good fairness properties. d is now the amount of acknowledged data during the last RTT and Δ is the RTT itself. If we assume that an ACK packet is generated right after a data packet is received and that ACKs are evenly delayed in the return path.. they increase their congestion window (i. until a network limit is reached. [63] TCP Westwood+ [64] TCPW CRB [65] TCPW ABSE [66] TCPW BR [67] TCPW BBE [68] 1 Section Year V-A Base Added/Changed Modes or Features Mod1 Status Experimental Experimental Experimental Experimental Experimental Estimate of available bandwidth (ACK S granularity). when ACKs are differently delayed and grouped due to congestion over the reverse path. To calculate the forward-path bandwidth actually utilized. After two flows start competing from any state x0 . discrete averaging of instantaneous bandwidth estimation samples leads to substantial overestimation. the only question is how the sender can discover the rate observed by the receiver. P = ∗ optional or available in patch form 2001 NewReno Implementation Linux Sim ns2∗ >2. retransmission of all outstanding data packets. [66]. we just need to multiply the ACK rate by the amount of acknowledged data. this is not entirely straightforward from a theoretical point of view. and B −1 is the previously calculated average value of the estimate. the ratio between flows’ congestion windows would remain intact. S reduction coefficient adaptation. As a direct solution we can ask the receiver to send special rate notifications. in the presence of the ACK compression effect [69].e.AFANASYEV et al.e. The bandwidth calculation holds in the long term even if some ACKs are lost or delayed by the receiver.6. It can be shown that if two Westwood flows simultaneously detect a congestion event and reduce their congestion windows w based on the achieved rate estimate (w = B × RT T ). i. in the formula b = d/Δ. where d is the amount of acknowledged data by the ACK and Δ is the time elapsed since the last ACK received). The proposed [62] and later patented [63] solution is to perform a senderside estimation of the actual delivery rate based on an existing notification mechanism (i. the calculated instantaneous where α(Δ) is the averaging coefficient. To mitigate fluctuations..e. Presence of intrafairness property (i. identifying predominant cause of packet loss V-C 2002 CRB Available bandwidth estimate (continuously S varied sampling interval). This estimate of average bandwidth during the last RTT is defined to be further averaged in long-term using the wellknown exponential smoothing technique. During the consecutive Congestion Avoidance phase. Although set-up experiments have shown a good level of precision for Westwood’s estimate.e. using ACK packets).. On the first level. However. congestion window boosting TCP specification modification: S = the sender reactions.

[66] proposed TCPW ABSE (Westwood with Adaptive Bandwidth Share Estimation). Wang et al. . Faster Recovery). If this ratio is lower than the threshold θ (e. the congestion window is set to RE × RT Tmin ). It is claimed that smooth adaptation of the sampling interval improves estimation precision in transition periods. The basic idea is to make the averaging sharper if availability of network resources is changing very dynamically (the new sample should have a bigger impact on the averaged value). RT T · (V E − RE) VE Network Share the ratio would slowly increase (e. instead of a measured RTT. these observations of the border cases comply with the definition of the CRB heuristic.. ABSE continuously changes the interval depending on an estimated network state. new value of the congestion window is calculated as BE × RT Tmin ) and the new one for congestion loss (i.g. The level of dynamics is calculated through a bandwidth estimate jitter. As a solution to this problem they proposed TCPW CRB (Westwood with Combined Rate and Bandwidth estimation). This will happen. [65] acknowledged the critical vulnerability of Westwood: under certain network conditions the bandwidth estimation (BE) technique gives highly inaccurate results. However. C.. ACCEPTED FOR PUBLICATION Network Limit ar e xn+1 xn Packet losses x1 x0 Zero Buffering Network Share Fig. TCPW CRB Wang et al. which refines the estimation algorithm by complementing it with a conservative long-term bandwidth calculation (“rate estimation” RE) technique. If the current value of RE is significantly smaller than predicted by the Vegas-like estimation V E. when a Westwood flow knows the exact amount of utilized network resources. The network state estimation heuristic is adapted to directly control the length of a sampling interval Δ in slightly changed form compared to CRB: Δ = max Δmin . a congestion event is assumed. Clearly. otherwise. due to various random processes in the network and an imprecise bandwidth estimation technique (ACKs may be delayed or lost).. In addition to the adaptive calculation of a sampling interval. in practice. Convergence diagram when two Westwood flows are competing with each other level of imprecision. it is likely to underestimate bandwidth in the presence of random errors. this has been confirmed only through a number of NS2 simulations and the authors agree that future investigation is required to evaluate CRB in wide-range network scenarios that include real Internet experiments.e. TCPW ABSE As an extension of CRB (Section V-B). ABSE’s authors have confirmed that the varied smoothing coefficient is able to help achieve a fast response to changes.e. when a number of lost packets are close to or equal zero and when RTT is close to the minimal value). and RE is an exponentially averaged bandwidth estimate with a sampling interval equal to the RTT. a long sampling interval will be calculated (i. when V E and RE are close (i. At the same time.. 38.. θ = 1. Unfortunately. ABSE leverages the idea of dual bandwidth estimation by introducing a bandwidth sampling interval adaptation mechanism.22 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. Upon detecting a packet loss. similar to CRB. which is determined by comparing the ratio between the current congestion window size and relation RE × RT Tmin to a predefined threshold θ. Thus. and at the same time provide resistance to the noise. A primary cause for loss is assumed to be congestion when the long-term bandwidth estimate RE shows a high Eq ua l( fa ir) sh where Δmin is a predefined minimal sampling interval. In an idealized case. Δ is close to RT T when RE → 0). As long as CRB does not conceptually change Westwood policies upon detecting a loss (i. It is similar to one from the Westwood+ proposal.. in a finite number of steps. instead of two predetermined sample intervals for CRB (ACK inter-arrival and a long predefined constant period). see Section II-G). in theory. Clearly. In other words. CRB authors claim that the dual bandwidth estimate (BE and RE) improves Westwood fairness to legacy Reno/NewReno flows. but the sampling period is some predefined constant T . the ratio between congestion window sizes will become very close to one.e. intrafairness characteristics remain unchanged. Through NS2 simulations. ABSE also defines a varied exponential smoothing coefficient for averaging bandwidth estimation samples. To tackle both underestimation and overestimation problems simultaneously. the network is likely to be in severe congestion.e. V E is a Vegas-type estimation of expected rate (V E = cwnd/RT Tmin . an old and a new.. CRB chooses one of the estimates depending on the assumed predominant loss type: the old estimate for random loss (i.4). Experimental results show that the long-term estimate prevents overestimation if a network is experiencing congestion. CRB thinks that loss is not related to congestion.g. then after ten steps of linear increase—in ten RTTs—a new ratio would be 11:20). because Reno always halves its network share while Westwood sets it depending on the estimate value. inter-fairness or fairness between Westwood and legacy Reno-type flows is not quite definite. a Reno flow will be suppressed. if upon loss detection the congestion window sizes of the two flows have the ratio 1:10. CRB maintains two estimates. similar to Westwood+ (Section V-A). In the opposite case.e. Westwood/Westwood+ flows can compete successfully and relatively fairly with Reno-type flows. B. However. the minimal sampling interval will be used. and smoother otherwise.

and μ can be 1. under a wide variety of network conditions... and thus has similar inter-fairness properties. VI. without further theoretical and practical investigations it cannot be claimed that BBE provides a universal solution for congestion control in wired-wireless networks. BBE complements the congestion window reduction policy with an additional variable coefficient μ ≤ 1: w = RT Tmin × B × μ. Tend is a condition for returning to the default state when non-congestion loss type is assumed (Tstart = β · Qmax ). or ABSE (Section V-C) congestion control algorithms can be highly unfair to standard TCP flows if the network has limited buffering capabilities. This coefficient borrows DUAL’s concept (Section II-B) of sensing the current network state: when the current queuing delay Q is close to a maximum Qmax . In some environments these policies can be extremely helpful in the case of real non-congestion losses. otherwise. This is especially crucial because of BR’s policies when detecting a noncongestion loss. and the variable rate is more precise when the network is congestion-free (i. More specifically. same as the standard TCP during congestion).e.e. Westwood+ (Section V-A).e. BBE calculates the achievable rate as a weighted sum of two estimates: the variable sampling rate Bv (e. E. [68] showed that flows running the Westwood.g. This technique allows BBE to adapt easily to network changes. the network is congestionfree. along with a good data transfer performance comparable to the original Westwood algorithm. ABSE does not change Westwood’s concept of Faster Recovery... The rationale behind this comparison is that when there is no congestion. 0. NewReno. The weighting is as follows: B = γ · Bv + (1 − γ) · Bc . 1/RTT as in Westwood+). RT T is close to RT Tmin and the amount of delivered data packets during last the RTT is close to cwnd). [67] claimed that utilizing two independent losstype estimation mechanisms increases estimation precision and reduces the number of false positives.g. NS2 simulations showed very good characteristics of ABSE fairness to legacy NewReno flows. TCPW BR Though the Westwood approach (Sections V-A through V-C) can significantly improve the effective TCP throughput in the presence of non-congestion related packet losses. However. A newly proposed TCPW BR (Westwood with Bulk Repeat) algorithm is also based on Westwood.AFANASYEV et al. the network is considered to be experiencing congestion and μ should be 1/2 (i. as in original Westwood) and a constant sampling rate Bc (e.05 respectively. Moreover. if it is estimated to be non-congestion related. Additionally. In particular.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 23 Similar to CRB. II-F) policies due to their internal limitations (i. Upon each loss detection. they introduced BBE. real-world experiments are required to confirm simulation results. The coefficient γ. The threshold coefficients α and β can be. [67] discovered that it cannot effectively handle volumes of random errors (> 2%). Bv has more weight when RTT is far from RT Tmax). for example. ACK rate. However. one SACK can indicate no more than four blocks of lost packets).e.. Bc has more weight when RT T is close to RT Tmax). similar to the above-mentioned additional reduction coefficient μ. The rate gap threshold algorithm is based on comparing a Westwood’s bandwidth estimate (BE) to a fraction α of the expected throughput (V E). BR changes the retransmission timer backoff algorithm (see Section II-A) by limiting a maximum timer value during non-congestion packet losses with a predefined constant. BBE recognizes the overestimation problem in the original Westwood algorithm and proposes a hybrid estimation technique. As we can see. The actual proposed calculation of the coefficient μ is defined as μ = Qmax /(Q + Qmax ) where Qmax is not just the maximum queuing delay observed during the connection lifetime. In addition..4 and 0. The main difference is that QDET maintains two thresholds Tstart and Tend . . D. BR applies very aggressive (highly optimistic) recovery policies instead of the original ones. H IGH -S PEED /L ONG -D ELAY N ETWORKS The emergence of high-speed networks uncovered the inability of deployed TCP variants (Reno. If Westwood’s estimate is less than the predefined fraction of expected throughput. relies on the queuing delay concept: γ = 1/eα×RT T /RT Tmax where α is some large positive constant. The queuing delay estimation threshold (QDET) algorithm is similar to Spike [71] and is based on the DUAL concept of measuring the queuing delay (Section II-B). the estimate calculation follows a general observation: the constant rate sampling is more precise when the network is experiencing congestion (i. but the exponentially smoothed queuing delay samples obtained just before each loss detection event. Tstart represents a condition for entering the state when all losses are assumed to be caused by congestion (Tstart = α · Qmax ). a Bottleneck and Buffer Estimation algorithm that refines the Westwood policy of reducing the congestion window size w upon detecting a loss. where B is the Westwood estimate of a recent achievable rate. If this is the case. The proposed loss type estimation mechanism in BR is a compound of two loss-detection algorithms [70]: the queuing delay estimation threshold and rate gap threshold algorithms. This decision improves the recovery time in environments where the probability of loss is extremely high. loss is assumed to be due to a congestion event. the data throughput is still relatively close to the expected (i. In addition. The latter is calculated in a manner similar to Vegas (Section II-G): V E = cwnd/RT Tmin . The simulation results provided by BBE’s authors have shown quite a good fairness to the legacy NewReno flows. but additionally integrates a special loss-type detection mechanism. appearance of the highspeed networks (both wired and wireless) has opened a number of other problems (see Section VI) which outweigh the issue of fairness to NewReno. otherwise. Yand et al. and does not modify the congestion window size. non-congestion related loss type is assumed. even in the face of a substantial number of packet losses.e. and much more effective than TCP SACK/FACK (Section II-E. TCPW BBE Shimonishi et al.. To resolve this problem. SACK. Yang et al. BR immediately retransmits all data packets that have been transmitted and have not yet been acknowledged (outstanding data packets).

On the other hand. this principle was generally realized with a congestion window (cwnd) increase by one packet for each RTT if no errors were detected.. and that is equivalent to an unrealistic packet loss probability. This is an experimental congestion control method that has several objectives. Reno/NewReno flow needs about D × cwnd RTTs. it would take almost two hours [72]. Moreover. all the packets must be delivered during these two hours. when the congestion window is less than or equal to 38 packets (i. because cwnd increases by one every RTT. This behavior considerably increases the efficiency of high-speed/long-delay networks. HS TCP After recognizing TCP’s efficiency problem in high-speed networks. at high congestion window sizes (or low loss rates) HS-TCP probes the network resources more aggressively than Reno and. and (b) fairness to standard TCP in high loss rate environments. and (c) be fair to other flows present in the network. The latter is divided generally into the three categories: (1) intra-fairness— . (b) respond quickly to network changes. satellite) links. Among them are (a) efficiency in high bandwidth-delay product (BDP) networks.24 IEEE COMMUNICATIONS SURVEYS & TUTORIALS.1 when the congestion window is more than 84k packets. A. 39. 40. [74] proposed the HS-TCP (HighSpeed TCP) algorithm. To illustrate the problem—sometimes referred to as the bandwidth-delay product problem (BDP problem)—let us consider a TCP flow trying to discover all the resources of some network channel. reacts more conservatively to loss detection events. That is.g. assuming there are no packet losses. Evolutionary graph of TCP variants aimed at improving efficiency in high-speed or long-delay networks characteristic of resource distribution between flows running the same congestion control algorithm in the same network environment. In TCP implementations. it should act like a standard NewReno [19] in environments with loss probability higher than 10−3 . they have the same objective: to create an ideal algorithm for high-speed (e. (2) inter-fairness—characteristic of distribution between flows running different algorithms in the same environments.5. it has same behavior as NewReno when the congestion window is small) to (and beyond) 70 and 0. Objective of HS-TCP CUBIC Illinois Libra Hybla TCPW-A LogWestwood+ Fig. This works quite well if network capacity or round-trip delays are relatively small. The resulting functions α(w) and β(w) vary from 1 and 0. the tradeoff is an increased level of packet losses: more packets are lost during congestion events. etc. However.e. is on the order of the channel bandwidth delay product (BDP). which occur more frequently. 100 ms round-trip delay.g.. on the one hand. Floyd [72]. at the same time. Although these solutions rest on different assumptions and approaches (see Figure 39).) to use the resources of these networks effectively. All of the congestion control algorithms discussed in Sections II through V improve different aspects of efficiency for data transfer. More precisely. pkts Reactive (loss-based) Proactive (delay-based) Reactive (loss-based with bandwidth estimation) ≈83k (10Gbps if RTT=100ms) 104 103 102 10 10-10 10-7 10-3 Reno HS-TCP STCP YeAH HS-TCP Africa Compound BIC ARENO H-TCP Fusion FAST NewVegas Fig. As we can see. The algorithm should simultaneously (a) provide for efficient use of network resources. In a network having 10 GiBit/s capacity. optical) or large delay (e. Figure 41 shows the schematic comparison between HS-TCP and NewReno behavior during Congestion Avoidance/Fast Recovery phases. respectively). ACCEPTED FOR PUBLICATION Congestion window. to get to a theoretical upper bound of a TCP data transfer rate (D × cwnd/RT T .. and (3) RTT-fairness—characteristic of resource distribution between flows sharing the same bottleneck link but having different RTTs. HS-TCP should be able to utilize the 10 Gbps link for the network with a loss rate not exceeding 10−7 (NewReno is unable to utilize this link if the loss rate exceeds 10−10 ). without relying on unrealistically low loss rates. In the remaining part of this section we discuss various solutions (Table V) that address several congestion control problems. and a maximum data packet size of 1500 bytes. without questioning the basic principle on which it rests. respectively. These functions α(w) and β(w) are obtained based on the above-mentioned objectives defined in terms of the achievable congestion window size and the required loss rate (bold curve in Figure 40). which was defined as far back as 1988 as part of Tahoe (Section II-A): network resource discovery during the congestion avoidance phase should be highly conservative. but does not work well otherwise. For this purpose HS-TCP replaces the standard NewReno increase coefficient α in Congestion Avoidance and decrease factor β after a minor loss detection (during the Fast Recovery phase) by functions of the congestion window size (α(w) and β(w). The minimum time required for this. where D is a maximum data packet). [73].

16 ns2. R = the receiver reactions. In high-speed/long-delay networks. scaling increase step to a reference RTT.6∗ ns2∗ ns2∗ >2.6. precautionary decongestion 1 TCP specification modification: S = the sender reactions. including standard TCP flows.22 [91] gas (NewReno) mode depending on a combined Vegas-type and DUAL-type estimate. For example. By definition. There are two reasons why this is so. It is expected that this limitation will not have a significant impact on performance.25∗ XP∗ .6. persistent non-congestion detection Westwood+ Logarithmic congestion window increase S Experimental >2. First.Switching between fast (STCP) and slow S Experimental >2. during severe congestion situations (when the loss probability is high) HSTCP is equivalent to standard Reno and thus inherits all its characteristics. Limited Slow-Start Westwood Agile probing. [74] STCP [73] H-TCP [75]. Congestion window increase steps as a Vegas function of the achievable rate and queuing delay estimates 2005 HS-TCP. Floyd [92] proposed a complementary algorithm that bounds the maximum increase step during Slow Start to 100 packets (Limited Slow Start).13 >2.6.DELAY NETWORKS TCP Variant HS-TCP [72].13 ns2 >2. Slow Start (in the worst case) can cause a loss of about 83. Additive increase steps and multiplicative S Experimental DUAL decrease factors as functions of the queuing delay YeAH TCP VI-Q 2007 STCP.22 ns2 2006 NewReno. [76] Section Year VI-A VI-B VI-C Base Added/Changed Modes or Features Mod1 S S S Status Experimental Experimental Experimental Implementation Win2 Linux Sim3 >2.6. or re-initialization after a timeout. Switching between fast (HS-TCP) and slow Vegas (NewReno) mode depending on the Vegas-type network state estimation 2005 HS-TCP. S’03∗ >2. scale the congestion window increase step by the bottleneck link capacity and queuing delay 2005 Vegas Rapid window convergence. Slow Start operates only during initialization.14– S’08. initial slow-start threshold estimation HS-TCP Binary congestion window search.13 S S S S S S Experimental Experimental Experimental Experimental Experimental Experimental >2.6.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 25 TABLE V F EATURES OF TCP VARIANTS AIMED AT IMPROVING EFFICIENCY IN HIGH . Another important question for a new congestion control algorithm is how flows that utilize it interact with each other (intra-fairness) and with other flows (inter-fairness). Second. packet pacing. To resolve this problem. multiplicative decrease coefficient adaptation NewReno Scaling the increase steps in Slow-Start and Congestion Avoidance to the reference RTT. Limited Slow-Start NewReno Multiplicative Increase Multiplicative Decrease congestion avoidance policy NewReno Congestion window increase steps as a function of time elapsed since the last packet loss detection. which is assumed to be a reasonable payoff for a significant reduction of induced packet losses.000 packets. Congestion window increase steps as a Vegas function of the achievable rate and queuing delay estimates 2007 Westwood.SPEED OR LONG .6.6.6. 2.12 ns2.6∗ ns2. P = the protocol specification 2 MicrosoftTM operating systems: S for server versions 3 Network simulators ∗ optional or available in patch form VI-P The high-speed/long-delay networks create one more problem for TCP.6. Ve. the unbounded exponential probing (see Tahoe in Section II-A) can lead to a loss of extremely large numbers of packets. Two components (slow and scalable) in the Vegas congestion window calculation S S S S S Experimental Experimental Experimental Experimental Experimental Vista. During the initial Slow Start phase when an approximate network limit is still unknown. HS-TCP . which is approximately 120 MBytes of wasted network resources. In other words. 2. in a 10 Gbps link with 100 ms RTT.AFANASYEV et al.29∗ ns2 2008 BIC VI-K VI-L VI-M VI-N VI-O The congestion window control as a cubic function of time elapsed since a last congestion event 2003 Vegas Constant-rate congestion window equation-based update 2005 NewReno Adaptation of the packet pairs to estimate the bottleneck link capacity. it is insignificant for longlived flow performance.6. it takes about 8 seconds to fully utilize a 1 Gbps link with 100 ms RTT. data packet pacing.13 2003 NewReno 2003 2004 TCP Hybla [77] BIC TCP [78] TCPW-A [79] LogWestwood+ [80] TCP Cubic [81] FAST TCP [82]–[84] TCP Libra [85] TCP NewVegas [32] TCP AR [86] TCP Fusion [87] TCP Africa [88] Compound TCP [89] TCP Illinois [90] VI-D 2004 VI-E VI-F VI-G VI-H VI-I VI-J 2004 2005 2008 Additive increase steps and multiplicative decrease factors as functions of the congestion window size. packet pairing 2005 Westwood.

However.e.e. fairness between STCP flows).01). 43. A number of congestion control algorithms discussed later in this survey (Hybla in Section VI-D. CUBIC in Section VI-H) address this problem and. because standard flows cannot effectively utilize the available network resources. Second. intra-. w = w + α× w. it can be shown that due to MIMD policies.e. subsequent research discovered that AIMD coefficient scaling (functions instead of constants) significantly intensifies this problem. Congestion window dynamics of HS-TCP Fig. However. However.. which scale quite well in many environments. where α = 0. this is similar to HS-TCP (Section VI-A) where the network resource probing steps (i. an STCP flow is extremely unfair.. [76] presented one more alternative to congestion control for TCP. Finally. Standard Reno/NewReno Congestion window dynamics of STCP explicitly does not consider fairness to standard TCP flows to be significantly important. Fortunately. where β = 0. RTT-) and effectiveness properties..26 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. FAST in Section VI-I. but with increased frequency and sharpness of increase/decrease phases (Figure 42).. B. the congestion window increase per RTT) grow as the congestion window itself is growing. Convergence diagram of the two STCP flow competition assuming that standard TCP flows cannot effectively utilize network resources. at the same time. inter-fairness characteristics (i. that during Congestion Avoidance the window is increased by a constant number of packets each RTT and decreased during Fast Recovery by a fraction of itself (Figure 11)—NewReno’s intra-fairness properties are preserved. intra-fairness property is highly important. In one sense. Clearly. This happens because multiplicative increase and multiplicative decrease policies essentially preserve a ratio between congestion window sizes of the flows.e. under the assumption that two STCP flows are experiencing the same RTTs and are able to detect a packet loss simultaneously. 41. A hypothetical congestion window dynamic of STCP looks similar to that of HS-TCP. called H-TCP (Hamilton TCP). during Congestion Avoidance an STCP flow increases its congestion window w by a fraction α of the window size with each RTT (i. This is generally undesirable for most networks. x2 − x1 multiplicative increase (a flow with the larger congestion window increases more than a flow with the smaller) x1 − x2 multiplicative decrease (a flow with the larger congestion window decreases more than a flow with the smaller) Fig. However. During Fast Recovery. the solution creates a number of critical problems. HS-TCP has substantial problems with fairness if flows have different RTTs. STCP does not even try to be inter-fair. the functional dependence of the elapsed time Δ has one significant advantage over the dependence of the e . Although this problem is inherited from Reno [78]. H-TCP in Section VI-C. STCP Kelly [73] proposed STCP (Scalable TCP) as an alternative to HS-TCP (Section VI-A) to solve the data transfer effectiveness problem in high-speed/long-delay networks. the flow with the larger initial share will always have an advantage (Figure 43). from Figure 42 we can easily recognize that even one STCP flow moves the network to a state of nearly constant congestion. the proposed modifications resolve the target problem by making the increase/decrease dynamics follow exponential functions. w = w − β × w. ACCEPTED FOR PUBLICATION Loss detection Network limit Congestion window Loss detection Network limit Congestion window HS-TCP improvement Time STCP improvement Time Standard Reno/NewReno Fig. The key idea of their proposal is that the congestion window increase step α in Congestion Avoidance should be a non-decreasing function of the time elapsed since the last congestion event (Δ).. That is. in the high-loss zone. H-TCP Leith and Shorten [75]. STCP behaves like standard TCP. STCP rejects the core AIMD concept and introduces a multiplicative increase multiplicative decrease idea (MIMD). the MIMD approach does not conceptually provide intra-fairness (i. Third. 42. which is intended to have good fairness (inter-.125). Instead of complicated AIMD coefficient calculations. In other words.e. Network Limit Network Share (STCP) Eq u al (fa ir) sh x0 x2 x1 ar Packet losses Network Share (STCP) x0 − x1 . both to STCP and to standard TCP flows that have higher RTT values [78]. C. it reduces the congestion window by a different fraction β upon detecting a loss (i. First. fairness to standard flows) are similar to HS-TCP: in the low-loss rate zone (<10−3). preserve data transfer effectiveness in high-speed/longdelay networks. it can be shown that because HS-TCP does not change the core additive increase multiplicative decrease (AIMD) concept of NewReno—namely.

Leith [76] removed the Fast Recovery modification from H-TCP. can be 100 ms: α (Δ) = α(Δ) × RT T /RT Tref . assuming that after a loss detection. H-TCP defines the increase in the congestion window w as α(Δ) for each RTT (equivalent to increase as a fraction α(Δ)/w for each reception of non-duplicate ACK. potentially resulting in catastrophic unfairness in the network resource distribution. It is meant to smooth the burst-nature of TCP transmissions. the H-TCP proposal includes a small modification of the congestion window reduction policy in Fast Recovery. To demonstrate that this holds. .2. More specifically. TCP Hybla Caini and Firrincieli [77] emphasized the problem of degradation of TCP throughput with standard congestion control— NewReno—in long-delay networks. Hybla introduces two more techniques that complement the congestion control: pacing the transmission of data packets [93] and estimating the initial slow-start threshold using the packet pair algorithm [85]. It can be noted that this definition of α(Δ) still leads to some degree of RTT-unfairness. in Slow Start w = w + ρ2 /w. an H-TCP flow is fair to other H-TCP flows present in the same network path. otherwise. whenever Δ < Δlow .: HOST-TO-HOST CONGESTION CONTROL FOR TCP 27 Congestion window Flow with RTT1 Flow with RTT 2=2·RTT 1 T1 T2 Time elapsed from the last congestion event . α(Δ) = 1. the scaling factor ρ is calculated according to the equation ρ = RT T /RT Tref . As a consequence. the increase steps upon receiving ACK packet are defined as follows: w = w + 2ρ − 1.g. In addition to these changes in the Congestion Avoidance phase.e. The pacing is essentially setting up a minimal delay between transmission of any two consecutive packets. and the the TCP throughput B has an upper bound that is inversely dependant on RT T 2 (the throughput can be approximated by the expression B ∼ w/RT T ). It can be shown that in Fig. during the same time period. Rationale of H-TCP’s RTT-unfairness congestion window: namely. to obtain the normalized increase steps in both phases.5 is used. However. To mitigate this effect. if we calculate the upper bound of the TCP throughput (i. the RTTs may be different by several orders of magnitude. If the absolute value of the relation [B(k) − B(k − 1)]/B(k − 1) is less than 0. to some degree. as an example.. This algorithm introduces modifications to the NewReno’s Slow Start and Congestion Avoidance phases that make them semi-independent of RTT. that no matter how large the initial congestion window sizes have been. later in the Internet-Draft proposal. especially with satellite segments. However. the congestion window is reduced by the ratio RT Tmin /RT Tmax. as follows: α(Δ) = 1 + 10(Δ − Δlow ) + 0. a flow with a shorter = RTT will always have an advantage compared to a flow with a longer RTT. In addition. both flows decrease their congestion windows by half. with this assumption we do not change the H-TCP principle. we see that a flow having a longer RTT always loses to a flow with a shorter RTT. If we assume that α(Δ) is calculated once per RTT at time 0. H-TCP defines an optional mechanism of scaling the α(Δ) to a reference RTT (RT Tref ) which. α(Δ) is the polynomial function over time Δ elapsed since the last congestion event. Clearly. Formally. Congestion window evolution in Hybla NewReno’s Congestion Avoidance.e. D. More specifically. the ratio between the congestion window and the RTT). For example. where three flows having different RTT values are presented.. the coefficient 0. we will see that all three flows can transmit data at similar rates (≈ 2 MByte/s after 500 ms). The packet pair algorithm provides an ability to estimate the network path capacity. flows experiencing the same network conditions will exhibit the same congestion window increase dynamics.calculation of the target congestion window value Fig.AFANASYEV et al. In other words.5 · (Δ − Δlow )2 where Δlow is a predefined threshold of H-TCP’s compatibility mode—i. where w is the current congestion window size. T1 . 25 ms). upon detecting a packet loss. where RT Tref is a reference RTT (e. To resolve this RTT-unfairness problem. 45. in Congestion Avoidance This definition is illustrated in Figure 45. let us consider two H-TCP flows competing with each other and having different RTT values (Figure 44). Knowledge of the network capacity may help us improve the convergence speed and. In particular. a Hybla algorithm has been proposed [77]. provides a scalability in high-BDP networks. flows will result in different congestion window values. one can build a convergence diagram of the two competing H-TCP flows and observe that it looks similar to Figure 11. The higher the RTT value. In heterogeneous networks. 44.. and T2 (note. H-TCP estimates the achieved flow’s throughput B(k) and compares it with the estimate of the preceding loss event B(k − 1). the congestion window size w is inversely dependent on RTT. the higher the ratio ρ becomes and the congestion window is increased more rapidly with each ACK packet reception. but it allows us to highlight the problem).

e. In other words. but cannot eliminate this problem. absolute unfairness).. At the same time. the target congestion window value. in a high-BDP network it may create the same problem that was discovered in SlowStart: if the congestion window is increased too fast. For example. To some extent. Though a true binary search algorithm features a very fast (logarithmic) convergence time.. To address the opposite case when the search range is too narrow—near the estimated optimum—BIC defines the congestion window increase by at least some constant Smin number of packets. the cost of this friendliness is an increased aggressiveness of the flows with larger RTT values. an indication of successful data delivery raises the lower boundary wmin to the previous congestion window size—the value when the network is expected to be congestion-free. but it also limits the increase in Rapid Convergence when the search range is too wide. BIC enables the Limited Slow Start phase with an unlimited slow-start threshold value. This property limits applicability of Hybla to satellite-like channels: Hybla. Schematically. if we assume that two competing flows can detect a loss simultaneously (a synchronized loss detection). or exceeds. BIC not only adopts HS-TCP’s Limited Slow Start.e. is unable to work effectively in high-speed networks with relatively small delays. Thus. Similar calculations for STCP show that. [94] showed that in certain environments BIC may have low RTT-fairness and inter-fairness values (fairness to other deployed TCP congestion controls). However.. Additionally.125 · w) when the congestion window size is more than 38. Hybla is designed to fall back to the standard mode (to Reno-like congestion control rules) if a flow’s RTT is less than a predefined reference value.. similar to NewReno (see Section II-D). While the network successfully delivers data packets (i. BIC TCP Xu et al. Rapid Convergence. In an attempt to create a congestion control that can scale well in any high-BDP (high bandwidth-delay product) network and yet to remain relatively RTT-fair. the optimal congestion window size (i. A revised version of BIC. This action is to discover a new upper bound and restart the binary search (Figure 47).e. Xu et al. [78] showed that in a synchronized loss model. This number is borrowed from HSTCP and is aimed at providing compatibility to Reno in environments with loss rates exceeding 10−3 . wmin is set to one and wmax to some arbitrary high value). BIC reduces the multiplicative decrease coefficient from 0.56 times larger. is meant to improve these properties and is discussed in .g. However. BIC sets the upper search boundary wmax to the current congestion window size— the value when the network is experiencing congestion— and enters the well-known Fast Recovery phase. This phase rapidly discovers.. and a flow with the higher RTT will get nothing (i.e. Additionally. the sender receives all ACKs during the last RTT). similar to the standard Reno. For this reason. the analytical calculations reveal that an HS-TCP flow having an RTT x times smaller will get a network share which is x4.125 (i. Congestion window dynamics in TCP BIC A number of experimental evaluations [77] have confirmed remarkable RTT-friendliness of the Hybla algorithm. 47. The problem in both cases lies in the way these algorithms discover network resources: a flow with a larger congestion window will try to increase its share more than a flow with a smaller window. pacing technique soften. during the RTT. In other words. Besides updating the congestion window.e. the value corresponding to the available network resources) by relying on detection of a packet loss as an indication of congestion window overshooting. feedback rate is proportional to 1/RTT). the STCP flow with the smaller RTT will always get all of the network resources. [78] proposed a BIC (Binary Increase Congestion control) algorithm. more aggressive flows can easily congest the network before they detect any packet loss. Rapid Convergence is not allowed to increase the congestion window by more than some predefined value Smax . The BIC approach for optimal congestion window discovery has a unique feature for loss-based congestion control approaches—the congestion window probing steps decrease as the window approaches a target value. in theory. Finally. in a binary search manner. three duplicate ACKs are received). [78] pointed out the RTT unfairness problem of HS-TCP (Section VI-A) and STCP (Section VI-B).. the congestion window is updated to the median of the search range between minimum wmin and maximum wmax congestion window sizes (initially. to increase the convergence rate in the low-loss network environments. in theory BIC is no RT T2 less RTT-fair than the standard Reno algorithm. w = w − 0. a large number of packets can be lost (see Section VI-A). 46. ACCEPTED FOR PUBLICATION Congestion window Congestion window wmax Network limit Loss detection Binary Search Network limit wmin Time Limited Slow Start Time Fig. E. called CUBIC [81].28 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. This algorithm extends NewReno with an additional operational phase. Binary search for the optimal congestion window in TCP BIC Fig. As soon as packet loss is detected (e. these flows have a slower feedback rate—a packet loss can be detected no earlier than delivery of a packet can be confirmed (i. the congestion window ratio of two flows with different RTTs (RTT-fairness) changes from e(1/RT T1 −1/RT T2 )t·ln 2 for small window sizes to RT T1 for large ones. Xu et al. the congestion control concept of BIC as a search problem is illustrated in Figure 46. a number of later experimental evaluations [81].5 to 0. when the current congestion window value during Rapid Convergence becomes very close to.

since the window growth does not depend much on RTT. TCPW-A introduces an additional technique called persistent non-congestion detection (PNCD). in a network with 10 ms RTT. LogWestwood+ Kliazovich et al. for this “maximum” LogWestwood+ takes a value of the congestion window observed just before the last detection of a packet loss (i. β is a coefficient of multiplicative decrease in Fast Recovery.g. in some scenarios LogWestwood+ may lose its scalability characteristics. which enhances the previously introduced BIC algorithm (Section VI-E) with RTT-independent congestion window growth functions. the initial Slow Start phase may be prematurely terminated (e. For example.AFANASYEV et al. while in a network with 100 ms RTT. TCPW-A Wang et al. In the TCPW-A proposal authors presented several experimental evaluations. the flow with the larger RTT is likely to remain in a compatible mode all the time.. Congestion window dynamics of TCPW-A Fig. a Vegas-like rate estimate (RE = cwnd/RT Tmin ) increases as the congestion window increases. This function preserves not only RTT-fairness. a modification to TCP Westwood (Section V-A) that improves its properties in high-speed or long-delay networks. while the flow with the smaller RTT quickly switches to a scalable mode and acquires all available resources. which includes steps of calculation of the Westwood-like eligible rate estimate (ERE) and update to the slow start threshold based on this estimate. and at the same time has good intra-. the loss rate 10−3 allows one loss every 380 ms. especially in heterogeneous networks where RTT can vary significantly. The switching criteria in most cases is a pre-calculated congestion window size w which corresponds √ to a certain loss rate p (w = 1. the proposed algorithm. Similar to BIC. they introduced a concept of agile probing. More specifically. CUBIC TCP Rhee and Xu [81] noted the highly challenging problem of creating a simple congestion control algorithm that scales well in high-BDP networks. [80] proposed another high-speed extension of TCP Westwood (Section V-A).2/ p). H. as follows: w =C Δ− 3 β · wmax /C 3 + wmax where C is a predefined constant. Some of the previous congestion control proposals (e. Thus if two flows competing in the same bottleneck link have different RTTs. However. and RTT-fairness properties. BIC) enforce inter-fairness (i. the same rate allows only one loss every 3. HS-TCP. For example. and wmax is the congestion window size just before the last registered loss detection. To prevent unnecessary switches to the Slow Start when network resources are almost fully consumed.e. the threshold in HS-TCP is set to 38 packets.. and gently increased when approaching an estimated maximum. due to a burst of temporal congestion or even some random losses). This observation allowed Rhee and Xu [81] to propose CUBIC congestion control.g.. To accomplish this. fairness to the standard TCP flows) by switching to standard congestion window update rules in high-loss environments. For example. and later LogWestwood+ will not be able to rapidly discover a new maximum in high-speed or long-delay networks. other than that the first is based on Westwood+ and the second is based on NewReno. TCP LogWestwood+. G. is that the standard Congestion Avoidance phase (linear increase) is used instead of the Slow Start phase after reaching the maximum (Figure 49). Besides the main objective of being able to effectively utilize resources of high-speed or long-delay networks. Congestion window dynamics of TCP LogWestwood+ Section VI-H. These steps simultaneously provide two benefits: reduced network stressing during the Slow Start phase (dynamic adjusting of the threshold) and rapid discovery of available network resources during the Congestion Avoidance phase (temporary return to the Slow Start when the threshold becomes larger than the congestion window. STCP. features behavior similar to BIC (Section VI-E): congestion window is increased rapidly when the current value is small. F. inter-. they did not answer the question of how well TCPW-A behaves in comparison to other high-speed TCP variants discussed in this section. 48. Although their results showed considerable improvements compared to standard NewReno. this definition of loss rate is not the ideal guideline. Analytical modeling and ns-2 simulations of the algorithm behavior allowed the authors to claim that LogWestwood+ is much more efficient and RTT-fair than the standard NewReno algorithm. but also scalability and intra-fairness properties of BIC’s Limited Slow Start and Rapid Convergence phases.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 29 Congestion window Network limit Westwood estimate (ERE*RTTmin) Agile probing Time Congestion window Loss detection Loss detection Network limit Westwood estimate (ERE*RTTmin) Log increase Linear increase Time Fig. which corresponds to a loss of one out of every 1000 consecutive packets (p = 10−3 pkt/s).8 seconds. However. see Figure 48). The main idea of this technique is based on the assumption that when the network is not congested. [79] proposed TCPW-A. just before the last reduction). CUBIC borrows the H-TCP approach (Section VI-C) of defining the congestion window w as a cubic function of elapsed time Δ since the last congestion event. 49. The function has a . The key difference between LogWestwood+ and BIC.e..

CUBIC does not have 100% network resource utilization and can induce a large number of packet losses in the network (as long as a loss is the only signaling mechanism). In some sense. which is hard to calculate in some environments (e. even in smallBDP networks. introduced a FAST algorithm. FAST’s authors have concluded that α should be a constant. ACCEPTED FOR PUBLICATION Loss detection Network limit 2 1 3 2 2 target window wmax Time 1 – right branch of cubic function 2 – left branch of cubic function 3 – left and right branches of cubic function Fig.30 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. Second. FAST limits a potential increase in the congestion window (when α w) to be no more than a current value.16).. CUBIC provides a mechanism to ensure that its performance is no worse than the performance of the standard (Reno) congestion control. to calculate the new target congestion window size. Formally. there are two fundamental differences between Vegas and FAST: FAST defines a periodic fixed-rate congestion window update (e.and inter-unfairness [84]. Nevertheless. but it is still scalable to high-BDP networks. Attempts at making α vary depending on the congestion window size.. [91] and by real-world measurements. otherwise. the target is updated.. and scalability. Finally. In the initial step labeled 1. Congestion window dynamics in CUBIC very fast growth when the current window w is far from the estimated target wmax . when α is too small. I. inspired by the Vegas idea of congestion control with the queuing delay as a primary congestion indicator (see Section II-G). the protocol will scale easily to any high-BDP (high bandwidth-delay product) networks. the RTT is not always a good substitute for the queuing delay. RTT fairness. This mechanism includes calculating a supplementary congestion window size wreno that approximates the performance of a corresponding standard Reno flow. J. If CUBIC detects that the supplementary window wreno exceeds the main window.e. the congestion window gently approaches the target (phase 2). the window will be increased based solely on the predefined parameter α. or SACK). At later stages. RT T and RT Tmin are current and minimum RTT.g.g. but it will have substantial convergence problems (the stable state when w = w × RT Tmin /RT T + α will be barely reachable). this can be written as an increase of the window wreno by s every RTT. [82]–[84]. [85] proposed Libra as another variant of congestion control to resolve the scalability issues in standard TCP. FAST’s characteristics depend highly on the true minimal RTT value. In addition. However.6. if the network is experiencing congestion (RT T > RT Tmin ).g. If a loss is detected before wmax is reached. the latter is reset to be equal to the former. the proposed congestion window update rule is not friendly to standard TCP (Reno.. due to the fact that it has been the default for the Linux TCP suite since 2006 (i. The only difference is that FAST increases the congestion window based on the internal timer expiration (e. ongoing research [84]. FAST will decrease the congestion window (use of the network resources) proportionally to the congestion level estimated using RTT measurements (RT Tmin /RT T ).e. Because the congestion window in CUBIC can be reduced by a fraction β different from 0. while preserving and improving the RTT-fairness Congestion window . stability. FAST TCP Jin et al. which is roughly equivalent to the increase in the standard Slow-Start mode. 50. Figure 50 shows the theoretical dynamics of the growth of the congestion window in CUBIC. [95] recognizes a number of serious issues with the design. Linux kernel version 2. each 20 ms) and.g. and it is very conservative when w is close to wmax . NewReno. CUBIC is currently the second most-used congestion control algorithm for TCP. According to this equation. Selection of α has conflicting effects on two important protocol parameters: scalability and stability. but Slow-Start is clocked by ACK packets reception. we will see a congestion window growth according to both left and right branches of the cubic function (phase 3). First. FAST behavior is practically equivalent to Vegas). FAST uses a specially designed equation which incorporates a simple delay-based congestion estimation feature: RT Tmin +α w=w· RT T where w is a current congestion window size. as described below. generally βcubic = βreno ). Although the problem of accurate selection of α is still an open issue. following a reduction upon detecting a loss. each 20 ms). the target window wmax is unknown and is discovered using the right branch of the cubic function. FAST uses a wellknown technique of exponential smoothing of the calculated congestion window value. FAST may be considered a scalable variant of Vegas that defines a periodic congestion window update based on the internal delay-based estimate of the network state.. If it was a temporary congestion event. if α is too large. This discovery is much more conservative than the exponential discovery used in conventional Slow Start. FAST will easily stabilize but will have scalability problems (e. Additionally. To make the equation-based algorithm tolerant of shortterm fluctuations in network parameters. TCP Libra Marfia et al. and α is an important protocol parameter. Although simulation-based and real-world experiments show remarkable intra-fairness. especially when there is congestion along the reverse path or when there are route changes.5 (i. In other words. if α = 1.. The good performance and fairness properties of CUBIC were confirmed by various experimental studies [81]. and RTT measurements are reported to lead to substantial intra. In the opposite case. the appropriate performance (an average sending rate) can be achieved only if the supplementary congestion window increase steps are scaled with s = 3 × (β − 1)/(β + 1) [81]. when routes tend to be dynamic) without relying on additional messaging from the network.

. NewVegas switches to Fast Recovery followed by Congestion Avoidance. they also found that it has three serious problems (two of them have been inherited from Reno. the congestion window is increased by α packets every RTT. NewVegas authors believe and experimentally confirm that with delay-based congestion controls. The constant γ is selected in a way that considers all links with an RTT close to or more than γ to have some pathological problems when RTT-fairness is not an issue. according to the equation: α = k1 · C · P · RT T 2/(RT T + γ) where RT T is the current RTT estimate. Thus. RT Tmax.AFANASYEV et al. RT Tmin . At any point. but to continue the opportunistic exponential-like resource probing with reduced intensity. it remembers the current congestion window value in a special state variable wr and switches to the Rapid Window Convergence. In order to solve the second problem of generation of bursty traffic during initialization and re-initialization. In this state. According to the proposal [32]. The key idea of this phase is not to immediately terminate the Slow Start phase when the estimate of network buffering exceeds the threshold (Δ > α). if the loss is detected using the RTO. this is based on the assumption that two data . and (c) Vegas’ estimation of network buffering can be significantly biased if receivers use the standardized delayed ACK technique [17]. if a packet loss is detected.. and P is a penalizing factor which reduces the increase step if the network experiences congestion (e. Formally. 2). 1 and 2. NewVegas resets the congestion window size and moves to Slow Start. will further worsen because of estimation biases.g.. NewVegas reacts exactly the same as the original Vegas algorithms. this technique can help overcome the RTT estimation problem. The rationale of the congestion window increase steps α is as follows: the first part of the functional dependence k1 · C makes the increase steps scalable to the bottleneck link capacity.g. when the current RTT value is large (e. the congestion window is allowed to be increased by x packets: x = (wr )−2 3+n where n is a number of times the early congestion indicator triggers in the Rapid Window Convergence phase. The penalizing part P forces Libra to decrease the network resource probing intensity (the congestion window increase steps) exponentially. respectively). the scaling factor will reduce β further. A number of experimental evaluations (ns2 simulations) show that Libra can help improve the high-BDP link utilization and fairness properties of TCP.. However. Clearly. where θ and γ are constants. including the non-scalable Reno with selective ACKs (Section II-E). The last part. In particular. in Libra’s Congestion Avoidance. but waits for a timeout or another data packet. In detail. the increase steps are scaled to RTT2 —the essential requirement for RTTfariness (see Section VI-D). if the loss is detected using three duplicate ACKs. penalizing factor P can be represented with the expression based on queuing delay measurements: P = e−k2 ×Q/Qmax where k2 is some constant (e. The latter value in Libra is estimated using a well-known packet pair technique [96]. If RTT is significantly less than the constant γ. the recommended values for these constants (θ and γ are equal to 1 second) make the scaling factor close to 1 in most cases. it is not very significant. Libra’s design is based on NewReno (Section II-D) and modifies the Congestion Avoidance congestion window increase steps to follow a specially designed function of both the RTT and the bottleneck link capacity. Libra’s properties. the congestion window size is close to the convergence point). (b) during the (re-)initialization phases (Slow Start and Fast Recovery) the Vegas congestion control can generate very bursty traffic. Moreover.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 31 properties. Yet this is the opposite of what congestion control should do: the decrease should be maximized in the presence of congestion and minimized when the network is in a congestion-free state. the same results show that Libra does not always outperform other congestion control approaches. In terms of TCP. when New Vegas’s Slow Start detects that the threshold has been exceeded (early congestion detection). In other words. In addition. this means that if there are data to be sent and the current value of the congestion window allows sending only one data packet. is responsible for Libra’s RTT-fairness. it sets-up a minimal delay between transmission of any two consecutive packets [69]. if the estimated level of buffering in the network (Q/Qmax) increases. Additionally. γ and k1 are predefined constants (e.e.g. and Q and Qmax are current and maximum queuing delay estimates (see Section II-B). if no loss has occurred. RT T 2/(RT T + γ). [97]. due to high reliance on the queuing delay estimation (i. To reduce the convergence time and improve to some degree the high-BDP link utilization. Although this scaling factor is derived analytically [85]. Although it is reported to have a negative impact on TCP Reno performance [93]. similar to FAST (Section VI-I) and C-TCP (Section VI-O). packet pacing has only a positive effect. and RT T measurement consistency). [27]. NewVegas applies the well-known packet pacing technique. for every RTT. However. TCP New Vegas Sing and Soh [32] recognized the advantages of the delaybased congestion control approach presented in Vegas (Section II-G). NewVegas will hold any transmission until the window increases by at least one packet. C is a value responsible for Libra’s scalability and represents the capacity of the bottleneck link estimated using the packet pair technique. when n becomes more than 3.g. when the TCP receiver does not immediately respond to each data packet. Section II-C): (a) it cannot effectively utilize high-BDP links. K.. However. potential congestion in the network). Libra also defines a change in the multiplicative decrease policy of the Fast Recovery phase (w = w − β × w): the decrease coefficient β scales with the expression θ/(RT T + γ). Rapid Window Convergence terminates and normal Vegas-like Congestion Avoidance takes its place. the proposed New Vegas algorithm defines a new phase called Rapid Window Convergence. The last problem of the estimation bias is solved by requiring the sender to transmit data packets in pairs.

51. M. It extends TCPW-BBE (Section V-E) with a scalable congestion window probing in the Congestion Avoidance phase. ACCEPTED FOR PUBLICATION Congestion window Wprobe Loss detection Network limit Loss detection Congestion window 2 1 3 Fusion improvement Standard Reno/ NewReno Network limit Wbase TCP-AR improvement Time Standard Reno/NewReno Fig. queuing delay). In the case where the queuing delay lies somewhere in the range between one and three times the threshold (zone 2 in Figure 52).32 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. No scalable features are designed for Congestion Avoidance and Fast Recovery. Fusion defines three separate linear functions which are switched... and Vegas’ used network buffering (Section II-G) estimations. This ratio is essentially a simplified form of Westwood’s. the function gives a value close to the Westwood-like achievable rate estimate. where the sampling interval equals the RTT [87]. As one can see in Figure 52. a conventional Reno-like window wr is maintained along with the Fusion congestion window wf . [88] were concerned with the problems of several previously proposed congestion control algorithms . Defining the threshold in absolute terms requires manually adapting Fusion to a particular environment. the Vegas estimate). In the worst case.g. the congestion window increase function is defined to have two components: a slow constant increase component Wbase (increased by one for every RTT) and a scalable increase component Wprobe (increased by a function of the Westwood-like achievable rate estimate and the queuing delay.. Neither of these assumptions is entirely true in real networks. N. More specifically.5. more serious problem. but also introduces a new. Time 1 – scalable increase (Q<threshold) 2 – constant congestion window (threshold < Q <3·threshold) 3 – reducing to the lower bound Congestion window dynamics in Fusion packets will not be separated much during delivery (e. 52. C-TCP. packet pairing has questionable benefits on the precision of RTTbased estimations (e. DUAL’s queuing delay (Section II-G). the congestion window remains unchanged. Instead of TCP-AR’s congestion window increase in Congestion Avoidance as a continuous function over the queuing delay. Another problem of Fusion is the way it quickly defaults to standard congestion window control rules.e. TCP-AR totally loses its ability to scale in high-BDP networks. If the queuing delay grows more than three times the threshold (zone 3 in Figure 52). Moreover. when the network is experiencing congestion (i. due to congestion) and that the TCP receiver sends ACK packets for every other data packet. FAST). slow. non-scale mode most of the time. see Sections II-B and V-A respectively). RT Tmin /RT T ). when queuing delay is near maximum). Congestion window dynamics in TCP-AR Fig.. expressed in seconds) queuing delay threshold value. Although NewVegas authors have recognized the problem of high-BDP link utilization. the RTT measurement can be improved significantly just by employing the Timestamp option [31]. Fusion changes the constant congestion window reduction ratio β in Fast Recovery to the value β = max(0. If wf becomes smaller than wr . when queuing delay is close to zero).e. Fusion not only has the same vulnerabilities as TCP-AR. combines the ideas of Westwood’s achievable rate (Section V-A).g. relying on the queuing delay and achievable rate metrics makes this algorithm vulnerable if RTT measurements should become noisy. Second.. which limits the potential applicability of NewVegas in the future. This manual configuration is highly undesirable and usually impossible to perform. when the queuing delay is wrongly estimated to be close to the maximum. Figure 51 shows conceptually the congestion window dynamics of the TCP-AR algorithm.e.. when the network is congestion-free (i. Experimental results showed that TCP-AR can improve the network utilization successfully and at the same time preserve a good level of intra-fairness. the congestion window decreases by the number of packets buffered in the network (i.g. TCP Fusion Kaneko et al.. First. depending on an absolute (i. as a minimum. L. TCP Africa King et al. in a fashion similar to TCP-AR (Section VI-L). However. TCP-AR Shimonishi and Murase [86] presented TCP-AR (Adaptive Reno) as another approach to improve TCP performance and preserve friendliness to standard TCP in high-speed networks. the congestion window is increased at a fast rate each RTT by a predefined fraction of Westwood’s achievable rate estimate (scalable increase). BIC. the proposed Rapid Window Convergence resolves only one part of the problem: this phase of scalable congestion window increase steps intends only to improve the early termination of Slow Start. [87] presented a Fusion algorithm which. Thus.e. Although experimental results of evaluating Fusion [87] have shown some improvements in terms of utilization and fairness characteristics in comparison to other scalable algorithms (e. To make Fusion behave at least as well as the standard Reno congestion control. in certain cases Fusion may stay in the compatible. The scalable component is a continuous function that has two important properties. HS-TCP. If the current queuing delay is less than the predefined threshold (zone 1 in Figure 52). In addition. the value of the scalable component Wprobe is zero. the wf is reset equal to wr .

Illinois has several implementation differences. Both simulation results and real-world performance evaluation show substantial advantages of the C-TCP scheme: a good utilization of the high-BDP links and good intra-. This algorithm combines the aggressiveness (scalability) of HS-TCP when the network is determined to be congestion-free and the conservative character of standard NewReno (Section II-D) when the network is experiencing congestion. This reduction can be understood as a smooth transition between scalable HS-TCP and slow Reno modes. we will see a convex curve (shown in hatched bar) in the transition from the scalable HS-TCP to the slow Reno mode.g. it has inherited the Vegas sensitivity to the correctness of RTT measurements. For example. is based on NewReno (Section II-D) and is designed on the one hand to be very aggressive when the network is determined to be in a congestion-free state and on the other hand be very gentle when the network is experiencing congestion. [89] presented C-TCP (Compound TCP). The queuing delay calculation . However. To resolve this contradiction. inter-. due to a high volume of cross traffic. STCP. the theoretical congestion window dynamics of C-TCP (Figure 54) are very similar to Africa’s with the exception that after the threshold has been exceeded (Δ > α). However. The congestion/non-congestion criteria was borrowed from the Vegas algorithm (see Section II-G): the estimate of network buffering Δ is compared to some predefined constant α. Otherwise.g. O. inter-.e. For example. In a number of simulations conducted by its authors.g.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 33 Loss detection Congestion window High buffering zone ( > ) Network limit Congestion window Loss detection High buffering zone ( > ) Network limit Fast mode (HS-TCP) Slow mode (NewReno) Time HS-TCP rules Transition from HS-TCP to Reno rules Time Fig. These dictate congestion window increase and decrease steps as functions of the congestion window itself (see Figure 53).. Africa has not been implemented and evaluated in real networks. instead of explicitly defining the fast and slow modes. the flow seeing a higher RTT (which is equivalent to having a higher threshold α value) will be much more aggressive and unfair to the other flow. Congestion window dynamics of C-TCP for high-BDP networks. Africa showed good network utilization in high-BDP networks. Congestion window dynamics of TCP Africa Fig. as opposed to the instant transitions between the fast and slow modes of Africa.AFANASYEV et al. it moves to slow mode and applies the Reno rules: increase by one. compared to congestion controls that rely only on packet losses (e. the dualmode C-TCP algorithm (Section VI-O) is currently the most deployed TCP congestion control in the world. route dynamics. a congestion control approach similar in spirit to Africa (Section VI-N). and RTT-fairness properties. 53. When the estimate exceeds the threshold α. w = w + α every RTT) and the decrease ratio β in Fast Recovery (i. It also tries to use a delay-based estimate of the network state to combine the conventional Reno-type congestion control (Section II-C) with a congestion control that is scalable in high-BDP networks. C-TCP defines an additional scalable component wf ast to be added to the final congestion window calculations (w = wreno + wf ast ). the performance of delay-based algorithms may suffer greatly when the delay (RTT) measurements are very noisy. decrease by half.. 54. upon detecting loss using three duplicate ACKs) to be special functions of the queuing delay... RTT-) comparable to those exhibited by NewReno flows. HS-TCP. [90] noted that congestion control algorithms that interpret a delay as a primary signal for inferring the network state (e.). This algorithm. C-TCP has replaced the conventional congestion control for TCP in Microsoft Windows operating systems and is currently the most deployed congestion control worldwide. w = w − β · w. etc. it moves to fast mode and directly applies the HS-TCP rules of the Congestion Avoidance and Fast Recovery phases. where ζ is a predefined constant). and fairness properties (intra-. It defines both the congestion window w increase steps α in Congestion Avoidance (i. one flow already sending data when a second one appears). the scalable component wf ast is gently reduced by a value proportional to the estimate itself (wf ast = wf ast − ζ · Δ. etc. similar to Africa (Section VI-N) and C-TCP (Section VI-O). However.e. P. TCP Illinois Liu et al. More formally. since it is embedded into the Microsoft Windows operating system (see Table V). because C-TCP relies on the Vegas estimate. including HS-TCP (Section VI-A) and STCP (Section VI-B).. the Illinois algorithm has been proposed. C-TCP Tan et al. However. Unfortunately. where α is some small predefined constant). for example. lower induced loss rate compared to HS-TCP and STCP. As a result. However. As a result. Vegas and Fast) are able to achieve a better efficiency and do not stress the network excessively. This component is updated according to the slightly modified HSTCP rules (Section VI-A) but only when the Vegas estimate Δ (Section II-G) shows a small level of network buffering (Δ < α. In response to these concerns they developed the Africa (Adaptive and Fair Rapid Increase Congestion Avoidance) algorithm. on the weaker side. Reno. if Africa sees that there is little buffering (Δ < α). if flows competing with each other in the same network observe different minimal RTT values (e. the idea of multiple-mode congestion control for high-BDP networks with delay-based mode-switching has been widely adopted by several proposals discussed later.

Illinois switches to the compatibility mode (α = 1 and β = 0. it controls only the amount of the congestion window increase and cannot enforce its reduction—the advantages of Illinois can easily be nullified. where Qmax is a maximum queuing delay observed over the lifetime of the connection.125) Transition from αmax (10) to αmin (0. whereby it reduces the congestion window w by the number of packets Δ estimated to be buffered in the network if this number exceeds a predefined threshold ε (i.5) Time Fig. processing delay. etc).e. YeAH includes two more mechanisms for improving robustness during congestion events and enhancing intra-fairness properties. Illinois sets the default values of αmax = 10. the congestion level is measured. otherwise. the α and β coefficients are updated once every RTT.g.. The theoretical and experimental evaluation of Illinois showed that it is able to use the available resources in the high-BDP networks better than the standard Reno congestion control. the congestion window increases by at most one packet every RTT and decreases by half upon detecting a loss from three duplicate ACKs.e. YeAH furthermore disables the precautionary decongestion if the reference window is more than the actual congestion window size. is close in spirit to C-TCP (Section VI-O). but as a fraction of the minimum RTT observed during the connection lifetime. ACCEPTED FOR PUBLICATION αmax (10) βmax (0.1) Transition from βmin (0. to mitigate the effects of queuing delay measurement noise. YeAH TCP Baiocchi et al. similar to HS-TCP (Section VI-A) and STCP (Section VI-B). in queuing delay calculations YeAH uses the minimum of recently measured RTTs (e. The first mechanism. It can fall back to the Reno mode (α = αmin and β = βmax ) whenever either the minimum or the maximum RTT values are incorrectly estimated or the RTT includes large random components (e. 55. the slow Renolike mode is enforced.. it preserves and improves the intra-.. Q/Qmax ). YeAH) can improve inter-.. the α coefficient is allowed to be set to the maximum. making it behave like NewReno during severe congestion events. According to the Illinois specification. there are two differences in the definition of the latter metric from what was introduced in DUAL. αmin = 0. HS-TCP. while the decrease coefficient is directly proportional (Figure 55). For the former. Q2 = 0. However. the proposed YeAH (Yet Another High-speed) algorithm defines the slow NewReno (Section II-C) and the fast STCP (Section VI-B) modes in Congestion Avoidance and Fast Recovery explicitly.g. and RTT-fairness properties. YeAH defines simultaneous use of two delaybased metrics: the Vegas-type estimate of a number of packets buffered in the network (see Section II-G) and the DUAL-type network congestion level estimate (see Section II-B). To summarize.g. For this purpose. The second mechanism repeats another idea presented in C-TCP: the congestion window is restricted to be a value that is greater than if only Reno rules are applied.g.5) when the congestion window size is less than a predefined threshold wt (e. ten packets). Congestion window dynamics of TCP Illinois follows the definition introduced in DUAL (Section II-B). precautionary decongestion. 5) the value of the queuing delay is less than the first threshold Q1 .3. However. Q2 .. In the Linux implementation. In the latter. not as a fraction of the maximum queuing delay (e.1 · Qmax . This switch. Additive increase α and multiplicative decrease δ coefficients as a function of queuing delay Q Fig. βmin = 0. the results confirmed that approaches which combine delay-based and lossbased metrics (e.. the performance of YeAH—similar to all delay-based . Experimental evaluation showed that YeAH maintains high efficiency in high-BDP networks that maintain a network buffering at a very low level. intra-. although the queuing delay is a secondary parameter by which to infer the network state—i. different propagation delays when path frequently changes. C-TCP. However. during the last RTT) instead of an averaged RTT. At the same time. Additionally. can be varied to achieve desired performance characteristics.e.g. and RTT-fairness properties substantially compared to pure loss-based approaches (Reno. 56.125) to βmax (0. To provide a reliable mechanism for mode switching. Q1 = 0. then it behaves exactly as STCP. [91] introduced one more alternative for congestion control that combines packet loss detection and measurement of RTT as mechanisms to estimate the network state. First.1) Q1 Qmax Q2 Q3 βmin (0. STCP).8 · Qmax .g. improves fairness properties of Illinois to some extent. The increase coefficient α depends inversely on the queuing delay.01 · Qmax . only if during several consecutive RTTs (e.5. However. In addition to mode switching. if YeAH estimates a low level of packet buffering in the network (Δ < α. Additionally. and Q3 .125.34 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. w = w − ε · Δ). Figure 56 shows the key cases of the Illinois theoretical congestion window dynamics.. Second.. bmax = 0.5) Congestion window Loss detection Buffering zone (Q>0) Network limit αmin (0. Q. less than half). The minimum and maximum values of α and β. Similar to Africa (Section VI-N). where α is a predefined threshold) and the queuing delay estimate shows a low congestion level (Q/RT Tmin < ϕ.. and the queuing delay thresholds Q1 . YeAH maintains a reference congestion window size wreno that varies according to Reno rules. inter-. the congestion window is updated aggressively—increased by a fraction of the congestion window itself each RTT and decreased by another fraction which is much smaller than that in slow mode (i. where ϕ is another predefined threshold). Africa. and Q3 = 0.

In the first part of this survey. srcPort. Essentially. Additionally. this allows users to game even the “ideally” fair TCP congestion control system and acquire an advantage in the network resources distribution.g. “proactive” Vegas. we classified and discussed proposals that build a foundation for host-to-host congestion control principles. There are also questions about the fundamental assumptions of TCP congestion control. Africa. there has been a belief that the more data packets routers can buffer. Often.e. provides basic support for multi-homing. etc. instead of providing fast feedback to a TCP sender by dropping a number of packets. there are not yet the well-defined and broadly-accepted criteria to serve as a good baseline for appropriately selecting a congestion control algorithm. we need a mechanism to make new flows aware of the current network state (e. First of all. For example. especially when the probability of loss is high. VIII. If the delay patterns change because of non-congestion-related factors. a new congestion control-related problem has appeared on the Internet.. Because all TCP flows behave independently. “ideally” fair congestion control under current definitions is not really “ideal. TCP-AR. Thus.). but problems of efficient channel utilization.. introduces the basic technique of gradually probing network resources and relying on packet loss to detect that the network limit has been reached. high-speed. However. such switching rules makes algorithms behave in a non-high-speed. the available capacity can easily be exceeded. Several problems have arisen as user mobility significantly increased. This happens because such algorithms do not have the ability to invalidate various internal parameters during the transmission. Recently. The congestion control techniques developed so far do not really work if the connection lifetime is only one or two RTTs..) address this problem by incorporating at least two sets of rules to control the transmission rate of a flow (i. but everything in TCP is based on individual flows.AFANASYEV et al.g. C-TCP. For many years. DNS requests/responses via TCP). As a result. Jain’s metric was based on user shares [29]. reliable detection of congestion events in separate and common network paths. “reactive” HS-TCP and STCP behave as standard Reno while the congestion window is less then a predefined threshold). O PPORTUNITIES FOR FUTURE RESEARCH Currently we have as situation where there is no single congestion control approach for TCP that can universally be applied to all network environments. Recently there were several reports of the excessive buffering syndrome (also known as a congestive queueing or buffer madness event) on the “End-to-end” mailing list [100]. However. router manufacturers and network administrators often choose maximum values for the bandwidth and delay (or choose some large buffer size). the more effectively network channels are utilized. Thus. C ONCLUSION In this work we have presented a survey of various approaches to TCP congestion control that do not rely on any explicit signaling from the network. SCTP [98]. The best known recommendation is to set the buffer size equal to a bandwidthdelay product (BDP) of the connection served [99]. C-TCP.) use patterns in delay measurements for switching purposes. Clearly this value has a direct impact on the performance of short-lived flows. One of the primary causes is a wide variety of network environments and different (and sometimes opposing) network owners’ views regarding which parameters should be optimized. YeAH. a new short-lived flow has no idea of the present network state and can only exacerbate any congestion present in the system. mode. long-delay. wireless.” To summarize. the network resources will be distributed as one to five..: HOST-TO-HOST CONGESTION CONTROL FOR TCP 35 approaches—can degrade when RTT measurements have significant noise. where round-trip delays grew in excess of 5–10 seconds. dstIP.. the current version of Linux kernel provides an API for software developers to choose any one of the supported algorithms for a particular connection. although this technique solves . The challenge is distinguishing the independent network states without knowing the exact topology of the Internet.g. A new generation of the reliable data transfer protocol. etc. if one user opens only one TCP connection and another opens five. these algorithms may suffer from efficiency and fairness degradation. A number of the congestion control algorithms from Section VI (HS-TCP. therefore inefficient. More and more users now have multiple physical access channels to the Internet. Jain’s index (see Section II-A) is used as a fairness measure. a fundamental research question is how to enforce fairness on a user-level basis without sacrificing throughput of individual flows. Illinois. routers extensively buffer packets making a TCP sender unaware of an abnormal situation in the network. from the basic problem of eliminating the congestion collapse phenomenon to problems of using available network resources effectively in different types of environments (wired. for example to speedup data transfer (since a TCP connection is identified by the tuple {srcIP. STCP. TCP is fundamentally unable to use them simultaneously. objective guidelines to select a proper congestion control for a concrete network environment are yet to be defined. In practice. VII. Other algorithms (e. if the initial value is large enough and the number of short-lived flows in the network is substantial. dstPort}). Some algorithms switch modes based on a currently achieved transmission rate (e. However. for example because of a re-routing path. The survey highlighted the fact that the research focus has changed with the development of the Internet. etc. Moreover.g. Tahoe. it was initially assumed that each TCP flow should be fair to each other. and user fairness are still to be solved. conventional Reno-like rules when the network seems to be congested and scalable rules otherwise). One potential direction to solving this problem is to maintain global estimates of network states. However. In some network environments. Unfortunately. an ability to estimate the available network path capacity before the actual data transfer). The only congestion control parameter useful during such connections is the initial value of the congestion window. Another aspect of congestion control not yet fully investigated is the problem of short-lived flows (e. The first proposal.

2000. 1461–1472. conference on applications.” IEEE Commun. R EFERENCES [1] J. Seshan. [12] C. Postel.” RFC. 4. December 1997. pp. 1465–1480. [17] M. and M. no. and challenges. Li. p. 314–329. Allman. 281–291. Commun. Doyle. and the Linux-world uses CUBIC. 8. [5] J. “RFC3782—the NewReno modification to TCP’s fast recovery algorithm. Murata. 4. As we showed. 177–186. and J. Floyd and T. “RFC791—Internet Protocol. “RFC896—Congestion control in IP/TCP internetworks. applying slightly different techniques to achieve the same goal. 1999. Balakrishnan. These proposals show that in such environments. Libra. 28. 1981. vol. vol. “RFC2018—TCP selective acknowledgment options.. E. NY. [13] S. Al Hanbali.” IEEE/ACM Trans. N. “Modified TCP congestion avoidance algorithm. TCP-AR. [10] M. “Promoting the use of end-to-end congestion control in the Internet. we showed that technology advances have introduced new challenges for TCP congestion control. “A survey on TCP-friendly congestion control. In the third part of our survey. Widmer. Hasegawa. H-TCP) introduced simple but highly optimistic (aggressive) policies to probe networks for the available resources. [16] V. Padmanabhan. [2] C. Fusion. DOOR). 3rd quarter 2005. The second part of the survey is devoted to a group of congestion control proposals that are focused on environments where packets are frequently reordered. “A survey of TCP over ad hoc networks. vol. Scheuermann. vol. no. and R. F. “RFC793—transmission control protocol. Netw. 1988. August 1999.” IEICE Trans. inter-. we showed that basic hostto-host congestion control principles can solve not only the direct congestion problem but also provide a simple traffic prioritizing feature. [3] J. “A survey on congestion control for mobile ad hoc networks. In the last two sections of the survey. Mahdavi. [22] L. Sel. and there is no current consensus in the research community regarding which approach is superior. it creates a great deal of inefficient use of the network. 6. 1996. 5. we reviewed a group of solutions with the most research interest over the recent past. STCP. 553–574. 22. there are disadvantages to both of these approaches. no. 18. Unfortunately. pp. such techniques led to the appearance of a number of other problems. Parallel Distrib. Peterson. Fall.” IEEE Control Syst. 22. Nain. they co-exist in the current Internet: C-TCP is deployed in the Windows-world. August 1987. YeAH) perform this by relying on secondary delay-based network state estimation techniques. [19] S. vol. V. and W. Jacobson. Two proposals. Low. V.” ACM SIGCOMM. or (2) by undoing previously applied actions if reordering is detected (Eifel. Mahdavi. “Internet congestion control. vol. 1999. we discussed several solutions (the Westwood-family algorithms) which apply similar techniques for estimating the last “good” flow rate and using this rate as a baseline to distinguish between congestion or random packet loss. no. 3. Romanov. 28–43. Henderson. pp.” RFC. Mag. or (3) introduce alternative concepts for network state estimation through delay measurements (DUAL. H. [23] G. DSACK). 7. Vermont. June 2001. Gerla and L. Illinois. and protocols for computer communications (SIGCOMM). Mauve. [4] A. 458–472. vol. vol. pp. pp. 7. T. Mogul. 7. Stowe. no. Not surprisingly. Floyd. [21] M. [11] J. Murata. 1984. (Special Issue on New Developments on QoS Technologies for Information Networks). Brakmo and L. [18] S. Two algorithms examined (Nice and LP). Mathis and J. Second. and M. Paganini. pp. solutions to the efficiency problem include algorithms that (1) refine the core congestion control principle by making more optimistic assumptions about the network (Reno. Hasegawa and M. Katz. 2004. pp.” in Proc. and P.. and A. Unfortunately. vol. and M. ACKNOWLEDGMENT The authors are very much obliged to Erik Kline and Janice Wheeler for the valuable input on the survey organization and sleepless nights spent in reading and correcting errors in this text.” ACM Computer Communication Review. [14] V. IX. pp. First. 3. 655. no. “A comparison of mechanisms for improving TCP performance over wireless links.” in Proceedings of the ACM workshop on Frontiers in computer communications technology (SIGCOMM). October 1995.” IEEE Trans. BIC and CUBIC. Lochert. vol. ACCEPTED FOR PUBLICATION the congestion problem.” email to the end2end list. technologies. pp. V. . 390–401. Mauve. Crowcroft. Altman. RR-TCP). no. 15. April 2007. 2. use packet loss to establish an approximated network resource limit. [6] H. “TCP Vegas: end to end congestion avoidance on a global Internet. 1992. [9] G.” RFC..” RFC.” RFC. “Survey on fairness issues in TCP congestion control mechanisms. [15] Z. S. 6. C. April 1980.” Wireless Communications and Mobile Computing. C-TCP. “Eliminating periodic packet losses in 4. Kleinrock. A PPENDIX See Figure 57 for an evolutionary graph of variants of TCP congestion control. 5. K. 22–36. no.” RFC. IEEE ICNP. Floyd. New York. 2007. Later proposals employed more intelligent techniques to make congestion control aggressive only when the network is considered congestion-free and conservative during a congestion state. Surveys Tutorials. Netw.. and RTT-unfairness. April 1990.” IEEE J. Another group of proposals (FAST. Denda. Commun. Stevens.” RFC. pp. [20] M. May/June 2001. no. “Flow control: a comparative survey. Yang. These proposals aim to solve the problem of poor utilization of high-speed or long-delay network channels by TCP flows. Leung. 13. E84-B. February 2002. Areas Commun. Veno). pp. architectures. “RFC2582—the NewReno modification to TCP’s fast recovery algorithm. “Forward acknowledgement: refining TCP congestion control.” in Proc. “Analysis and improvement of fairness between TCP Reno and Vegas for deployment of TCP Vegas to the Internet. or (3) by refining the network state estimation heuristic (PR. 4.3– Tahoe BSD TCP congestion control. “Congestion avoidance and control. vol. Africa. 1981. “Fragmentation considered harmful. which is used as a secondary criterion to estimate the current network state. Gurtov. [7] K. 522–535. USA. Mathis. 9–16. 1996. B.-C. 756–769. efficiency can be improved significantly by (1) delaying the control actions (TD-FR). RR). Henderson. pp.” IEEE Network. solutions. or (2) refine the TCP protocol to include extended reporting abilities of the receiver (SACK. R. Syst.” IEEE/ACM Trans. Floyd and K. J. Kurata. NewReno). 28–37.36 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. Postel. which allows the sender to estimate the network state more precisely (FACK. “An overview of packet reordering in transmission control protocol (TCP): problems. have the same aim: to provide an opportunity to send non-critical data reliably without interfering with other data transfers. 1. and D. Kent and J. A. S. Vegas. pp. Wang and J.. Paxson. Nagle.” IEEE Trans. Jacobson.. pp. [8] S. “RFC2581—TCP congestion control. no. The first proposals addressing this problem (HS-TCP. no. and A. including the intra-.

” IEEE J. 1997. vol. D. no. 11. pp. “TCP Veno: TCP enhancement for transmission over wireless access networks. Towsley. R. and A. Feldmann. 4. 7.AFANASYEV et al. pp.” SIGCOMM Computer Communication Review. [39] J. “RFC2001—TCP Slow Start. L. and W. Braden. SIGCOMM. Bennett. 2005. Karn and C. [37] T. Congestion Avoidance. Shectman. USA. “RFC2574—definition of the differentiated services field (DS field) in the IPv4 and IPv6 headers. Arkko.. Liew. 429–440. and D. Fast Retransmit. Eggert.” in Proc. M.” RFC. no.” RFC. no. Briscoe. M. “RFC1122—Requirements for Internet Hosts . [33] V. Third International Conference on Networking and Services (ICNS). Nichols. no. Hollot.. vol. Fu and S. 1998. Blake. Misra. Binczewski. SIGMETRICS.” RFC. 1997.-B. Ananda. Braden. Available: ftp://ftp.” Computational Methods in Science and Technology. 141–146. Weiss. Blake. and D. “On designing improved controllers for AQM routers supporting TCP flows. [32] J. Jacob. no. P. Partridge. Belter. [40] C. 4. B. Arthur. [27] R. 2. December 1999.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 37 RFC 793 Tahoe DUAL Reno FACK NewReno Vegas+ Vegas Congestion collapse Reordering Eifel DOOR TD-FR Veno Vegas A Low-priority PR RR Nice LP Westwood ABSE Wireless CRB BR Westwood+ BBE High-speed STCP YeAH HS-TCP Africa Compound BIC H-TCP ARENO CUBIC Illinois Libra Hybla Fusion FAST NewVegas TCPW-A LogWestwood+ Reactive (loss-based) Proactive (delay-based) Reactive (loss-based with bandwidth estimation) Fig. D. pp. 2001. Sel. 139–152. pp. and D. pp. “Packet reordering is not pathological network behavior. “End-to-end Internet packet dynamics. [Online]. vol. August 1998. Wang. no. Black. [29] D. Areas Commun. and N. 216–225. Przybylski. [24] C. 21. vol. vol.gov/ talks/sf-tcpimpl-aug98. 27. S. 1–14. Jacobson.Communication Layers. Floyd.” Computer Networs and ISDN Systems. L. 2001. 1. A. C. [28] W. and M. “TCP New Vegas: Improving the Performance of TCP Vegas Over High Latency Links.” RFC. Harle. Davies. 3. [34] M. and A. [25] K. Evolutionary graph of variants of TCP congestion control. B.” in Proc. 1998. Gong. 789–798. Z. [36] S. Paxson. Sing and B.” Presentation to the TCPIMPL Working Group. Partridge. Lehane. Chiu and R. [30] S. [41] J. Bu and D. pp. 4th IEEE International Symposium on Network Computing and Applications (IEEE NCA05).” IEEE/ACM Trans.” Computer Communications. “RFC1323—TCP Extensions for High Performance. Towsley. pp. June 2007. “Improving round-trip time estimates in reliable transport protocols. “RFC2575—an architecture for differentiated services. “Shall we worry about packet reordering. Borman. 1989. 6. 1726–1734. C. [26] P. vol. NY. Netw. “TCP Vegas-A: Improving the performance of TCP Vegas. Stevens. [38] C. 17. [35] K. Black. “Dagstuhl perspectives workshop on end-to-end protocols for the future . vol.lbl. “Analysis of the increase and decrease algorithms for congestion avoidance in computer networks. Baker. Srijith. 1987. 73–80. Soh. pp. 1992. IEEE INFOCOM. V. February 2003. New York. 2005. “Keeping order: Determining the effect of TCP packet reordering. F. 2005. E.” in Proc. Carlson. Jain. 28. 57.” in Proc. A. 1989. Handley.” in Proc. “Revisions to RFC 2001.-M.pdf [31] V.” RFC.ee. 2. and W. “Fixed point approximations for TCP behavior in an AQM network.

. “TCP Nice: A Mechanism for Background Transfers.” IEEE Proc. M. Seshan. Rhee. 2005. vol. H. New York. Floyd. “One-way delay estimation and its application. 2008.S. Bunn. ACCEPTED FOR PUBLICATION [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] internet. Wang.” SIGOPS Operating Systems Review. “M-TCP: TCP for mobile cellular networks. 1991. S.11 wireless networks: the definitive guide. X. vol. 1. 2003. Kaneko. pp. no. http://tools. 42.” UCLA Computer Science Department. M. 28. July 2008. and D. and I. ——. Paganini et al. F. pp. Caini and R. “Improving TCP performance over mobile adhoc networks with out-of-order detection and response. Brown and S. Clark and W. “RR-TCP: a reordering-robust TCP with DSACK. California.” in Proc. vol. Obraczka. pp. 2514–2524. PFLDnet. [90] S. IEEE GLOBECOM. Netw. Floyd. May 2003. no. 2004. November 20. W. International Conference on Distributed Computing Systems. 2005. USA. “RFC3168—the addition of explicit congestion notification (ECN). 42–47. [68] H. M. “I-TCP: Indirect TCP for Mobile Hosts. 1. Balakrishnan. 819–828. vol. 2002. Kelly. S.” in Proc. M. no.” Computer Communications. pp. pp.-H. vol. vol. J. Kuzmanovic and E. “Non-congestion packet loss detection for TCP error recovery using wireless links. K. Murase. pp. 802. Wei. pp. J. “Explicit allocation of best-effort packet delivery service. 2. and M. R. IEEE GLOBECOM. “HSTCP-LP: A protocol for low-priority bulk data transfer in high-speed high-RTT networks. and M. and M.” Operating Systems Review. M. Bohacek. “End-to-end differentiation of congestion and wireless losses. 2002.” IEEE Network. Sanadidi. S. 5. Areas Commun. M. Les Cottrell. “TCP Westwood: Bandwidth estimation for enhanced transport over wireless links. vol. “Method and apparatus for TCP with faster recovery.ietf. April 2003. Peterson. Ludwig and A. PFLDnet. E. vol. Rutgers University. 2001. pp. Knightly. 222– 233. O’Reilly & Associates. 4. A. 4. Low. Tan.” IEEE J. P. Mathis. Floyd. Jin. Seventh International Symposium on Computers and Communications. 2002. Granelli. Rhee and L. Kliazovich. A. Davie. [88] R. 23. Dahlin.” RFC.38 IEEE COMMUNICATIONS SURVEYS & TUTORIALS. [75] D. 2001. March 2005. 1157–1165. November 2002. Cottrel. Badrinath. IEEE INFOCOM. S. no.” U. and S.” ACM SIGCOMM Computer Communication Review. 217–225. Venkataramani. Katto. “TCP Hybla: a TCP enhancement for heterogeneous networks. 21. 8–18. vol. and M. S. Singh.” Computer Communications Review. pp. vol. Y. pp. TSVWG. July 2002. “TCP with senderside intelligence to handle dynamic. Newman. New Reno and Vegas TCP congestion control.” in Proc. no.” in Proc. Kuzmanovic and E. Valla. and L. K. “Understanding the performance of TCP pacing. vol. 739–752. 235–248. “RFC3742—Limited slow-start for TCP with large congestion windows. 131–136. vol. Marina Del Rey (Los Angeles). 4. T. [93] A. Valla. Cen. Grieco and S. 23. G. architecture. Lim. “Logarithmic window increase for TCP Westwood+ for improvement in high speed. “TCP-LP: low-priority service via end-point congestion control. Ravot. Doyle. Choi and C. M. 13. King. J. pp. Gerla. 4. D. vol. 2003. no. 2000. Vacirca. 19. vol. ISI. IEEE INFOCOM. Leith and R. E. Netw. Ng. [73] T. vol. 3548–3552. Cosman. [71] S. M. Choe. Sanadidi. and M. 1994. 27. “CUBIC: a new TCP-friendly high-speed TCP variant. Baiocchi. and F. Su. 14. Baraniuk. Karp. April 2004. Gerla. pp. K. S. 30–36. and J. R. performance. Sanadidi. no. 1999. B. “Deployment experience with differentiated services. Knightly. 1997. “Improving efficiency-friendliness tradeoffs of TCP congestion control algorithm. and D. Leith. IETF. no. Ramakrishnan. Samaraweera. 547–566. J. G. Ludwig and R. “TCPPR: TCP for Persistent Packet Reordering. 1995.” IEEE/ACM Trans. pp.” in Proc. Sel. “YeAH-TCP: yet another highspeed TCP. 95–106. 5. Gerla. Sanadidi.” IEEE/ACM Trans. W. Wei. no. NY. Singh. Kokku.” Department of Computer Science. Mascolo. Zhang. February 2005. Floyd. no. no.” in Proc.. April 2003. Harfoush. M. NY. [78] L. 2004. S. 5. Palazzi. T. Jin. Y. R. D. New York. Wei. R. 5. Roccetti. pp. and S.” RFC. 2009.. 14. H. “RFC3649—HighSpeed TCP for large congestion windows. J. Shimonishi. pp. Z. “FAST TCP: From Theory to Experiments. 52. 1838–1848. and R. Ravot. 2395– 2410. 4. and R. 2006. why do we care?. H. March 2000. D. 11th IEEE International Conference on Network Protocols (ICNP). “FAST TCP: motivation. pp. pp. O. Gurtov. Liu. IEEE INFOCOM. Clark. 362–373.” Computer Networks. Savage.” IEEE International Conference on Communications 2003. “A compound TCP approach for high-speed and long distance networks. 2005.” RFC. Shenker. pp. M. “TCP-Fusion: a hybrid congestion control algorithm for high-speed networks. Basar.” Proc. Doyle.” SIGCOMM Computer Communication Review. “RFC4015—the Eifel response algorithm for TCP. no. Gerla. Katz. “H-TCP: TCP congestion control for high bandwidth-delay product paths. 2006. [69] L. Wang and Y. Bunn. IEEE INFOCOM. S. J.” in Proc. Voelker. 4–11. Rep. Wang.” in Proc. 3. March 2004. [87] K. 287–297. no. [74] S. S. “TCP-Illinois: A loss and delay-based congestion control algorithm for high-speed networks. Xu.” in Proceedings of PFLDnet.. “TCP-Africa: an adaptive and fair rapid increase rule for scalable TCP. Yamada. First International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS). [92] S. Mascolo. Gerla. pp. no. “Binary increase congestion control for fast. [89] K. August 2008. 36. [85] G. Sridharan. “Efficiency/friendliness tradeoffs in TCP Westwood. A. S.. Black.” July 2005. 1998. UCLA-CSD TR-050037. Martin.” IEEE/ACM Trans. Netw. Sanadidi. 133– 147. 469–481. chapter 2. 11. 2003. 2005. and M. “RR-TCP: a reorderingrobust TCP with DSACK. 2. Sanadidi. “TCP-LP: a distributed algorith for low priority data transfer. 2604–2608. F. Podolsky. algorithms.” Wireless Networks. Bakre and B. Hespanha. “Improving reliable transport and handoff performance in cellular wireless networks. G.” in Proc. no. and M. M. 329–344. M. and C.” SIGCOMM Computer Communication Review. 2004. 1. [83] C. Q. ISI. pp. X. 2.” IEEE/ACM Trans. “RFC2883—An Extension to the Selective Acknowledgement (SACK). “TCPW with bulk repeat in next generation wireless networks. leaky pipes. and T. 342. 2006. Marina Del Rey (Los Angeles). 6.” International J. and M. 39. Mahdavi. Xu. Sebastopol. Buhrmaster. 12. R. Anderson. Yang. 64– 74. S. Inc. 674–678. Wang. Pau. and R. “FAST TCP: from theory to experiments. “Scalable TCP: improving performance in highspeed wide area networks. [70] N. May 2005. TR-02-006. pp. Knightly.” SIGCOMM Computer Communication Review. vol. pp. “Internet QoS: A big picture. 22. Gerla. C. Netw. S. Rep. vol. Kuzmanovic. Gerla. vol. H. vol.” December 2003. Patent 7 299 280. “TCP Libra: Exploring RTT-Fairness for TCP.. Tech. M. H. 2007. 1.” International Computer Science Institute. 222–230. Jin. Communications. and D. pp. [91] A.” RFC. Rep. “Improving efficiencyfriendliness tradeoffs of TCP in wired-wireless combined networks. CA. 6. Shorten. no. A. Gast and M. “H-TCP: TCP for high-speed and longdistance networks. Choe. 4. Aggarwal. 703–717. March 2003. Yoo. R. A. 3. [77] C. Satellite Communications and Networking. J. California. February 2007. C. vol. 19– 43. C. 2005. Shimonishi and T. Tech. R.” IEEE Network. vol. [81] I. [82] C.” K. “The Eifel algorithm: making TCP robust against spurious retransmissions. vol. 2. Fujikawa. 304–311. Floyd. 1246–1259. 2002. Riedi. “Observations on the dynamics of a congestion control algorithm: The effects of two-way traffic. M. J.. Ni. [76] D. Low. Hegde. Sanadidi. Katz. Feng. 2003. Low. and M. F. HighSpeed TCP and Quick-Start for Fast Long-Distance HighSpeed TCP and Quick-Start for fast longdistance networks (slides). Lee. IEEE ICC. August 1999. [72] S. vol.” in Proc. Xiao and L. D. Wang. Firrincieli. M. Newman. P.” RFC.” in Proceedings of the 3rd ACM international symposium on mobile ad hoc networking & computing. vol. [84] D. [80] D. Paganini. . long distance networks. vol. Loukides. Singh. pp. L. J. 3. Floyd. Miorandi. Zhang. Zhang. pp. Castellani.” in Proc. Srikant. pp. “Performance evaluation and comparison of Westwood+. pp.” in Proceedings of the ACM SIGCOMM workshop on Revisiting IP QoS: What have we learned. S. R.” in Proc. H. Gerla. 32.” ACM Computer Communication Review. pp. 2003. Song. 2000. and R. Zhang. Sanadidi. 7. S. and R. “Adaptive bandwidth share estimation in TCP Westwood. and K. 30.” in Proc. pp. long distance networks.org/html/draftleith-tcp-htcp-06. Wang. C. pp. A. Tech. Marfia. [86] H. large. ACM MOBICOM. no. F. Casetti. [79] R. and G. vol.” IETF Internet Draft. 146. Fang. February 2007. DCSTR-314. A. vol. B. M. B.

S. He received his M. He received honorary doctorates from CCNY (1997). pp. file systems. Moscow. He is a Distinguished Professor of Computer Science at UCLA and served as chairman of the department from 1991 to 1995. for mentoring generations of students and for leading the commercialization of technologies that have transformed the world. from Massachusetts Institute of Technology in 1963. in Computer Science at the University of California. the CCNY Electrical Engineering Award. Neil Tilley received his bachelor’s degree from the University of California. Amherst (2000). 2003. no. Dovrolis. He has been pursuing a Ph.” in Proceedings of IEEE INFOCOM. “SCTP: a proposed standard for robust internet data transport. ubiquitous computing. the L. and L. Amer. Portland. Tenth International Conference on Computer Modeling and Simulation. Chen. Belhaj. 2008. [100] “End-to-end mailing list. for the functional specification of packet switching which is the foundation of the Internet Technology. I. the highest honor for achievement in science bestowed by the President of the United States. USA. Davis. the Marconi Award. the ACM SIGCOMM Award. “Open issues in router buffer sizing. pp. “TCP Pacing Revisited. the Communications and Computer Prize. P. [96] R. and M.postel. Alexander Afanasyev received his B. multimedia systems. Dhamdhere and C. the IEEE Internet Millennium Award. Gerla. He was listed by the Los Angeles Times in 1999 as among the “50 People Who Most Influenced Business This Century. He has done research in the fields of distributed operating systems. In 2006 he received the medal for the best student scientific project in Russian universities.E. J. P. His current research interests include parallel and networked systems. 2006. Cao.” in Proc.D. [99] A. Los Angels since 2009. NEC C&C. Ladha. S. the National Academy of Engineering. Low. [98] A.org/e2e. and S.Tech. vol. Russia in 2005 and 2007. Politecnico di Torino (2005). Le. Iyengar. 56–63. the NAE Charles Stark Draper Prize. Nara. in Electrical Engineering and Computer Science from the University of Notre Dame in 1979. and the IEEE Harry Goode Award. the Lanchester Prize.” He was also listed as among the 33 most influential living Americans in the December 2006 Atlantic Monthly. He is currently working towards his Ph. Ha. Kleinrock’s work was further recognized when he received the 2007 National Medal of Science. Peter Reiher received his B. Lao.” . 11. This Medal was awarded “for fundamental contributions to the mathematical theory of modern data networks. and peer-to-peer environments. Y.” in Fourth International Workshop on Protocols for Fast Long-Distance Networks. Ericsson Prize.D.: HOST-TO-HOST CONGESTION CONTROL FOR TCP 39 [94] S.D. an ACM Fellow. Oregon.” in Proceedings of SIGCOMM. the University of Bologna (2005). no. “A step toward realistic performance evaluation of high-speed TCP variants. peer-to-peer networks and intelligent agents.” ACM SIGCOMM Computer Communication Review. network and distributed systems security. [95] S. and Ph. “VFAST TCP: an improvement of FAST TCP. an IEEE Fellow. and the University of Judaism (2007). degree from City College of New York (CCNY) in 1957 and received his Ph. nomadic computing. August/September 2004. the University of Massachusetts. Y. 87–92. Kapoor. Dr. and optimistic parallel discrete event simulation. L. the INFORMS Presidents Award. gigabit networks. “CapProbe: A simple and accurate capacity estimation technique. He is a member of the American Academy of Arts and Sciences. Among his many honors. Leonard Kleinrock received his B. Japan. packet radio networks. mobile computing. Reiher is an Adjunct Professor in the Computer Science Department at UCLA. 1. and M.D. and a founding member of the Computer Science and Telecommunications Board of the National Research Council. respectively. in Computer Science from UCLA in 1984 and 1987. the UCLA Outstanding Teacher Award. local area networks.” http://www. he is the recipient of the CCNY Townsend Harris Medal. network security. His research interests include network systems. Sanadidi.-J. Kim. He has published more than 250 papers and authored six books on a wide array of subjects including queuing theory.S. packet switching networks.AFANASYEV et al. the Okawa Prize. Rhee. respectively. 2006. Wei.M. vol. degree in computer science at the University of California. and K. L. M. the Sigma Xi Monie Ferst Award. Xu. degrees in Computer Science from Bauman Moscow State Technical University. mobile systems.” Computer. L. [97] D. 36.htm. Caro Jr.Tech. 36. Los Angeles in the Laboratory for Advanced System Research. March 2006.E. broadband networks. Shah.

Sign up to vote on this title
UsefulNot useful