You are on page 1of 19

U

N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Journal Code: Article ID Dispatch: 26.06.12 CE:
D A C 2 4 0 2 No. of Pages: 12 ME:
INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS
Int. J. Commun. Syst. (2012)
Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/dac.2402
A survey on TCP Incast in data center networks
Yongmao Ren
1,
*
,
, Yu Zhao
1,2
, Pei Liu
1,2
, Ke Dou
1,2
and Jun Li
1
1
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
2
Graduate University of Chinese Academy of Sciences, Beijing 100049, China
SUMMARY
In high bandwidth, low
Q1
latency data center networks, when multiple data senders simultaneously commu-
nicate with a single receiver, namely in many-to-one communication pattern, the burst data overload the
receivers switch buffers, which leads to Transmission Control Protocol
Q2
(TCP) throughput collapse. This
is the so-called TCP Incast problem. It has become a new hot research topic in recent years. Also, many
proposals have been put forward from the aspects of multiple layers including link layer, transport layer,
and application layer, etc., to mitigate TCP Incast. In this paper, an
Q3
in-depth survey on these proposals is
given, and the principles, merits, and drawbacks of the main proposals for solving the TCP Incast issue are
analyzed and summarized. Copyright 2012 John Wiley & Sons, Ltd.
Received 1 December 2011; Revised 9 May 2012; Accepted 13 June 2012
KEY WORDS: TCP Incast; data center networks; congestion control; cloud computing
1. INTRODUCTION
Cloud computing services and applications need big data centers for support. Companies like
Google, Microsoft, Amazon, and IBM use data centers for Web search, storage, e-commerce, and
large-scale general computations. The main characteristics of a data center network are high-speed
links, low propagation delays, and limited-size switch buffers. Transmission Control Protocol Q4 (TCP)
is currently the most popular transport protocol used in the Internet. It is also popularly used in data
center networks. However, the unique workloads scale, and environments of the data center works
violate the WAN assumptions on which TCP was originally designed. For example, in contempo-
rary operation systems such as Linux, the default retransmission timeout (RTO) timer value is set to
200 ms, a reasonable value for WAN, but two to three orders of magnitude greater than the average
roundtrip time in the data center network [1]. Also, there is some performance bottleneck in Linux
TCP, which will have some negative effects on the data center network [2].
One communication pattern, termed Incast [3] by researchers, elicits a pathological response
from popular implementations of TCP. Figure 1 shows a typical TCP Incast scenario used by many F1
literatures. In this pattern, a client connects to the data center via a switch, which in turn is connected
to many servers. The client requests data from one or more servers and the data are transferred from
the servers to the client via the bottleneck link from the switch to the client in a many-to-one fashion.
The client requests data using a large logical block size (e.g., 1 MB). Also, the actual data blocks are
stripped over many servers using a much smaller block size (e.g., 32 KB) called the server request
unit (SRU). A client issuing a request for a data block sends a request packet to each server that
stores data for the requested data block. The request, which is served through TCP, is completed
only after all SRUs from the requested data block have been successfully received by the client.
*Correspondence to: Yongmao Ren, Computer Network Information Center, Chinese Academy of Sciences, Beijing
100190, China.

E-mail: renyongmao@cstnet.cn
Copyright 2012 John Wiley & Sons, Ltd.
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
2 Y. REN ET AL.
Figure 1. Typical TCP Incast scenario.
As the number of concurrent senders increases, the data transfer workload can overow the buffer
at the bottleneck switch, leading to packet losses, and subsequent TCP retransmissions. Therefore,
signicant degradation of application goodput called TCP-Incast throughput collapse can occur.
The TCP Incast issue potentially arises in many typical data center applications. For example, in
cluster storage, when storage nodes respond to requests for data, in Web search, when many workers
respond near simultaneously to search queries, and in batch processing jobs like MapReduce [4],
in which intermediate key-value pairs from many Mappers are transferred to appropriate Reducers
during the shufe stage.
The TCP Incast issue has already attracted many researchers interest because of the development
of Cloud computing. Many possible solutions have been proposed from the aspects of multiple
layers mainly including link layer, transport layer, and application layer. However, some of them
have high efciency but the costs are also high, and, some need low cost but have weak efciency.
In this paper, the principles, merits, and drawbacks of the main proposals for solving the TCP Incast
issue will be analyzed and summarized.
2. METHODS OF SOLVING THE TCP INCAST ISSUE
Generally speaking, there are two main methods of solving the TCP Incast issue. One is trying to
avoid the generation of packet loss or reduce packet loss. Also, the other is trying to quick recover
after packet loss, to reduce the effect of packet loss.
2.1. Reduce packet loss
A key reason for TCP Incast is the many-to-one transport pattern. Many data senders send data
simultaneously, and the bottleneck switch buffers are overloaded, which leads to packet loss.
Therefore, to reduce packet loss, there are some specic possible measures. For example, increasing
the switch buffer [3], globally scheduling of data transfers [5], and explicitly informing the
congestion state to data senders so that the senders can adjust sending rate correspondingly [6, 7],
and so on.
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
TCP INCAST IN DATA CENTER NETWORKS 3
2.2. Quick recovery
Once the packet loss occurs, the TCP sender will wait for duplicate ACKs arriving or RTO timer
timeout occurring, and then TCP will reduce the congestion control windowand enter the congestion
recovery process. Possible solutions include reducing RTO value to let the TCP sender enter
retransmission more quickly [1, 3], and designing a quicker congestion recovery algorithm, and
so on [8].
3. EXISTING PROPOSALS AND ANALYSIS
So far, many possible solutions have been proposed. By the aspects of consideration, they can be
mainly divided into link layer proposals, transport layer proposals, application layer proposals, and
other proposals.
3.1. Link layer proposals
At
Q5
the link layer, there is much research on congestion control and ow control in wireless networks
[9]. In data center networks, congestion control and ow control are mainly two methods to mitigate
the TCP Incast issue. The IEEE 802.1Qau Congestion Notication project [10] is concerned with
the specication of a Layer 2 congestion control mechanism, in which a congested switch can con-
trol the rate of Layer 2 sources whose packets are passing through the switch, like TCP/RED [11].
Also, the IEEE 802.1Qbb Priority-based Flow Control project [12] is concerned with introducing a
link-level, per-priority ow control or PAUSE function.
3.1.1. Congestion control. The QCN (quantized congestion notication) algorithm [6] is a famous
algorithm developed for inclusion in the IEEE 802.1Qau standard to provide congestion control at
Layer 2 in data center networks. It is composed of two parts.
The Congestion Point (CP) algorithm: The switch buffer samples incoming packets and generates
a feedback message containing the information about the extent of congestion at the CP to the
source. The CP buffer is shown in Figure 2. The congestion measure F
b
is calculated by the formula F2
F
b
D

Q
off
Cw

, (1)
whereQ
off
DQQ
eq
, Q

DQQ
old
DQ
a
Q
d
, Q denotes the instantaneous queue-size, Q
eq
is
the desired operating point, Q
old
is the queue-size when the last feedback message was generated,
Q
a
and Q
d
denote the number of arriving and departure packets between two consecutive sampling
times respectively, and w is a non-negative constant. F
b
captures a combination of queue-size excess
(Q
off
/ and rate excess (Q).
The Reaction Point (RP) algorithm: The rate limiter associated with a source decreases its send-
ing rate based on feedback received from the CP, and increases its rate voluntarily to recover lost
bandwidth and probe for extra available bandwidth. Figure 3 shows the basic RP behavior. F3
Figure 2. Congestion detection in QCN CP [6].
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
4 Y. REN ET AL.
Figure 3. QCN RP operation [6].
Rate decreases: When a feed message is received, current rate (CR) and target rate (TR) are
updated as follows:
TR CR (2)
CR CR.1 G
d
jF
b
j/, (3)
where the constant G
d
is chosen to ensure that the sending rate cannot decrease by more than 50%,
and thus G
d

jF
bmax
j D1=2, where F
bmax
denotes the maximum of F
b
.
Rate increases: This occurs in two phases: Fast Recovery (FR) and Active Increase (AI).
Fast Recovery: The CR is updated as follows:
CR
1
2
.CRCTR/ (4)
Active Increase: At the end of a feedback message, the rate limiter updates TR and CR as follows:
TR TRCR
AI
(5)
CR
1
2
.CRCTR/, (6)
where R
AI
is a constant chosen to be 5 Mb/s in the baseline implementation.
The authors in [13] proposed modications on the CP and RP algorithm. They found that if every
packet is sampled, the performance is much better and collapse does not occur even for simulations
with 32 KB buffer size. However, sampling every packet might not be necessary. The authors pro-
posed two strategies that offer almost the same level of performance as sampling every packet, while
requiring fewer packets to be sampled during periods without congestion. Also, they proposed to
reduce the amount by which an RP increases its rate during congestion by making the self-increase
rate (R_AI) congestion aware. This is done by setting R_AI 5 Mb/s when no congestion occurs, but
a fraction of this amount with negative feedback.
Approximately Fair QCN [14] proposes modications on QCN to ensure a faster convergence to
fairness than QCN. QCN sends the same congestion feedback value to all ows, while Approxi-
mately Fair QCN distinguishes each ow based on their (estimated) sending rates and adjusts the
feedback to each ow accordingly.
Fair QCN [15] was proposed to improve fairness of multiple ows sharing one bottleneck link. It
feeds QCN messages back to all ow sources, which send packets with the sending rate over their
share of the bottleneck link capacity. The congestion parameter is calculated as follows:
F
b
.i / D
A
i
P
N
0
kD1
A
k
F
b
, (7)
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
TCP INCAST IN DATA CENTER NETWORKS 5
where N
0
is the total number of overrate ows, and A
k
is the total number of the received packets
from the k
th
overrate source ow. Thus, F
b
.i / is proportional to F
b
.
3.1.2. Flow control. Phanishayee et al. proposes using Ethernet ow control (EFC) to solve the
TCP Incast problem [3]. The overloaded switch that supports EFC can send a pause frame to the
interface sending data to the congestion buffer, informing all devices connected to that interface to
stop or forward data for a designed time. During this period, the overloaded switch can reduce the
pressure on its queues.
The simulation experiment results show that EFC is effective with the conguration that all clients
and servers connect to a single switch. However, it does not work well with multiple switches
because of head-of-line blocking. When the pause frame from one congested buffer intends to stop
the ow causing the congestion, it also stops other ows because of the switchs FIFO mecha-
nism. Another problem with EFC is the inconsistency of switchs implementation between different
switch vendors.
For the above problems, a number of recent Ethernet initiatives [16] have been introduced to add
congestion management with rate-limiting behavior and to improve the pause functionality with a
more granular per-channel capability. These initiatives contribute to creating a lossless and ow-
control version Ethernet, referred to as Data Center Ethernet. IEEE 802.1Qbb priority ow control,
one of the functions of IEEE 802.1 Data Center Bridging, extends the basic IEEE 802.3x PAUSE
semantics to multiple CoSs Q6 , enabling applications that require ow control to coexist on the same
wire with applications that perform better without it [17].
Ethernet with lossless behavior will mitigate the Incast problem effectively, but new standards
implementation in switch will take time and money.
3.2. Transport layer
At transport layer, the proposals can be divided into two types. One is modifying TCP parameters
while keeping the TCP protocol unchanged, and the other is designing enhanced TCP protocols.
3.2.1. Modication on Transmission Control Protocol parameters.
3.2.1.1. Reducing the minimum retransmission timout timer. The default value of the TCP mini-
mum RTO timer is 200 ms, which was originally designed for WAN environments. Unfortunately,
this value is orders of magnitude greater than the round-trip time in general data center networks,
which is typically around 100 s. Reducing the RTOmin to avoid TCP Incast is reasonable for the
reason that this large RTOmin imposes a huge throughput leisure because the transfer time for each
data block is signicantly smaller than RTOmin. The simulation experiment results in [3] indicate
that reducing the minimum value of the RTO timer from the default 200 ms to 200 s improves
goodput by an order of magnitude. However, as the authors also pointed out, reducing RTOmin
to 200 s requires a TCP clock granularity of 100 s, according to the standard RTO estima-
tion algorithm. Also, BSD TCP and Linux TCP implementations are currently unable to provide
this ne-grained timer. Therefore, it is hard to implement. Moreover, reducing the RTOmin value
might be harmful, especially in situations where the servers communicate with clients in the wide
area network.
The practical experiment results in [1] also veried this idea. Different RTOmin values ranging
from 1 to 200 ms are experimented and compared. The basic result shows that smaller minimum
RTO timer values mean larger values for the goodput.
3.2.1.2. Disabling the delayed ACK. The TCP delayed ACK mechanism attempts to reduce the
amount of ACK trafc by having a receiver acknowledge only every other packet. If a single packet
is received with none following, the receiver will wait up to the delayed ACK timeout threshold
before sending an ACK. The default minimum delayed ACK in Linux is 40 ms. In the smaller RTO
environment, such as when the RTO is below 40 ms in the datacenter, before the sender receives the
ACK, it will incorrectly assume that a loss has occurred. The practical experiment results in [18]
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
6 Y. REN ET AL.
show that the goodput with delayed ACK disabled is a bit bigger than the goodput with delayed
ACK enabled when the number of servers is beyond 8.
3.2.1.3. Removing binary exponential backoff. Reference [19] points out that removing the TCP
binary exponential backoff (BEB) can benet throughout. Because under severe congestion, more
than 50% timeouts can invoke BEB, and then make the server stalled; the link is underutilized
during the waiting time for the stalled server to recover. Thus, it causes drastic dropping of the
throughput. Also, the data center network is different from the Ethernet. In the context of the data
center network, packets are transferred by store-and-forward rather than broadcasting; applications
on nodes are synchronized read, which means that nodes (client or server) do have limited packets
to send. Thus, BEB is not suitable in the data center.
Removing BEB can mitigate the Incast problem, resulting in smaller dispersion of response time
for all servers to return their portion of data to the client [19]. The simulation experiment results in
[19] show that it does not advance the onset of Incast collapse. Instead, it can even benet from it
when using larger SRU size and lower RTOmin.
However, Ref. [1] also proposes the similar solution: smaller multiplier for the RTO exponential
back off and randomized multiplier for the RTO exponential back off. In contrast, it points out that
both of the solutions are unhelpful in preventing the TCP Incast problem, for the reason that there
is only a small number of exponential back off for the entire transfer.
Although Ref. [1] points out that altering exponential back off behavior has little impact on mit-
igating the Incast, it did not describe its NS Q7 simulation environment to provide convincing result.
In [19], it works well under severe congestion (more than 256 servers). However, it has little effect
in middle and slight congestion. Therefore, the severe congestions occurrence of the entire transfer
needs further measurement.
3.2.2. Enhanced Transmission Control Protocols. In recent years, there has been much research
work on TCP congestion control in different environments, especially in wireless networks [20, 21],
and fast long distance networks [22]. Also, there are also several TCP variants designed for data
center networks. Among them, two noted enhanced TCP protocols are DCTCP (Data Center TCP)
[7] and ICTCP (Incast Congestion Control for TCP) [8].
3.2.2.1. Data Center TCP. The goal of DCTCP is to achieve high burst tolerance, low latency, and
high throughput, with commodity shallow buffered switches in common data center networks. To
this end, DCTCP is designed to operate with small queue occupancies, without loss of throughput
by using Explicit Congestion Notication (ECN) in the network to provide explicit feedback to the
end hosts.
The DCTCP algorithm has three main components:
(1) Simple marking at the switch side. There is a single parameter, the marking threshold, K in
the switch side. An arriving packet is marked with the CE Q8 codepoint if the queue occupancy is
greater than K upon its arrival. Otherwise, it is not marked.
(2) ECN-Echo at the receiver side. ACK every packet, setting the ECN-Echo ag if and only if
the packet has a marked CE Q9 codepoint.
(3) Controller at the sender side. The sender maintains an estimate of the fraction of packets that
are marked, called , which is updated once for every window of data (roughly one RTT)
as follows:
.1 g/ Cg F, (8)
where F is the fraction
Q10
of packets that were marked in the last window of data, and 0 < g < 1 is
the weight given to new samples against the past in the estimation of . The close to 0 indicates
low; otherwise, close to 1 indicates high levels of congestion. The reaction of the DCTCP sender
when receiving an ACK with the ECN-Echo ag set is as follows:
cwnd cwnd .1 =2/. (9)
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
TCP INCAST IN DATA CENTER NETWORKS 7
Thus, when is near 0 (low congestion), the window is only slightly reduced. When congestion is
high ( D1), DCTCP cuts window in half, just like TCP.
The authors in [23] developed a uid model to provide a mathematical analysis for the throughput
and delay performance of DCTCP. Their results show that DCTCPs throughput remains higher than
94% even as the threshold K goes to zero. This is much higher than the limiting throughput of 75%
for a TCP source as the buffer size goes to zero. Meanwhile, they pointed out that the convergence
of DCTCP is no more than 1.4 times slower than TCP, and the RTT-fairness of DCTCP is better than
TCPDrop-tail but worse than TCP with RED in the NS2 simulations. The authors also indicate that
in their analytic work based on the case where there is a single bottleneck, it will be very necessary
to understand the behavior of DCTCP in general networks in the future.
Although DCTCP performs well in the authors measurement in practical experiment environ-
ment, another study in data center congestion [24] indicated that nding a switch that supports ECN
is proved to be surprisingly difcult. The universality to deploy the hardware in common datacenters
seems to be decient.
3.2.2.2. Incast Congestion Control for TCP. Different from the previous approach to reduce the
impact of Incast congestion by a ne-grained timeout value, the ICTCP is designed at the receiver
side by adjusting the TCP receive window proactively before packet losses occur.
The ICTCP algorithm has two main components:
(1) Control trigger by evaluating available bandwidth. Assume the link capacity of the inter-
face on receiver server is C. Dene the bandwidth of total incoming trafc observed on that
interface as BW
T
. Then dene the available bandwidth BW
A
on that interface as,
BW
A
Dmax.0,

C BW
T
/, (10)
where 2 0, 1 is a parameter to absorb potential oversubscribed bandwidth during window
adjustment.
(2) Window adjustment on single connection. The expected throughput of connection i is
obtained as,
b
e
i
Dmax

b
m
i
, rwnd
i
=RT T
i

(11)
d
b
i
D

b
e
i
b
m
i

=b
e
i
, (12)
where b
m
is measured throughput and b
e
isthe expected throughput, rwnd and RT T are
the receive window and RT T for connectioni , respectively. The d
b
is the ratio of through-
put difference of measured and expected throughput over the expected one for connection i .
The window adjustment is shown in Table I. T1
The practical experimental results in [8] demonstrated that ICTCP was effective for avoiding
congestion by achieving almost zero timeout for TCP Incast, and it provided high performance and
fairness among competing ows.
The authors chose a different approach to implement the ICTCP in TCP stack by developing
ICTCP as a Network Driver Interface Specication driver on Windows Q11 . This approach can directly
Table I. The ICTCP window adjustment [8].
d
b
61 Increase receive window if there is enough quota
of available bandwidth on the network interface.
Decrease the quota correspondingly if the receive
window is increased.
d
b
>1 Decrease receive window by one MSS2 if this
condition holds for three continuous RTT. The
minimal receive window is 2

MSS.
1 <d
b
<2 Keep current receive window.
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
8 Y. REN ET AL.
support virtual machines, which are prevailing in data centers, but lead to poor portability for other
operating systems; meanwhile, the limitation of this approach increases the difculty to reproduce
the experiment.
3.3. Application layer
At the application layer, the main idea is how to reduce packet loss. A common idea is restricting
the number of participating severs in a synchronized data transfer. Specically, there are some ways
proposed as follows.
3.3.1. Increasing server request unit size. The probability of Incast events increases with the num-
ber of servers engaged in a synchronized data transfer to any client. To counter this, the clients
should request more data from fewer servers. The NS2 simulation experiment results in [3] illus-
trate that increasing the SRU size can improve the overall goodput. For a xed data block size,
increased SRU size leads to a data block being stripped across fewer severs, so clients can request
the same amount of data from smaller number of servers to avoid Incast. Besides, with larger SRU
size, severs will use the spare link capacity made available by any stalled ow waiting for a timeout
event.
However, stripping data across fewer servers is contrary to the design of cluster-based storage
systems where data are stored across many storage servers to improve both reliability and perfor-
mance. The storage system is also required to allocate pinned space in the client kernel memory to
handle larger SRU, thus it increases memory pressure, which may lead to client kernel failures in
le system implementations.
3.3.2. Staggering data transfer. Staggering data transfer is another way to limit the number of
synchronously communicating servers, and the staggering effect can be produced either by the
clients or at the servers [5].
Aclient can stagger data transfer by requesting only a subset of the total block fragments at a time,
or requesting from subset of servers, and then only a limited number of servers will synchronously
respond to the client.
Servers can randomly or deterministically delay their response to a request. A server can wait an
explicitly pre-assigned time or a random period before beginning its data transfer, thus limiting the
number of servers participating in a parallel data transfer. Servers can also respond to other requests
and prefetch data into their cache during the delayed period.
Although staggering data transfer is theoretically an ideal way to prevent Incast, there are so far
no sufcient experimental data to prove its efciency. Its performance needs to be tested and veried
in further researches and experiments.
3.3.3. Global scheduling of data transfer. Global scheduling of data transfer is required to handle
the situation that a client may be simultaneously running several workloads and making multiple
requests to different subsets of servers. This global scheduling can be designed to restrict the total
number of servers simultaneously responding to any client, thus avoiding Incast. One instantiation
of this idea is to use SRU tokens, where a server cannot transfer data to a client unless it has that
clients SRU token; during the period that a server waits for the right token, it can prefetch data into
its cache. Reference [5] gives this idea but no experiment.
The storage system may learn that to avoid experiencing Incast, it is safe for it to send data to
a given client from only k servers; the system would then create k SRU tokens for each client in
a global token pool. Each client can send requests to all servers containing the data it requests,
but only the servers that have been allocated that clients token can transfer the data to the client.
This restriction can achieve an optional goal that only k servers simultaneously send data to any
given client.
However, it is not easy to nd the optimal value of k. The storage system might obtain the value
of k either through manual conguration or real-time system measurements. For the former, many
experiments might be required. Furthermore, when the environment is changed, a new conguration
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
TCP INCAST IN DATA CENTER NETWORKS 9
need to be carried out manually again; for the latter, an efcient measurement model is required to
ensure that the storage system can obtain the real-time and optimal value of k.
3.4. Other proposals
In addition to the above proposals at each layer, there are also some other proposals considering the
physical layer, or transfer pattern, or data handling method, etc.
3.4.1. Larger switch buffer. To solve Q12 the primary cause of Incast timeouts, the root cause of
timeouts packet losses should be mitigated, by increasing the switch buffer allocated at the
Ethernet switch. With a large enough buffer space, Incast can be avoided for a limited number.
The simulation experiment results in [3] show that doubling the switchs output buffer doubles the
number of servers that can be supported before the system experiments Incast.
Unfortunately, with the number of servers increasing, a switch with a larger buffer size is needed
and that costs much more (the Force10 E1200 switch [23], which has very large buffers, costs
over $500,000). Thus, there will be a difcult choice between cost and over-provisioning for the
system designer.
3.4.2. Probabilistic retransmission. Another practical technique to solve TCP Incast is reduc-
ing the time spent detecting a congestion instead of avoiding timeouts. These techniques rely on
probabilistic retransmissions, kernel threads, and duplicate ACKs.
The algorithm in [25] is as follows. First, the kernel thread retransmits the highest unacknowl-
edged segment in the senders transmission window with a probability P, which is marked in one
of six reserved bits in the segment header. Second, the receiver will return a normal ACK followed
by Duplicate ACKs threshold (dupackthresh, number of duplicate ACKs) as a feedback about the
congestion to the sender depending on the receiving situation of the original segment and the retrans-
mitted segment. Third, the sender will automatically enter Fast Retransmit without waiting for
retransmission timeouts when it receives dupackthresh ACKs in a row.
The simulation experiment results in [25] show that the above algorithms perform well. The algo-
rithms show advantages compared with the default TCP (RTO D 200 ms) and even the modied
TCP (RTO D 200 s). But how to choose P is still a question to be considered. If P is set too
low, the technique will provide no signicant benet. Also, if it is set too high, it will cause unnec-
essary retransmission, which contributes further to the congestion at the switch [25]. In addition,
the paper does not refer to the performance of the number of servers more than 64, which also
needs considering.
3.5. Changing le stripping pattern.
In addition
Q13
to the factor of excessive severs participating in a synchronized data transfer, disk head
contention among clients caused by data access to popular les on I/O servers can also lead to
throughput collapse. The authors in [26] proposed a new le stripping strategy called storage server
grouping (SSG), which changes the le stripping pattern across the storage servers based on the
analysis of le popularity and impact of the number of storage servers on the clients perceived
performance (I/O speedup) to reduce the negative effect caused by Incast.
Storage server grouping is proposed as a framework that can automatically change le stripping
parameters, such as stripping unit size, stripping factor (the number of storage servers to store les),
and stripping index (the rst storage server for the stripping), in a online manner. It uses the proposed
I/O speedup model to nd the optimal number of storage servers before a le is stripped across stor-
age servers. The I/O speedup model is trained by using relative machine learning technique [27] to
correlate the number of storage servers with I/O performance of a workload. SSG keeps tracking
le popularity and intelligently separates les into different server groups by setting the stripping
index (the rst storage server for the stripping), reducing data access interference on each group.
SSG also periodically tunes the le stripping parameters based on the I/O workload characteristics
proled online. The authors have implemented the SSG scheme on top of a parallel le system,
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
0
1
0
2
0
3
0
4
0
5
0
6
0
7
0
8
0
9
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
3
2
3
3
3
4
3
5
3
6
3
7
3
8
3
9
4
0
4
1
4
2
4
3
4
4
4
5
4
6
4
7
4
8
4
9
5
0
5
1
5
2
5
3
5
4
1
0
Y
.
R
E
N
E
T
A
L
.
Table II. Comparative analysis of various proposals for TCP Incast.
Typical proposals Efciency Cost
Link layer Congestion control [6, 10, 1315] QCN can effectively control link rate, but performs High, need special switch to support.
poorly in Incast [13].
Flow control [3, 16, 17] EFC is effective in simple congure, but does not High, need to create a lossless Ethernet, but new
work well with multiple switches. standards implementation at switch take time and money.
Transport Modication on TCP It can mitigate the Incast problem slightly. Low. Just modify the TCP parameters.
layer parameters [1, 18, 19]
Enhanced TCP protocols [7, 8, 24] DCTCP performs much better than TCP in Incast High. DCTCP needs special switch to support ECN and
problem when concurrent senders less than 35 [7]. changing TCP source code.
ICTCP is effective to avoid congestion by achieving ICTCP needs changing TCP source code.
almost zero timeout for TCP Incast [8].
Application Increasing SRU size [3] The overall goodput can be improved. May cause memory pressure, latency and fairness issues.
layer Need further experiments and researches.
Staggering data transfer [5] It can theoretically avoid Incast, but lack of Need further experiments and researches.
experimental or practical data to prove its efciency.
Global Scheduling of Data [5] It can effectively ensure the throughput if the High if congured manually; low if a proper measurement
optimal value k could be obtained in real time. model applied.
Others Larger switch buffer [3] Incast can be avoided for a limited number High, the larger switch buffer, the more cost.
of servers.
Probabilistic retransmission [25] It is effective in avoiding Incast with proper Low, need kernel thread with three algorithms
probability P. to implement.
Changing le stripping SSG can improve system-wide I/O throughput by High if the popularity of les changes substantially and/or
pattern [2628] up to 38.6% and 22.1% on average. frequently; low if rarely changes.
C
o
p
y
r
i
g
h
t

2
0
1
2
J
o
h
n
W
i
l
e
y
&
S
o
n
s
,
L
t
d
.
I
n
t
.
J
.
C
o
m
m
u
n
.
S
y
s
t
.
(
2
0
1
2
)
D
O
I
:
1
0
.
1
0
0
2
/
d
a
c
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
TCP INCAST IN DATA CENTER NETWORKS 11
called Lustre [28]. Their experimental results show that SSG can improve system-wide I/O
throughput by up to 38.6% and 22.1% on average.
However, there might be some adaptability and usability problems in the SSG scheme. It can be
effective if the popularity of les rarely changes. Unfortunately, the cost may be high if the popu-
larity of les changes substantially and/or frequently, because every time this situation appears, the
reconguration of the stripping pattern is required accordingly.
4. COMPARISON
At the above section, the main proposals for solving the TCP Incast issue in a data center network
have been described and analyzed. Although there seems to be many solutions, their implementation
efciency and cost varies a lot. Here, a comparison between these proposals is given as Table II. T2
5. CONCLUSION
In this paper, the methods to solve the TCP Incast issue in data center networks have been
summarized. According to the two methods, reducing packet loss and quick recovery, the existing
proposals are described and analyzed by each TCP/Internet Protocol Q14 layer. Although there are many
proposed solutions so far, there is no perfect solution yet. Many proposals are still not mature and
need further research and experimentation. Besides these proposals, designing a novel and suitable
network architecture for data centers to improve its performance is also a hot research topic [2931].
ACKNOWLEDGEMENTS
This work was supported in part by the Knowledge Innovation Program of the Chinese Academy of Sciences
under Grant No.CNIC_QN_1203. The authors thank the editors and reviewers for their earnest and helpful
comments and suggestions.
REFERENCES
1. Chen Y, Grifth R, Liu J, Katz RH, Joseph AD. Understanding tcp Incast throughput collapse in datacenter networks.
In Proceedings of 1st ACM workshop on Research on enterprise networking, NY, USA, 2009; 7382.
2. Wu W, Crawford M. Potential performance bottleneck in Linux TCP. International Journal of Communication
Systems November 2007; 20(11):12631283.
3. Phanishayee A, Krevat E, Vasudevan V, David G, Andersen DG, Ganger GR, Gibson GA, Seshan S. Measurement
and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems. In Proc. 6th USENIX Conference on
File and Storage Technologies (FAST 08), San Jose, CA, February 2008; 2629.
4. Ding Z, Guo D, Liu X, Luo X, Chen G. A MapReduce-supported network structure for data centers. Concurrency
and Computation: Practice and Experience 2011
Q15
. DOI: 10.1002/cpe.1791.
5. Krevat E, Vasudevan V, Phanishayee A, Andersen DG, Ganger GR, Gibson GA, Seshan S. On Application-
level Approaches to Avoiding TCP Throuthput Collapse in Cluster-based Storage Systems. In Proceedings of 2nd
international Petascale Data Storae Workshop (PDSW 07), NY, USA, November 2007; 14.
6. Alizadeh M, Atikoglu B, Kabbani A, Lakshmikantha A, Pan R, Prabhakar B, Seaman M. Data Center Transport
Mechanisms: Congestion Control Theory and IEEE Standardization. In Proc. 46
th
Annual Allerton Conference,
Illinois, USA, September 2008; 12701277.
7. Alizadeh M, Greenberg A, Maltz D, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M. Data Center TCP
(DCTCP)
Q16
. In Proceedings of ACM SIGCOMM, NY, USA, September 2010.
8. Wu H, Feng Z, Guo C, Zhang Y. ICTCP: Incast Congestion Control for TCP in Data Center Networks. In Proceedings
of ACM CoNEXT, NY, USA, 2010.
9. Valarmathi K, Malmurugan N. Distributed multichannel assignment with congestion control in wireless mesh
networks. International Journal of Communication Systems; 24:15841594. DOI: 10.1002/dac.1234, 2011.
10. 1Qau - Congestion notication. (Available from: http://www.ieee802.org/1/pages/802.1au.html). [Last Accessed:
Oct. 13 2011].
11. Zhang C, Mamatas L, Tsaoussidis V. A study of deploying smooth- and responsive-TCPs with different queue
management schemes. International Journal of Communication Systems May 2009; 22(5):513530.
12. 1Qbb Priority-based Flow Control. (Available from: http://www.ieee802.org/1/pages/802.1bb.html). [Last
Accessed: Oct. 16, 2011].
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
12 Y. REN ET AL.
13. Devkota P, Reddy ALN. Performance of Quantized Congestion Notication in TCP Incast Scenarios of Data Centers.
In Proceedings of the 2010 18th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of
Computer and Telecommunication Systems, (MASCOTS), Miami Beach, FL, 2010.
14. Kabbani A, Alizadeh M, Yasuda M, Pan R, Prabhakar B. AF-QCN:Approximate Fairness with Quantized Congestion
Notication for Multi-tenanted Data Centers. In Proceedings of 18th IEEE Annual Symposium on High Performance
Interconnects (HOTI), Mountain View, CA, August 2010.
15. Zhang Y, Ansari N. On mitigating TCP Incast in Data Center Networks. In Proceedings of INFOCOM, Shanghai,
China, 2011.
16. Wadekar M. Enhanced Ethernet for Data Center: Reliable, Channelized and Robust. In Proceedings of 15th IEEE
Workshop on Local and Metropolitan Area Networks, NY, USA, June 2007; 6571.
17. CISCO white paper. Priority Flow Control: Build Reliable Layer 2 Infrastructure, June 2009
Q17
.
18. Vasudevan V, Phanishayee A, Shah H, Krevat E, Andersen DG, Ganger GR, Gibson GA, Mueller B. Safe and
Effective Fine-grained TCP Retransmissions for Datacenter Communication. In Proceedings of ACM SIGCOMM
09 Barcelona, Spain, August 2009.
19. Zheng H, Chen C, Chunming C, Qiao AC. Understanding the Impact of Removing TCP Binary Exponential Backoff
in Data Centers. In Proceedings of Third International Conference on Communications and Mobile Computing,
Qingdao, China, April 2011; 174177.
20. Ko E, An D, Yeom I, Yoon H. Congestion control for sudden bandwidth changes in TCP. International Journal of
Communication Systems. DOI: 10.1002/dac.1322, 2011.
21. Hou T-C, Hsu C-W, Wu C-S. Adelay-based transport layer mechanismfor fair TCP throughput over 802.11 multihop
wireless mesh networks. International Journal of Communication Systems; 24:10151032. DOI: 10.1002/dac.1207,
2011.
22. Ren YM, Tang HN, Li J, Qian HL. Transport protocols for fast long distance networks. Journal of Software 2010;
21(7):15761588.
23. Force10 E1200 switch. (Available from: http://www.nasi.com/force10_e1200.php).
24. StewartMichael Tuxen RR, Neville-Neil GV. An Investigation into Data Center Congestion with ECN. In Proceeding
of 2011 Technical BSD Conference (BSDCan 2011), Ottawa, CA, May 2011.
25. Kulkarni S, Agrawal P. A Probabilistic Approach to Address TCP Incast in Data Center Networks. In Proceedings
of ICDCS W, MN, USA, June 2011; 2633.
26. Zhang X, Liu G, Jiang S. Improve Throughput of Storage Cluster Interconnected with a TCP/IP Network Using
Intelligent Server Grouping. In Proceedings of 2010 IFIP international conference on Network and parallel
computing, September 2010.
27. Mesnier MP, Waches M, Sambasivan RR, Zheng AX, Ganger GR. Modeling the Relative Fitness of Storage.
In Proceedings of ACM SIGMERICS international conference on Measurement and modeling of computer systems,
NY, USA, June 2007.
28. Sun Microsystem, Inc. Lustre: A Scalable, High Performance File System. (Available from: http://www.lustre.org,
2009).
29. Guo C, Haitao W, Tan K, Shi L, Zhang Y, Songwu L. Dcell: a scalable and fault-tolerant network structure for data
centers. In Proceedings of ACM SIGCOMM 2008 conference on Data communication, NY, USA, 2008; 7586.
30. Guo C, Guohan L, Li D, Haitao W, Zhang X, Shi Y, Tian C, Zhang Y, Songwu L. BCube: a high performance,
server-centric network architecture for modular data centers. In Proceedings of ACM SIGCOMM 2009 conference
on Data communication, NY, USA, 2009; 6374.
31. Greenberg A, Jain N, kandula S, Kim C, Lahiri P, Maltz D, Patel P, Sengupta S. Vl2: a scalable and exible data
center network. In Proceedings of ACM SIGCOMM 2009 conference on Data communication, NY, USA, October
2009.
AUTHORS BIOGRAPHIES Q18
Yongmao Ren
Yu Zhao
Pei Liu
Ke Dou
Jun Li
Copyright 2012 John Wiley & Sons, Ltd. Int. J. Commun. Syst. (2012)
DOI: 10.1002/dac
Author Query Form
Journal: International Journal of Communication Systems
Article: dac_2402
Dear Author,
During the copyediting of your paper, the following queries arose. Please respond to these by annotating
your proofs with the necessary changes/additions.
If you intend to annotate your proof electronically, please refer to the E-annotation guidelines.
If you intend to annotate your proof by means of hard-copy mark-up, please refer to the proof mark-
up symbols guidelines. If manually writing corrections on your proof and returning it by fax, do
not write too close to the edge of the paper. Please remember that illegible mark-ups may delay
publication.
Whether you opt for hard-copy or electronic annotation of your proofs, we recommend that you provide
additional clarication of answers to queries by entering your answers on the query sheet, in addition
to the text mark-up.
Query No. Query Remark
Q1 AUTHOR: Please provide a suitable gure (abstract dia-
gram or illustration selected from the manuscript or an
additional eye-catching gure) and a short GTOC
abstract (maximum 80 words or 3 sentences) summarizing
the key ndings presented in the paper for Table of Content
(TOC) entry.
Q2 AUTHOR: Transmission Control Protocol. Is this the cor-
rect denition of TCP? Please change if incorrect.
Q3 AUTHOR: a deep was changed to an in-depth. Please
check if this is correct.
Q4 AUTHOR: Transmission Control Protocol. Is this the cor-
rect denition of TCP? Please change if incorrect.
Q5 AUTHOR: Please check all section heading levels if cor-
rect.
Q6 AUTHOR: Please spell out CoSs.
Q7 AUTHOR: Please provide developer name, city, state (if
US), country for NS.
Q8 AUTHOR: Please spell out CE.
Q9 AUTHOR: Please spell out CE.
Q10 AUTHOR: Please conrm if renumbering of equations
is OK as equation 8 was missing/skipped in the original
manuscript.
Q11 AUTHOR: Please provide developer name, city, state (if
US), country for Windows.
Q12 AUTHOR: Please check if changes to the rst sentence of
Section 3.4.1 are correct.
Q13 AUTHOR: All occurrences of striping and sriped
changed to stripping and stripped, respectively. Please
check if this is correct.
Q14 AUTHOR: Internet Protocol. is this the correct denition
of IP? Please change if incorrect.
Q15 AUTHOR: Please provide page range and volume number
for Reference 4.
Q16 AUTHOR: Please provide page range for References 78,
1315, 18, 24, 2627, 31.
Q17 AUTHOR: Please provide accessed date for References
17, 23, 28.
Q18 AUTHOR: Please provide author biographies.

USING e-ANNOTATION TOOLS FOR ELECTRONIC PROOF CORRECTION

Required software to e-Annotate PDFs: Adobe Acrobat Professional or Adobe Reader (version 7.0 or
above). (Note that this document uses screenshots from Adobe Reader X)
The latest version of Acrobat Reader can be downloaded for free at: http://get.adobe.com/uk/reader/

Once you have Acrobat Reader open on your computer, click on the Comment tab at the right of the toolbar:




























































1. Replace (Ins) Tool for replacing text.

Strikes a line through text and opens up a text
box where replacement text can be entered.
How to use it
Highlight a word or sentence.
Click on the Replace (Ins) icon in the Annotations
section.
Type the replacement text into the blue box that
appears.
This will open up a panel down the right side of the document. The majority of
tools you will use for annotating your proof will be in the Annotations section,
pictured opposite. Weve picked out some of these tools below:
2. Strikethrough (Del) Tool for deleting text.

Strikes a red line through text that is to be
deleted.
How to use it
Highlight a word or sentence.
Click on the Strikethrough (Del) icon in the
Annotations section.


3. Add note to text Tool for highlighting a section
to be changed to bold or italic.

Highlights text in yellow and opens up a text
box where comments can be entered.
How to use it
Highlight the relevant section of text.
Click on the Add note to text icon in the
Annotations section.
Type instruction on what should be changed
regarding the text into the yellow box that
appears.
4. Add sticky note Tool for making notes at
specific points in the text.

Marks a point in the proof where a comment
needs to be highlighted.
How to use it
Click on the Add sticky note icon in the
Annotations section.
Click at the point in the proof where the comment
should be inserted.
Type the comment into the yellow box that
appears.

USING e-ANNOTATION TOOLS FOR ELECTRONIC PROOF CORRECTION

















































For further information on how to annotate proofs, click on the Help menu to reveal a list of further options:
5. Attach File Tool for inserting large amounts of
text or replacement figures.

Inserts an icon linking to the attached file in the
appropriate pace in the text.
How to use it
Click on the Attach File icon in the Annotations
section.
Click on the proof to where youd like the attached
file to be linked.
Select the file to be attached from your computer
or network.
Select the colour and type of icon that will appear
in the proof. Click OK.
6. Add stamp Tool for approving a proof if no
corrections are required.

Inserts a selected stamp onto an appropriate
place in the proof.
How to use it
Click on the Add stamp icon in the Annotations
section.
Select the stamp you want to use. (The Approved
stamp is usually available directly in the menu that
appears).
Click on the proof where youd like the stamp to
appear. (Where a proof is to be approved as it is,
this would normally be on the first page).
7. Drawing Markups Tools for drawing shapes, lines and freeform
annotations on proofs and commenting on these marks.
Allows shapes, lines and freeform annotations to be drawn on proofs and for
comment to be made on these marks..
How to use it
Click on one of the shapes in the Drawing
Markups section.
Click on the proof at the relevant point and
draw the selected shape with the cursor.
To add a comment to the drawn shape,
move the cursor over the shape until an
arrowhead appears.
Double click on the shape and type any
text in the red box that appears.
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
S
WILEY AUTHOR DISCOUNT CLUB
We would like to show our appreciation to you, a highly valued contributor to Wileys
publications, by offering a unique 25% discount off the published price of any of our
books*.
All you need to do is apply for the Wiley Author Discount Card by completing the
attached form and returning it to us at the following address:
The Database Group (Author Club)
John Wiley & Sons Ltd
The Atrium
Southern Gate
Chichester
PO19 8SQ
UK
Alternatively, you can register online at www.wileyeurope.com/go/authordiscount
Please pass on details of this offer to any co-authors or fellow contributors.
After registering you will receive your Wiley Author Discount Card with a special promotion
code, which you will need to quote whenever you order books direct from us.
The quickest way to order your books from us is via our European website at:
http://www.wileyeurope.com
Key benefits to using the site and ordering online include:
Real-time SECURE on-line ordering
Easy catalogue browsing
Dedicated Author resource centre
Opportunity to sign up for subject-orientated e-mail alerts
Alternatively, you can order direct through Customer Services at:
cs-books@wiley.co.uk, or call +44 (0)1243 843294, fax +44 (0)1243 843303
So take advantage of this great offer and return your completed form today.
Yours sincerely,
Verity Leaver
Group Marketing Manager
author@wiley.co.uk
*TERMS AND CONDITIONS
This offer is exclusive to Wiley Authors, Editors, Contributors and Editorial Board Members in acquiring books for their personal use.
There must be no resale through any channel. The offer is subject to stock availability and cannot be applied retrospectively. This
entitlement cannot be used in conjunction with any other special offer. Wiley reserves the right to amend the terms of the offer at any
time.
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
S
To enjoy your 25% discount, tell us your areas of interest and you will receive relevant catalogues or leaflets
from which to select your books. Please indicate your specific subject areas below.
Accounting
Public
Corporate
[ ]
[ ]
[ ]
Architecture
Business/Management
[ ]
[ ]
Chemistry
Analytical
Industrial/Safety
Organic
Inorganic
Polymer
Spectroscopy
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Computer Science
Database/Data Warehouse
Internet Business
Networking
Programming/Software
Development
Object Technology
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Encyclopedia/Reference
Business/Finance
Life Sciences
Medical Sciences
Physical Sciences
Technology
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Engineering
Civil
Communications Technology
Electronic
Environmental
Industrial
Mechanical
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Earth & Environmental Science
Hospitality
[ ]
[ ]
Finance/Investing
Economics
Institutional
Personal Finance
[ ]
[ ]
[ ]
[ ]
Genetics
Bioinformatics/
Computational Biology
Proteomics
Genomics
Gene Mapping
Clinical Genetics
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Life Science
Landscape Architecture
Mathematics
Statistics
Manufacturing
Materials Science
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Medical Science
Cardiovascular
Diabetes
Endocrinology
Imaging
Obstetrics/Gynaecology
Oncology
Pharmacology
Psychiatry
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Psychology
Clinical
Forensic
Social & Personality
Health & Sport
Cognitive
Organizational
Developmental & Special Ed
Child Welfare
Self-Help
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Non-Profit [ ] Physics/Physical Science [ ]
Please complete the next page /
REGISTRATION FORM
For Wiley Author Club Discount Card
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
S
I confirm that I am (*delete where not applicable):
a Wiley Book Author/Editor/Contributor* of the following book(s):
ISBN:
ISBN:
a Wiley Journal Editor/Contributor/Editorial Board Member* of the following journal(s):
SIGNATURE: Date:
PLEASE COMPLETE THE FOLLOWING DETAILS IN BLOCK CAPITALS:
TITLE: (e.g. Mr, Mrs, Dr) FULL NAME: .
JOB TITLE (or Occupation): ..
DEPARTMENT: ..
COMPANY/INSTITUTION:
ADDRESS:

TOWN/CITY:
COUNTY/STATE: .
COUNTRY: .
POSTCODE/ZIP CODE:
DAYTIME TEL:
FAX:
E-MAIL:
YOUR PERSONAL DATA
We, John Wiley & Sons Ltd, will use the information you have provided to fulfil your request. In addition, we would like to:
1. Use your information to keep you informed by post of titles and offers of interest to you and available from us or other
Wiley Group companies worldwide, and may supply your details to members of the Wiley Group for this purpose.
[ ] Please tick the box if you do NOT wish to receive this information
2. Share your information with other carefully selected companies so that they may contact you by post with details of
titles and offers that may be of interest to you.
[ ] Please tick the box if you do NOT wish to receive this information.
E-MAIL ALERTING SERVICE
We also offer an alerting service to our author base via e-mail, with regular special offers and competitions. If you DO wish to
receive these, please opt in by ticking the box [ ].
If, at any time, you wish to stop receiving information, please contact the Database Group (databasegroup@wiley.co.uk) at John Wiley & Sons Ltd,
The Atrium, Southern Gate, Chichester, PO19 8SQ, UK.
TERMS & CONDITIONS
This offer is exclusive to Wiley Authors, Editors, Contributors and Editorial Board Members in acquiring books for their personal use. There should
be no resale through any channel. The offer is subject to stock availability and may not be applied retrospectively. This entitlement cannot be used
in conjunction with any other special offer. Wiley reserves the right to vary the terms of the offer at any time.
PLEASE RETURN THIS FORM TO:
Database Group (Author Club), John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, PO19 8SQ, UK author@wiley.co.uk
Fax: +44 (0)1243 770154