Professional Documents
Culture Documents
Abstract— We study the fundamental trade-off between stor- efficient codes that also minimize the repair bandwidth for
age and content download time. We show that the download exact data reconstruction was established in [5].
time can be significantly reduced by dividing the content into Content accessibility is another main property of interest.
arXiv:1210.3012v1 [cs.IT] 9 Oct 2012
III. R EDUCING D ELAY BY C ODING Note that E[Xk,n ] decreases as k decreases for a given
n. This fact will provide us some intuition to understand
In both of our models, the download completion time is a the analysis of download time for the fountain and queueing
random variable. One natural way to reduce this time is to models presented in Section IV and Section V respectively.
replicate the content across n independent servers (or disks).
Then if the user issues n requests, one to each of the n IV. M ULTIPLE F OUNTAINS
servers, it only needs to wait for the one of the requests to In this section we investigate the redundancy storage in the
be served. This strategy gives a sharp reduction in download context of fountain model. Content requests ( e.g., request for
time, but at the cost of n times more storage space and the videos, news or other information) from customers are sent
cost of processing multiple requests. to a network of servers. We focus in particular on multiple
We thus argue that it is more economical to divide the fountain content retrieval systems defined as follows.
content into k blocks, encode them into n > k coded blocks, Definition 1 ((n, k) multiple fountain): An (n, k) multi-
and store them on n different servers (one block per server). ple fountain content retrieval system contains n servers.
√
−Dµ+ D 2 µ2 +4nDµ
Every content request entering the system is forked to n which has root k ∗ = 2 as in (6).
servers. Requests are served as soon as the content becomes Fig. 1 shows the mean response time T10,k versus k for the
available, which happens at a random time independently of (10, k) multiple fountain system with parameters delivery
request arrivals. A request is satisfied when any k out of n time D = 5 and various values of mean waiting time 1/µ.
servers have responded and delivered their messages. We observe that the optimal value of k ∗ increases with 1/µ.
Recall that the content is divided into k blocks and en-
coded into n > k coded blocks which are stored on n servers
(one block per server). Content is said to be downloaded 14
when any k out of n blocks are successfully delivered to the 1/u = 0
user. The fountain model described in Section II-A assumes 12 1/u = 2
that multiple users can access the server simultaneously (i.e, 1/u = 4
1.0
We first address the scenario where the number of disks
disks n is kept constant, but the storage expansion changes
from 1 to n as we choose k from n to 1. We then study
0.8
fraction of completed downloads
the scenario where the storage expansion factor n/k is kept
constant, but the number of disks varies.
1) Flexible Storage Expansion & Fixed Number of Disks:
0.6
In Fig. 3 we plot the mean response time T(n,k) versus k
for fixed number of disks n = 10, arrival rate λ = 1 request
per second and service rate µ = 3 units of data per second.
0.4
Each disk stores 1/k units of data and thus the service rate
of each individual queue is µ0 = kµ. The simulation plot
0.2
single disk
k=1
0.11 k=2
k=5
0.0
Tn,k
0.1
Upper bound
0.0 0.2 0.4 0.6 0.8 1.0
0.09 Lower bound
Mean Response Time
response time
0.08
0.07
← unit storage requirement per file
0.06 n/k = 1
0.05 n/k = 2
0.04 n/k = 10
0.03
2 4 6 8 10 Fig. 4. CDFs of the response time of (10, k) fork-join systems, and the
k required storage
Fig. 3. Mean response time T(n,k) increases with k for fixed n because
the redundancy of coding reduces. The plot also demonstrates the tightness 2) Flexible Number of Disks & Fixed Storage Expansion
of the bounds derived in Section V-B
: Now we take a different viewpoint and analyze the benefit
shows that as k increases with n fixed, the code rate k/n of spreading the content across more disks while using the
increases thus reducing the amount of redundancy. Hence, same total storage space. Fig. 5 shows a simulation plot
T(n,k) increases with k. We also observe that the bounds (7) of the mean response time T(n,k) versus k while keeping
and (10) derived in Section V-B are very tight. constant code rate k/n = 1/2. The response time T(n,k)
In addition to low mean response time, ensuring quality- reduces with increase in k because we get the diversity
of-service to the user may also require that the probability of advantage of having more disks. With a very large n, as
exceeding some maximum tolerable response time is small. k increases, the theoretical bounds (7) and (10) suggest that
Thus, we study the CDF of the response time for different T(n,k) approaches zero. This is because we assumed that
values of k for a fixed n. service rate of a single disk µ0 = kµ since the 1/k units of
In Fig. 4 we plot the CDF of the response time with k = the content F is stored on one disk. However, in practice the
1, 2, 5, 10 for fixed n = 10. The arrival rate and service rate mean service time 1/µ0 will not go zero because reading each
are λ = 1 and µ = 3 as defined earlier. For k = 1, the PDF is disk will need some non-zero processing delay in completing
represents the minimum of n exponential random variables, each task irrespective of the amount of data stored on it.
which is also exponentially distributed. In order to understand the response time better, we plot
The CDF plot can be used to design a storage system its CDF in Fig. 6 for different values of k for fixed ratio
that gives probabilistic bounds on the response time. For k/n = 1/2. Again we observe that the diversity of increasing
example, if we wish to keep the response time below 0.1 number of disks n helps reduce the response time.
seconds with probability at least 0.75, then the CDF plot
VI. C ONCLUSION AND F UTURE W ORK
shows that k = 5, 10 satisfy this requirement but k = 1
does not. The plot also shows that at 0.4 seconds, 100% of We analyzed the download time of a content file from a
requests are complete in all fork-join systems, but only 50% distributed storage system. We assume that content of interest
are complete in the single-disk case is available redundantly on multiple disks, or on multiple
0.2
Tn, n/2
0.18
1.0
Upper bound
0.16 Lower bound
Mean Response Time
0.14
0.8
fraction of completed downloads
0.12
0.6
0.1
0.4
k=2
0.06
k=5
0.04
2 4 6 8 10
0.2
n
0.0
0.0 0.5 1.0 1.5 2.0