Professional Documents
Culture Documents
3, MARCH 2017
Abstract—Load imbalance is a major source of overhead in parallel programs such as MapReduce. Due to the uneven distribution of
input data, tasks with more data become stragglers and delay the overall job completion. Running Hadoop in a private cloud opens up
opportunities for expediting stragglers with more resources but also introduces problems that often outweigh the performance gain:
(1) performance interference from co-running jobs may create new stragglers; (2) there exists a semantic gap between the Hadoop
task management and resource pool-based virtual cluster management preventing tasks from using resources efficiently. In this paper,
we strive to make Hadoop more resilient to data skew and more efficient in cloud environments. We present FlexSlot, a user-
transparent task slot management scheme that automatically identifies map stragglers and resizes their slots accordingly to accelerate
task execution. FlexSlot adaptively changes the number of slots on each virtual node to balance the resource usage so that the pool of
resources can be efficiently utilized. FlexSlot further improves mitigation of data skew with an adaptive speculative execution strategy.
Experimental results show that FlexSlot effectively reduces job completion time up to 47:2 percent compared to stock Hadoop and two
recently proposed skew mitigation and speculative execution approaches.
Index Terms—MapReduce, cloud, data skew, stragglers, task slot management, speculative execution
1 INTRODUCTION
Interferences from co-located users may create new strag- architecture and key designs. Section 5 presents the imple-
glers in Hadoop applications [12], [24]. Second, to reduce the mentation details of FlexSlot. Section 6 gives the testbed
resources required to run a virtual cluster, private clouds setup and experimental results. We conclude this paper
often multiplex a pool of shared resources among virtual with future work in Section 7.
nodes [32]. Virtual cluster management such as VMware dis-
tributed resource scheduler (DRS) (Distributed Resource 2 RELATED WORK
Manager) [16] and OpenStack Utilization-based Schedul-
YARN [36] is the second generation of Hadoop. It uses con-
ing [4] dynamically allocates resources to individual nodes
tainers to replace task slots and provides a finer granularity
according to their estimated demands. It eliminates the need
of resource management. But it only allows different jobs to
of proactively allocating resource for slave nodes. Surpris-
have different container sizes. The tasks in one job still have
ingly, we find that there exists a semantic gap between the
the containers of one fixed size. Therefore, YARN cannot
demand-based resource allocation and the actual needs of
mitigate data skew with flexible size of containers.
Hadoop tasks. The virtual nodes that receive disproportion-
There are a few studies on straggler mitigations. SkewRe-
ately more resources due to high demands actually run fast
duce [25] alleviates the computational skew by balancing
tasks, leaving nodes with stragglers deprived of resources.
data distribution across nodes using an user-defined cost
Our study exposes that running unmodified Hadoop in such
function. SkewTune [26] repartitions the data of stragglers
a cloud does not mitigate skew but aggravates the imbalance.
to take the advantage of idle slots freed by short tasks. How-
In this paper, we explore the possibility of using elastic
ever, moving repartitioned data to idle nodes requires extra
resource allocation in the cloud to mitigate skew in Hadoop
I/O operations, which could aggravate the performance
applications. We design FlexSlot, an effective yet simple
interference. Dolly [8] provides a speculative execution at
extension to the slot-based Hadoop task scheduling frame-
job level which clones small jobs with straggler tasks. But,
work. FlexSlot automatically identifies map stragglers and
duplication of the entire job not only increase the resource
resizes their slots. Slot resizing not only allows stragglers to
usage and I/O contention on intermediate data, which can
receive more resources but also alleviates the interference
significantly affect job performance. More importantly,
between co-running jobs. To expose the actual demands of
these approaches do not have the coordination between
tasks, FlexSlot co-locates stragglers with fast tasks on a node
Hadoop and the cloud infrastructure.
by dynamically changing its number of slots. As fast tasks
There is great research interest in improving Hadoop
finish quickly and the node is allocated a large amount of
from different perspectives. A rich set of research focused
resources, stragglers in fact receive more resources and their
on the performance and efficiency of Hadoop cluster. Jiang
executions are thus accelerated. Furthermore, FlexSlot
et al. [23], conducted a comprehensive performance study
employs the adaptive speculative execution. It incorporates
of Hadoop, and summarized the factors that can signifi-
with the flexible slot management and starts speculative
cantly improve the Hadoop performance. Verma et al. [36],
tasks for the straggler with optimized configurations. We
[37], proposed cluster resource allocation approach for
note that FlexSlot focuses only on the map tasks, because
the data skew mitigation is done through on-demand Hadoop. They focused on improving the cluster efficiency
resource allocation in the cloud that is more effective for by minimizing resource allocations to jobs while maintain-
mitigating the data skew in map tasks. ing the service level objectives. They estimated the execu-
We implemented FlexSlot on a 32-node Hadoop virtual tion time of a job based on its resource allocation and input
cluster and evaluated its benefits using the Purdue Map- dataset, and determined its minimum resource allocation.
Reduce Benchmark Suite (PUMA) [5] and Intel HiBench A Hadoop cluster in the cloud can be seen as a heteroge-
benchmark suite [3] with datasets collected from real applica- neous Hadoop cluster because of the dynamic resource
tions. We compared the performance of FlexSlot running dif- allocation. A number of studies proposed different task
ferent workloads with that of the stock Hadoop and a recently scheduling algorithms to improve Hadoop performance for
proposed skew mitigation approach SkewTune [26]. We also heterogeneous environments [7], [11], [15], [40]. DynMR [34]
compares FlexSlot with Maximum Performance Cost enables interleaved MapReduce execution that overlaps the
(MCP) [10], a novel speculative strategy for Hadoop. Experi- reduces tasks with map tasks. By opportunistically schedule
mental results show that FlexSlot reduces job completion time all tasks, DynMR significantly increases both performance
up to 47:2 percent compared to stock Hadoop, SkewTune and and efficiency of Hadoop. For instance, PIKACHU focuses
MCP. FlexSlot also achieves better cloud resource utilization on achieving optimal workload balance for Hadoop [15]. It
with both the flexible slot size and flexible number of slots. presents guidelines for the trade-offs between the accuracy of
A preliminary version of this paper appeared in the Proc. workload balancing and the delay of workload adjustment.
of IEEE/ACM SC’2014 [17]. In this significantly extended But, these studies focus on hardware heterogeneity in physi-
paper, we redesign and develop FlexSlot with adaptive cal machine based clusters. They are not designed for VM-
speculative execution. We propose an adaptive speculative based clusters where the heterogeneity can be changed by
execution strategy and use it in slot memory resizing. We dynamic resource allocation. FlexSlot is able to adapt to the
perform new experiments to evaluate the enhanced change of heterogeneity with flexible slot size and number.
approach. We also extend experiments with more compre- There are recent studies that focus on improving perfor-
hensive benchmarks on iterative jobs. mance of applications in the cloud by dynamic resource
The rest of this paper is organized as follows. Related allocation [22], [32], [33]. For example, Bazaar is a cloud
work is presented in Section 2. Section 3 introduces the framework that predicts the resource demand of applica-
background of Hadoop, discusses existing issues, and tions based on the high-level performance goals [22]. It
presents a motivating example. Section 4 elaborates FlexSlot translates the performance goal of an application into
Authorized licensed use limited to: Seoul National University. Downloaded on July 25,2023 at 08:07:54 UTC from IEEE Xplore. Restrictions apply.
800 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 3, MARCH 2017
TABLE 1
Task Skewness of Wordcount on Three Clusters
Fig. 5. Task completion time and iowait time. Fig. 6. Task completion time and the cputime.
The configuration of the speculative tasks are adjusted in data skew in map tasks. A map task processes a collection
order to meet the resource demands of the straggler tasks. of key value pairs, each pair as one record. Due to data
skew, some records may require more resources to process
4.2 Identifying Stragglers than others, leading to more processing time. These expen-
There are three causes of straggler tasks in a cloud environ- sive records can simply contain large data that fills the out-
ment: (1) uneven data distribution among tasks; (2) non- put buffer quickly. When the output buffer is full,
uniform data processing time; and (3) performance interfer- intermediate results need to be written to disk. Thus, expen-
ence from co-running jobs. We measure two performance sive records cause more frequent disk I/O operations.
metrics during task execution. Hadoop provides progress score The inverted-index benchmark contains non-uniform
to represent the fraction of work completed. It is computed as records and incurs skewed task processing time. Fig. 5
the ratio of finished input size and the original input size. shows the breakdown of task execution time for all tasks.
Progress rate [40], which is defined as the change in progress Since expensive records incur excessive I/O operations, we
score in unit time, is a good online measure of task perfor- are interested in how much does I/O time (e.g., I/O wait)
mance. If all tasks process input data at the same speed, the contribute to the total task completion time. Tasks are sorted
tasks with smaller progress rates are likely to have large input in the ascending order of their completion time, with tasks
sizes. Another important metric is the input processing speed D. on the right tail considered as stragglers. We find that I/O
It counts how many bytes are processed in unit time. Ideally, wait contributed most to the prolonged completion time in
all tasks should have the same input processing speed given stragglers. We observed as much as 8 more I/O wait
that the data distribution is uniform across tasks and it takes time than that of regular tasks. Thus, a large portion of time
the same amount time to process the same amount of data. If spending in waiting for I/O is a good indicator of stragglers
some tasks exhibit apparently slower processing speed, it is with I/O bottleneck. A large slot memory size, e.g., a large
likely that their record are more expensive to process or they output buffer, can effectively reduce I/O operations.
are experiencing interferences. CPU Starvation Due to Inaccurate Demand Estimation. As
Neither of the two metrics alone can reliably determine shown in Fig. 4, demand-based resource allocation incurs se-
stragglers. Thus, we create a composite metric based on vere imbalance on task distribution, with some nodes heavily
these two. We normalize both metrics against their maxi- loaded and some nodes hosting only a few stragglers. The cul-
mum values among all tasks, both in the range of 0 to 1, and prit is that nodes running faster tasks received more resources
simply use their sum as a heuristic to identify stragglers. from cloud resource management. It enters a loop that nodes
Since progress rate can be expressed by D with more resources ran tasks even faster. Stragglers usually
S , where S is the
input data size and D is the input processing speed. There- incur more I/O operations and appear to be less busy in terms
fore, we use P ¼ ð1 þ S1 Þ D to measure task performance. of CPU usage. Thus, such tasks may become victims of CPU
We consider straggler tasks as outliers compared to regular starvation in demand-based resource allocation.
tasks. For every measurement interval, we calculate the value Fig. 6 shows the breakdown of task runtime of bench-
mark wordcount. It plots each task’s actual cputime, I/O
of P for all tasks and perform a k-means clustering algorithm
wait and steal time, which is the time that a task’s node
on the P values. The clustering algorithm divides all tasks
is ready to run but fails to get scheduled by the hypervisor.
into two clusters. The tasks in cluster with smaller P values
In the figure, we can see that steal time contributed most
are considered significantly different from the rest of tasks.
to the straggler runtime. If given sufficient CPU, these strag-
glers can be effectively accelerated. Thus, a large portion of
4.3 Determining Performance Bottlenecks
time being ready but not running is a good indicator of
Having stragglers identified by the proposed heuristic, Flex-
insufficient CPU resources. Next, we show that purposely
Slot then determines how to accelerate the execution of
coupling stragglers with faster tasks exposes the demands
stragglers. The major causes of stragglers are low disk I/O
of stragglers to the cluster-level resource manager.
performance or insufficient CPU resources.
Disk I/O Bottleneck Due to Data Skew. Previous studies
show that the number of input records can largely affect the 4.4 Resizing Slot Memory
task completion time even with the same amount of input Increasing the memory size of a slot can reduce the number
data [36], [37], [41]. The expensive record is a major type of of I/O operations and effectively expedites stragglers
Authorized licensed use limited to: Seoul National University. Downloaded on July 25,2023 at 08:07:54 UTC from IEEE Xplore. Restrictions apply.
804 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 3, MARCH 2017
caused by data skew. However, the memory that can be Algorithm 2. Flexible Slot Number
used by a slot is limited by the heap size of its JVM and
1: Variables: list of node L; list of steal time ST on these nodes
the output buffer size of the task running on it. Although
2:
the JVM heap size can be dynamically changed through
3: while true do
JVM memory ballooning, the output buffer size of a task
4: calculate the average steal time m;
cannot be altered without restarting the task. Thus, it is dif-
5: calculate the standard deviation of steal times s;
ficult to increase a slot’s memory size during task execution. 6: if s then
We rely on the straggler identification to determine strag- 7: break
glers in a timely manner. As such, we can afford to kill 8: end if
stragglers and restart them on larger slots. 9: for sti in ST do
Another question is how to efficiently determine the slot 10: Dsti ¼ sti m
size for straggler tasks. We propose an algorithm that is 11: end for
automatically invoked on all straggler tasks at every heart- 12: sort list DST in ascending order
beat. It determines the slot memory size for a straggler task. 13: find node lmax that has the maximum Dst
The pseudo-code for this algorithm is shown in Algorithm 1. 14: find node lmin that has the minimum Dst
The goal of this algorithm is to increase the slot memory size 15: lmin .removeMaxMapSlots(1)
for straggler tasks so that the performance of these tasks 16: lmax .addMaxMapSlots(1)
becomes close to that of regular tasks. For some straggler 17: end while
tasks, it may require multiple iterations to reach the desired
slot memory size. The objective is to minimize the steal time (defined in
Section 4.3) on the Hadoop cluster. To reduce the steal
Algorithm 1. Flexible Slot Size time on straggler nodes, which are either caused by CPU
starvation or interference, we increase the slot number on
1: Variables: Heartbeat interval T ; Data spilt size S;
these nodes in hopes that the additional slots will bring
2:
3: The killcount of task k is initialized to 0. faster tasks and finally increase the nodes’ CPU alloca-
4: /* Only apply to stragglers */ tion. We follow a simple heuristic of moving slots from
5: function RESIZESLOTMEM k the node with the smallest steal time to the nodes with
6: get current input processing speed D the largest. We ensure that the number of total task slots
7: if k:killcount ¼¼ 0 or D S
k:DSprev > T then in the cluster remain the same. The algorithm keeps cal-
8: kill k, free slot s culating the average node steal time m and its standard
9: s:size ¼ s:size ð1 þ aÞ deviation s. When s is greater than the threshold , the
10: launch k on slot s algorithm decides that the imbalance of CPU resource
11: k:killcount ¼ k:killcount þ 1 needs to be adjusted. Changing allows us to trade off
12: end if between the resource imbalance and the converging
13: k:Dprev ¼ D; speed of the algorithm. A large makes the algorithm
14: end function converging faster and more tolerant to resource imbal-
ance. In practice, the ideal value of is affected by the
Once a straggler is identified, the function ResizeSlot- CPU over-subscription ratio of the cluster. A high CPU
Mem is called to determine the appropriate memory size of over-subscription ratio introduces more interferences
the task. If the task has not gone through a memory resizing between applications [2]. In our experiment, we empiri-
(i.e., first time killing and killcount ¼ 0), we kill it immedi- cally set to 0:2m.
ately and increase the memory size of the slot at a step of a Changing the slot number on nodes not only affects
(empirically set to 0.2). Otherwise, we test if an additional cloud resource allocation but also influences Hadoop task
round of killing and memory resizing is needed. scheduling, leading to a better coordination between the
Suppose a task k’s input data size is S. For each round (at two. Without slot adjustment, each Hadoop node is config-
heartbeat intervals), we keep increasing the memory size of ured with the same number of slots but is allocated imbal-
the slot until the resulting decrease in the input processing anced resources. Dynamically changing slot number on
each node directs both task and resources to the same node.
time (i.e., D S
k:DSprev ) no longer outweighs the wasted proc-
For example, adding slots to a node brings more resources
essing time due to task killing (i.e., the last heartbeat inter- and tasks. The performance interference from co-running
val T ). Then, the slot resizing process is stopped. jobs is also mitigated as the objective of the algorithm is to
minimize steal time.
4.5 Adjusting Slot Number One concern of this algorithm is that newly added slots
The key rationale behind adjusting slot number on Hadoop can be assigned with a straggler task, aggravating the skew
nodes is to couple stragglers with fast tasks and expose and worsening the starvation on straggler nodes. However,
stragglers’ demand to cloud management. Once stragglers as majority of the tasks are regular tasks, the new slot is
are identified and their bottlenecks are determined as insuf- likely to be assigned to a regular task. Currently, all nodes
ficient CPU resource, FlexSlot tries to move slots from nodes are monitored for the automated slot memory resizing. If
running fast tasks to nodes running stragglers. We design a we find that the newly scheduled task is a straggler task, we
greedy algorithm for automating the slot adjustment. The kill it and reassign it to another node that has no stragglers
pseudo-code is shown in Algorithm 2. while trying to preserve data locality.
Authorized licensed use limited to: Seoul National University. Downloaded on July 25,2023 at 08:07:54 UTC from IEEE Xplore. Restrictions apply.
GUO ET AL.: MOVING HADOOP INTO THE CLOUD WITH FLEXIBLE SLOT MANAGEMENT AND SPECULATIVE EXECUTION 805
TABLE 2
The Slot Management Interface
TABLE 3
PUMA Benchmark Details
due to the fact that running stock Hadoop with demand- Fig. 10. The normalized job completion time.
based resource allocation created “hot spot” nodes that have small memory demand due to the small volume of
boost these fast tasks. FlexSlot avoided “hot spot” nodes their intermediate results. The default configuration of slot
from receiving disproportional more resources. Note that memory size already provides sufficient output buffer.
making fast tasks run even faster does not help improve The k-means benchmark has a large volume of intermedi-
overall job completion time, which is bottlenecked primarily ate data. It requires large output buffer and more memory
by stragglers. to reduce I/O operations. It contains computation intensive
To further study how skew is mitigated, we show tasks that require a lot of CPU resource. FlexSlot achieved
detailed breakdown of task completion time. For compari- much shorter job completion time than stock Hadoop did
son, we also ran benchmarks in Hadoop and SkewTune on because FlexSlot was able to improve both the slot memory
static homogeneous virtual clusters (denoted as Homo). size and CPU resource allocation for k-means.
Fig. 9 shows the breakdown of average task execution time FlexSlot achieved 17:5, 1:2 and 3:3 percent shorter job
of inverted-index on different clusters. FlexSlot reduced the completion time than SkewTune in k-means, histogram-mov-
portion of steal time by 60:5 and 49:1 percent when com- ies and histogram-ratings benchmarks, respectively. FlexSlot
pared to stock Hadoop and SkewTune with demand-based only had marginal performance improvement over Skew-
resource allocation. The huge steal time of stock Hadoop Tune in histogram-movies and histogram-ratings benchmarks
and SkewTune with demand-based resource allocation is because their data skew is not severe and the CPU require-
the sign of imbalanced CPU resource allocation between ment is also low. But for the job with significant data skew
fast tasks and straggler tasks. The reduction of steal time and high CPU consumption, FlexSlot clearly showed a
suggests that FlexSlot was effective in preventing CPU star- significant advantage compared to SkewTune.
vation on stragglers. Moreover, FlexSlot reduced the I/O
wait time compared to stock Hadoop and SkewTune in 6.5 Mitigating Performance Interference
homogeneous clusters. FlexSlot achieved 39:7 and 26:7 per- Note that adjusting slot size and number can possibly
cent lower I/O wait time than stock Hadoop and Skew- resolve the interference between co-running jobs. To create
Tune, respectively. It suggests that FlexSlot reduce the I/O interferences, we consolidated two clusters with the same
operations for straggler tasks with larger slot memory size. configuration (see Section 6.1 for details) to the shared phys-
ical infrastructure. We submitted the same jobs to these two
6.4 Reducing Job Completion Time clusters and calculated their average job completion times.
We have shown that FlexSlot is effective in mitigating skew. We used the job completion time in stock Hadoop as the
In this section, we study how does the mitigation help baseline, and normalized FlexSlot and SkewTune’s perfor-
improve overall job completion time. Similarly, we use the mance against it.
job completion time in the stock Hadoop as the baseline and Fig. 11 shows the normalized job completion time of all
compare the normalized job completion time of FlexSlot benchmarks due to three approaches with performance
and SkewTune. Fig. 10 shows the normalized job comple-
tion time of all benchmarks due to these three approaches.
The results show that for benchmarks with expensive
records, e.g., inverted-index, term-vector, wordcount and grep,
FlexSlot outperformed stock Hadoop by 35:1, 29:3, 26:7 and
27:4 percent, respectively. FlexSlot also outperformed Skew-
Tune by 21:6, 17:1, 16:9 and 17:1 percent in these
benchmarks.
Benchmarks such as k-means, histogram-movies and histo-
gram-ratings use data from Netflix. The input data is rela-
tively uniform in record size. In the experiments with these
benchmarks, FlexSlot outperformed the stock Hadoop by
25, 13 and 12 percent, respectively. However, FlexSlot has
less performance improvement on histogram-movies and his- Fig. 11. The normalized job completion time with interferences of co-run-
togram-ratings than on k-means, because those benchmarks ning jobs.
Authorized licensed use limited to: Seoul National University. Downloaded on July 25,2023 at 08:07:54 UTC from IEEE Xplore. Restrictions apply.
GUO ET AL.: MOVING HADOOP INTO THE CLOUD WITH FLEXIBLE SLOT MANAGEMENT AND SPECULATIVE EXECUTION 809
Fig. 14. Job completion time with different speculative execution Fig. 15. Performance impact of the speculative execution on FlexSlot.
mechanisms.
implemented as iterative MapReduce jobs, which are sensi-
utilization. It does not affect the performance of jobs that tive to the workload imbalance in iterations. In this section,
have high resource demand because the slots will only be we study the effectiveness of FlexSlot in mitigating data
added when there is available resource. skew for iterative MapReduce jobs. For this experiment, we
choose the pagerank and bayes benchmarks from Intel
HiBench benchmark suite [3]. We compare FlexSlot to the
6.7 Speculative Execution
stock Hadoop, SkewTune, and MCP.
We have shown the effectiveness of the flexible slot manage-
Fig. 16 shows the normalized job completion time of all
ment of FlexSlot in mitigating data skew and improving
four approaches. The results show that FlexSlot outper-
resource utilization. In this section, we study the performance
formed the stock Hadoop by 43:7 and 47:2 percent in pag-
benefit of the adaptive speculative execution in FlexSlot. We
erank and bayes benchmarks, respectively. FlexSlot also
compare the adaptive speculative execution in FlexSlot with outperformed SkewTune by 30:1 and 36:2 percent in these
the speculative execution strategies in Hadoop and in MCP. two benchmarks. Note that the tasks in each iteration can
Fig. 14 shows the normalized job completion time of all benefit from the data skew mitigation which significantly
three speculative execution mechanisms. The results show reduces the delay due to load imbalance in iterations. Flex-
that FlexSlot outperformed stock Hadoop and MCP by as Slot is able to mitigate straggler tasks without waiting for a
much as 39:1 and 23:8 percent, respectively. FlexSlot free slot make it more effective than SkewTune in this case.
achieved 27:4 39:1 percent shorter job completion time
Comparing to MCP, FlexSlot achieved 34:1 and 40:4 percent
than stock Hadoop did in the inverted-index, term-vector,
shorter job completion time in pagerank and bayes. The
wordcount and grep benchmarks. The performance improve-
improvement of FlexSlot over MCP is mainly due to the cus-
ment is mainly due to the reason that FlexSlot performs the
tomized speculative task configuration and the early termi-
speculative execution throughout the entire lifespan of a
nation of straggler tasks, which guarantees the speculative
job. FlexSlot also outperformed MCP by 12:1 23:8 percent task will perform better than the straggler task. On the con-
in these benchmarks. Comparing to MCP, FlexSlot is more trast, MCP does not consider the impact of different config-
accurate in identifying straggler tasks. Starting the specula- urations when starting the speculative task. Thus it missed
tive tasks with optimized configurations also significantly the chance to further improves task performance.
improves the performance of FlexSlot. For benchmarks with
less data skew, i.e., histogram-movies and histogram-ratings,
6.9 Slot Resizing Overhead
FlexSlot outperformed stock Hadoop by 11 and 12 percent
respectively. The performance of FlexSlot and MCP on these FlexSlot uses task-killing-based approach in the slot mem-
two benchmarks are similar. ory resizing and allows tasks to be killed multiple times.
To further study the performance impact of the speculative The task killing inevitably incurs overhead as the killed
execution, we compare FlexSlot with and without the specula- stragglers lose already performed work. We measure the
tive execution. Fig. 15 shows the normalized job completion
time of two approaches. The results show FlexSlot with specu-
lative execution achieved 6:1 17:2 percent shorter job com-
pletion time than it without speculative execution on these
benchmarks. With speculative execution, FlexSlot is able to
try multiple slot memory sizes for the straggler task at the
same time, which reduces the overhead of killing the task
multiple times. Moreover, running the speculative task on a
different node ensures the task to be ran on a fast node which
further improve the performance of the task.
[13] E. Coppa and I. Finocchi, “On data skewness, stragglers, and [36] A. Verma, L. Cherkasova, and R. H. Campbell, “ARIA:
mapreduce progress indicators,” in Proc. 6th ACM Symp. Cloud Automatic resource inference and allocation for MapReduce
Comput., 2015, pp. 139–152. environments,” in Proc. 8th ACM Int. Conf. Autonomic Comput.,
[14] J. Dean and S. Ghemawat, “MapReduce: Simplified data process- 2011, pp. 235–244.
ing on large clusters,” in Proc. 6th Conf. Symp. Operating Syst. Des. [37] A. Verma, L. Cherkasova, and R. H. Campbell, “Resource provi-
Implementation, 2004, Art. no. 10. sioning framework for MapReduce jobs with performance goals,”
[15] R. Gandhi, D. Xie, and Y. C. Hu, “PIKACHU: How to rebalance in Proc. ACM/IFIP/USENIX 12th Int. Middleware Conf., 2011,
load in optimizing mapreduce on heterogeneous clusters,” in pp. 165–186.
Proc. USENIX Conf. Annu. Tech. Conf., 2013, pp. 61–66. [38] J. Wolf, et al., “FLEX: A slot allocation scheduling optimizer for
[16] A. Gulati, A. Holler, M. Ji, G. Shanmuganathan, C. Waldspurger, mapreduce workloads,” in Proc. 11th ACM/IFIP/USENIX Int. Mid-
and X. Zhu, “Vmware distributed resource management: Design, dleware Conf., 2010, pp. 1–20.
implementation and lessons learned,” VMware Tech. J., vol. 1, [39] N. J. Yadwadkar, G. Ananthanarayanan, and R. Katz, “Wrangler:
pp. 45–64, 2012. Predictable and faster jobs using fewer resources,” in Proc. ACM
[17] Y. Guo, J. Rao, C. Jiang, and X. Zhou, “FlexSlot: Moving hadoop into Symp. Cloud Comput., 2014, pp. 1–14.
the cloud with flexible slot management,” in Proc. ACM/IEEE Int. [40] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica,
Conf. High Perform. Comput. Netw. Storage Anal., 2014, pp. 959–969. “Improving MapReduce performance in heterogeneous environ-
[18] Y. Guo, J. Rao, and X. Zhou, “iShuffle: Improving hadoop perfor- ments,” in Proc. 8th USENIX Symp. Operating Syst. Des. Implemen-
mance with Shuffle-on-Write,” in Proc. 10th Int. Conf. Autonomic tation, 2008, pp. 29–42.
Comput., 2013, pp. 107–117. [41] Z. Zhang, L. Cherkasova, and B. T. Loo, “AutoTune: Optimizing
[19] S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu, “Maestro: execution concurrency and resource usage in MapReduce work-
Replica-aware map scheduling for MapReduce,” in Proc. 12th flows,” in Proc. 10th USENIX/ACM Int. Conf. Autonomic Comput.,
IEEE/ACM Int. Symp. Cluster Cloud Grid Comput., 2012, pp. 435– 2013, 175–181.
442.
[20] S. Ibrahim, H. Jin, L. Lu, S. Wu, B. He, and L. Qi, “LEEN: Locality/ Yanfei Guo received the BS degree in computer
fairness-aware key partitioning for MapReduce in the cloud,” science and technology from the Huazhong Uni-
in Proc. IEEE 2nd Int. Conf. Cloud Comput. Technol. Sci., 2010, versity of Science and Technology, China, in
pp. 17–24. 2010, and the PhD degree in computer science
[21] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Dis- from the University of Colorado, Colorado Springs
tributed data-parallel programs from sequential building blocks,” in 2015. He is currently a postdoc fellow at the
in Proc. 2nd ACM SIGOPS/EuroSys Eur. Conf. Comput. Syst., 2007, Argonne National Lab. His research interests
pp. 59–72. include cloud computing, big data processing and
[22] V. Jalaparti, H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, MapReduce, and HPC. He is a member of the
“Bridging the tenant-provider gap in cloud services,” in Proc. 3rd IEEE.
ACM Symp. Cloud Comput., 2012, Art. no. 10.
[23] D. Jiang, B. C. Ooi, L. Shi, and S. Wu, “The performance of
MapReduce: An in-depth study,” Proc. VLDB Endowment, vol. 3,
Jia Rao received the BS and MS degrees in com-
pp. 472–483, 2010.
puter science from Wuhan University in 2004 and
[24] H. Kang and J. L. Wong, “To hardware prefetch or not to pre-
2006, respectively, and the PhD degree from
fetch?: A virtualized environment study and core binding
Wayne State University in 2011. He is currently
approach,” in Proc. 18th Int. Conf. Architectural Support Program.
an assistant professor at the Department of Com-
Lang. Operating Syst., 2013, pp. 357–368.
puter Science, University of Colorado, Colorado
[25] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “Skew-resistant
Springs. His research interests include the areas
parallel processing of feature-extracting scientific user-defined
of resource auto-configuration, machine learning
functions,” in Proc. 1st ACM Symp. Cloud Comput., 2010, pp. 75–86.
and CPU scheduling on multi-core systems. He is
[26] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “SkewTune: Miti-
a member of the IEEE.
gating skew in mapreduce applications,” in Proc. ACM SIGMOD
Int. Conf. Manage. Data, 2012, pp. 25–36.
[27] B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy, “A platform
for scalable one-pass analytics using MapReduce,” in Proc. ACM Changjun Jiang received the PhD degree from
SIGMOD Int. Conf. Manage. Data, 2011, pp. 985–996. the Institute of Automation, Chinese Academy of
[28] R. Nathuji, A. Kansal, and A. Ghaffarkhah, “Q-Clouds: Managing Sciences, Beijing, China, in 1995 and conducted
performance interference effects for QoS-aware clouds,” in Proc. post-doctoral research at the Institute of Comput-
5th ACM Eur. Conf. Comput. Syst., 2010, pp. 237–250. ing Technology, Chinese Academy of Sciences,
[29] X. Pu, L. Liu, Y. Mei, S. Sivathanu, Y. Koh, and C. Pu, in 1997. Currently he is a professor with the
“Understanding performance interference of I/O workload in vir- Department of Computer Science and Engineer-
tualized cloud environments,” in Proc. IEEE 3rd Int. Conf. Cloud ing, Tongji University, Shanghai. His current areas
Comput., 2010, pp. 51–58. of research are concurrency processing and intel-
[30] S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, and ligent web systems. He is a member of the IEEE.
D. Reeves, “Sailfish: A framework for large scale data processing,”
in Proc. 3rd ACM Symp. Cloud Comput., 2012, Art. no. 4.
[31] K. Ren, Y. Kwon, M. Balazinska, and B. Howe, “Hadoop’s adoles-
cence: An analysis of Hadoop usage in scientific workloads,” Proc. Xiaobo Zhou received the BS, MS, and PhD
VLDB Endowment, vol. 6, pp. 853–864, 2013. degrees in computer science from Nanjing Uni-
[32] G. Shanmuganathan, A. Gulati, and P. Varman, “Defragmenting versity, in 1994, 1997, and 2000, respectively. He
the cloud using demand-based resource allocation,” in Proc. ACM is currently a professor and the chair at the
SIGMETRICS/Int. Conf. Meas. Modeling Comput. Syst., 2013, Department of Computer Science, University of
pp. 67–80. Colorado, Colorado Springs. His research lies
[33] Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “CloudScale: Elastic broadly in Distributed and Cloud computing,
resource scaling for multi-tenant cloud systems,” in Proc. 2nd Datacenters, BigData parallel processing, auto-
ACM Symp. Cloud Comput., 2011, Art. no. 5. nomic and sustainable computing. He was a
[34] J. Tan, et al., “DynMR: Dynamic MapReduce with reduce task recipient of NSF CAREER Award in 2009. He is a
interleaving and map task backfilling,” in Proc. 9th Eur. Conf. Com- senior member of the IEEE.
put. Syst., 2014, Art. no. 2.
[35] V. K. Vavilapalli, et al., “Apache Hadoop YARN: Yet another
" For more information on this or any other computing topic,
resource negotiator,” in Proc. 4th ACM Symp. Cloud Comput., 2013,
Art. no. 5. please visit our Digital Library at www.computer.org/publications/dlib.
Authorized licensed use limited to: Seoul National University. Downloaded on July 25,2023 at 08:07:54 UTC from IEEE Xplore. Restrictions apply.