Moving_Hadoop_into_the_Cloud_with_Flexible_Slot_Management_and_Speculative_Execution

798 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO.
3, MARCH 2017
Moving Hadoop into the Cloud with Flexible Slot

Management and Speculative Execution
Yanfei Guo, Member, IEEE, Jia Rao, Member, IEEE, Changjun Jiang, Member, IEEE,
and Xiaobo Zhou, Senior Member, IEEE
Abstract—Load imbalance is a major source of overhead in parallel programs such as MapReduce. Due to the uneven distribution of
input data, tasks with more data become stragglers and delay the overall job completion. Running Hadoop in a private cloud opens up
opportunities for expediting stragglers with more resources but also introduces problems that often outweigh the performance gain:
(1) performance interference from co-running jobs may create new stragglers; (2) there exists a semantic gap between the Hadoop
task management and resource pool-based virtual cluster management preventing tasks from using resources efficiently. In this paper,
we strive to make Hadoop more resilient to data skew and more efficient in cloud environments. We present FlexSlot, a user-
transparent task slot management scheme that automatically identifies map stragglers and resizes their slots accordingly to accelerate
task execution. FlexSlot adaptively changes the number of slots on each virtual node to balance the resource usage so that the pool of
resources can be efficiently utilized. FlexSlot further improves mitigation of data skew with an adaptive speculative execution strategy.
Experimental results show that FlexSlot effectively reduces job completion time up to 47:2 percent compared to stock Hadoop and two
recently proposed skew mitigation and speculative execution approaches.
Index Terms—MapReduce, cloud, data skew, stragglers, task slot management, speculative execution
1 INTRODUCTION
H ADOOP, the open source implementation of the

MapReduce programming model [14], is increasingly
popular in Big Data analytics due to its fault tolerance, sim-
machine. However, the differences in machines are not sig-
nificant enough to mitigate skew and job performance is still
bottlenecked by stragglers. Although skew can be elimi-
plicity of use and the scalability to large clusters. However, nated by using customized and job-specific data partitioners
studies have shown that current use of Hadoop in enterprises for balancing workload among tasks [25], this approach
and research stay in an ad hoc manner, leaving advanced fea- requires domain knowledge on the structure of input data
tures underused [31], potential performance unexploited [26], and imposes burdens on users. To this end, researchers pro-
[38], [40], and resources in Hadoop clusters inefficiently uti- posed to mitigate skew by dynamically re-partitioning data
lized [30]. In particular, load imbalance (a.k.a. skew) among during job execution [26]. Nevertheless, re-distributing data
Hadoop tasks poses significant challenges to achieving good at runtime introduces overhead of moving data between
performance and high resource utilization [13], [20], [25], [27]. machines. More importantly, re-distributing data is not
Skew that could come from uneven data distribution or non- very effective for map tasks since the data distribution is
uniform data processing cost creates stragglers, tasks that runs unknown prior execution.
significantly slower than others. Such sluggish tasks can take We propose a new perspective on tackling the skew prob-
more than five times longer to complete than the fastest lem in Hadoop applications. Rather than mitigating skew
task [26], slowing the overall job completion. among tasks, we try to balance the processing time of tasks
To address hardware failure and misconfiguration, even with the presence of data imbalance. Specifically, tasks
Hadoop speculatively runs a backup copy of a slow task on a with more data or more expensive data records are acceler-
different machine. Besides fault-tolerance, speculative exe- ated by having more resources. We purposely create hetero-
cution is able to expedite stragglers due to skew to a certain geneous clusters and match different machine capabilities
extent as the backup copy may run on a better performing with the actual processing demands of unbalanced tasks.
Cloud computing, unlocked by virtualization, allows the cre-
ation of dynamic virtual clusters with elastic resource alloca-
Y. Guo, J. Rao, and X. Zhou are with the Department Computer Science, tion. Hadoop clusters can be easily scaled out by adding
University of Colorado, Colorado Springs, CO 80918. virtual nodes with a latency of several minutes. More impor-
E-mail: {yguo, jrao, xzhou}@uccs.edu.
C. Jiang is with the Key Lab of Embedded System and Service Computing,
tantly, individual nodes can also be scaled up with more
Ministry of Education, Tongji University, Shanghai 201804, China. resources. A recent study found that the ability to scale-up
E-mail: cjj@tongji.edu.cn. leads to better Hadoop performance in a virtual cluster com-
Manuscript received 11 Jan. 2016; revised 11 May 2016; accepted 27 May pared to a native cluster with the same settings [6].
2016. Date of publication 7 July 2016; date of current version 15 Feb. 2017. However, moving Hadoop into the cloud introduces
Recommended for acceptance by F. Cappello. additional problems that can outweigh the benefit of flexible
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. resource management. First, clouds are usually shared by
Digital Object Identifier no. 10.1109/TPDS.2016.2587641 multiple users in order to increase hardware utilization.
1045-9219 ß 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Seoul National University. Downloaded on July 25,2023 at 08:07:54 UTC from IEEE Xplore. Restrictions apply.
GUO ET AL.: MOVING HADOOP INTO THE CLOUD WITH FLEXIBLE SLOT MANAGEMENT AND SPECULATIVE EXECUTION 799
Interferences from co-located users may create new strag- architecture and key designs. Section 5 presents the imple-
glers in Hadoop applications [12], [24]. Second, to reduce the mentation details of FlexSlot. Section 6 gives the testbed
resources required to run a virtual cluster, private clouds setup and experimental results. We conclude this paper
often multiplex a pool of shared resources among virtual with future work in Section 7.
nodes [32]. Virtual cluster management such as VMware dis-
tributed resource scheduler (DRS) (Distributed Resource 2 RELATED WORK
Manager) [16] and OpenStack Utilization-based Schedul-
YARN [36] is the second generation of Hadoop. It uses con-
ing [4] dynamically allocates resources to individual nodes
tainers to replace task slots and provides a finer granularity
according to their estimated demands. It eliminates the need
of resource management. But it only allows different jobs to
of proactively allocating resource for slave nodes. Surpris-
have different container sizes. The tasks in one job still have
ingly, we find that there exists a semantic gap between the
the containers of one fixed size. Therefore, YARN cannot
demand-based resource allocation and the actual needs of
mitigate data skew with flexible size of containers.
Hadoop tasks. The virtual nodes that receive disproportion-
There are a few studies on straggler mitigations. SkewRe-
ately more resources due to high demands actually run fast
duce [25] alleviates the computational skew by balancing
tasks, leaving nodes with stragglers deprived of resources.
data distribution across nodes using an user-defined cost
Our study exposes that running unmodified Hadoop in such
function. SkewTune [26] repartitions the data of stragglers
a cloud does not mitigate skew but aggravates the imbalance.
to take the advantage of idle slots freed by short tasks. How-
In this paper, we explore the possibility of using elastic
ever, moving repartitioned data to idle nodes requires extra
resource allocation in the cloud to mitigate skew in Hadoop
I/O operations, which could aggravate the performance
applications. We design FlexSlot, an effective yet simple
interference. Dolly [8] provides a speculative execution at
extension to the slot-based Hadoop task scheduling frame-
job level which clones small jobs with straggler tasks. But,
work. FlexSlot automatically identifies map stragglers and
duplication of the entire job not only increase the resource
resizes their slots. Slot resizing not only allows stragglers to
usage and I/O contention on intermediate data, which can
receive more resources but also alleviates the interference
significantly affect job performance. More importantly,
between co-running jobs. To expose the actual demands of
these approaches do not have the coordination between
tasks, FlexSlot co-locates stragglers with fast tasks on a node
Hadoop and the cloud infrastructure.
by dynamically changing its number of slots. As fast tasks
There is great research interest in improving Hadoop
finish quickly and the node is allocated a large amount of
from different perspectives. A rich set of research focused
resources, stragglers in fact receive more resources and their
on the performance and efficiency of Hadoop cluster. Jiang
executions are thus accelerated. Furthermore, FlexSlot
et al. [23], conducted a comprehensive performance study
employs the adaptive speculative execution. It incorporates
of Hadoop, and summarized the factors that can signifi-
with the flexible slot management and starts speculative
cantly improve the Hadoop performance. Verma et al. [36],
tasks for the straggler with optimized configurations. We
[37], proposed cluster resource allocation approach for
note that FlexSlot focuses only on the map tasks, because
the data skew mitigation is done through on-demand Hadoop. They focused on improving the cluster efficiency
resource allocation in the cloud that is more effective for by minimizing resource allocations to jobs while maintain-
mitigating the data skew in map tasks. ing the service level objectives. They estimated the execu-
We implemented FlexSlot on a 32-node Hadoop virtual tion time of a job based on its resource allocation and input
cluster and evaluated its benefits using the Purdue Map- dataset, and determined its minimum resource allocation.
Reduce Benchmark Suite (PUMA) [5] and Intel HiBench A Hadoop cluster in the cloud can be seen as a heteroge-
benchmark suite [3] with datasets collected from real applica- neous Hadoop cluster because of the dynamic resource
tions. We compared the performance of FlexSlot running dif- allocation. A number of studies proposed different task
ferent workloads with that of the stock Hadoop and a recently scheduling algorithms to improve Hadoop performance for
proposed skew mitigation approach SkewTune [26]. We also heterogeneous environments [7], [11], [15], [40]. DynMR [34]
compares FlexSlot with Maximum Performance Cost enables interleaved MapReduce execution that overlaps the
(MCP) [10], a novel speculative strategy for Hadoop. Experi- reduces tasks with map tasks. By opportunistically schedule
mental results show that FlexSlot reduces job completion time all tasks, DynMR significantly increases both performance
up to 47:2 percent compared to stock Hadoop, SkewTune and and efficiency of Hadoop. For instance, PIKACHU focuses
MCP. FlexSlot also achieves better cloud resource utilization on achieving optimal workload balance for Hadoop [15]. It
with both the flexible slot size and flexible number of slots. presents guidelines for the trade-offs between the accuracy of
A preliminary version of this paper appeared in the Proc. workload balancing and the delay of workload adjustment.
of IEEE/ACM SC’2014 [17]. In this significantly extended But, these studies focus on hardware heterogeneity in physi-
paper, we redesign and develop FlexSlot with adaptive cal machine based clusters. They are not designed for VM-
speculative execution. We propose an adaptive speculative based clusters where the heterogeneity can be changed by
execution strategy and use it in slot memory resizing. We dynamic resource allocation. FlexSlot is able to adapt to the
perform new experiments to evaluate the enhanced change of heterogeneity with flexible slot size and number.
approach. We also extend experiments with more compre- There are recent studies that focus on improving perfor-
hensive benchmarks on iterative jobs. mance of applications in the cloud by dynamic resource
The rest of this paper is organized as follows. Related allocation [22], [32], [33]. For example, Bazaar is a cloud
work is presented in Section 2. Section 3 introduces the framework that predicts the resource demand of applica-
background of Hadoop, discusses existing issues, and tions based on the high-level performance goals [22]. It
presents a motivating example. Section 4 elaborates FlexSlot translates the performance goal of an application into
800 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 3, MARCH 2017
3 BACKGROUND AND MOTIVATIONS

We first describe the basics of the Hadoop MapReduce
framework and discuss the causes of skew in Hadoop tasks.
Then, we show that elastic resource allocation in a cloud
and an ideal task scheduling in Hadoop together help bal-
ance task execution time and improve the overall job perfor-
mance significantly. Finally, we demonstrate that a naive
migration of Hadoop to the cloud does not mitigate skew
but aggravates task imbalance.
3.1 Skew in Hadoop MapReduce

Fig. 1. The slot-based task scheduling in Hadoop. The data processing in MapReduce is expressed as two
functions: map and reduce. In the implementation of
multiple combinations of resources and selects the combina-
Hadoop, the map and reduce functions are implemented in
tion that is most suitable for a cloud provider. One recent
MapTask and ReduceTask. Each MapReduce job is
work focuses on demand-based resource allocation [32]. It
divided into multiple map tasks and reduce tasks.
efficiently distributes the resource by dynamically allocat-
Fig. 1 shows MapReduce execution environment in
ing the overall capacity among VMs based on their
demands. But these approaches focus on VM management Hadoop with one master node and multiple slave nodes.
only and are less effective than FlexSlot for Hadoop in the The master node runs JobTracker and manages the task
cloud because of the semantic gap between demand-based scheduling on the cluster. Each slave node runs Task-
resource allocation and the actual needs of Hadoop tasks. Tracker and controls the execution of individual tasks.
The performance interference is an important issue for Hadoop uses a slot-based task scheduling algorithm. Each
applications in clouds. There are a few recent studies work- TaskTracker has a preconfigured number of map slots
ing on mitigating performance interference for different and reduce slots. Task trackers report their number of free
applications [12], [16], [24], [28], [29]. TRACON provides a slots to the job tracker through heartbeat [1]. The job tracker
profiling based interference prediction model [12]. Its assigns a task to each free slot.
cloud-aware scheduler utilizes the model and mitigates the Ideally, each task should take approximately the same
performance interference for data-intensive workloads. time to complete if it is assigned the same amount of data.
Gulati et al. introduced distributed resource scheduler for However, there is no guarantee that data is evenly distrib-
resource-management and performance isolation among uted among tasks or the same amount of input data will
VMs that located on multiple physical machines [16]. take the same time to process. Map tasks usually process
Several speculative execution strategies have been pro- block of input at a time using the default FileInputFor-
posed in the literature, including the original MapRe- mat. If the input data contains file with various sizes (e.g.,
duce [14], Hadoop [1], Dryad [21], LATE [40], and files smaller or larger than the HDFS block size), taking one
Wrangler [39]. These strategies only start the speculative block of data as input inevitably leads to input skew in map
execution when the job is nearly done and the cluster has tasks. Although it is possible to combine small blocks into a
available task slots. There are a few recent studies focus on map input, there is only a small portion of hadoop users
improving the speculative execution [9], [10], [19]. Mantri [9] customizes input handling while data skew in map tasks is
focuses on starting speculative execution as soon as a strag- quite common in production hadoop clusters [31]. Similarly,
gler is detected during the job’s execution. It kills the strag- reduce tasks may also have imbalanced input size as the
gler task once it finds the speculative task is faster than the partitions generated by map tasks can contain different
straggler. Mantri estimates a task’s remaining time based on number of key/value pairs [25], [26]. Moreover, the same
the progress bandwidth. However, there is no guarantee input size can lead to distinct task execution time as some
that the speculative copy of the straggler task will perform records are inherently more expensive to process. As a
better. Mantri may need to kill and restart the speculative result, unbalanced task runtime is prevalent in Hadoop
task on multiple nodes. Wrangler [39] predicts which task applications [31].
would benefit from speculative execution and starts specu- Fig. 2 shows an example of skewed task execution in the
lative tasks accordingly. However, the prediction relies on a wordcount benchmark. The job was run on a homoge-
statistical learning model that is derived from historical
neous cluster with 32 nodes. We can see that nearly 20 per-
workloads. The learning time could be long for workloads
cent of tasks run at least 20 percent longer than the others.
with low repetitiveness. The performance interference in
Such an imbalance leads to prolonged job completion time
the cloud can also affect the accuracy of the prediction. Max-
and wasted cluster resources on nodes that were idle wait-
imum Performance Cost approach [10] employs exponen-
ing for the stragglers. Next, we show that how a re-organi-
tially weighted moving average (EWMA) to predict the
zation of cluster resources, i.e., using a heterogeneous
remaining time of a task. MCP selects the node for specula-
tive execution using a sophisticated cost-benefit model cluster, improves Hadoop performance.
which considers the workload on each node and the data
locality. It significantly improves the job performance. All 3.2 Speculative Execution
these approaches lack the ability to identify the performance Hadoop employs speculative execution to reduce the delay
bottleneck of the straggler tasks, they cannot guarantee that caused by straggler tasks. When Hadoop detects that one
the speculative copies of tasks will perform better. task is running significantly slower than its peer tasks, it
TABLE 1
Task Skewness of Wordcount on Three Clusters
Homo Heter Heter + Skew-Sched

Skewness 2:65 2:25 0:31
configurations on each node. The resources were evenly dis-

tributed to nodes, resulting in an uniform node capacity of
4.8 GHz CPU and 4 GB memory. The second cluster
(denoted as Heter) contained nodes with heterogeneous
resource allocations. We profiled the resource demands of
individual Hadoop tasks and created powerful nodes for
stragglers. We first ran a job in a homogeneous cluster and
determined the stragglers. We then scaled up the nodes
Fig. 2. The skewed task execution time of wordcount. running stragglers according to their runtime statistics.
Specifically, we increased the CPU resource and memory
launches a speculative copy of the straggler task with the size of these nodes proportionally based on their CPU time
same input data chunk. The speculative execution results in and I/O wait time in the last run. We then re-ran the job on
multiple tasks processing the same input data in parallel. the adjusted cluster. Although stragglers had their input
As soon as one of these parallel copies completes, other data on the powerful nodes, Hadoop’s task scheduler may
tasks with the same input will be discarded. Speculative still launch them remotely on less powerful nodes if there
execution is originally design to handle failures so that are no available slots on the powerful ones. In the third
stragglers due to faulty hardware or misconfiguration cluster (denoted as Heter + Skew-sched), we forced that
would not slow down the entire job. Intuitively, speculative stragglers only run on powerful nodes.
execution can help mitigate data skew across tasks as pow- We calculated the statistical skewness in task runtimes in
P 3
erful machines who complete tasks sooner may help com- a job J as n1 i2J ðxi smÞ
3 , where xi is the runtime of individual
pute the straggler task. tasks, m and s represent the average and deviation, respec-
However, there are several reasons that speculative exe- tively. The lower the skewness, the more balanced task exe-
cution is not effective in mitigating data skew in practice. cution. Table 1 lists the task skewness of wordcount on the
First, speculative execution is only performed at the finish- three clusters. As expected, cluster Homo shows significant
ing stage of a job when free task slots are available on multi- skew in task completion with a skewness of 2:65. We can
ple machines. It can not address the data skew in a timely also see that creating a heterogeneous cluster alone only
manner since the skew exists in the task input. Second, mitigated the skew to a certain extent (e.g., reducing skew-
Hadoop uses a simple progress score, which measures the ness to 2:25) because Hadoop task scheduler may not run
fraction of data processed, to identify stragglers. As strag- stragglers on powerful nodes. In contrast, strictly requiring
glers can be due to many reasons in a cloud environment stragglers be run on powerful nodes significantly improved
(see discussions in 4.2), existing straggler identification in task balance with a skewness of 0:30. This confirms previous
Hadoop does not guarantee that stragglers with the most findings [40] that allocating more data and work on power-
resource needs will be accelerated. Last, although Hadoop ful nodes optimizes Hadoop performance in a heteroge-
considers data locality when launching speculative tasks, it neous cluster environment.
does not address load imbalance or consider the resource Unfortunately, it is difficult to predict which tasks will be
contention on Hadoop nodes. As such, speculative execu- the stragglers and determine their resource demands before
tion does not guarantee that stragglers would finish sooner actually running these tasks. There is a need for dynamic
and can even aggravate imbalance. resource allocation in Hadoop clusters in response to
stragglers.
3.3 Mitigating Skew with Heterogeneous Resource
Allocation 3.4 Automatically Tuning Hadoop with Demand-
In this section, we study how does heterogeneous resource Based Resource Allocation
allocation to map tasks help the overall job performance. Modern cloud management middleware, such as VMware
The data skew in reduce tasks can be mitigated by reparti- DRS, XenServer and OpenStack, supports demand-based
tioning and rebalancing intermediate data [18], [25], [26]. resource allocation, where a virtual cluster shares a resource
But a similar approach cannot be easily applied to map pool and individual nodes receive resources according to
tasks. We show that purposely allocating more resources to their estimated demands. This technique promotes efficient
nodes that run stragglers effectively mitigates map input usage of cluster resource and avoids the potential resource
skew. If not otherwise stated, tasks refer to map tasks wastage of manual configured resource reservation. Ideally,
throughout this paper. if nodes running stragglers show high demands, demand-
We created three 32-node Hadoop clusters, each with a based resource allocation is able to automatically tune a
total capacity of 153.6 GHz CPU resource and 128 GB mem- Hadoop cluster into a heterogeneous cluster that accelerates
ory in our university cloud. The first cluster (denoted as stragglers. However, we found that there exists a semantic
Homo) emulates a physical cluster with homogeneous gap between demand-based resource allocation and the
Fig. 3. Task runtime of wordcount on two clusters.

Fig. 4. Task distribution of wordcount on two clusters.
actual needs of Hadoop tasks, making incorrect decisions in
management effectively directs resource to map stragglers
resource allocations.
in a cloud environment. Next, we present FlexSlot, a user-
We created a resource pool of 153.6 GHz CPU resources
transparent flexible slot management scheme for Hadoop.
and 128 GB memory, which is shared by a 32-node Hadoop
cluster. VMware DRS was used to automatically allocate
4 FLEXSLOT DESIGN
resources on these virtual nodes. Figs. 3a and 3b show the
execution time of individual map tasks on a static homoge- We aim to mitigate skew in Hadoop applications by
neous cluster and the dynamic cluster with demand-based leveraging dynamic resource allocation in the cloud. We
resource allocation, respectively. From the figure, we can focus on a private cloud platform in which dynamic
see that demand-based resource allocation unexpectedly resource allocation to virtual clusters is automated by esti-
aggravated skewness in task execution time. Although the mating the demands of individual virtual nodes and adjust-
tasks on the left of Fig. 3b ran faster than those in the static ing their resources accordingly. In this section, we present
cluster, stragglers in the dynamic cluster (tasks on the right FlexSlot, an effective yet simple extension to the Hadoop’s
tail) appeared to be significant slower. slot management scheme that automatically identifies map
Fig. 4 shows the number of finished tasks on each virtual stragglers at run-time and adjusts both the map slot size
node in the two clusters. We find that in the dynamic clus- and number on the nodes hosting the stragglers. The
ter, some nodes ran 10 more tasks than the nodes with dynamic adjustment to slot configurations effectively
fewest tasks. An examination of the Hadoop execution log exposes the actual demands of stragglers to the low-level
and the VMware resource allocation log revealed that these virtual resource management and guides the demand-based
“hot spot” nodes ran mostly fast tasks and received dispro- allocation to efficiently allocate resources to stragglers.
portionately more resources than the “idle” nodes that
turned out to host straggler tasks. We attribute the counter- 4.1 Overview
intuitive resource allocations to the semantic gap between FlexSlot provides three functionalities: on-the-fly straggler
the VM resource demands estimated by the cloud manage- identification, automated slot reconfiguration and adaptive
ment and the actual Hadoop task demands. VMware DRS speculative execution.
computes a node’s CPU demand based on its recent actual Identifying Stragglers. FlexSlot continuously monitors two
CPU consumption and estimates the memory demand task-specific metrics during task execution: progress rate and
according to the ratio of touched pages in a set of randomly- input processing speed. Based on a synthesis of these metrics,
selected pages. FlexSlot is able to identify tasks that are abnormally slower
However, the demand-based allocation does not meet than their peers due to either uneven data distribution,
stragglers’ needs. Study [31] has shown that production expensive record or cloud interference. Once having deter-
Hadoop clusters are still dominated by I/O intensive jobs mined stragglers, FlexSlot infers their resource bottlenecks
and stragglers with more data are likely to spend a signifi- based on resource usages obtained on the straggler nodes.
cant amount of time waiting for disk I/O. Thus, nodes run- Proactively Changing the Size of Slots. If FlexSlot deter-
ning stragglers appear to be less busy than nodes with fast mines that a straggler’s performance is bottlenecked by I/O
tasks and are allocated less resource by the demand-based operations, it proactively terminates the straggler and
cloud management. For memory, a straggler is unable to restarts it with a larger slot size. Since it is hard to predict
use the additional memory allocated because its memory the memory requirement of a task, FlexSlot uses a trial-and-
usage is upper bounded by the JVM heap size of its slot. error approach to change the slot size at incremental steps.
The heap size of a slot is statically set in the cluster configu- The slot size increment continues until the overhead of task
ration file. The selection of the heap size is usually conserva- restarting outweighs the improvement on task processing
tive to prevent a node from memory thrashing. speed.
Summary: We have shown that skew in Hadoop map Adaptively Adjusting the Number of Slots. FlexSlot bridges
tasks can be effectively mitigated by accelerating stragglers the semantic gap between Hadoop tasks and the demand-
with more resources. However, demand-based dynamic based resource allocation by adaptively changing the num-
resource allocation in a cloud does not meet stragglers’ ber of slots on Hadoop nodes. Contrary to intuition, FlexSlot
needs and the fixed slot configuration in Hadoop prevents a adds more slots to straggler nodes in order to expose the
flexible use of dynamic resources. These findings motivated actual straggler demand to the cluster-level resource
us to make Hadoop run more efficiently in the cloud in the management.
presence of skew and the in place cloud management. We Adaptively Creating Speculative Task. FlexSlot creates spec-
found that a simple extension to Hadoop’s map slot ulative tasks for straggler mitigation in a timely manner.
Fig. 5. Task completion time and iowait time. Fig. 6. Task completion time and the cputime.
The configuration of the speculative tasks are adjusted in data skew in map tasks. A map task processes a collection
order to meet the resource demands of the straggler tasks. of key value pairs, each pair as one record. Due to data
skew, some records may require more resources to process
4.2 Identifying Stragglers than others, leading to more processing time. These expen-
There are three causes of straggler tasks in a cloud environ- sive records can simply contain large data that fills the out-
ment: (1) uneven data distribution among tasks; (2) non- put buffer quickly. When the output buffer is full,
uniform data processing time; and (3) performance interfer- intermediate results need to be written to disk. Thus, expen-
ence from co-running jobs. We measure two performance sive records cause more frequent disk I/O operations.
metrics during task execution. Hadoop provides progress score The inverted-index benchmark contains non-uniform
to represent the fraction of work completed. It is computed as records and incurs skewed task processing time. Fig. 5
the ratio of finished input size and the original input size. shows the breakdown of task execution time for all tasks.
Progress rate [40], which is defined as the change in progress Since expensive records incur excessive I/O operations, we
score in unit time, is a good online measure of task perfor- are interested in how much does I/O time (e.g., I/O wait)
mance. If all tasks process input data at the same speed, the contribute to the total task completion time. Tasks are sorted
tasks with smaller progress rates are likely to have large input in the ascending order of their completion time, with tasks
sizes. Another important metric is the input processing speed D. on the right tail considered as stragglers. We find that I/O
It counts how many bytes are processed in unit time. Ideally, wait contributed most to the prolonged completion time in
all tasks should have the same input processing speed given stragglers. We observed as much as 8 more I/O wait
that the data distribution is uniform across tasks and it takes time than that of regular tasks. Thus, a large portion of time
the same amount time to process the same amount of data. If spending in waiting for I/O is a good indicator of stragglers
some tasks exhibit apparently slower processing speed, it is with I/O bottleneck. A large slot memory size, e.g., a large
likely that their record are more expensive to process or they output buffer, can effectively reduce I/O operations.
are experiencing interferences. CPU Starvation Due to Inaccurate Demand Estimation. As
Neither of the two metrics alone can reliably determine shown in Fig. 4, demand-based resource allocation incurs se-
stragglers. Thus, we create a composite metric based on vere imbalance on task distribution, with some nodes heavily
these two. We normalize both metrics against their maxi- loaded and some nodes hosting only a few stragglers. The cul-
mum values among all tasks, both in the range of 0 to 1, and prit is that nodes running faster tasks received more resources
simply use their sum as a heuristic to identify stragglers. from cloud resource management. It enters a loop that nodes
Since progress rate can be expressed by D with more resources ran tasks even faster. Stragglers usually
S , where S is the
input data size and D is the input processing speed. There- incur more I/O operations and appear to be less busy in terms
fore, we use P ¼ ð1 þ S1 Þ D to measure task performance. of CPU usage. Thus, such tasks may become victims of CPU
We consider straggler tasks as outliers compared to regular starvation in demand-based resource allocation.
tasks. For every measurement interval, we calculate the value Fig. 6 shows the breakdown of task runtime of bench-
mark wordcount. It plots each task’s actual cputime, I/O
of P for all tasks and perform a k-means clustering algorithm
wait and steal time, which is the time that a task’s node
on the P values. The clustering algorithm divides all tasks
is ready to run but fails to get scheduled by the hypervisor.
into two clusters. The tasks in cluster with smaller P values
In the figure, we can see that steal time contributed most
are considered significantly different from the rest of tasks.
to the straggler runtime. If given sufficient CPU, these strag-
glers can be effectively accelerated. Thus, a large portion of
4.3 Determining Performance Bottlenecks
time being ready but not running is a good indicator of
Having stragglers identified by the proposed heuristic, Flex-
insufficient CPU resources. Next, we show that purposely
Slot then determines how to accelerate the execution of
coupling stragglers with faster tasks exposes the demands
stragglers. The major causes of stragglers are low disk I/O
of stragglers to the cluster-level resource manager.
performance or insufficient CPU resources.
Disk I/O Bottleneck Due to Data Skew. Previous studies
show that the number of input records can largely affect the 4.4 Resizing Slot Memory
task completion time even with the same amount of input Increasing the memory size of a slot can reduce the number
data [36], [37], [41]. The expensive record is a major type of of I/O operations and effectively expedites stragglers
caused by data skew. However, the memory that can be Algorithm 2. Flexible Slot Number
used by a slot is limited by the heap size of its JVM and
1: Variables: list of node L; list of steal time ST on these nodes
the output buffer size of the task running on it. Although
2:
the JVM heap size can be dynamically changed through
3: while true do
JVM memory ballooning, the output buffer size of a task
4: calculate the average steal time m;
cannot be altered without restarting the task. Thus, it is dif-
5: calculate the standard deviation of steal times s;
ficult to increase a slot’s memory size during task execution. 6: if s then
We rely on the straggler identification to determine strag- 7: break
glers in a timely manner. As such, we can afford to kill 8: end if
stragglers and restart them on larger slots. 9: for sti in ST do
Another question is how to efficiently determine the slot 10: Dsti ¼ sti m
size for straggler tasks. We propose an algorithm that is 11: end for
automatically invoked on all straggler tasks at every heart- 12: sort list DST in ascending order
beat. It determines the slot memory size for a straggler task. 13: find node lmax that has the maximum Dst
The pseudo-code for this algorithm is shown in Algorithm 1. 14: find node lmin that has the minimum Dst
The goal of this algorithm is to increase the slot memory size 15: lmin .removeMaxMapSlots(1)
for straggler tasks so that the performance of these tasks 16: lmax .addMaxMapSlots(1)
becomes close to that of regular tasks. For some straggler 17: end while
tasks, it may require multiple iterations to reach the desired
slot memory size. The objective is to minimize the steal time (defined in
Section 4.3) on the Hadoop cluster. To reduce the steal
Algorithm 1. Flexible Slot Size time on straggler nodes, which are either caused by CPU
starvation or interference, we increase the slot number on
1: Variables: Heartbeat interval T ; Data spilt size S;
these nodes in hopes that the additional slots will bring
2:
3: The killcount of task k is initialized to 0. faster tasks and finally increase the nodes’ CPU alloca-
4: /* Only apply to stragglers */ tion. We follow a simple heuristic of moving slots from
5: function RESIZESLOTMEM k the node with the smallest steal time to the nodes with
6: get current input processing speed D the largest. We ensure that the number of total task slots
7: if k:killcount ¼¼ 0 or D S
k:DSprev > T then in the cluster remain the same. The algorithm keeps cal-
8: kill k, free slot s culating the average node steal time m and its standard
9: s:size ¼ s:size ð1 þ aÞ deviation s. When s is greater than the threshold , the
10: launch k on slot s algorithm decides that the imbalance of CPU resource
11: k:killcount ¼ k:killcount þ 1 needs to be adjusted. Changing allows us to trade off
12: end if between the resource imbalance and the converging
13: k:Dprev ¼ D; speed of the algorithm. A large makes the algorithm
14: end function converging faster and more tolerant to resource imbal-
ance. In practice, the ideal value of is affected by the
Once a straggler is identified, the function ResizeSlot- CPU over-subscription ratio of the cluster. A high CPU
Mem is called to determine the appropriate memory size of over-subscription ratio introduces more interferences
the task. If the task has not gone through a memory resizing between applications [2]. In our experiment, we empiri-
(i.e., first time killing and killcount ¼ 0), we kill it immedi- cally set to 0:2m.
ately and increase the memory size of the slot at a step of a Changing the slot number on nodes not only affects
(empirically set to 0.2). Otherwise, we test if an additional cloud resource allocation but also influences Hadoop task
round of killing and memory resizing is needed. scheduling, leading to a better coordination between the
Suppose a task k’s input data size is S. For each round (at two. Without slot adjustment, each Hadoop node is config-
heartbeat intervals), we keep increasing the memory size of ured with the same number of slots but is allocated imbal-
the slot until the resulting decrease in the input processing anced resources. Dynamically changing slot number on
each node directs both task and resources to the same node.
time (i.e., D S
k:DSprev ) no longer outweighs the wasted proc-
For example, adding slots to a node brings more resources
essing time due to task killing (i.e., the last heartbeat inter- and tasks. The performance interference from co-running
val T ). Then, the slot resizing process is stopped. jobs is also mitigated as the objective of the algorithm is to
minimize steal time.
4.5 Adjusting Slot Number One concern of this algorithm is that newly added slots
The key rationale behind adjusting slot number on Hadoop can be assigned with a straggler task, aggravating the skew
nodes is to couple stragglers with fast tasks and expose and worsening the starvation on straggler nodes. However,
stragglers’ demand to cloud management. Once stragglers as majority of the tasks are regular tasks, the new slot is
are identified and their bottlenecks are determined as insuf- likely to be assigned to a regular task. Currently, all nodes
ficient CPU resource, FlexSlot tries to move slots from nodes are monitored for the automated slot memory resizing. If
running fast tasks to nodes running stragglers. We design a we find that the newly scheduled task is a straggler task, we
greedy algorithm for automating the slot adjustment. The kill it and reassign it to another node that has no stragglers
pseudo-code is shown in Algorithm 2. while trying to preserve data locality.
4.6 Adaptive Speculative Execution

Algorithm 3. Speculative Execution for Slot Memory
FlexSlot employs a set of strategies to improve the effective- Resizing
ness of speculative execution in dealing with performance
heterogeneity. First, speculative tasks are scheduled on the 1: Variables: List of speculative tasks P ; Data spilt size S;
task slots that are dynamically added to a slave node. This 2:
3: /* Only apply to stragglers */
allows a speculative task to run in a timely manner during
4: function SPECULATIVERESIZESLOTMEM k
the job execution. Second, FlexSlot guarantees that specula-
5: get list of nodes with the input data L
tive tasks and the corresponding straggler tasks do not
6: get current slot memory size s0
share the same node, which reduces resource contentions 7: s ¼ s0
and avoids potential faulty hardware. Third, a straggler 8: for nodei in L do
task will be terminated as soon as the speculative task is 9: if k is not on nodei then
found to be faster, which reduces the overhead of specula- 10: s ¼ s ð1 þ aÞ
tive execution. 11: start ki on nodei with s
In FlexSlot, each speculative task is executed with a cus- 12: P:addðki Þ
tomized configuration that matches the characteristics of 13: calculate processing speed Di for ki
the straggler task. This is to ensure that the speculative task 14: g ¼ Dsi
runs faster than the straggler task. Instead of simply identi- 15: end if
fying a straggler task based on the progress score or the 16: end for
remaining time, FlexSlot utilizes a straggler identification 17: find ki with the largest g i in P
method that considers both the input data size and data 18: kill k and other speculative tasks in P
processing rate (as discussed in Section 4.2) to find the task 19: end function
with the most resource needs for speculative execution. It
helps FlexSlot effectively identify stragglers due to different Using speculative execution allows FlexSlot to explore
reasons such as data skew and the lack of resources. multiple slot memory sizes at the same time. It signifi-
cantly reduces the time to search for the optimal slot
4.6.1 Speculative Execution for Slot Memory Resizing memory size. Moreover, starting the speculative tasks on
By default, FlexSlot increases the memory allocation of a nodes other than where the straggler task is running on
straggler task by killing and restarting the task with larger has a side effect. The speculative task will have higher
memory size. FlexSlot may need to kill a task multiple times chance to be scheduled on a slave node with more CPU
before reaching the optimal slot memory size. Although resource since the task will have a higher progress-
FlexSlot tries to minimize the overhead of killing the task, it speed-to-memory ratio.
is still possible that the overall performance gain is not sig-
nificant enough to outweigh the overhead of multiple 4.6.2 Load Balance for Speculative Execution
rounds of task killing. To this end, FlexSlot leverages specu-
FlexSlot uses a dedicated task slot on each slave for specula-
lative execution to perform memory resizing in a more effi-
tive tasks. However, adding additional task slots can exacer-
cient way. Instead of sequentially repeating the killing-
bate the unbalanced resource allocation between slave
restarting loop, FlexSlot spawns multiple simultaneous
nodes and may create more stragglers. As aforementioned,
speculative tasks with different memory sizes. Once the
FlexSlot tries to mitigate the resource imbalance by adjust-
speculative task with the optimal memory size is found,
ing the number of task slots on slave nodes. However, it
FlexSlot kills the remaining speculative tasks and the origi-
only changes one slot at each tuning cycle. Therefore, the
nal straggler. Thus, the speculative tasks only temporarily
newly added speculative task will require additional tuning
impact the overall system performance. We determine the
cycles. To address this issue, FlexSlot creates a speculative
optimal memory size as the one that leads to the highest
slot by replacing a regular task slot from the node. It main-
processing speed-to-memory ratio, which is a metric mea-
tains the same total number of slots on the slave nodes, and
suring the efficiency of data processing with respect to the
allows the algorithm of adjusting the number of slots to
memory resources allocated.
work effectively without change.
Algorithm 3 shows the speculative execution for resizing
slot memory. For each straggler task k, FlexSlot obtains the
list of nodes on which replicas of the straggler’s input data 5 IMPLEMENTATION
block are stored. FlexSlot starts a speculative task for k on Fig. 7 shows the architecture of FlexSlot. It consists of two
each of such nodes. Note that no speculative task is main components: SlotAgent and SlotManager. The
launched on the node where the straggler runs. FlexSlot SlotAgent is implemented in the TaskTracker. It pro-
increases the slot memory size of the speculative tasks at a vides the slot management interface that allows online
step of a. For example, if the slot memory size of k is s, then change of both map slot memory size and the number of
the slot memory size of the first and the second speculative slots. It also collects performance metrics in a slave node.
tasks will be s ð1 þ aÞ and s ð1 þ aÞ2 , respectively. Flex- The SlotManager is implemented in the JobTracker. It
Slot calculates the processing-speed-to-memory ratio g i for identifies straggler map tasks based on the performance
each speculative task ki . It iterates all the speculative tasks metrics reported by the SlotAgent, determines the slot
and finds the one with the largest g. All other speculative memory size for each straggler task and the number of slots
tasks and the straggler are killed once the optimal memory on each slave node. These two components are organized in
size is found. a client/server model and communicate via RPC (remote
We added two functions to modify the map slot number:

addMaxMapSlots() and removeMaxMapSlots(). They
change the variable maxMapSlots to increase or decrease
the number of map slots. The task launcher also keeps a
local copy of maxMapSlots. We have also modified the
task launcher so that their max slot variables can be
correctly updated via addMaxSlots() and removeMax-
Slots() functions.
Through these interfaces, we can change the slot memory
size and the number of slots without restarting the task
tracker. These changes are compatible with the slot-based
task scheduling in Hadoop. Any scheduler can use these
interfaces to perform the slot management during the task
Fig. 7. The architecture of FlexSlot. scheduling.
FlexSlot measures the cputime, the I/O wait time, and the
procedure call). To support the functionalities of these two steal time by reading the corresponding metrics from the
components, we also modified the implementations of proc file system of a VM. Typical hypervisors such as XEN
TaskTracker, TaskLauncher, and TaskScheduler. and VMware vSphere keep the tracks of these metrics. For a
In Hadoop, the mapred-site.xml file contains all the guest VM running Linux operating systems, these metrics
configurations of the MapReduce sub-system. All cluster are recorded in the cpustat in the proc file system.
parameters are set according to this file during the Hadoop
initialization and cannot be changed online. Therefore, we 6 EVALUATION
modified TaskTracker and TaskLauncher to support
online parameter update. Table 2 lists the functions that are 6.1 Testbed Setup
related to the slot management interface. We performed evaluations of FlexSlot on our university
Each task slot in Hadoop is a JVM that is created when a cloud. It consists of 8 HP BL460c G6 blade servers. Each
task is scheduled. It is managed by the task tracker in each server is equipped with two-way Intel quad-core Xeon
slave node and dedicated for the execution of one task. The E5530 CPUs and 64 GB memory. The servers are connected
memory size of a task slot is controlled by two parameters: with 10 Gbps Ethernet. VMware vSphere 5.1 is used to pro-
the io.sort.mb that controls the size of the buffer for map vide the server virtualization.
output and the mapred.child.jvm.opts that controls We used a 32-node virtual Hadoop cluster to evaluate
the maximum heap size of task slot JVM. These parameters FlexSlot. Each node was initially configured with 4 VCPU
need to be updated before a JVM is launched. and 4 GB memory. Depending on different experiments, the
FlexSlot maintains a hash table for tasks and their corre- resource allocation to individual Hadoop nodes can be fixed
sponding task slot configurations. These configurations can or managed by the demand-based resource allocation in
be changed with the setTaskConfiguration() function. VMware DRS. We deployed Hadoop stable release version
We have implemented setSlotMemory function for 1.1.1 and each VM ran Ubuntu Linux with kernel 2.6.24.
updating the slot memory allocation of a specific task. We Two nodes were configured as the JobTracker and Name-
modified the startNewTask function to apply the updated Node, respectively. The rest 30 nodes ran as slave nodes for
configuration before launching a new task. By updating the HDFS storage and MapReduce task execution. We set the
configuration table, we enable changing the configuration HDFS block size to its default value 64 MB. Each slave node
of a Hadoop cluster and its tasks online. was initially configured with four map slots and two reduce
For each slave node, Hadoop starts one task tracker. The slots and parameters io.sort.mb and mapred.child.
task tracker reads the number of task slots for the node jvm.opts were set to 100 and 200 MB, respectively. These
from the configuration file mapred-site.xml. It then initi- Hadoop task settings were dynamically adjusted by FlexSlot
alizes the maxMapSlots and maxReduceSlots variables during task execution.
in the task tracker. The task tracker periodically checks the For comparison, we use two recently proposed appr-
number of running task and the max slot number. At each oaches. We implemented SkewTune [26], a approach for
heartbeat, the task tracker makes new task request if there mitigating data skew. SkewTune parallelizes a straggler
are free slots available. task by repartitioning and redistributing its input data. It
TABLE 2
The Slot Management Interface
Function Class Functionality

setTaskConfiguration(TaskID tid, JobConf newConf) TaskTracker Update the configuration table for a specific task
setSlotMemory(TaskID tid, int memSize) TaskTracker Set memory size of the slot of a specific task
addMaxSlots(int numSlots) TaskLauncher Add slots to the task launcher.
removeMaxSlots(int numSlots) TaskLauncher Remove slots from the task launcher
addMaxMapSlots(int numSlots) TaskTracker Add map slots to the task tracker
removeMaxMapSlots(int numSlots) TaskTracker Remove map slots from the task tracker
TABLE 3
PUMA Benchmark Details
Benchmark Input Size (GB) Input Data

tera-sort 150 TeraGen
inverted-index 150 Wikipedia
term-vector 150 Wikipedia
wordcount 150 Wikipedia
grep 150 Wikipedia
k-means 30 Netflix data, k ¼ 6
histogram-movies 100 Netflix data
histogram-ratings 100 Netflix data
mitigates the data skew and improves the job completion

time. It assumes that all slave nodes have the same process-
Fig. 8. The CDF of task completion time.
ing capacity. SkewTune evenly distributes the unprocessed
data across all available nodes to mitigate the data skew.
However, in a virtualized environment, the resource alloca- learning. Thus we also include the pagerank and bayes
tion of different slave nodes can be different, especially in benchmark jobs from the Intel HiBench benchmark suite [3]
demand-based resource allocation. Evenly distributing the to provide a more comprehensive collection of benchmark
unprocessed data may not the best option. The existence of jobs.
hotspot nodes also incurs unnecessary data movements.
We also implemented MCP [10], a recently proposed 6.3 Mitigating Data Skew
strategy for speculative execution. MCP selects candidate In this section, we study the effectiveness of FlexSlot in miti-
tasks for speculative execution using exponentially gating data skew. If tasks take about the same time to finish
weighted moving average to predict the task processing even in the presence of skew, we consider that the skew has
speed and remaining time. It uses these information to esti- been mitigated. We measured the distribution of the task
mate the execution of speculative task on different node completion time of different benchmarks. We used the stock
and chooses the node that guarantees the earlier completion Hadoop with demand-based resource allocation in the
of the speculative task. It also tries to enforce data locality cloud as the baseline. The total number of task slots in Flex-
when choosing the node for speculative execution. How- Slot is set to be the same as that in stock Hadoop and Skew-
ever, MCP does not consider the performance bottleneck of Tune for a fair comparison.
straggler tasks, which limits its effectiveness in mitigating Fig. 8 shows the distribution of the task completion time
data skew. Simply restarting an I/O intensive straggler task of the inverted-index benchmark due to three different
on a different node is unlikely able to reduce the I/O wait approaches. The skewness of the task execution time due to
time. More importantly, starting speculative tasks on a fast the stock Hadoop, SkewTune, and FlexSlot are 4:98, 1:68
node without the awareness of the imbalance in resource and 0:96, respectively. The result shows that FlexSlot is the
allocation of VMs can aggravate performance disparity and most effective in mitigating data skew in all three
create more straggler tasks. approaches. Compared to SkewTune, FlexSlot significantly
reduced the task completion time of straggler tasks. FlexSlot
6.2 Workloads and SkewTune have similar performance on those regular
We used the PUMA benchmark suite [5] for evaluation. It tasks, but FlexSlot brings more improvement to the perfor-
contains various MapReduce benchmarks and real-world mance of straggler tasks than SkewTune does.
test inputs. Table 3 shows the benchmarks and their config- FlexSlot outperformed SkewTune for two reasons. First,
urations used in our experiments. FlexSlot continuously detects straggler tasks and reduces
These benchmarks are divided into three categories the execution time of straggler tasks with automated slot
based on the content of their input data. The tera-sort bench- memory resizing. On the other hand, SkewTune only miti-
mark uses a data set that is generated by TeraGen. The data gates a straggler task when there are free slots to parallel
has a relatively uniform distribution due to the randomized the straggler task. Second, SkewTune does not have the
generation process. The inverted-index, term-vector, wordcount coordination between the Hadoop task scheduler and the
and grep benchmarks use the data extracted from Wikipe- cloud infrastructure. It does not eliminate the hotspot
dia. The data contains records with different sizes. Some of nodes. SkewTune tends to use the hotspot nodes to parallel-
them are significantly larger than the average. This provides ize the execution of a straggler task because these nodes
a good example of data skew. The k-means, jhistogram-movies hold more resources. This introduces additional data move-
and histogram-ratings benchmarks use the data from Netflix. ment for the straggler tasks that can be finished quickly
They are good examples of expensive records. The content with simply more resource. FlexSlot does not have the prob-
of the records can significantly affect input processing time lems mainly because it moves slots, instead of tasks, across
even with the same record size. slave nodes. It preserves the data locality and minimizes the
As MapReduce is widely used in different areas, the data movement for mitigating data skew.
PUMA benchmark suite does not cover some new but The results in Fig. 8 also show that FlexSlot has longer
popular applications such graph processing and machine completion time for fast tasks than stock Hadoop has. It is
Fig. 9. Breakdown of task runtime of inverted-index.
due to the fact that running stock Hadoop with demand- Fig. 10. The normalized job completion time.
based resource allocation created “hot spot” nodes that have small memory demand due to the small volume of
boost these fast tasks. FlexSlot avoided “hot spot” nodes their intermediate results. The default configuration of slot
from receiving disproportional more resources. Note that memory size already provides sufficient output buffer.
making fast tasks run even faster does not help improve The k-means benchmark has a large volume of intermedi-
overall job completion time, which is bottlenecked primarily ate data. It requires large output buffer and more memory
by stragglers. to reduce I/O operations. It contains computation intensive
To further study how skew is mitigated, we show tasks that require a lot of CPU resource. FlexSlot achieved
detailed breakdown of task completion time. For compari- much shorter job completion time than stock Hadoop did
son, we also ran benchmarks in Hadoop and SkewTune on because FlexSlot was able to improve both the slot memory
static homogeneous virtual clusters (denoted as Homo). size and CPU resource allocation for k-means.
Fig. 9 shows the breakdown of average task execution time FlexSlot achieved 17:5, 1:2 and 3:3 percent shorter job
of inverted-index on different clusters. FlexSlot reduced the completion time than SkewTune in k-means, histogram-mov-
portion of steal time by 60:5 and 49:1 percent when com- ies and histogram-ratings benchmarks, respectively. FlexSlot
pared to stock Hadoop and SkewTune with demand-based only had marginal performance improvement over Skew-
resource allocation. The huge steal time of stock Hadoop Tune in histogram-movies and histogram-ratings benchmarks
and SkewTune with demand-based resource allocation is because their data skew is not severe and the CPU require-
the sign of imbalanced CPU resource allocation between ment is also low. But for the job with significant data skew
fast tasks and straggler tasks. The reduction of steal time and high CPU consumption, FlexSlot clearly showed a
suggests that FlexSlot was effective in preventing CPU star- significant advantage compared to SkewTune.
vation on stragglers. Moreover, FlexSlot reduced the I/O
wait time compared to stock Hadoop and SkewTune in 6.5 Mitigating Performance Interference
homogeneous clusters. FlexSlot achieved 39:7 and 26:7 per- Note that adjusting slot size and number can possibly
cent lower I/O wait time than stock Hadoop and Skew- resolve the interference between co-running jobs. To create
Tune, respectively. It suggests that FlexSlot reduce the I/O interferences, we consolidated two clusters with the same
operations for straggler tasks with larger slot memory size. configuration (see Section 6.1 for details) to the shared phys-
ical infrastructure. We submitted the same jobs to these two
6.4 Reducing Job Completion Time clusters and calculated their average job completion times.
We have shown that FlexSlot is effective in mitigating skew. We used the job completion time in stock Hadoop as the
In this section, we study how does the mitigation help baseline, and normalized FlexSlot and SkewTune’s perfor-
improve overall job completion time. Similarly, we use the mance against it.
job completion time in the stock Hadoop as the baseline and Fig. 11 shows the normalized job completion time of all
compare the normalized job completion time of FlexSlot benchmarks due to three approaches with performance
and SkewTune. Fig. 10 shows the normalized job comple-
tion time of all benchmarks due to these three approaches.
The results show that for benchmarks with expensive
records, e.g., inverted-index, term-vector, wordcount and grep,
FlexSlot outperformed stock Hadoop by 35:1, 29:3, 26:7 and
27:4 percent, respectively. FlexSlot also outperformed Skew-
Tune by 21:6, 17:1, 16:9 and 17:1 percent in these
benchmarks.
Benchmarks such as k-means, histogram-movies and histo-
gram-ratings use data from Netflix. The input data is rela-
tively uniform in record size. In the experiments with these
benchmarks, FlexSlot outperformed the stock Hadoop by
25, 13 and 12 percent, respectively. However, FlexSlot has
less performance improvement on histogram-movies and his- Fig. 11. The normalized job completion time with interferences of co-run-
togram-ratings than on k-means, because those benchmarks ning jobs.
Fig. 12. Job completion time with an unlimited number of slots.
interference from co-running jobs. The results show that

FlexSlot outperformed stock Hadoop and SkewTune by as
much as 43:8 and 33:2 percent, respectively. FlexSlot
achieved 34:2 43:8 percent shorter job completion time
than stock Hadoop in the inverted-index, term-vector, word-
count and grep benchmarks. FlexSlot also outperformed
SkewTune by 24:3 33:2 percent in these benchmarks. Due
to the performance among co-running jobs, the difference of
job completion time between FlexSlot, SkewTune, and stock
Hadoop is larger than the results in Fig. 10.
For benchmarks with less data skew, histogram-movies
and histogram-ratings, FlexSlot outperformed stock Hadoop
by 22 and 23 percent respectively, and outperformed Skew-
Tune by 17:1 and 18 percent respectively. The difference of
job completion time between SkewTune and stock Hadoop
is around 5 percent, because there is no much data skew to Fig. 13. Resource consumption of FlexSlot.
mitigate. The results clearly show that FlexSlot can further
improve the job completion time by mitigating the perfor- limited number of slots. But for CPU intensive jobs like k-
mance interference from co-running jobs in a cloud means, removing the limitation on slots makes no major differ-
environment. ence because the job performance is bottlenecked by the num-
ber of physical CPUs. Adding more slots does not allow more
6.6 Improving Resource Utilization tasks to run concurrently. In contrast, job such as histogram-
By default, FlexSlot keeps the total number of task slots movies and histogram-ratings have a significant portion of I/O
unchanged in a Hadoop cluster. Thus, any addition of task time. With a limited number of slots, the resources of the vir-
slots on one node should be coupled with the removal of tual cluster, especially the CPU resource, are not fully utilized.
slots on another node. As discussed in previous experiments, Fig. 12 shows that FlexSlot with unlimited slots achieved on
FlexSlot’s slot movement effectively mitigates skew and average 35 percent shorter job completion time compared to
improves job performance. In this section, we extend Flex- FlexSlot with limited slots.
Slot to handle unlimited number of slots. That is, FlexSlot Next, we compare the resource utilizations in different
adds or removes slot to/from a node only based on the per- FlexSlot variations. Figs. 13a and 13b show the CPU and
formance of tasks running on the node, without the con- memory usage of FlexSlot with and without a limit on the
straint of maintaining the total number of slots in the cluster. number of total slots. The results show that FlexSlot without
Although it is a common practice to match the map slot num- a slot limit resulted in 10 and 11 percent higher in CPU and
ber with the number of CPUs on a node, it is not considered memory utilization, respectively. As discussed above, these
the optimal configuration. With this experiment, we study if resources were used to accelerate job execution. Note that
an even more flexible slot management could further due to the different job characteristics of k-means, histogram-
improve cluster resource utilization and job performance. movies, and histogram-ratings benchmarks, the changes in
We compare two FlexSlot variations (i.e., with limited resource utilizations of these jobs were different. For the k-
slots and unlimited slots) with stock Hadoop. Fig. 12 means benchmark, the resource utilization was high even
shows the normalized job completion time of all benchmarks with a limited number of total slots. For histogram-movies
due to the three approaches. FlexSlot with an unlimited num- and histogram-ratings benchmarks, FlexSlot with a slot limit
ber of slots achieved up to 46 percent shorter job completion left roughly 15 and 26 percent CPU and memory resources
time than the stock Hadoop. It also outperformed the FlexSlot unused, respectively. Removing the limit improved the
using a limited number of slots by 35 percent. The improve- resource utilization significantly.
ment varies depending on the job characteristics. For most These results suggest that FlexSlot with unlimited slots
jobs, using an unlimited number of slots can bring approxi- can be a viable approach for Hadoop in the cloud. It
mately 17 percent shorter job completion time than using a improves the performance by maximizing the resource
Fig. 14. Job completion time with different speculative execution Fig. 15. Performance impact of the speculative execution on FlexSlot.
mechanisms.
implemented as iterative MapReduce jobs, which are sensi-
utilization. It does not affect the performance of jobs that tive to the workload imbalance in iterations. In this section,
have high resource demand because the slots will only be we study the effectiveness of FlexSlot in mitigating data
added when there is available resource. skew for iterative MapReduce jobs. For this experiment, we
choose the pagerank and bayes benchmarks from Intel
HiBench benchmark suite [3]. We compare FlexSlot to the
6.7 Speculative Execution
stock Hadoop, SkewTune, and MCP.
We have shown the effectiveness of the flexible slot manage-
Fig. 16 shows the normalized job completion time of all
ment of FlexSlot in mitigating data skew and improving
four approaches. The results show that FlexSlot outper-
resource utilization. In this section, we study the performance
formed the stock Hadoop by 43:7 and 47:2 percent in pag-
benefit of the adaptive speculative execution in FlexSlot. We
erank and bayes benchmarks, respectively. FlexSlot also
compare the adaptive speculative execution in FlexSlot with outperformed SkewTune by 30:1 and 36:2 percent in these
the speculative execution strategies in Hadoop and in MCP. two benchmarks. Note that the tasks in each iteration can
Fig. 14 shows the normalized job completion time of all benefit from the data skew mitigation which significantly
three speculative execution mechanisms. The results show reduces the delay due to load imbalance in iterations. Flex-
that FlexSlot outperformed stock Hadoop and MCP by as Slot is able to mitigate straggler tasks without waiting for a
much as 39:1 and 23:8 percent, respectively. FlexSlot free slot make it more effective than SkewTune in this case.
achieved 27:4 39:1 percent shorter job completion time
Comparing to MCP, FlexSlot achieved 34:1 and 40:4 percent
than stock Hadoop did in the inverted-index, term-vector,
shorter job completion time in pagerank and bayes. The
wordcount and grep benchmarks. The performance improve-
improvement of FlexSlot over MCP is mainly due to the cus-
ment is mainly due to the reason that FlexSlot performs the
tomized speculative task configuration and the early termi-
speculative execution throughout the entire lifespan of a
nation of straggler tasks, which guarantees the speculative
job. FlexSlot also outperformed MCP by 12:1 23:8 percent task will perform better than the straggler task. On the con-
in these benchmarks. Comparing to MCP, FlexSlot is more trast, MCP does not consider the impact of different config-
accurate in identifying straggler tasks. Starting the specula- urations when starting the speculative task. Thus it missed
tive tasks with optimized configurations also significantly the chance to further improves task performance.
improves the performance of FlexSlot. For benchmarks with
less data skew, i.e., histogram-movies and histogram-ratings,
6.9 Slot Resizing Overhead
FlexSlot outperformed stock Hadoop by 11 and 12 percent
respectively. The performance of FlexSlot and MCP on these FlexSlot uses task-killing-based approach in the slot mem-
two benchmarks are similar. ory resizing and allows tasks to be killed multiple times.
To further study the performance impact of the speculative The task killing inevitably incurs overhead as the killed
execution, we compare FlexSlot with and without the specula- stragglers lose already performed work. We measure the
tive execution. Fig. 15 shows the normalized job completion
time of two approaches. The results show FlexSlot with specu-
lative execution achieved 6:1 17:2 percent shorter job com-
pletion time than it without speculative execution on these
benchmarks. With speculative execution, FlexSlot is able to
try multiple slot memory sizes for the straggler task at the
same time, which reduces the overhead of killing the task
multiple times. Moreover, running the speculative task on a
different node ensures the task to be ran on a fast node which
further improve the performance of the task.
6.8 Performance Improvement on Iterative Jobs

Graph processing and machines learning are the emerging
use cases for MapReduce. These applications are usually Fig. 16. Job completion time of iterative MapReduce jobs.
Fig. 17. Average number of task kills for different benchmarks.

Fig. 18. Average number of task kills per straggler due to step a.
overhead using the number of kills per straggler. The fewer

two recent proposed skew mitigation and speculative exe-
the kills the smaller the overhead. Fig. 17 shows the average
cution approaches by 36:2 and 40:4 percent, respectively.
number of kills per straggler in different benchmarks. The
Our future work will be on extending the flexible slot man-
results show that most straggler only required one kill to
agement and the adaptive speculation execution approaches
run normally as fast as other tasks.
to the resource management in Apache Hadoop YARN.
The slot memory resizing algorithm increases a slot’s
memory size by step a for each task kill. The average num-
ber of task kills is affected by the value of a. Fig. 18 shows
ACKNOWLEDGMENTS
the average task kills due to different a values. For inverted- This research was supported in part by US National Science
index and k-means benchmarks, increasing a resulted in a Foundation grants CNS-1422119, CNS-1320122, CNS-
significant reduction in the average number of task kills. 1217979, and NSF of China research grant 61328203. The
These benchmarks are very sensitive to the memory size authors are grateful to the associate editor and anonymous
because they have large intermediate data. Using a larger a reviewers for their valuable suggestions. Xiaobo Zhou is the
increases the amount of memory resource that will be allo- corresponding author.
cated to them in each task kill. The histogram-movies bench-
mark, which does not have large intermediate data, is less REFERENCES
sensitive to the memory size. Allocating more memory to it
[1] Apache Hadoop Project. (2013). [Online]. Available: http://
during task killing does not have a great impact on its task hadoop.apache.org
execution time. A larger a value leads to more memory [2] Best Practices for Oversubscription of CPU, “Memory and Storage
resource waste. According to the observation shown in in vSphere Virtual Environments.” (2013). [Online]. Available:
https://communities.vmware.com/servlet/JiveServlet/
Fig. 18, we empirically set a to 0:2 in FlexSlot because bench- previewBody/21181-102-1-28328/vsphere-oversubscription-best-
marks with different types benefit from it while not incur- practices
ring significant memory waste. [3] HiBench. (2015). [Online]. Available: https://github.com/
intel-hadoop/HiBench
[4] OpenStack Wiki: Utilization based scheduler spec. (2012).
7 CONCLUSION [Onl ine]. Avail able: https://wi ki.openstack.org/wiki/
UtilizationBasedSchedulingSpec
Hadoop provides an open-source implementation of the [5] PUMA: Purdue mapreduce benchmark suite. (2012). [Online].
MapReduce framework. But its design poses challenges Available: http://web.ics.purdue.edu/fahmad/benchmarks.
to optimize performance due to the data skew. Moving htm
Hadoop into the cloud offers the possibility of mitigating [6] Virtualizing Apache Hadoop. (2012). [Online]. Available: http://www.
vmware.com/files/pdf/Benefits-of-Virtualizing-Hadoop.pdf
data skew with dynamic resource allocation. But Hadoop [7] F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijayku-
lacks of the coordination between its task scheduler and mar, “Tarazu: Optimizing mapreduce on heterogeneous clusters,”
the cloud management, which brings new challenges due in Proc. 17th Int Conf. Architectural Support Program. Lang. Operat-
to the performance interference and demand-based ing. Syst., 2012, pp. 61–74.
[8] G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica,
resource allocation. In this paper, we propose and design “Effective straggler mitigation: Attack of the clones,” in Proc. 10th
FlexSlot, an effective yet simple extension to the Hadoop’s USENIX Conf. Netw. Syst. Des. Implementation, 2013, pp. 185–198.
slot management that provides the flexibility to change [9] G. Ananthanarayanan, et al., “Reining in the outliers in map-
reduce clusters using Mantri,” in Proc. 9th USENIX Conf. Operating
the slot memory size and the number of slots in a Syst. Des. Implementation, 2010, pp. 265–278.
slave node online. We also design and develop an adap- [10] Q. Chen, C. Liu, and Z. Xiao, “Improving MapReduce perfor-
tive speculative execution that allows the speculative mance using smart speculative execution strategy,” IEEE Trans.
tasks to run with the optimal slot memory size. This Comput., vol. 63, no. 4, pp. 954–967, Apr. 2014.
[11] D. Cheng, J. Rao, Y. Guo, and X. Zhou, “Improving MapReduce
further helps mitigating data skew. We have imple- performance in heterogeneous environments with adaptive task
mented FlexSlot in Hadoop and evaluated its effective- tuning,” in Proc. 15th ACM/IFIP/USENIX Int. Middleware Conf.,
ness on a 32-node virtual Hadoop cluster with various 2014, pp. 97–108.
[12] R. C. Chiang and H. H. Huang, “TRACON: Interference-aware
workloads. Experimental results show that FlexSlot is scheduling for data-intensive applications in virtualized environ-
able to reduce job completion time by as much as 47:2 ments,” in Proc. Int Conf. High Perform. Comput., Netw. Storage
percent compared to stock Hadoop. It also outperformed Anal., 2011, pp. 1–12.
[13] E. Coppa and I. Finocchi, “On data skewness, stragglers, and [36] A. Verma, L. Cherkasova, and R. H. Campbell, “ARIA:
mapreduce progress indicators,” in Proc. 6th ACM Symp. Cloud Automatic resource inference and allocation for MapReduce
Comput., 2015, pp. 139–152. environments,” in Proc. 8th ACM Int. Conf. Autonomic Comput.,
[14] J. Dean and S. Ghemawat, “MapReduce: Simplified data process- 2011, pp. 235–244.
ing on large clusters,” in Proc. 6th Conf. Symp. Operating Syst. Des. [37] A. Verma, L. Cherkasova, and R. H. Campbell, “Resource provi-
Implementation, 2004, Art. no. 10. sioning framework for MapReduce jobs with performance goals,”
[15] R. Gandhi, D. Xie, and Y. C. Hu, “PIKACHU: How to rebalance in Proc. ACM/IFIP/USENIX 12th Int. Middleware Conf., 2011,
load in optimizing mapreduce on heterogeneous clusters,” in pp. 165–186.
Proc. USENIX Conf. Annu. Tech. Conf., 2013, pp. 61–66. [38] J. Wolf, et al., “FLEX: A slot allocation scheduling optimizer for
[16] A. Gulati, A. Holler, M. Ji, G. Shanmuganathan, C. Waldspurger, mapreduce workloads,” in Proc. 11th ACM/IFIP/USENIX Int. Mid-
and X. Zhu, “Vmware distributed resource management: Design, dleware Conf., 2010, pp. 1–20.
implementation and lessons learned,” VMware Tech. J., vol. 1, [39] N. J. Yadwadkar, G. Ananthanarayanan, and R. Katz, “Wrangler:
pp. 45–64, 2012. Predictable and faster jobs using fewer resources,” in Proc. ACM
[17] Y. Guo, J. Rao, C. Jiang, and X. Zhou, “FlexSlot: Moving hadoop into Symp. Cloud Comput., 2014, pp. 1–14.
the cloud with flexible slot management,” in Proc. ACM/IEEE Int. [40] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica,
Conf. High Perform. Comput. Netw. Storage Anal., 2014, pp. 959–969. “Improving MapReduce performance in heterogeneous environ-
[18] Y. Guo, J. Rao, and X. Zhou, “iShuffle: Improving hadoop perfor- ments,” in Proc. 8th USENIX Symp. Operating Syst. Des. Implemen-
mance with Shuffle-on-Write,” in Proc. 10th Int. Conf. Autonomic tation, 2008, pp. 29–42.
Comput., 2013, pp. 107–117. [41] Z. Zhang, L. Cherkasova, and B. T. Loo, “AutoTune: Optimizing
[19] S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu, “Maestro: execution concurrency and resource usage in MapReduce work-
Replica-aware map scheduling for MapReduce,” in Proc. 12th flows,” in Proc. 10th USENIX/ACM Int. Conf. Autonomic Comput.,
IEEE/ACM Int. Symp. Cluster Cloud Grid Comput., 2012, pp. 435– 2013, 175–181.
442.
[20] S. Ibrahim, H. Jin, L. Lu, S. Wu, B. He, and L. Qi, “LEEN: Locality/ Yanfei Guo received the BS degree in computer
fairness-aware key partitioning for MapReduce in the cloud,” science and technology from the Huazhong Uni-
in Proc. IEEE 2nd Int. Conf. Cloud Comput. Technol. Sci., 2010, versity of Science and Technology, China, in
pp. 17–24. 2010, and the PhD degree in computer science
[21] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Dis- from the University of Colorado, Colorado Springs
tributed data-parallel programs from sequential building blocks,” in 2015. He is currently a postdoc fellow at the
in Proc. 2nd ACM SIGOPS/EuroSys Eur. Conf. Comput. Syst., 2007, Argonne National Lab. His research interests
pp. 59–72. include cloud computing, big data processing and
[22] V. Jalaparti, H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, MapReduce, and HPC. He is a member of the
“Bridging the tenant-provider gap in cloud services,” in Proc. 3rd IEEE.
ACM Symp. Cloud Comput., 2012, Art. no. 10.
[23] D. Jiang, B. C. Ooi, L. Shi, and S. Wu, “The performance of
MapReduce: An in-depth study,” Proc. VLDB Endowment, vol. 3,
Jia Rao received the BS and MS degrees in com-
pp. 472–483, 2010.
puter science from Wuhan University in 2004 and
[24] H. Kang and J. L. Wong, “To hardware prefetch or not to pre-
2006, respectively, and the PhD degree from
fetch?: A virtualized environment study and core binding
Wayne State University in 2011. He is currently
approach,” in Proc. 18th Int. Conf. Architectural Support Program.
an assistant professor at the Department of Com-
Lang. Operating Syst., 2013, pp. 357–368.
puter Science, University of Colorado, Colorado
[25] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “Skew-resistant
Springs. His research interests include the areas
parallel processing of feature-extracting scientific user-defined
of resource auto-configuration, machine learning
functions,” in Proc. 1st ACM Symp. Cloud Comput., 2010, pp. 75–86.
and CPU scheduling on multi-core systems. He is
[26] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “SkewTune: Miti-
a member of the IEEE.
gating skew in mapreduce applications,” in Proc. ACM SIGMOD
Int. Conf. Manage. Data, 2012, pp. 25–36.
[27] B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy, “A platform
for scalable one-pass analytics using MapReduce,” in Proc. ACM Changjun Jiang received the PhD degree from
SIGMOD Int. Conf. Manage. Data, 2011, pp. 985–996. the Institute of Automation, Chinese Academy of
[28] R. Nathuji, A. Kansal, and A. Ghaffarkhah, “Q-Clouds: Managing Sciences, Beijing, China, in 1995 and conducted
performance interference effects for QoS-aware clouds,” in Proc. post-doctoral research at the Institute of Comput-
5th ACM Eur. Conf. Comput. Syst., 2010, pp. 237–250. ing Technology, Chinese Academy of Sciences,
[29] X. Pu, L. Liu, Y. Mei, S. Sivathanu, Y. Koh, and C. Pu, in 1997. Currently he is a professor with the
“Understanding performance interference of I/O workload in vir- Department of Computer Science and Engineer-
tualized cloud environments,” in Proc. IEEE 3rd Int. Conf. Cloud ing, Tongji University, Shanghai. His current areas
Comput., 2010, pp. 51–58. of research are concurrency processing and intel-
[30] S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, and ligent web systems. He is a member of the IEEE.
D. Reeves, “Sailfish: A framework for large scale data processing,”
in Proc. 3rd ACM Symp. Cloud Comput., 2012, Art. no. 4.
[31] K. Ren, Y. Kwon, M. Balazinska, and B. Howe, “Hadoop’s adoles-
cence: An analysis of Hadoop usage in scientific workloads,” Proc. Xiaobo Zhou received the BS, MS, and PhD
VLDB Endowment, vol. 6, pp. 853–864, 2013. degrees in computer science from Nanjing Uni-
[32] G. Shanmuganathan, A. Gulati, and P. Varman, “Defragmenting versity, in 1994, 1997, and 2000, respectively. He
the cloud using demand-based resource allocation,” in Proc. ACM is currently a professor and the chair at the
SIGMETRICS/Int. Conf. Meas. Modeling Comput. Syst., 2013, Department of Computer Science, University of
pp. 67–80. Colorado, Colorado Springs. His research lies
[33] Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “CloudScale: Elastic broadly in Distributed and Cloud computing,
resource scaling for multi-tenant cloud systems,” in Proc. 2nd Datacenters, BigData parallel processing, auto-
ACM Symp. Cloud Comput., 2011, Art. no. 5. nomic and sustainable computing. He was a
[34] J. Tan, et al., “DynMR: Dynamic MapReduce with reduce task recipient of NSF CAREER Award in 2009. He is a
interleaving and map task backfilling,” in Proc. 9th Eur. Conf. Com- senior member of the IEEE.
put. Syst., 2014, Art. no. 2.
[35] V. K. Vavilapalli, et al., “Apache Hadoop YARN: Yet another
" For more information on this or any other computing topic,
resource negotiator,” in Proc. 4th ACM Symp. Cloud Comput., 2013,
Art. no. 5. please visit our Digital Library at www.computer.org/publications/dlib.

Moving_Hadoop_into_the_Cloud_with_Flexible_Slot_Management_and_Speculative_Execution

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Moving_Hadoop_into_the_Cloud_with_Flexible_Slot_Management_and_Speculative_Execution

Uploaded by

Copyright:

Available Formats

798 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO.

Moving Hadoop into the Cloud with Flexible Slot

H ADOOP, the open source implementation of the

3 BACKGROUND AND MOTIVATIONS

3.1 Skew in Hadoop MapReduce

Homo Heter Heter + Skew-Sched

configurations on each node. The resources were evenly dis-

Fig. 3. Task runtime of wordcount on two clusters.

4.6 Adaptive Speculative Execution

We added two functions to modify the map slot number:

Function Class Functionality

Benchmark Input Size (GB) Input Data

mitigates the data skew and improves the job completion

Fig. 9. Breakdown of task runtime of inverted-index.

Fig. 12. Job completion time with an unlimited number of slots.

interference from co-running jobs. The results show that

6.8 Performance Improvement on Iterative Jobs

Fig. 17. Average number of task kills for different benchmarks.

overhead using the number of kills per straggler. The fewer

You might also like