Professional Documents
Culture Documents
Abstract—Task scheduling is a central issue in the realm II. D IRECTED ACYCLIC G RAPHS
of parallel processing, and recently in cloud computing. The
directed acyclic graph (DAG) is a well-known technique used to
The directed acyclic graph (DAG) is a graph consisting of
represent scheduling of computational tasks. Many researchers nodes and edges (aka links). Each node in DAG represents a
have studied task scheduling under various constraints and computational task of a certain application, while links exist to
proposed alternative solutions. The purpose of this research is model the cost of communication (latency) between tasks. In
to edify task scheduling on the cloud using a recent bio-inspired DAG, one seeks to optimally assign tasks to virtual machines
optimization technique, the Grasshopper Optimization Algorithm
(GOA). The proposed method is compared with the state-of-the-
on a cloud in such a way that sustains priorities among tasks,
art techniques in this area, where a reduction of 10% in the while keeps the total execution time to the minimum [1]. This
makespan is obtained. is a form of topological sorting applied on DAGs that produces
Index Terms—Directed acyclic graph (DAG), task scheduling, a linear ordering of the graph’s nodes.
cloud computing, makespan, scientific workflow To take an example of how task scheduling works, let us
consider the example depicted on the left part of Fig. 1. In
this workflow, there are seven tasks denoted as nodes in a
I. I NTRODUCTION directed graph. Each node has a name (task name) and a
duration. For example, Task A requires 10 time units to finish.
Task scheduling on clouds is a challenging, yet an important When scheduling these tasks, it is important to observe the
subject in cloud computing. The purpose of the scheduling interdependencies between the tasks in order not to violate
process is to obtain the optimum schedule in terms of the task priorities indicated by arrows on the graph. The DAG of
overall execution time, called the makespan, at the minimum Fig. 1 can be represented using a weighted adjacency matrix
expense of the available resources, represented by the number as:
of used virtual machines. The scheduler also aims at satisfying ⎡ ⎤
additional goals like obtaining the optimum load balancing, 10 1 1 0 0 0 0
⎢0 7 0 1 1 0 0⎥
achieving fault tolerance, and reducing the amount of con- ⎢ ⎥
⎢0 0 12 0 0 0 1⎥
sumed energy. ⎢ ⎥
Recent works of the researchers on the field of task W AMDAG =⎢
⎢0 0 0 5 0 0 0⎥
⎥ (1)
⎢0 0 0 0 3 1 0⎥
scheduling on clouds have considered various techniques and ⎢ ⎥
⎣0 0 0 0 0 4 0⎦
approaches. We review the recent research in this area in Sec-
tion III. Among these techniques, researchers have employed 0 0 0 0 0 0 2
nature-inspired optimization techniques including genetic al- In this representation, the diagonal entries represent the
gorithms (GA), particle swarm optimization (PSO), honey execution times of tasks. Values of off-diagonal entries indicate
bees, ant colony, and flower pollination. Our contribution in interdependencies between tasks, in addition to the communi-
this paper is to employ a nature-inspired optimization tech- cation cost when tasks are executed on multiple machines.
nique based on the behavior of grasshoppers in the problem Suppose that we wish to schedule the workflow of Fig. 1
of Task scheduling on clouds. on a single machine, and let us assume that no communica-
The rest of this paper is organized as follows: In the tion overhead is needed when scheduling tasks on the same
next section, we talk about directed acyclic graphs (DAGs) machine. Then the following would be valid orderings:
and how they can be used to schedule tasks on the cloud. A→C→G→B→D→E→F
Section III surveys the state-of-the-art related work on the area A→C→G→B→E→F→D
of DAG scheduling. The proposed methodology is presented in A→B→D→E→F→C→G
Section IV. Next, on Section V the experimental results with A→B→E→F→D→C→G
discussions are given. The paper is concluded in Section VI. A→B→C→D→E→F→G
11
The authors of [11] propose a scheduling algorithm and Tˆd is the best found solution in the search space, so far.
resource provisioning for workflow applications to provide The scheduling problem of DAG tasks on the cloud has an
Infrastructure as a Service (IaaS) on clouds, where virtual optimization nature. Therefore, the proposed solution in this
machines are charged on the basis of pay-per-use. The pro- paper employs GOA to get the optimum scheduling possible.
posed algorithm employs particle swarm optimization (PSO), The proposed approach is shown in Fig. 3.
which is a meta-heuristic optimization method suitable for
such scheduling problem. From the alternative solutions, PSO
choses the solution with a makespan closer to the workflow
deadline. In [12], the authors propose a technique called OrgQ
for joint server provisioning aiming at reducing total costs of
delay sensitive jobs (SENs) and queue delay of delay-tolerant
jobs (TOLs).
The authors in [13] propose a scheme called Scientific
Workflow Mining as a Service (SWMaaS) to provide mining
of scientific workflows in clouds including intra, as well as
inter-cloud. The scientific workflow comprises activities, or
tasks that are modeled using DAGs. The authors of [14]
Fig. 3. Proposed methodology.
also studied scheduling of scientific workflows on clouds
and proposed an algorithm called Resource Demand Aware
Looking at Fig. 3, we can see that GOA is the core
Scheduling (RDAS) for that purpose, which partitions work-
solution to the DAG scheduling problem. GOA produces the
flows and then fairly assigns resources.
optimal output schedule. Similar to [11], the encoding of this
IV. P ROPOSED M ETHODOLOGY scheduling problem is performed by using the dimensionality
of the hyper dimensional space to represent the number of
Grasshopper Optimization Algorithm (GOA) is a newly de-
tasks in the workflow and the grasshopper’s position represents
veloped stochastic evolutionary optimization technique based
the available number of resources (indicated by the available
on the behavior of grasshopper swarms in nature. GOA
number of virtual machines in the cloud). These virtual
is a gradient-free optimization method that was proposed
machines actually run the scheduled tasks. The objective is
by Saremi et al [15]. The model of interaction between
to minimize the fitness function of GOA. In the proposed
grasshoppers is represented as attractions and repulsions
approach, the fitness function is the makespan, which is the
among grasshoppers in the swarm. The comfort zone is defined
total execution time of all tasks on the cloud.Moreover, the
as the region, where there is neither attraction nor repulsion.
total execution time needed to run all these tasks on the
The equation used in GOA is given by [15]:
available virtual machines should not exceed the workflow’s
⎛ ⎞ deadline.
N
ubd − lbd χ − χ
Xid = c ⎝ c s(|χdj − χdi |)
j i ⎠ + Tˆd , V. E XPERIMENTAL RESULTS
2 dij In the experiments conducted in this research, the five
j=1,j=i
(2) Pegasus Workflows available at [16] have been used. These are
where: Montage, CyberShake, Epigenomics, LIGO’s Inspiral Anal-
Xid denotes the position of the i-th grasshopper in the Dth ysis, and the SIPHT workflows.The Montage workflow was
dimension. created by NASA/IPAC to come up with mosaics of the sky
c is a decreasing factor that is used to cause the comfort, from a number of input pictures. The Southern Calfornia
repulsion, and attraction regions to shrink. Earthquake Center uses the CyberShake workflow to study
i and j are indexes of grasshoppers. earthquakes. The Epigenomics workflow helps in genome
N is the number of grasshoppers used in the simulations. sequence processing and was invented by the USC Epigenome
ubd is the upper bound in the Dth dimension. Center in addition to and the Pegasus Team. The LIGO’s
lbd is the lower bound in the Dth dimension. Inspiral Analysis workflow (shown in Fig. 4 [16], as an
The purpose of the function s is to calculate the social forces example) is used in the generation and analysis of gravitational
among grasshoppers, and it is defined as: waveforms, while the SIPHT workflow is employed to find
−r untranslated RNAs (sRNAs) in bacterial replicons of the NCBI
s(r) = f e l − e−r (3)
database [16]. With the advent of cloud computing, scheduling
The parameters f and l denote the intensity of attraction and of such complex workflows manifests as a challenge, where
the attractive length scale, respectively. in addition to the scheduling problem, it is important to
dij represents the distance between grasshopper i and consider the heterogeneity and resource provisioning of virtual
grasshopper j, and is calculated using: machines with performance differences, as well [11].
The CloudSim [17] cloud simulating framework was used
dij = |χj − χi |, (4) in conducting the experiments. The makespan for the five
12
Fig. 5. Average of normalized makespans of the five workflows Montage,
CyberShake, Epigenomics, LIGO’s Inspiral Analysis, and SIPHT. The pro-
posed method is compared to Rodriguez and Buyya [11] and Almiani et al.
Fig. 4. The LIGO’s Inspiral Analysis workflow from the Pegasus Workflows
[14].
[16].
13