You are on page 1of 4

2019 6th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/ 2019 5th IEEE

International Conference on Edge Computing and Scalable Cloud (EdgeCom)

Efficient Task Scheduling for Applications on


Clouds
Hussein Al-Zoubi
Computer Science Department,
School of Electrical Engineering and Information Technology,
German Jordanian University,
Amman, Jordan.
hussein.alzoubi@gju.edu.jo

Abstract—Task scheduling is a central issue in the realm II. D IRECTED ACYCLIC G RAPHS
of parallel processing, and recently in cloud computing. The
directed acyclic graph (DAG) is a well-known technique used to
The directed acyclic graph (DAG) is a graph consisting of
represent scheduling of computational tasks. Many researchers nodes and edges (aka links). Each node in DAG represents a
have studied task scheduling under various constraints and computational task of a certain application, while links exist to
proposed alternative solutions. The purpose of this research is model the cost of communication (latency) between tasks. In
to edify task scheduling on the cloud using a recent bio-inspired DAG, one seeks to optimally assign tasks to virtual machines
optimization technique, the Grasshopper Optimization Algorithm
(GOA). The proposed method is compared with the state-of-the-
on a cloud in such a way that sustains priorities among tasks,
art techniques in this area, where a reduction of 10% in the while keeps the total execution time to the minimum [1]. This
makespan is obtained. is a form of topological sorting applied on DAGs that produces
Index Terms—Directed acyclic graph (DAG), task scheduling, a linear ordering of the graph’s nodes.
cloud computing, makespan, scientific workflow To take an example of how task scheduling works, let us
consider the example depicted on the left part of Fig. 1. In
this workflow, there are seven tasks denoted as nodes in a
I. I NTRODUCTION directed graph. Each node has a name (task name) and a
duration. For example, Task A requires 10 time units to finish.
Task scheduling on clouds is a challenging, yet an important When scheduling these tasks, it is important to observe the
subject in cloud computing. The purpose of the scheduling interdependencies between the tasks in order not to violate
process is to obtain the optimum schedule in terms of the task priorities indicated by arrows on the graph. The DAG of
overall execution time, called the makespan, at the minimum Fig. 1 can be represented using a weighted adjacency matrix
expense of the available resources, represented by the number as:
of used virtual machines. The scheduler also aims at satisfying ⎡ ⎤
additional goals like obtaining the optimum load balancing, 10 1 1 0 0 0 0
⎢0 7 0 1 1 0 0⎥
achieving fault tolerance, and reducing the amount of con- ⎢ ⎥
⎢0 0 12 0 0 0 1⎥
sumed energy. ⎢ ⎥
Recent works of the researchers on the field of task W AMDAG =⎢
⎢0 0 0 5 0 0 0⎥
⎥ (1)
⎢0 0 0 0 3 1 0⎥
scheduling on clouds have considered various techniques and ⎢ ⎥
⎣0 0 0 0 0 4 0⎦
approaches. We review the recent research in this area in Sec-
tion III. Among these techniques, researchers have employed 0 0 0 0 0 0 2
nature-inspired optimization techniques including genetic al- In this representation, the diagonal entries represent the
gorithms (GA), particle swarm optimization (PSO), honey execution times of tasks. Values of off-diagonal entries indicate
bees, ant colony, and flower pollination. Our contribution in interdependencies between tasks, in addition to the communi-
this paper is to employ a nature-inspired optimization tech- cation cost when tasks are executed on multiple machines.
nique based on the behavior of grasshoppers in the problem Suppose that we wish to schedule the workflow of Fig. 1
of Task scheduling on clouds. on a single machine, and let us assume that no communica-
The rest of this paper is organized as follows: In the tion overhead is needed when scheduling tasks on the same
next section, we talk about directed acyclic graphs (DAGs) machine. Then the following would be valid orderings:
and how they can be used to schedule tasks on the cloud. A→C→G→B→D→E→F
Section III surveys the state-of-the-art related work on the area A→C→G→B→E→F→D
of DAG scheduling. The proposed methodology is presented in A→B→D→E→F→C→G
Section IV. Next, on Section V the experimental results with A→B→E→F→D→C→G
discussions are given. The paper is concluded in Section VI. A→B→C→D→E→F→G

978-1-7281-1661-7/19/$31.00 ©2019 IEEE 10


DOI 10.1109/CSCloud/EdgeCom.2019.00012
The machine cannot, for example, execute Task B before Task • Priorities among tasks should be preserved.
A because Task B is dependent on Task A; equivalently, we say • The makespan should be minimized.
that Task A has higher priority than Task B. For any of these • The number of virtual machines should be minimized.
schedules, the machine will need 43 time units to finish all
III. R ELATED WORK
the tasks. This is so because the machine cannot start working
on a task before it finishes working on a previous task, and Task scheduling of DAGs is an NP-complete problem in
no communication is needed since all the tasks are executed the general case and in some special cases [1], [2]. Earlier
on a single machine. works that considered DAG scheduling includes the following
techniques [2]:
• Edge-Zeroing (EZ) method proposed in [3].
• Modified Critical Path (MCP) and Mobility Directed
(MD) techniques [4].
• Earliest Task First (ETF) [5].
• Dynamic Level Scheduling (DLS) approach [6].
• Dominant Sequence Clustering (DSC) [7].
For example, in [1], the authors proposed two algorithms
to improve the performance and reduce the cost of scheduling
application graphs of a system comprised of heterogeneous
processors on fully-connected networks. The first schedul-
ing algorithm is called Heterogeneous Earliest-Finish-Time
Fig. 1. A workflow consisting of 7 tasks (left) and after being scheduled on
3 virtual machines (right). (HEFT), which was successful in solving the directed acyclic
graph (DAG) scheduling problem, and the second is called
The right part of Fig. 1 shows as an example of scheduling Critical-Path-on-a-Processor (CPOP) algorithm.
the workflow on 3 virtual machines, and Fig. 2 shows the Previously, evolutionary optimization algorithms, like the
timing breakdown. We can see here that completing the seven genetic algorithm (GA), were used. However, GA requires
tasks requires 26 time units (this is called the makespan), long execution times and finding the optimal control param-
compared to 43 time units, with a reduction of 40% of eters is not easy [2]. In [2], the authors compare between
the makespan. However, three virtual machines are needed, the six previously mentioned scheduling algorithms, namely:
compared to 1. Edge Zeroing (EZ), Modified Critical Path (MCP), Mobility
The critical path is the execution path comprising the Directed (MD), Earliest Task First (ETF), Dynamic Level
sequence of tasks and communications demanding the longest Scheduling (DLS), and Dominant Sequence Clustering (DSC)
execution time from entry to exit [2]. The critical path deter- algorithms, and propose the Dynamic Critical-Path algorithm
mines the makespan of a workflow. In our example, the critical to perform static scheduling for task graph allocation of
path is: multiprocessors on fully-connected networks.
A→B→E→F In many situations, simultaneous scheduling of more than
one DAG is needed, where fairness appears to be an important
issue. In [8], the authors aimed at providing fairness for the
problem of scheduling multiple DAGs on a heterogeneous
network. The approach adopted by the authors is based on
combining multiple DAGs into one in a dynamic manner, and
then dealing with a single DAG.
On the other hand, it is important as well to consider
additional issues like minimizing the probability of failure in
order to maximize reliability. However, this seems to conflict
with minimizing makespan. In [9], the authors propose an
algorithm called Bi-objective Scheduling Algorithm (BSA)
that has two conflicting objectives for scheduling parallel
tasks on heterogeneous distributed computing environments:
performance (shortest time), as well as reliability, and thus
compromises between the two objectives.
DAG scheduling is also important in other domains such as
Fig. 2. Scheduling the workflow of Fig. 1 on 3 virtual machines.
on System on Programmable Chips (SoPCs), where applica-
tions compete for computational resources. In [10], the authors
In light of this, DAG scheduling represents a constrained propose a technique called Schedulers-Driven for scheduling
optimization problem with the following constraints: and placement of real time tasks on SoPCs using multiple
• Workflow deadline should be met. DAGs and following the approach of [8].

11
The authors of [11] propose a scheduling algorithm and Tˆd is the best found solution in the search space, so far.
resource provisioning for workflow applications to provide The scheduling problem of DAG tasks on the cloud has an
Infrastructure as a Service (IaaS) on clouds, where virtual optimization nature. Therefore, the proposed solution in this
machines are charged on the basis of pay-per-use. The pro- paper employs GOA to get the optimum scheduling possible.
posed algorithm employs particle swarm optimization (PSO), The proposed approach is shown in Fig. 3.
which is a meta-heuristic optimization method suitable for
such scheduling problem. From the alternative solutions, PSO
choses the solution with a makespan closer to the workflow
deadline. In [12], the authors propose a technique called OrgQ
for joint server provisioning aiming at reducing total costs of
delay sensitive jobs (SENs) and queue delay of delay-tolerant
jobs (TOLs).
The authors in [13] propose a scheme called Scientific
Workflow Mining as a Service (SWMaaS) to provide mining
of scientific workflows in clouds including intra, as well as
inter-cloud. The scientific workflow comprises activities, or
tasks that are modeled using DAGs. The authors of [14]
Fig. 3. Proposed methodology.
also studied scheduling of scientific workflows on clouds
and proposed an algorithm called Resource Demand Aware
Looking at Fig. 3, we can see that GOA is the core
Scheduling (RDAS) for that purpose, which partitions work-
solution to the DAG scheduling problem. GOA produces the
flows and then fairly assigns resources.
optimal output schedule. Similar to [11], the encoding of this
IV. P ROPOSED M ETHODOLOGY scheduling problem is performed by using the dimensionality
of the hyper dimensional space to represent the number of
Grasshopper Optimization Algorithm (GOA) is a newly de-
tasks in the workflow and the grasshopper’s position represents
veloped stochastic evolutionary optimization technique based
the available number of resources (indicated by the available
on the behavior of grasshopper swarms in nature. GOA
number of virtual machines in the cloud). These virtual
is a gradient-free optimization method that was proposed
machines actually run the scheduled tasks. The objective is
by Saremi et al [15]. The model of interaction between
to minimize the fitness function of GOA. In the proposed
grasshoppers is represented as attractions and repulsions
approach, the fitness function is the makespan, which is the
among grasshoppers in the swarm. The comfort zone is defined
total execution time of all tasks on the cloud.Moreover, the
as the region, where there is neither attraction nor repulsion.
total execution time needed to run all these tasks on the
The equation used in GOA is given by [15]:
available virtual machines should not exceed the workflow’s
⎛ ⎞ deadline.

N
ubd − lbd χ − χ
Xid = c ⎝ c s(|χdj − χdi |)
j i ⎠ + Tˆd , V. E XPERIMENTAL RESULTS
2 dij In the experiments conducted in this research, the five
j=1,j=i
(2) Pegasus Workflows available at [16] have been used. These are
where: Montage, CyberShake, Epigenomics, LIGO’s Inspiral Anal-
Xid denotes the position of the i-th grasshopper in the Dth ysis, and the SIPHT workflows.The Montage workflow was
dimension. created by NASA/IPAC to come up with mosaics of the sky
c is a decreasing factor that is used to cause the comfort, from a number of input pictures. The Southern Calfornia
repulsion, and attraction regions to shrink. Earthquake Center uses the CyberShake workflow to study
i and j are indexes of grasshoppers. earthquakes. The Epigenomics workflow helps in genome
N is the number of grasshoppers used in the simulations. sequence processing and was invented by the USC Epigenome
ubd is the upper bound in the Dth dimension. Center in addition to and the Pegasus Team. The LIGO’s
lbd is the lower bound in the Dth dimension. Inspiral Analysis workflow (shown in Fig. 4 [16], as an
The purpose of the function s is to calculate the social forces example) is used in the generation and analysis of gravitational
among grasshoppers, and it is defined as: waveforms, while the SIPHT workflow is employed to find
−r untranslated RNAs (sRNAs) in bacterial replicons of the NCBI
s(r) = f e l − e−r (3)
database [16]. With the advent of cloud computing, scheduling
The parameters f and l denote the intensity of attraction and of such complex workflows manifests as a challenge, where
the attractive length scale, respectively. in addition to the scheduling problem, it is important to
dij represents the distance between grasshopper i and consider the heterogeneity and resource provisioning of virtual
grasshopper j, and is calculated using: machines with performance differences, as well [11].
The CloudSim [17] cloud simulating framework was used
dij = |χj − χi |, (4) in conducting the experiments. The makespan for the five

12
Fig. 5. Average of normalized makespans of the five workflows Montage,
CyberShake, Epigenomics, LIGO’s Inspiral Analysis, and SIPHT. The pro-
posed method is compared to Rodriguez and Buyya [11] and Almiani et al.
Fig. 4. The LIGO’s Inspiral Analysis workflow from the Pegasus Workflows
[14].
[16].

[3] V. Sarkar, Partitioning and Scheduling Parallel Programs for Multipro-


workflows Montage, CyberShake, Epigenomics, LIGO’s In- cessors, Cambridge, Mass: MIT Press, 1989.
spiral Analysis, and SIPHT was measured and normalized. [4] M.Y. Wu and D.D. Gajski, “Hypertool: A Programming Aid for
The proposed work is compared to the most related work of Message-Passing Systems,” IEEE Transactions on Parallel and Dis-
tributed Systems, Vol. 1, No. 3, pp. 330–343, 1990.
Rodriguez and Buyya [11] and Almiani et al. [14]. Fig. 5 [5] J.J. Hwang, Y.C. Chow, F.D. Anger, and C.Y. Lee, “Scheduling Prece-
shows the obtained results. In these experiments, the same dence Graphs in Systems with Interprocessor Communication Times,”
number of virtual machines was used in the three cases SIAM Journal of Computing, Vol. 18, No. 2, pp. 244–257, 1989.
[6] G.C. Sih and E.A. Lee, “A Compile-Time Scheduling Heuristic for
(the optimum number), to shift our focus into minimizing Interconnection-Constrained Heterogeneous Processor Architectures,”
the makespan. We can see from Fig. 5 that the proposed IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 2,
approach is able to make about 10% reduction in the average pp. 75–187, 1993.
[7] T. Yang and A. Gerasoulis, “DSC: Scheduling Parallel Tasks on an
of normalized makespans of the five workflows. Unbounded Number of Processors,” lEEE Transactions on Parallel and
Distributed Systems, Vol. 5, No. 9, 1994.
VI. C ONCLUSION [8] H. Zhao and R. Sakellariou, “Scheduling Multiple DAGs onto Heteroge-
neous Systems,” IEEE Parallel and Distributed Processing Symposium,
The problem of task scheduling on heterogeneous systems Washington, pp. 159-163, 2006.
has been heavily studied by researchers in the literature. Yet, [9] M. Hakem and F. Butelle, “Reliability and Scheduling on Systems Sub-
ject to Failures,” IEEE International Conference on Parallel Processing
there is still room to achieve more improvements. In this (ICPP 2007), Xian, China, 2007.
paper, the Grasshopper Optimization Algorithm (GOA) is used [10] I. Belaid, F. Muller, and M. Benjemaa, “Schedulers-Driven Approach for
as the optimization technique to get the optimum scheduling Dynamic Placement/Scheduling of multiple DAGs onto SoPCs,” 2011.
[11] M. A. Rodriguez, and R. Buyya, “Deadline Based Resource Provisioning
scenario of tasks on the cloud. GOA is a simple algorithm and Scheduling Algorithm for Scientific Workflows on Clouds,” IEEE
with short execution time. In the experimental results, five Transactions on Cloud Computing, Vol. 2, No. 2, pp. 222–235, 2014.
popular workflows were scheduled and comparison with the [12] D. Xu, X. Liu, and Z. Niu, “Joint Resource Provisioning for Internet
Datacenters with Diverse and Dynamic Traffic,” IEEE Transactions on
most related work is given. A 10% reduction in the makespan Cloud Computing, Vol. 5, No. 1, pp. 71-84, 2017.
is achieved. For the future, we plan to extend our experiments [13] W. Song, F. Chen, H.-A. Jacobsen, X. Xia, C. Ye, and X. Ma,
to expand the results and encompass other aspects. “Scientific Workflow Mining in Clouds,” IEEE Transactions on Parallel
and Distributed Systems, Vol. 28, No. 10, pp. 2979–2992, 2017.
[14] K. Almiani, Y. C. Lee, and B. Mans, “Resource Demand Aware
R EFERENCES Scheduling for Workflows in Clouds,” IEEE 2017.
[15] S. Saremi, S. Mirjalili, and A. Lewis, “Grasshopper Optimisation
[1] H. Topcuoglu, S. Hariri, and M.-Y. Wu, “Performance-Effective and Algorithm: Theory and application,” Advances in Engineering Software,
Low-Complexity Task Scheduling for Heterogeneous Computing, IEEE Vol. 105, pp. 30–47, 2017.
Transactions on Parallel and Distributed Systems, Vol. 13, No. 3, pp. [16] Pegasus Workflows
260–274, 2002. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator.
[2] Y.-K. Kwok and I. Ahmad, “Dynamic Critical-Path Scheduling: An [17] The Cloud Computing and Distributed Systems
Effective Technique for Allocating Task Graphs to Multiprocessors,” http://www.cloudbus.org/
IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 5,
pp. 506-521, 1996.

13

You might also like