Professional Documents
Culture Documents
A Novel Deep Reinforcement Learning Scheme For Task Scheduling in Cloud Computing
A Novel Deep Reinforcement Learning Scheme For Task Scheduling in Cloud Computing
https://doi.org/10.1007/s10586-022-03630-2 (0123456789().,-volV)(0123456789().
,- volV)
Received: 6 October 2021 / Revised: 25 March 2022 / Accepted: 17 May 2022 / Published online: 29 June 2022
The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
Abstract
Recently, the demand of cloud computing systems has increased drastically due to their significant use in various real-time
online and offline applications. Moreover, it is widely being adopted from research, academia and industrial field as a main
solution for computation and storage platform. Due to increased workload and big-data, the cloud servers receive huge
amount of data storage and computation request which need to be processed through cloud modules by mapping the tasks
to available virtual machines. The cloud computing models consume huge amount of energy and resources to complete
these tasks. Thus, the energy aware and efficient task scheduling approach need to be developed to mitigate these issues.
Several techniques have been introduced for task scheduling, where most of the techniques are based on the heuristic
algorithms, where the scheduling problem is considered as NP-hard problem and obtain near optimal solution. But handling
the different size of tasks and achieving near optimal solution for varied number of VMs according to the task configuration
remains a challenging task. To overcome these issues, we present a machine learning based technique and adopted deep
reinforcement learning approach. In the proposed approach, we present a novel policy to maximize the reward for task
scheduling actions. An extensive comparative analysis is also presented, which shows that the proposed approach achieves
better performance, when compared with existing techniques in terms of makespan, throughput, resource utilization and
energy consumption.
Keywords Task scheduling Cloud computing Machine learning Deep reinforcement learning
123
4172 Cluster Computing (2022) 25:4171–4188
development tools etc., and Software as a service (SaaS) systems. During the last decade, the internet-based services
that includes virtual desktops, emails and communication has surged up where cloud computing based systems are
setups. Figure 1 illustrates the basic architecture of cloud considered as the main cause of increased energy con-
computing. sumption in information and communication technology
The cloud computing techniques are widely adopted by (ICT) sector. Recently, a study presented in [11] reported
various organizations and real-time applications due to that the energy consumption by cloud data center is
numerous applications and significant computing perfor- increasing by 15% annually and energy consumption cost
mance. Moreover, these techniques play a significant role make up about 42% of the total cost of operating of data
in information and communication technology (ICT) and centers. A study presented in [12] reported that around
academic research. Moreover, cloud computing techniques 6000 data centers are present in USA whose approximated
encompasses sundry functionalities and services such as energy consumption was 61 billion kilowatt hours (kWh).
advanced security, efficient distribution of large data in the In [13], authors discussed that energy consumption rate
geographical region, virtualization, resilient computing, between 2005 and 2010 was growing at the rate of 24%. It
web infrastructure, and various web-related applications. was also reported that data center in US consumed 70
The cloud computing technology offers several services billion kWh power in 2014 which was the 1.8% of the total
such as data processing, storing, computing resources, power consumption.
virtual desktops, virtual services, web-based deployment Thus, minimizing the energy consumption has become
platforms, web services, and databases. It uses the ‘‘pay-as- an important research topic to improve the green comput-
you-go’’ model offered by cloud service providers for ing systems. Several techniques have been presented to
specific applications. The cloud computing offers several minimize the energy consumption such as load balancing
benefits such as energy efficiency, cost-saving, flexibility, [14, 15], task scheduling [16, 17], virtual machine
faster implementation, and scalability [4–6]. scheduling [18], VM migration and allocation [19]. Task
In this field of computing applications, several tech- scheduling is considered as promising technique to mini-
niques are being presented based on technological mize the energy consumption which uses efficient resource
advancements such as distributed computing, parallel allocation according to the incoming tasks. Efficient
computing, grid computing, cluster computing, mobile resource allocation for incoming task is a challenging issue
computing, fog computing, and cloud computing. Due to because it has a significant impact on the entire system.
the increment in technology such as IoT, sensor networks, Generally, the task scheduling techniques distribute the
and mobile applications [7], a huge amount of data is tasks over available resources based on their goals such as
generated and stored on cloud servers. This increased data load balancing, increasing the resource utilization, reduc-
requires huge resources and consumes excessive energy ing the waiting time and system throughput. These tasks
while processing these data. Thus, cloud computing has can be independent where tasks can be scheduled in any
emerged as a promising solution to deal with these issues sequence or dependent where the upcoming tasks can only
and offers a virtualization process by creating virtual be scheduled after completing the ongoing dependent tasks.
computing machines at cloud data centers. These charac- The main aim of any novel scheduling algorithm is to
teristics of cloud computing (CC) considered as a big balance different types of conflicting parameters simulta-
breakthrough in green computing. However, minimizing neously, such as allocating better resources and maintain-
energy consumption and efficient resource allocation still ing the fair execution time. Several algorithms have been
remains a challenging task [8–10]. presented which are based on the meta-heuristic opti-
Currently, the green computing has become one of the mization schemes such as particle swarm optimization
hot research topics in high performance computing (PSO), genetic algorithm (GA), ant colony optimization,
and many more [20]. Similarly, machine learning based
schemes are adopted for this purpose such as Q-learning
with earliest finish time [21], neural network based
approach [22], combination of deep learning and rein-
forcement learning [23], hybrid particle swarm optimiza-
tion and fuzzy theory [24], and neural network with
evolutionary computation [25] etc. Most of the existing
schemes achieve desired performance for a certain limit of
tasks and also performs well with the homogenous tasks.
As the number of tasks increase, the convergence perfor-
mance is affected which leads to increase the execution
Fig. 1 Cloud architecture time. In this work, we introduce a reinforcement learning
123
Cluster Computing (2022) 25:4171–4188 4173
based approach which adapts the environment and updates The rest of the article is arranged in four sections, where
its working procedure based on the reward and punish Sect. 2 presents a brief description about existing tech-
mechanism. Moreover, the reinforcement learning learns niques of task scheduling, Sect. 3 presents proposed deep
from the complex environment and makes the decision. reinforcement learning based model for task scheduling,
This automated adaptive learning helps to minimize the Sect. 4 presents the comparative analysis based on the
complexity to process the task when compared with the obtained performance using proposed approach and finally,
traditional algorithms. Thus, the convergence and the Sect. 5 presents the concluding remarks about this
optimization function problem can be solved. Moreover, it approach.
focusses on improving the performance of the overall
system by considering CPU, RAM, Bandwidth and
resource parameters which improve the efficient resource 2 Literature survey
utilization.
This section presents the literature review study about
1.1 Motivation existing techniques of task scheduling and resource alloca-
tion in cloud computing. Currently, the energy consumption
Load balancing is a promising technique which helps to in cloud data centers has increased drastically. Thus, several
distribute the workload evenly among the available virtual researches have been presented to diminish the energy con-
machines and computing resources. Its main aim is to sumption. Most of the traditional techniques are not suit-
facilitate continuous support to the cloud systems by effi- able for the dynamic loads, thus iterative optimization
cient utilization. Moreover, load balancing is the promising schemes are also widely adopted. Pradeep et al. [1] reported
technique which minimizes response time for tasks and that the efficient task scheduling can be obtained by jointly
improves the resource utilization performance. Load bal- optimizing the cost, makespan and resource allocation.
ancing also strives to offer scalability and flexibility for Based on this, authors presented a hybrid scheduling
applications that may grow in size in the future and demand approach which is a combination of cuckoo search and
additional resources, as well as to prioritize operations that harmony search algorithm. However, the overall perfor-
require immediate execution above others. Load balanc- mance of these systems can be improved by optimizing the
ing’s other goals include lowering energy usage, mini- number of parameters, thus a multi-objective optimization
mizing bottlenecks, bandwidth utilization, and meeting function is presented which considers cost, energy, memory,
QoS criteria for better load balancing. Workload mapping credit and penalty as the function parameters.
and load balancing approaches that account various metrics Ebadifard et al. [2] discussed that dynamic task
are required. However, the current researches need to be scheduling is a nondeterministic polynomial time (NP)
improved by developing the fast and responsive scheduling hard problem which can be solved by optimization
process to accomplish the task requirement. Due to several schemes. Thus, authors developed a PSO based scheduling
challenges present in the existing techniques of task which considers the tasks as non-preemptive and inde-
scheduling in cloud computing, we focused on the devel- pendent. This approach, focus on the load balancing, where
opment of a novel scheme for energy efficient task it formulates the resource vector by considering CPU,
scheduling. The main aim of the proposed approach is to storage, RAM and bandwidth, whereas several traditional
allocate resources efficiently, on time task completion, task techniques only consider the CPU based resources. This
scheduling and minimizing the energy consumption. fitness function is designed based on makespan and average
utilization. Similarly, Chhabra et al. [6] also considered the
1.2 Work contribution task scheduling problem as NP-hard problem and sug-
gested that it can be tackled by meta-heuristic approaches.
In this work, first of all, we present a system model where Thus, authors adopted cuckoo search optimization scheme,
we describe the architecture of cloud computing, which but it is identified that the tradition cuckoo-search approach
includes user, cloud broker, VM allocator, VM provisioner, suffer from inaccurate local search, lack of diversity
VM scheduler and processing elements. In the next step, solution and slow convergence problem. These issues
we present the problem formulation for task scheduling and cause complexity in load balancing scenarios. To overcome
minimizing the energy consumption. We define the task these issues, authors developed a combined approach
model and components of task scheduling. In order to which uses opposition based learning, cuckoo-search, and
achieve the objective of scheduling and minimizing the differential evolution algorithm to optimize the energy and
energy consumption, we present a deep reinforcement makespan of incoming loads. The opposition based learn-
learning model. Finally, a comparative analysis is pre- ing (OBL) generates the optimal initial population. Then,
sented to describe the robustness of the proposed approach. an adaptive mechanism is used to switch between cuckoo-
123
4174 Cluster Computing (2022) 25:4171–4188
search (CS) and differential evolution (DE) approaches. In Machine learning techniques are categorized as super-
[24] Mansouri et al. presented a hybrid approach which vised and unsupervised learning. The supervised learning
uses a combination of modified particle swarm optimiza- schemes classify the data based on the pre-labelled dataset,
tion and fuzzy logic theory. In first phase, a new method is which is trained based on the historical features whereas
presented to update the velocity and roulette wheel selec- unsupervised schemes generate the groups of similar data.
tion is used to improve the global search capability. In the Based on this concept, Negi et al. [25] presented a hybrid
next phase, it applies crossover and mutation operators to approach which uses artificial neural network (ANN) as
deal with the local optima issue of PSO. Finally, a fuzzy supervised learning, clustering as unsupervised learning
logic based inference system is applied to compute the and type 2 fuzzy logic as soft computing technique for load
overall fitness of the system. In order to compute the fitness balancing. The complete approach is divided into two
function, authors used task length, CPU speed, RAM, and phases, where first of all it includes ANN based load bal-
execution time. The designed fitness function minimizes ancing which clusters the virtual machines into under
the executing time and resource wastage. Sulaiman et al. loaded and overloaded VMs with the help of Bayesian
[26] focused on developing the energy efficient and low optimization and improved K-means approach. In next
complexity scheduling approach. In order to achieve this phase, the multi-objective TOPSIS-PSO algorithm is pre-
objective, authors developed hybrid heuristic and genetic sented to perform the load balancing.
algorithm based approach. This approach improves the Ding et al. [30] authors developed a Q-learning based
quality of initial population generated by the genetic energy efficient task schedule for cloud computing module,
algorithm. this scheme is carried out into two states: in first phase, a
On the other hand, the machine learning based approa- centralized dispatcher is used for M/M/S queuing model
ches are also widely adopted in this field. Tong et al. [21] which helps to assign the incoming user requests to suit-
presented a combined approach which uses Q-learning and able server on cloud. In next phase, a Q-learning based
heterogeneous earliest finish time (HEFT) approach. The scheduler is applied to prioritize the tasks by continuously
Q-learning approach uses a reward mechanism which is updating the policy to assign tasks to virtual machines.
improved by incorporating the upward rank parameter Hoseiny et al. [31] introduced internet based distributed
from HEFT approach. Moreover, the self-learning process computing called as volunteer computing which is based
of Q-learning helps to update the Q-table efficiently. The on the resource sharing paradigm. The volunteers with
complete process is divided into two phases, where first of extra resources share the available extra resources to han-
all Q-learning is applied, which sorts the tasks to generate dle the large-scale tasks. However, the power consumption,
the optimal task order for processing and later earliest computation cost and latency are the well-known issues in
finish time strategy is applied to allocate the tasks to most distributed computing systems. To overcome these issues,
suitable virtual machine (VM). authors considered this as NP-hard task scheduling prob-
Sharma et al. [22] focused on energy efficient task lem and developed, two mechanisms namely Min-CCV
scheduling using machine learning techniques. In this (computation-communication-violation) and Min-V
work, authors presented a neural network based approach (Violation).
to achieve the reduced makespan, minimized energy con- Abualigah et al. [32] considered task scheduling as a
sumption, computation overhead and number of active critical challenge in cloud computing. To achieve the
racks, which can improve the overall performance of the minimum makespan and maximum resource utilization,
datacenters. This is a supervised learning scheme which authors developed an optimization scheme with elite based
considers the historical task data and predicts the most differential evolution approach called as MALO (Multi-
suitable available resources for numerous tasks. objective Antlion optimizer). This technique is able to
Currently, deep learning techniques have gained huge tackle the multi-objective task scheduling problem. The
attention from research community, industries and academic MALO simultaneously minimize the makespan and
field because of their significant nature of pattern learning. improves the resource utilization.
Based on this assumption, Rjoub et al. [23] adopted deep Hoseiny et al. [33] developed a novel scheduling algo-
learning scheme for cloud computing scenario and combined rithm in fog-cloud computing named as PGA (Priority
it with reinforcement learning scheme to perform the load Genetic Algorithm). The main aim of the PGA approach is
balancing. Specifically, this work presents four different to improve the overall performance of the system through
schemes which are reinforcement learning (RL), deep Q scheduling, by considering computation time, energy
networks, recurrent neural network long short-term memory requirement and task deadlines. In order to achieve this,
(RNN-LSTM), and deep reinforcement learning combined genetic algorithm based optimization scheme is presented
with LSTM (DRL-LSTM). to prioritize the tasks which can be assigned further for
processing according to VMs capacity.
123
Cluster Computing (2022) 25:4171–4188 4175
Pradeep Heuristic optimization for NP hard Reduced cost and execution A hybrid algorithm achieves near optimal
et al. [1] time solution and convergence is faster
Ebadifard PSO based load balancing approach Reduced make-span and Compatibility of input user requests is checked
et al. [2] response time and a new fitness function is defined
Chhabra Optimization based scheme Optimal population, better- Effective switching between cuckoo search and
et al. [6] searching efficiency, avoiding differential evolution
the trap of local optima
Tong et al. Q-learning with earliest time finish time Reduced cost and average This scheme sorts the tasks into two categories
[21] scheduling response time as processor allocation and optimal order of
processors
Sharma Back propagation neural network training on Performance improved in terms ANN based rack and ANN based task scheduler
et al. the data generated by genetic algorithm for of throughput, energy model is presented for selection
[22] scheduling consumption, overhead for
static data
Rjoub A combination of deep learning, LSTM, and Optimizing the RAM usage and Efficient prediction of VMs for allocation of
et al. Q learning is presented CPU utilization cost upcoming task. Further, true metrics for
[23] reward function is expected to incorporate for
selection of best VMs based on their behavior
Mansouri Hybrid scheduling with the combination of Execution time, efficiency, Fuzzy rules are designed based on task length,
et al. Fuzzy Logic and PSO better initial population leads CPU speed, bandwidth, and fitness
[24] to minimize the convergence
issue
Negi et al. Bayesian optimization-based Selection of optimal number of Type-2 fuzzy logic clustering along with
[25] enhanced K-means with ANN is used to VMs helps to improve the K-means is presented to achieve the solution
obtain the scheduling via clustering execution time, transmission for multi-objective problem
time,
and CPU utilization
Sulaiman Genetic algorithm based hybrid heuristic Running time, makespan Improved evolutionary scheme which uses two
et al. algorithm is presented, a heterogeneous performance improved chromosomes in initial population to increase
[26] scheduling prioritization model (HSIP) is however, dynamic scheduling the convergence and speed
used for tasks is not considered
This section described several related works to improve (VM) plays an important role. This can be obtained by a
the energy performance of cloud computing systems with decision making strategy. Moreover, the tasks are hetero-
the help of energy aware task scheduling, resource allo- geneous and loaded randomly to the servers, thus a self-
cation, VMs scheduling and many more. Table 1 shows the learning mechanism can be a promising solution for these
comparative analysis of these techniques. The complete scenarios. Based on these assumptions, we present an
study focused on the optimization and machine learning energy aware task scheduling approach based on rein-
process to perform the task scheduling. The optimization forcement learning.
techniques suffer from the convergence issues and machine
learning based methods suffer from the appropriate train- 3.1 System model
ing, data labeling and clustering errors etc.
We consider a cloud computing based task scheduling
framework as depicted in below given Fig. 2. As men-
3 Proposed model tioned before, the cloud data centers receive computa-
tionally intensive tasks which cannot be performed at local
This section presents the proposed solution for energy devices thus, the cloud centers create virtual machines with
aware task scheduling based on self-learning approach for different configuration and tasks are allocated to these
heterogeneous cloud computing environment. In these machines. In this work, we mainly focus on the task
scenarios, the cloud data centers receive numerous tasks scheduling and its resources. We assume that the cloud
where identifying the correct execution order and then datacenter has several virtual machines and a task sched-
allocating the task to the best suitable virtual machine uler which has complete information about incoming tasks
123
4176 Cluster Computing (2022) 25:4171–4188
and capacity details of other virtual machines which considers that the total maximum MIPS of all VMs allo-
includes task size, computation speed in million instruc- cated to host should not exceed the maximum power of the
tions per second (MIPS), expected time to complete the host. Further, these hosts use VM provisioner to map the
tasks and waiting time for the task in the queue. The PEs of a host selected by VM allocator. Finally, the VM
scheduler observes this information and generates the schedulers facilitate the processing power of PEs to each
scheduling decision which includes where and when to VM based on their requirement to complete the task.
schedule the task based on their start time, end time and The main aim of our work is to increase the task com-
bandwidth requirement. Below given Fig. 2 depicts the pletion rate in the given time duration without wasting the
working process of a cloud computing system. For sim- resources. Thus, for a known resource provisioning d and a
plicity, we consider that real-time tasks are characterized scheduling plan b, the first objective function is defined as
by start time si , deadline di and task length li . maximizing [34] the successful task completion rate
In the first phase, the cloud user submits real-time tasks (STCR) given as:
at time si to a cloud computing (CC) environment. After
STCR ¼ max tFi ðd; bÞ ð1Þ
submission of real-time tasks, the user has an assurance T i 2T
that the outcome of these tasks will be received by the where tFi denotes the task completion time of task i which
deadline di . Similarly, the cloud computing (CC) envi- is allocated to VM j .
ronment is further divided into multiple VMs which are
Further, we consider the power consumption issue by
characterized by their processing capacities. The VM unit
PEs to complete the task. Let us consider that static and
contains a processing element (PE) which require power to
dynamic power consumption of host is given as:
accomplish the assigned tasks. Further, VM allocator
search for the host where the VM can find its required P ¼ Pstatic þ Pdynamic ð2Þ
resources to complete the task. The VM allocator also
123
Cluster Computing (2022) 25:4171–4188 4177
The static power denotes the power consumed by the as million instructions (MI), task start time, finish time and
host when it is in idle mode and it remains unchanged when bandwidth. The task i which belong to K category can be
the host is turned on. Thus, the static power is denoted as: represented as:
Pstatic ¼ aPmax ð3Þ ji;k ¼ hai :zi ; li ; di i ð9Þ
where a denotes the ratio of static power and maximum where ai denotes the task arriving time, li is the size of task,
power of host and Pmax is the maximum power consumed di is the expected delay of task and zi is the bandwidth.
by the host. Similarly, the dynamic power consumption can The cloud computing servers are configured with several
be expressed as [35]: heterogeneous virtual machines as V ¼ fv1 ; v2 ; . . .; vM g.
Task scheduler plays important role by deciding the task
Pdynamic ðuÞ ¼ ðPmax Pstatic Þu2 ð4Þ
scheduling order and selection of most suitable VM.
where u denotes the utilization of a host at a specific time. Generally, these types of mechanism follow a queuing
The computing power of a host at a given time is related process, where before scheduling the task, it is placed in
to its efficiency. Thus, the energy required by host to the waiting slot which is later vacated after successful
deliver maximum efficiency is denoted as: scheduling of task and another task is placed in the empty
Z tmax waiting queue. The response time of tasks after scheduling
Emax ¼ Pmax dx ¼ Pmax tmax ð5Þ is obtained by including the waiting time and execution
0
time. Thus, the expected time of scheduling task i in jth VM
where tmax is the time, where the host operates at maximum can be computed as:
power to complete the certain types of instructions. Let us li
consider the power used by the system is PðuÞ, then the ei;j ¼ ð10Þ
vj
energy consumption can be expressed as:
Z tmax Let us consider that task start time si;j and finish time f i;j
u
E¼ PðuÞdt ð6Þ of task i on jth VM. The starting time of other incoming
0 tasks depends on the finish time of previously scheduled
Thus, the overall energy consumption of host to com- task. Thus, the starting time for incoming tasks can be
plete the tasks can be expressed as: denoted as si;j ¼ max f i;j ; ai and task finish time can be
Pmax tmax expressed as f i;j ¼ si;j þ ei;j , where ei;j represents the time
E ¼ a þ ð1 aÞu2 ð7Þ
u required by VM to process the task. Similarly, the response
time for task i can be computed with the help of waiting
Thus, minimizing the power consumption becomes our
time and execution time. The response time can be
second objective. The overall objectives of this work are:
( expressed as:
X J XN X M
Min E Tnj ð jÞ ti;j ¼ wi;j þ ei;j ð11Þ
j¼1 n¼1 m¼1
where wi;j , denotes the waiting time. If the task which is in
J X
X N ð8Þ queue is processed immediately then there is no waiting
s:t: T j ðmÞ mslot
n
j¼1 n¼1 period, otherwise the gap between starting time and
arriving time can be given as wi;j ¼ si;j ai . Based on these
s:t:Tw Tidle ¼ 0
parameters, the successful task completion rate (STCR) can
According to this equation, the energy consumption be computed as:
should be minimized, moreover, the waiting time T w and i
gi;j ¼ ð12Þ
idle slots T idle also should be avoided which improves the ti;j
efficiency of the overall resource utilization.
where denotes the expected latency value and g is the
successful task completion rate. To accomplish this, we
3.2 Task and scheduling model
present reinforcement learning based approach which helps
to improve the performance of task scheduling mechanism.
In this section, we present the heterogeneous incoming task
model and their scheduling process. In this model we
3.3 Proposed scheduling model using deep
assume that heterogeneous and computationally intensive
reinforcement learning
tasks are arriving at the cloud server which are classified
into K categories as J ¼ fj1 ; j2 ; ::; jK g. The tasks which
Generally, the main concept of reinforcement learning is to
belong to same category have the same characteristics such
design an agent based system, which can adapt the change
123
4178 Cluster Computing (2022) 25:4171–4188
in the environment and behave in that environment scheduler policy as pðjsÞ, which is used for mapping the
accordingly. In this work, we apply Q-learning based states to actions. Table 2 shows the notations used in this
approach to optimize the scheduling scheme based on the work.
decisions and evaluating the feedback from cloud envi- The reward of the action at in state st is denoted by r t .
ronment. Let us consider that total T tasks as T ¼ The main aim of this scheduler is to obtain the optimal
ft1 ; t2 ; t3 ; . . .; tn are submitted to cloud sever which con- scheduling policy to minimize the cost for all available
tains total V virtual machines as V ¼ fv1 ; v2 ; . . .; vm g. This VMs and tasks. The states, actions and rewards of the
learning scheme works in three phases by computing state reinforcement learning can be considered as follows:
space, action space and rewards. Let us consider that st
• State space: In our model, at each time t the state space
denotes the task scheduler state in state space S at time t,
st represents the present scheduling of the tasks on the
and at denotes the action in the action space A at time t
VMs, where each VM is characterized based on its
with probability Pðs1 js; Þ ¼ P½stþ1 ¼ s1 jst ¼ s; t ¼ , where
P available resources such as CPU, RAM, bandwidth and
s1 2S Pðs1 js; Þ ¼ 1. In this model, we consider a cloud storage space.
• Action space: In current state space, the action state at
Table 2 Main notations used in this work
denotes the scheduling action at time t for all the tasks
T Tasks scheduled on the VMs. Specifically, for each task, this
action is denoted as 0 or 1, which is a decision process,
V Virtual machines
represents whether the task is scheduled to VM or not.
tF Task completion time
• Reward: The reward function denotes the efficiency of
Pstatic Static power consumption
the task scheduling process. Let us assume that, task ti
Pdynamic Dynamic power consumption
is assigned to the VM vj then, we define the execution
Pmax Maximum power consumption
cost as fi;j which is considered as an immediate reward
a Ratio of static and maximum power
for this action. Similarly, the overall reward for all the
tmax Maximum required duration to complete the task
scheduling actions can be obtained by adding the
PðuÞ Power consumed by a system at an instance
execution cost for each task. The reward for individual
u Utilization of host
task is defined in terms of CPU, bandwidth, RAM and
j Cloud servers
storage as follows:
K Total categories
ai Task arriving time fi;j ¼ wi;j þ wi;j Pj ð13Þ
l Size of task
d Delay where wi;j denotes the waiting time for task ti to be
z Bandwidth assigned to its corresponding VM, Pj denotes the price for
s Start time virtual machine, and wi;j is the execution time of task on
f Finish time assigned VM. In this approach, we focus on optimizing the
e Expected time to accomplish the task scheduling model with the help of Q-learning which
w Waiting time evaluate the feedback from cloud environment. This
S State space feedback model improves the decision making process.
A Action space Once all the rewards are collected, mean Q value of an
P Probability of switching action a in state s with policy p can be denoted as Qp ðs; Þ
p Policy
and the optimal function can be expressed as:
P Price for virtual machine Q ðs; aÞ ¼ minp Qp ðs; Þ ð14Þ
w Execution time
This expression can be re-written in the form of transi-
R Resource matrix
tion probability from one state to another state, which is
W Weight matrix
given as:
Q Queued tasks
X
w Size of waiting slot
Q ðs; aÞ ¼ !ðs1 js; aÞ r þ cminQ ðs1 ; a1 Þ ð15Þ
a1
A Action corresponding to the current state s1
A/ Invalid action
where ! represents the transition probability of moving
Aw Valid action set
from current state s to next state s1 based on the action and
m Virtual machine
c represents discount parameter of rewards. Here, we
assume the requirement of resources for the execution of a
123
Cluster Computing (2022) 25:4171–4188 4179
task is identified after the arrival of task and before this scenario, the action for vm to schedule the task jn can be
assigning the task to any VM, meanwhile the following presented as:
resource conditions need to be satisfied such as: m 2 f1; 1; 2; . . .; M g
A ¼ fAe jAe ¼ ðvm ; jn Þj ð21Þ
KiCPU CPUjt n 2 f1; 1; 2; . . .; Og
KiRAM RAMjt where 1 denotes that the invalid task is selected.
ð16Þ Finally, the reward mechanism is presented, which
KiBW BWjt
evaluates the scheduling process, whether it is achieving
KiStorage Storagetj the goal of optimal policy p ¼ pðajsÞ. For a valid action,
the reward is assigned by taking the ratio of response time,
where K CPU
i , K RAM
i , K BW
i and K Storage
i represents CPU, energy and latency requirement, whereas for invalid action,
memory, bandwidth and storage required for the task the reward is assigned as zero. This reward function can be
respectively and CPU tj , RAM tj , BW tj , Storagetj represents expressed as:
current resource availability status of the VM. Now, the
wj ; a 2 A/
scheduler considers the policy and evaluates the Qp ðs; aÞ r¼ ð22Þ
0; a 2 Aw
for the current policy. Thus, the policy updating process
can be expressed as: where Aw is the set of invalid action and Aw is the set of
p1 ¼ argminQp ðs; aÞ ð17Þ tasks which are selected under valid action. Similarly, this
a policy is updated for each incoming task until the all tasks
Further this process focuses on obtaining the optimal are allocated efficiently.
policy p which can decrease the reward in each state s as: The novelty of the proposed approach is laid down in
reinforcement learning paradigm. Unlike conventional
min E½Qp ðs; aÞ ð18Þ
p approaches, we characterize the optimal scheduling by
In order to obtain the optimal task scheduling, we incorporating multiple parameters during reward mecha-
redesign the state space, action space and state transition. nism such as CPU, RAM, Bandwidth resource and storage
At this stage, we consider cloud server as the environment resource. This leads to commencement of a robust rein-
and task scheduler is considered as the agent which inter- forcement learning model. Furthermore, this configuration
acts with the environment. of RL scheme is able to handle the variations in job sizes
Let us consider that the system is in current state s 2 S by efficiently utilizing its resources. However, deep
which provides the information as resource matrix, weight learning based approaches are considered as computation-
matrix and queued tasks. Thus, the state of system can be ally complex algorithms. In order to minimize the com-
expressed as: plexity, we adopt reinforcement learning which doesn’t
require multiple trainings as dataset size increase. In this
S ¼ fSjS ¼ ðR; W; jQjÞg ð19Þ work, we have trained the model for small size of the
where R denotes the resource matrix, W denotes the dataset and the same model is amalgamated during further
weight matrix, and Q is the queued tasks. The resource training process, whereas traditional machine learning
matrix shows the state of various VMs including their algorithms require multiple trainings which affects the
processing capacity and availability for the next task. The performance of the system. In the worst case scenario
upcoming tasks are stored in two parts, first is waiting slot (where the task is not finished in a given time), the system
and queue. The tasks which are present in the waiting slot complexity is reported as OðN 0 2 Þ where N0 is the total
have higher priority for scheduling. Moreover, these tasks number of offloaded tasks.
only can be scheduled in each time stamp. For this sce-
nario, the waiting slot can be represented as:
4 Results and discussion
W ¼ ½w1 ; w2 ; . . .; wO ð20Þ
where wi denotes the size of waiting slot where O denotes This section presents the experimental analysis of proposed
the total tasks which can be present in the waiting slot. energy efficient task scheduling. We compare the obtained
In the next phase, we focus on the action space, where performance with various existing schemes. The proposed
task scheduling process performs two actions which approach is implemented by using MATLAB and Python 3
includes determining the task execution order and finding running with Pytorch installed on windows platform. The
the optimal virtual machine to allocate the task. At this system is equipped with 16 GB RAM, and 6 GB of
phase, processing two types of action require large action NVDIA 2060 RTX graphical memory card unit. In the
space which causes complexity in learning mechanism. For experiments, we have considered the discount factor as
123
4180 Cluster Computing (2022) 25:4171–4188
Table 3 Job description contains 20%, 40%, 30%, 4% and 6% portion of the total
Task Group Number of tasks
jobs.
Regular 100 In this experiment, total number of jobs in small number
200 of jobs are 15,000–55,000, medium number of jobs are
300 59,000–99,000, large number of jobs are 101,000–135,000,
400 extra-large number of jobs are 150,000–337,500, and huge
500 number of jobs are 525,000–900,000.
600 In this work, we divide the dataset into two groups as
Big 700 regular size dataset, and big size dataset. For each group,
800 we measured the performance in terms of makespan,
900 throughput and resource for fixed and varied number of
1000 VMs.
1100
4.2 Experimental analysis
123
Cluster Computing (2022) 25:4171–4188 4181
Table 6 Resource performance for regular size dataset and fixed VMs
Jobs PSO [28] MVO [28] EMVO [28] Proposed
123
4182 Cluster Computing (2022) 25:4171–4188
123
Cluster Computing (2022) 25:4171–4188 4183
Fig. 8 Average throughput performance for varied virtual machines 4.2.2 Performance analysis for big-size dataset
Table 10 Resource utilization performance for regular size dataset In this subsection, we present the experimental analysis for
and varied VMs big size dataset and measured the performance for fixed
Jobs PSO [28] MVO [28] EMVO [28] Proposed and varied number of virtual machines. Similar to previous
experiment, we measure the performance in terms of
100 90 90 90 93.58 resource utilization, makespan and throughput.
200 95 95 95 96.10
300 94.67 96.66 96.66 97.20
• Performance analysis for fixed VMs: In this exper-
400 96.50 97.50 97.50 98.11
iment, we have considered the task group which
500 97.60 98 98 98.5
contains 700 to 1000 number of tasks. These tasks are
tackled by the fixed number of virtual machines which
600 97.67 98.33 98.33 98.90
are considered as 100.
123
4184 Cluster Computing (2022) 25:4171–4188
Fig. 10 Makespan performance for varied number of jobs for fixed Fig. 11 Throughput performance for varied number of jobs for fixed
VM (100 VMs) VM (100 VMs)
Table 12 Throughput performance for big size data and fixed VMs Table 13 Resource utilization performance for big size data and fixed
VMs
Jobs PSO [28] MVO [28] EMVO [28] Proposed
Jobs PSO [28] MVO [28] EMVO [28] Proposed
700 72 77 85 92
800 69 77 86 94.5 700 98.4 98.6 99 99
900 66 78 93 95.85 800 98.6 99 99 99.2
1000 70 78 86 95.8 900 98 99 98.80 99.1
1000 98.6 99 99 99
123
Cluster Computing (2022) 25:4171–4188 4185
Table 15 Makespan performance for big size task and varied VMs
Tasks PSO [28] MVO [28] EMVO [28] Proposed
123
4186 Cluster Computing (2022) 25:4171–4188
In this experiment, we obtained the average MCT, SJFP, LJFP, Min–Min, Max–Min, PSO, SJFP-
energy consumption as 4.48 J, 6.48 J, 5.7 J, 5.22 J, PSO, LJFP-PSO, MCT-PSO, and Proposed
4.58 J, 5.14 J, 5.34 J, 5.24 J, 4.34 J and 3 J by using approach, respectively. Similarly, we measure the
performance for varied number of virtual machines
as depicted in Fig. 17.
The average energy consumption is obtained as
3.64 J, 5.48 J, 4.81 J, 4.44 J, 3.88 J, 3.56 J, 5.06 J,
4.46 J, 3.36 J, and 2.14 J by using MCT, SJFP,
LJFP, Min–Min, Max–Min, PSO, SJFP-PSO, LJFP-
PSO, MCT-PSO, and Proposed approach for 40, 80,
120, 160, and 200 number of virtual machines,
respectively. The experimental results show that, as
the number of tasks, increasing along with the VMs,
then the proposed approach balances load efficiently
and utilizes resources efficiently. Moreover, this
efficient utilization of resources and scheduling helps
Fig. 15 Resource utilization for big-size task and varied VMs to minimize the energy consumption.
123
Cluster Computing (2022) 25:4171–4188 4187
In these experiments, we obtained improved perfor- algorithms for task scheduling in IaaS cloud computing envi-
mance of makespan, throughput, resource utilization and ronment. PLoS ONE 12(5), e0176321 (2017)
5. Shafiq, D.A., Jhanjhi, N.Z., Abdullah, A., Alzain, M.A.: A load
energy consumption. The initial two parameters i.e., balancing algorithm for the data centres to optimize cloud com-
makespan and throughput satisfies the faster computing puting applications. IEEE Access 9, 41731–41744 (2021)
scenario which leads to complete the task in given dead- 6. Chhabra, A., Singh, G., Kahlon, K.S.: Multi-criteria HPC task
line. The efficient task completion helps to minimize the scheduling on IaaS cloud infrastructures using meta-heuristics.
Clust. Comput. 24(2), 885–918 (2021)
energy consumption. Similarly, the better resource uti- 7. Gani, A., Nayeem, G.M., Shiraz, M., Sookhak, M., Whaiduzza-
lization reduces the resource wastages which lead towards man, M., Khan, S.: A review on interworking and mobility
the improving Quality of Service. Thus, the proposed techniques for seamless connectivity in mobile cloud computing.
approach is capable to reduce the energy consumption and J. Netw. Comput. Appl. 43, 84–102 (2014)
8. Ab-Rahman, N.H., Choo, K.K.R.: A survey of information
improves overall performance of the cloud computing security incident handling in the cloud. Comput. Secur. 49, 45–69
model. (2015)
9. Khan, S., Ahmad, E., Shiraz, M., Gani, A., Wahab, A.W.A.,
Bagiwa, M.A.: Forensic challenges in mobile cloud computing.
Computer, Communications, and Control Technology (I4CT),
5 Conclusion 2014 International Conference on; 2014: IEEE.
10. Iqbal, S., Kiah, M.L.M., Dhaghighi, B., Hussain, M., Khan, S.,
In this work, we have focused on the energy consumption Khan, M.K., et al.: On cloud security attacks: a taxonomy and
and task scheduling related issues of cloud computing. intrusion detection and prevention as a service. J. Netw. Comput.
Appl. 74, 98–120 (2016)
Task scheduling plays an important role in this field of 11. Han, S., Min, S., Lee, H.: Energy efficient VM scheduling for big
cloud computing, which helps to obtain the efficient task data processing in cloud computing environments. J. Amb. Intell.
allocation and minimizes the energy consumption. Several Hum. Comput. 14, 1–10 (2019)
techniques have been presented in this field such as soft 12. Kurp, P.: Green computing. Commun. ACM 51(10), 11–13
(2008)
computing, optimization and machine learning, but the 13. https://www.computerworld.com/article/3089073/cloud-computing-
existing techniques suffer from various challenges. Thus, to slows-energy-demand-us-says.html
overcome these issues, we present a novel scheme of task 14. Zhang, J., Yu, F.R., Wang, S., Huang, T., Liu, Z., Liu, Y.: Load
scheduling which adopts the reinforcement learning and a balancing in data center networks: a survey. IEEE Commun.
Surv. Tutor. 20(3), 2324–2352 (2018)
new solution is presented for reward mechanism and 15. Afzal, S., Kavitha, G.: Load balancing in cloud computing: a
designing the policy. The experimental study shows that hierarchical taxonomical classification. J. Cloud Comput. 8(1),
proposed approach achieves better performance when 1–24 (2019)
compared with existing task scheduling algorithms. 16. Arunarani, A.R., Manjula, D., Sugumaran, V.: Task scheduling
techniques in cloud computing: a literature survey. Futur. Gener.
Comput. Syst. 91, 407–415 (2019)
17. Alworafi, M. A., Dhari, A., El-Booz, S. A., Nasr, A. A., Arpitha,
Funding None. A., & Mallappa, S.: An enhanced task scheduling in cloud
computing based on hybrid approach. In: Data Analytics and
Data Availability The data that support the findings of this study are Learning (pp. 11–25). Springer, Singapore (2019)
available from the corresponding author, upon reasonable request. 18. Liu, L., & Qiu, Z.: A survey on virtual machine scheduling in
cloud computing. In 2016 2nd IEEE International Conference on
Computer and Communications (ICCC) (pp. 2717–2721). IEEE.
Declarations (2016)
19. Zakarya, M.: An extended energy-aware cost recovery approach
Conflict of interest None. for virtual machine migration. IEEE Syst. J. 13(2), 1466–1477
(2018)
20. Alkayal, E. S., Jennings, N. R., & Abulkhair, M. F.: Survey of
task scheduling in cloud computing based on particle swarm
References optimization. In 2017 International Conference on Electrical and
Computing Technologies and Applications (ICECTA) (pp. 1–6).
1. Pradeep, K., Jacob, T.P.: A hybrid approach for task scheduling IEEE. (2017)
using the cuckoo and harmony search in cloud computing envi- 21. Tong, Z., Deng, X., Chen, H., Mei, J., Liu, H.: QL-HEFT: a novel
ronment. Wirel Pers. Commun. 101(4), 2287–2311 (2018) machine learning scheduling scheme base on cloud computing
2. Ebadifard, F., Babamir, S.M.: A PSO-based task scheduling environment. Neural Comput. Appl. 15, 1–18 (2019)
algorithm improved using a load-balancing technique for the 22. Sharma, M., Garg, R.: An artificial neural network based
cloud computing environment. Concurr. Comput. 30(12), e4368 approach for energy efficient task scheduling in cloud data cen-
(2018) ters. Sustain. Comput. 26, 100373 (2020)
3. Singh, P., Dutta, M., Aggarwal, N.: A review of task scheduling 23. Rjoub, G., Bentahar, J., Abdel Wahab, O., Saleh Bataineh, A.:
based on meta-heuristics approach in cloud computing. Knowl. Deep and reinforcement learning for automated task scheduling
Inf. Syst. 52(1), 1–51 (2017) in large-scale cloud computing systems. Concurr. Comput. 15,
4. Madni, S.H.H., Abd Latiff, M.S., Abdullahi, M., Abdulhamid, 5919 (2020)
S.I.M., Usman, M.J.: Performance comparison of heuristic
123
4188 Cluster Computing (2022) 25:4171–4188
24. Mansouri, N., Zade, B.M.H., Javidi, M.M.: Hybrid task Engineering, Dr. Ambedkar Institute of Technology, Bengaluru,
scheduling strategy for cloud computing by modified particle India. His research interests include Power optimization in Comput-
swarm optimization and fuzzy theory. Comput. Ind. Eng. 130, ing systems, Embedded systems design, Real time operating systems,
597–633 (2019) and Communication systems.
25. Negi, S., Rauthan, M.M.S., Vaisla, K.S., Panwar, N.: CMODLB:
an efficient load balancing approach in cloud computing envi- G. V. Jayaramaiah was born in
ronment. J Supercomput. 12, 1–53 (2021) Tumkur, Karnataka, India. He
26. Sulaiman, M., Halim, Z., Lebbah, M., Waqas, M., Tu, S.: An received his Ph.D. degree from
evolutionary computing-based efficient hybrid task scheduling Indian Institute of Technology,
approach for heterogeneous computing environment. J. Grid Bombay, India in 2008. Since
Comput. 19(1), 1–31 (2021) then, in his vast academic
27. https://data.mendeley.com/datasets/b7bp6xhrcd/1 experience, he is contributing to
28. Shukri, S.E., Al-Sayyed, R., Hudaib, A., Mirjalili, S.: Enhanced technical education in most of
multi-verse optimizer for task scheduling in cloud computing the capacities. He is having a
environments. Expert Syst. Appl. 15, 114230 (2020). https://doi. teaching experience of 30
org/10.1016/j.eswa.2020.114230 Years. He is currently serving as
29. Alsaidy, S. A., Abbood, A. D., & Sahib, M. A.: Heuristic ini- Professor and Head in the
tialization of PSO task scheduling algorithm in cloud computing. Department of Electrical and
J. King Saud Univ. Comput. Inform. Sci. (2020) Electronics engineering, Dr.
30. Ding, D., Fan, X., Zhao, Y., Kang, K., Yin, Q., Zeng, J.: Ambedkar Institute of Technol-
Q-learning based dynamic task scheduling for energy-efficient ogy, Bengaluru, Karnataka, India. His research interests include
cloud computing. Futur. Gener. Comput. Syst. 108, 361–371 Power electronics, Renewable energy and Embedded systems. He has
(2020) published many research papers on his area of interest. He is a life-
31. Hoseiny, F., Azizi, S., Shojafar, M., Tafazolli, R.: Joint QoS- time member of FIE (I), MISTE, and IEEE.
aware and cost-efficient task scheduling for fog-cloud resources
in a volunteer computing system. ACM Trans. Internet Technol. Chandrapal Singh received his
(TOIT) 21(4), 1–21 (2021) M.Tech. degree in Electronics
32. Abualigah, L., Diabat, A.: A novel hybrid antlion optimization and Communication Engineer-
algorithm for multi-objective task scheduling problems in cloud ing from SVIT (VTU), Banga-
computing environments. Clust. Comput. 24(1), 205–223 (2021) lore, Karnataka, India in 2014.
33. Hoseiny, F., Azizi, S., Shojafar, M., Ahmadiazar, F., & Tafazolli, He has published over twelve
R.: PGA: a priority-aware genetic algorithm for task scheduling research papers in journals,
in heterogeneous fog-cloud computing. In IEEE INFOCOM international and national con-
2021-IEEE Conference on Computer Communications Work- ferences. His research interests
shops (INFOCOM WKSHPS) (pp. 1–6). IEEE. (2021) include almost all aspects of
34. Calzarossa, M.C., Della Vedova, M.L., Massari, L., Nebbione, networks such as wireless com-
G., Tessera, D.: Multi-objective optimization of deadline and munications, MIMO systems,
budget-aware workflow scheduling in uncertain clouds. IEEE underwater networks and Cloud
Access 9, 89891–89905 (2021) computing. He is currently
35. Beloglazov, A., Buyya, R., Lee, Y. C., & Zomaya, A.: A tax- working as Research Analyst in
onomy and survey of energy-efficient data centers and cloud R&D Section of Xsys Softech, Bengaluru, India.
computing systems. In Advances in computers (Vol. 82,
pp. 47–111). Elsevier (2011).
123