Professional Documents
Culture Documents
Abstract—Industry 4.0 have automated the entire manu- with the availability of green energy on different cloud data
facturing sector (including technologies and processes) by centers, where the energy consumption consists of the us-
adopting Internet of Things and cloud computing. To han- age of running application over multiple data centers and
dle the workflows from Industrial Cyber-Physical systems, transferring the data among them through SDWAN. The
more and more data centers have been built across the evaluation shows that our proposed method can increase
globe to serve the growing needs of computing and stor- usage of green energy for the execution of industrial work-
age. This has led to an enormous increase in energy usage flow up to 3× times with a slight increase in the cost when
by cloud data centers, which is not only a financial burden compared to cost-based workflow scheduling methods.
but also increases their carbon footprint. The private soft-
ware defined wide area network (SDWAN) connects a cloud Index Terms—Big data, green energy, industrial
provider’s data centers across the planet. This gives the clouds, industrial workflow applications, software defined
opportunity to develop new scheduling strategies to man- networking.
age cloud providers workload in a more energy-efficient
manner. In this context, this article addresses the problem
of scheduling data-driven industrial workflow applications
over a set of private SDWAN connected data centers in an I. INTRODUCTION
energy-efficient manner while managing tradeoff of a cloud
HE INDUSTRY 4.0 revolution helps to gather data in
provider’ revenue. Our proposed algorithm aims to mini-
mize the cloud provider’s revenue and the usage of nonre-
newable energy by utilizing the real-world electricity prices
T real-time and then analyze it at remote clouds to improve
the industrial processes and detect faulty operations. This helps
to realize the future operating problems in advance and make
Manuscript received September 2, 2020; revised December 15, 2020; the industrial processes more stringent. This transition triggers
accepted December 15, 2020. Date of publication December 18, 2020;
date of current version May 3, 2021. This work was supported in part high quality productivity that further improves the industrial
by the PACE Project EP/T021985/1, in part by the SUPER Project productivity, economics, and workflows. However, this makes
EP/R033293/1, in part by the National Natural Science Foundation of more data-driven undertakings, which increases the data sharing
China under Grant 62072408, and in part by the Zhejiang Provincial
Natural Science Foundation of China under Grant LY20F020030. Paper across multiples sites and even across industrial (factory) bound-
no. TII-20-4197. (Corresponding author: Gagangeet Singh Aujla.) aries. As a result, the dependence on cloud computing will in-
Zhenyu Wen and Deepak Puthal are with the School of Computing, crease though the deployment of machine data and functionality
Newcastle University, NE1 7RU Newcastle upon Tyne, U.K. (e-mail:
wenluke427@gmail.com; dputhal88@gmail.com). over remote clouds. Eventually, this leads to an increase in the in-
Saurabh Garg is with the University of Tasmania, Hobart TAS 146, dustrial workflow applications running at the cloud data centers.
Australia (e-mail: saurabh.kr.garg@gmail.com). However, more and more industrial applications are harnessing
Gagangeet Singh Aujla is with the Department of Computer
Science, Durham University, DH1 3LE Durham, U.K. (e-mail: cloud resources. To satisfy the needs of their customers and
gagi_aujla82@yahoo.com). industrial workflow applications, cloud computing providers in
Khaled Alwasel is with the School of Computing Science, Newcastle general maintain very large infrastructure.
University, NE1 7RU Newcastle upon Tyne, U.K., and also with the Col-
lege of Computing and Informatics, Saudi Electronic University, Riyadh It is quite evident that the amount of energy consumed by the
11673, Saudi Arabia (e-mail: kalwasel@gmail.com). world’s data centers—the repositories of billions of gigabytes of
Schahram Dustdar is with the Computer Science (Informatics), TU information—will exponentially increase over the next decade,
Wien, 1040 Vienna, Austria (e-mail: dustdar@dsg.tuwien.ac.at).
Albert Y. Zomaya is with the School of Computer Science, putting an enormous strain on energy sources. In 2010, elec-
The University of Sydney, Sydney, NSW 2006, Australia (e-mail: tricity usage in global data centers accounted for about 1.3% of
albert.zomaya@sydney.edu.au). total electricity usage worldwide [1]. Data centers now consume
Rajiv Ranjan is with the School of Computing Science, Newcastle
University, NE1 7RU Newcastle upon Tyne, U.K., and also with the about three percent of the global electricity supply. Clearly,
Chinese University of Geosciences, Wuhan 430074, China (e-mail: to operate such large infrastructure, a very large amount of
rranjans@gmail.com). electricity is required depending on the size of the data cen-
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2020.3045690. ters. This also contributes significantly to their high operational
Digital Object Identifier 10.1109/TII.2020.3045690 cost. According to a report published by the European Union,
1551-3203 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
5646 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 8, AUGUST 2021
a decrease in emission volume of 15–30% is required before storage, computation, and network [6]–[8]. However, most of
2020 to keep the global temperature increase below 2 ◦ C. Thus, these solutions are not directly applicable in the context of
high energy consumption and the carbon footprint of cloud data scientific or industrial workflow applications, which is the focus
center infrastructures have become key environmental concerns of this article. For each application workload and execution
and have immense potential to deal a hefty blow to efforts to profile, a different strategy is needed to minimize their energy
contain global warming. The majority of cloud providers (such usage while optimizing SLAs. Scientific and industrial work-
as Amazon, Apple, and Google) offer a multicloud environment flows are one of the most complex applications where several
(including industrial clouds) which is Geo-distributed across dif- tasks have to be executed in a synchronous manner to achieve
ferent countries and are connected via software-defined network the required quality of service [9]. Communication between
(SDN) [2]. Some of them are designed to utilize the green energy, different tasks makes the matter worse as the energy usage of
provided from local providers.1 So, this provides an opportunity the network also needs to be considered with other constraints.
for the cloud providers to schedule their workloads to the data Giacobbe et al. [10] proposed usage of multiple data center
centers, which utilize more renewable energies. locations to improve energy cost and also minimize environ-
mental impact. However, these solutions are designed for simple
A. Research Question applications that consist of tasks, which can run independently.
Also, the existing solutions do not take the advantages of SDN
How to execute an industrial workflow application across
network to account to further improving the data provision
multiple data centers via private SDWAN? To execute an in-
strategies.
dustrial workflow task on cloud, we need sufficient computing
In order to minimize energy usage while avoiding SLA vi-
resources provided by cloud provider, the codebase of the task
olations of workflow applications, we need new system and
belonged to user and data provided by user or generated from
algorithmic solutions that can consider several factors including
upstream tasks. To this end, the workflow scheduler has to handle
dependency between different tasks with data transfer cost in
the three factors at the same time, i.e., resource provisioning, task
private SDWAN, in addition to energy cost and carbon footprint
provisioning, and data provisioning. Wen et al. [3], [4] developed
associated with application execution. To this end, we propose
the schedulers to run scientific workflow over multiple data
an adaptive genetic algorithm-based mechanism to schedule
centers. However, they do not consider the software-defined data
workflow applications considering application users’ require-
centers that offers more flexible for design new green energy
ments such as deadline and budget. To minimize the carbon
scheduling algorithm. WARM [5] aim of scheduling the tasks
footprint [11], the proposed algorithm selects the schedule
in SDN-based cloud data center while maximizing the revenue
that favors the data centers where green energy being utilized.
of the cloud provider. It optimizes the latency of tasks both
However, as green energy availability varies with time [12],
in network and VM. However, this solution is not suitable for
thus, our algorithm also considers resources from different data
multiple clouds scenarios.
centers. Moreover, from the user perspective execution cost
How to optimize for energy efficiency while avoiding SLA
and minimum execution time is also important; our algorithm
violations? While cloud computing aims to optimize the use
also considers this tradeoff between execution cost, usage of
of hosted ICT resources, the cloud providers do not (yet) have
green energy, and execution time. In particular, our proposed
an effective solution for simultaneously optimizing energy con-
algorithm minimizes execution cost while selecting solutions
sumption and SLAs (e.g., deadline, processing cost) especially
with minimum carbon footprint for overall schedule by using
for big data-driven scientific and industrial workflow application
multiple data centers with more green energy usage (Table III
scheduling in software-defined multicloud environments. One of
highlights the novelty of our work). The contributions of this
the major reasons for this state of affairs is that cloud providers
article are as follows.
operate multiple large data centers distributed across multiple
1) A new SDN-based workflow broker (SDNWB) to de-
locations. Depending on the location of the data center and
ploy industrial workflow tasks across multiple software-
location of the application owner, the scheduling process for
defined data centers while automating the task provision-
cloud applications has to automate cloud data center selection
ing, data provisioning, and resource provisioning.
and, in doing so, ensure that SLA (e.g., application hosting
2) An adaptive genetic algorithm (GA) and associated
cost, application run-time performance) and energy (e.g., total
SDNWB for green scheduling of industrial workflow
electricity bills, sustainability goals) requirements are met at
applications.
the same time, which are often conflicting. When selecting
3) Tradeoff analysis of different factors such as energy cost,
ICT resources (e.g., virtual machines, containers, storage space)
green energy availability, and workflow requirements
from multiple data centers, cloud providers must consider het-
based on real data.
erogeneous set of criteria and complex dependencies across
4) Extensive experimental evaluation to study the feasibility
multiple layers (e.g., application level, data center level), which
of the proposed scheduling algorithm and architecture.
is impossible to resolve manually.
There is a substantial amount of related work address-
ing the improvement of the carbon footprint of data centers II. SYSTEM MODEL
by managing customer workloads at different levels such as
Fig. 1 presents a high level system model with components
of SDNWB utilized by public cloud providers for executing the
1 [Online]. Available:www.google.com/green industrial workflow application.
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: RUNNING INDUSTRIAL WORKFLOW APPLICATIONS IN A SOFTWARE-DEFINED MULTI-CLOUD ENVIRONMENT 5647
The system S, in this article, consists of a set of software- where Tl (vmi ) is time required to launch VMi ∈
defined data center owned by a provider such as Ama- {small(VM), medium(VM), large(VM)}, and Te (vi , vmi )
zon EC2. S = {R1 , R2 , . . . , R3 } ∪ D, where Ri , 1 ≤ i ≤ k, = represents the time required execute vi over vmi . Tt (vi , vmi ) is
{vm1 , vm2 , . . . , vmi , vmk } ∪ di . the time required transfer vi ’s input data to vmi . However, if all
In a data center Ri , vmi is a virtual machine for hosting the input data are in the same VM, the transferring is equal to
application services (e.g., a workflow task) and di is the cloud 0, i.e., Tt (vi , vmi ) = 0. The models for calculating these times
specific data repository (such as S3 in the case of Amazon S3 will be detailed in the following section. Based on (1), we can
cloud). identify the total cost of executing a workflow over a set of
A public cloud provider utilizes Workflow Orchestrator that VMs that are allocated over different data centers as follows:
deploys the industrial workflow application across multiple data
center and the SDN Controller optimizes the data transferring T Cost(λ, VM) = Cost(vi , VMi ). (2)
among the data centers while executing a workflow application, vi ∈λ, VMi ∈VM
such that the application can be executed with minimal execution
cost and carbon footprint. In particular, the users submit their B. Performance Model
industrial workflows with all the executables and information
such as execution requirements, task description, and the desired As abovementioned, the makespan (or performance) of ex-
security requirements to our broker. Workflow Orchestrator is ecuting an industrial workflow application over different VMs
responsible for matching different workflow tasks to different that are deployed over different data centers includes three parts:
data centers based on their electricity prices and usage of green the time the VM (Tl ) is launched, transferring input data to
energy. Based on the planning, the Workflow Orchestrator inter- destination VM (Tt ), and executing the service (Te ). In order
acts with each data center to prepare virtual machines to execute to model this, we assume that the launching time of each type
the workflow tasks in a defined order. SDN Controller manages of VM is the same which is equal to Tl . Next, the network
the data transfer between different tasks during execution by throughput (or bandwidth) is tp, 2tp, and 3tp corresponding to
configuring the flowable of the related SDN-switches. small(VM), medium(VM) and large(VM), noting tpvmi . How-
ever, data transmission rate is not only affected by throughput
of the deployed VMs, but also the geographical location of the
A. Cost Model
VMs.
We assume that each dci has three types of VM: small(VM), Same data center: A workflow includes two tasks v1, v2,
medium(VM), and large(VM). The price of these three where the data are generated from v1 and transferred to v2.
VMs are: 4 ∗ Price(Small(VM)) = 2 ∗ Price(medium(VM)) = They are allocated to two different VMs VM1 and VM2 , and
Price(large(VM)) = 4M . In this article, we assume that the the VMs are deployed over the same data center. Thus, the
VMs are charged according to how long they are used for, throughput between v1 and v2 is tpv1→v2 = min(tpvm1 , tpvm2 )).
therefore, the cost of running small(VM) for four minutes is Thus, the time required to transfer P size of data from v1 to v2
4M . is: Tt (v1, v2) = min(tpvmP ,tpvm ) .
1 2
Based on the assumptions abovementioned, we can have the Moreover, the host VMs of v1 and v2 are allocated in different
cost of executing vi over small(VM) as follows: data centers. For instance, if the vm1 , which is used to host v1 is
Cost(vi , vmi ) = (Tl (small(VM))) + Te (vi , vmi ) deployed on dc1 , and vm2 , which is used to host v2 is deployed
on dc2 . So the time of transferring P data from v1 to v2 is:
+ Tt (vi , vmi )) ∗ Price(vmi ) (1) Tt (v1, v2) = min(tpvmP ,tpvm ) + P ∗ ϕ ∗ H(dc1 , dc2 ), where ϕ is
1 2
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
5648 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 8, AUGUST 2021
the average latency incurred at each hop and H(dc1 , dc2 ) is the D. Electricity Cost
number of core network hops between two data centers. The electricity cost is caused by data processing and
The execution time of each service depends on the perfor- communication. We compute the electricity cost of data pro-
mance of the host VM, in general, the larger VM has better cessing by multiplying the local electricity price with energy
performance. Therefore, the execution of time vi , which is consumed by the corresponding VM as shown in 8, where
hosted on vmi can be represented as: Te (vi , vmi ). Thus, the Eprice(vmi ) represent electricity price of the data center vmi
makespan of executing λ is: deployed
Makespan(V, VM) = Tl (vi , vmi ) + Tt (vi , vj ) comp
Ehost (VM) = [(α ∗ ρ + (1 − α) ∗ ρ ∗ uvmi )
vi ,vj ∈V⊂λ vmi ∈VM
vmi ,vmj ∈VM
i =j
∗ Eprice(vmi )]. (8)
+ Tt (vi , vmi ) (3)
Regarding the electricity cost cause by data exchanging,
where V is a set of the services, which belong to the critical path. we assume that the electricity price is a constant value Ω for
each hop. As the result, the total electricity cost for running a
C. Energy Model workflow with deployment solution λ is formalized in
comp
There are two key operations in the execution of an industrial T Ele(λ, VM) = Ehost (VM)
workflow application where energy is spent: a) in data process-
ing or computation and b) communication. For computation, in + P comm (vmi , vmj ) ∗ Ω. (9)
vmi ∈VM
general the power consumption of a server varies as a function vmj ∈VM
of its utilization level. If a server is idle, the power saving
mechanism lowers the frequency of CPU and, thus, only a small III. PROPOSED ENERGY-AWARE INDUSTRIAL WORKFLOW
proportion (α) of peak power is utilized. If ρ is peak power SCHEDULING ALGORITHM: GREENGA
consumption and u is utilization of resources, then the power
consumption by a host in the data center will be In this article, we aim to find an optimized solution that
maximizes the proportion of renewable energy and minimizes
comp
Phost = α ∗ ρ + (1 − α) ∗ ρ ∗ u. (4) the real electricity cost under deadline constraints from users.
Therefore, this can be considered a dual objective optimization
A host may run several VMs at a time, thus, this power will problem.
be spent by each VM according to its usage of resources. Even Given the complexity of the problem with multiple objec-
though a host may have several resources such as CPU cores, tive functions and constraints, it is not possible to find the
disk, memory, and other elements, we assume that uvmi indicates solution to the scheduling problem in polynomial time. Thus,
aggregate resources utilized by each VM i hosted on the server. we adapted a well known evolutionary algorithm (i.e., genetic
The power consumption of server (host) will be algorithm), which is known to find the near-optimal solution for
scheduling workflow applications. Previously the genetic algo-
comp
Phost (VM) = (α ∗ ρ + (1 − α) ∗ ρ ∗ uvmi ). (5) rithm has been applied for optimizing makespan of workflow
vmi ∈VM applications; however, its applicability and performance has not
been tested for optimizing different factors such as revenue,
Let one VM i be transmitting data to another VM j. Let energy consumption, and carbon footprint. In our approach,
H(i, j) be the number of hops or routers/switches between these we first converted this multiobjective problem into a single
VMs. For communication, the power consumption depends on objective optimization problem by multiplying each objective;
the bandwidth used by a VM in communication and number of the resultant problem is formulated as
routers those data need to be transmitted from to reach to the
destination VM. If B is the total bandwidth available and ξrouter min (f (λ) ∗ g(λ))
is the power consumed by a router, the power consumption for s.t. f (λ) = T Ele(λ, VM)
the communication will be
g(λ) = (1 − σ(λ))T Energy(λ, VM)
tpv1→v3
P comm
(vmi , vmj ) = ξrouter ∗ H(i, j) ∗ . (6) Makespan(V, VM) ≤ deadline
B
Therefore, the total energy consumption can be formalized as T Cost(λ, VM) ≤ budget (10)
comp f represents the total monetary cost of running a given workflow
T Energy(λ, VM) = Phost (VM)
with deployment solution λ. Next, g indicates the nongreen
+ P comm (vmi , vmj ). (7) energy consumption and where σ(λ) is a function that calculates
vmi ∈VM the proportion of renewable energy consumption on deployment
vmj ∈VM
solution λ. Finally, deadline and budget are given by users as
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: RUNNING INDUSTRIAL WORKFLOW APPLICATIONS IN A SOFTWARE-DEFINED MULTI-CLOUD ENVIRONMENT 5649
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
5650 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 8, AUGUST 2021
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: RUNNING INDUSTRIAL WORKFLOW APPLICATIONS IN A SOFTWARE-DEFINED MULTI-CLOUD ENVIRONMENT 5651
Fig. 3. Electricity cost for medium size workflow. Fig. 6. Electricity cost versus number of generation.
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
5652 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 8, AUGUST 2021
Fig. 13. Proportion of the usage of renewable energy for very large
Fig. 8. Energy consumption for CyberShake. size workflow.
Fig. 9. Energy consumption for Montage. Fig. 14. Presentation of time save comparing with deadline.
The local optimal has a very high probability of causing the ter-
mination of the program by meeting the predefined termination
condition, as described in Section III-B.
4) Performance/Makespan: Deadline is a hard constraint,
which means that the execution time of each submitted workflow
must be equal to or less than the specific deadline. Fig. 14
shows the time saving of the generated solutions, comparing
Fig. 10. Energy consumption for LIGO. with user given deadlines, where the Y-axis represents the ratio
of saving time and the given deadline( savedTime
deadLine ). The X-axis is the
type of workflow, where “M EP”, “L EP”, and “VL EP” are the
medium, large and very large size of “Epigenomics” workflows.
The results show that all generated solutions can guarantee the
given deadline.
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: RUNNING INDUSTRIAL WORKFLOW APPLICATIONS IN A SOFTWARE-DEFINED MULTI-CLOUD ENVIRONMENT 5653
Fig. 15. Energy consumption for transferring medium size workflow Fig. 19. Energy consumption for transferring large size workflow data
data in small scale WAN. in large scale WAN.
Fig. 16. Energy consumption for transferring large size workflow data
in small scale WAN. Fig. 20. Energy consumption for transferring very large size workflow
data in large scale WAN.
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
5654 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 8, AUGUST 2021
TABLE III
COMPARE WITH OTHER RELATED WORK
traces such as Microsoft Azure Dataset.5 and Alibaba Cluster carbon footprint. For instance, Aksanli et al. [31] proposed a
Data.6 database scheduling strategy, which predicts the green energy
availability reducing rescheduling of jobs. Goiri et al. [32] also
VII. RELATED WORK predicted green energy availability to schedule map reduce jobs.
Deng et al. [33] proposed an online algorithm to minimize
A. Cost and Performance-Based Tasks Scheduling the operational cost of data centers by using mutiple energy
To improve the performance of run a workflow application, resources. Kaushik et al. [34] proposed an energy saving cloud
Mao and Humphrey [23] introduced an autoscaling method to storage solution by dividing the storage structures in different
allocate workflow tasks to a set of VMs to meet the deadline zones based on power characteristics. Most of the abovemen-
constraints. tioned works focus on single data centers. Garg et al. [11]
Malawski et al. [24] considered the monetary cost for run- proposed a green cloud framework, which utilizes multiple
ning workflow over cloud. They developed the algorithm to clouds to improve energy consumption. However, the work does
overcome the tradeoff between makespan and financial cost. not give a mechanism to maximize the usage of green energy.
An new algorithm was proposed in [17] that utilizes the idle Kiani et al. [35] shared similar aims to ours, i.e., to increase
time of provisioned resources and surplus budget to scale up the green energy usage and cutting the cost of electricity across
execution of workflow application to catch of chance of meeting multiple data centers. They utilize the concept of decomposing
deadlines. There are also some algorithms [25] considering the workload into green and brown, however, they focus on a
security for running the workflow applications on cloud. simple workload consisting of individual tasks. Giacobbe et al.
In multiple clouds case, Yuan et al. [20] proposed an effective [10], [36] introduced approaches for migrating virtual machines
algorithm to scheduling the tasks across public cloud and private, among more distributed federated clouds where costs ([36]) and
while minimizing the cost and ensuring the delay within a environmental impact (using renewable energy along with the
boundary. However, this method is not suitable for scientific or selection of data centers with the lower PUE [10]) are taken into
industrial workflow applications. PANDA [26] was developed to account. Yuan et al. [7] proposed an algorithm that focus on
schedule workflow across private cloud and public while finding ensuring the deadline of executing tasks on green data centers.
the “best” tradeoff between performance and cost. Fard et al. These works consider simple computation resources at the level
[27] solved the tradeoff between monetary cost and completion of generic VMs, no further investigation from this perspective
time via a Pareto-optimal-based algorithm. However, none of is performed.
them considered security and cloud availability change. Security In summary, to the best of our knowledge, our proposed
was consider in [28] that introduced a static method to optimize GA-based scheduling is the first work that focuses on increas-
the deployment of a workflow application on multiple cloud by ing usage of green energy and minimizing electricity cost for
considering security, makespan, and monetary cost. workflow application execution in multiple cloud data centers.
We also consider variable electricity cost across different cloud
B. Green Scheduling data centers. Table III provides a comparison between our work
and the state of the arts.
Various existing proposals [21], [22] suggested different so-
lutions with respect to green scheduling in cloud computing.
Catena and Tonellotto [29] proposed a predictive energy saving VIII. CONCLUSION
online scheduling algorithm to reduce the energy consumption
of distributed web search engines. Li et al. [30] proposed an In recent years, several works were attempted to develop
energy model for edge and core cloud, to estimate the energy mechanisms to be able to efficiently execute industrial workflow
consumption based on the number of IoT devices and the desired applications in software-defined cloud environments with min-
application QoS. There have been many works [6], [31]–[33] imal cost. However, over the years, as usage of cloud increases,
that have proposed techniques to improve cloud data center concern about its carbon footprint has also became a critical
efficiency in terms of electricity usage and decreasing their topic of research. In this context, this article proposed an adaptive
GA-based industrial workflow scheduling algorithm that utilizes
5 [Online]. Available: https://github.com/Azure/AzurePublicDataset multiple software-defined cloud data center resources not only to
6 [Online]. Available: https://github.com/alibaba/clusterdata improve green energy usage but also keep the cost of execution
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: RUNNING INDUSTRIAL WORKFLOW APPLICATIONS IN A SOFTWARE-DEFINED MULTI-CLOUD ENVIRONMENT 5655
to a minimum. The performance of our algorithm was evaluated [18] M. Malawski, G. Juve, E. Deelman, and J. Nabrzyski, “Algorithms for cost-
using real industrial workflow workload with different sizes and deadline-constrained provisioning for scientific workflow ensembles
in IAAS clouds,” Future Gener. Comput. Syst., vol. 48, pp. 1–18, 2015.
under various configurations of virtual machines. We compared [19] K. Alwasel et al., “IOTSIM-SDWAN: A simulation framework for in-
our algorithm with another GA-base algorithm that just opti- terconnecting distributed datacenters over software-defined wide area
mizes electricity cost. The experimental results clearly show network (SD-WAN),” J. Parallel Distrib. Comput., vol. 143, pp. 17–35,
2020.
that our proposed algorithm favors more green energy usage [20] H. Yuan, J. Bi, W. Tan, M. Zhou, B. H. Li, and J. Li, “TTSA: An effective
with expenditure similar to the base algorithm for large and very scheduling approach for delay bounded tasks in hybrid clouds,” IEEE
large size workflows. For smaller size workflows, with 10–20% Trans. Cybern., vol. 47, no. 11, pp. 3658–3668, Nov. 2017.
[21] G. S. Aujla and N. Kumar, “SDN-based energy management scheme for
increase in electricity cost, our algorithm can generate a schedule sustainability of data centers: An analysis on renewable energy sources
that uses almost 200% times more green energy. and electric vehicles participation,” J. Parallel Distrib. Comput., vol. 117,
In future, we will evaluate our proposed algorithm in real pp. 228–245, 2018.
[22] G. S. Aujla and N. Kumar, “Mensus: An efficient scheme for energy
cloud environments and integrate with workflow engines. We management with sustainability of cloud data centers in edge–cloud envi-
will also work on improving the algorithm by considering ronment,” Future Gener. Comput. Syst., vol. 86, pp. 1279–1300, 2018.
dynamic changes in green energy availability. [23] M. Mao and M. Humphrey, “Auto-scaling to minimize cost and meet ap-
plication deadlines in cloud workflows,” in Proc. Int. Conf. High Perform.
Comput., Netw., Storage Anal., Nov. 2011, pp. 1–12.
[24] M. Malawski, G. Juve, E. Deelman, and J. Nabrzyski, “Cost- and
REFERENCES deadline-constrained provisioning for scientific workflow ensembles in
IaaS clouds,” in Proc. Int. Conf. High Perform. Comput., Netw., Storage
[1] J. Koomey et al., “Growth in data center electricity use 2005 to 2010,” Anal., 2012, pp. 22–1–22–11.
Rep. Anal. Press, Completed Request New York Times, vol. 9, no. 2011, [25] J. Mace, A. van Moorsel, and P. Watson, “The case for dynamic security
p. 161, 2011. solutions in public cloud workflow deployments,” in Proc. IEEE/IFIP 41st
[2] S. Jain et al., “B4: Experience with a globally-deployed software defined Int. Conf. Dependable Syst. Netw. Workshops, Jun. 2011, pp. 111–116.
wan,” ACM SIGCOMM Comput. Commun. Rev., vol. 43, no. 4, pp. 3–14, [26] M. R. H. Farahabady, Y. C. Lee, and A. Y. Zomaya, “Pareto-optimal
2013. cloud bursting,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 10,
[3] Z. Wen, J. Cała, P. Watson, and A. Romanovsky, “Cost effective, reliable pp. 2670–2682, Oct. 2013.
and secure workflow deployment over federated clouds,” IEEE Trans. Serv. [27] H. M. Fard, R. Prodan, and T. Fahringer, “A truthful dynamic workflow
Comput., vol. 10, no. 6, pp. 929–941, Nov./Dec. 2016. scheduling mechanism for commercial multicloud environments,” IEEE
[4] Z. Wen, R. Qasha, Z. Li, R. Ranjan, P. Watson, and A. Romanovsky, Trans. Parallel Distrib. Syst., vol. 24, no. 6, pp. 1203–1212, Jun. 2013.
“Dynamically partitioning workflow over federated clouds for optimising [28] F. Jrad, J. Tao, I. Brandic, and A. Streit, “{SLA} enactment for large-
the monetary cost and handling run-time failures,” IEEE Trans. Cloud scale healthcare workflows on multi-cloud,” Future Gener. Comput. Syst.,
Comput., vol. 8, no. 4, pp. 1093–1107, Oct.–Dec. 2020. vol. 43/44, pp. 135–148, 2015.
[5] H. Yuan, J. Bi, M. Zhou, and K. Sedraoui, “Warm: Workload-aware multi- [29] M. Catena and N. Tonellotto, “Energy-efficient query processing in
application task scheduling for revenue maximization in SDN-based cloud web search engines,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 7,
data center,” IEEE Access, vol. 6, pp. 645–657, 2017. pp. 1412–1425, Jul. 2017.
[6] J. Wu, S. Guo, J. Li, and D. Zeng, “Big data meet green chal- [30] Y. Li, A.-C. Orgerie, I. Rodero, B. L. Amersho, M. Parashar, and J.-M.
lenges: Greening big data,” IEEE Syst. J., vol. 10, no. 3, pp. 873–887, Menaud, “End-to-end energy models for edge cloud-based IoT platforms:
Sep. 2016. Application to data stream analysis in IoT,” Future Gener. Comput. Syst.,
[7] H. Yuan, J. Bi, M. Zhou, and A. C. Ammari, “Time-aware multi-application vol. 87, pp. 667–678, 2018.
task scheduling with guaranteed delay constraints in green data center,” [31] B. Aksanli, J. Venkatesh, L. Zhang, and T. Rosing, “Utilizing green energy
IEEE Trans. Autom. Sci. Eng., vol. 15, no. 3, pp. 1138–1151, Jul. 2018. prediction to schedule mixed batch and service jobs in data centers,” ACM
[8] J. Bi et al., “Application-aware dynamic fine-grained resource provision- SIGOPS Oper. Syst. Rev., vol. 45, no. 3, pp. 53–57, 2012.
ing in a virtualized cloud data center,” IEEE Trans. Autom. Sci. Eng., [32] Í. Goiri, K. Le, T. D. Nguyen, J. Guitart, J. Torres, and R. Bianchini,
vol. 14, no. 2, pp. 1172–1184, Apr. 2017. “Greenhadoop: Leveraging green energy in data-processing frameworks,”
[9] K. Vahi et al., “A case study into using common real-time workflow mon- in Proc. 7th ACM Eur. Conf. Comput. Syst., ACM, 2012, pp. 57–70.
itoring infrastructure for scientific workflows,” J. Grid Comput., vol. 11, [33] W. Deng, F. Liu, H. Jin, C. Wu, and X. Liu, “Multigreen: Cost-minimizing
no. 3, pp. 381–406, 2013. multi-source datacenter power supply with online control,” in Proc. 4th
[10] M. Giacobbe, A. Celesti, M. Fazio, M. Villari, and A. Puliafito, “Evaluating Int. Conf. Future Energy Syst., ACM, 2013, pp. 149–160.
a cloud federation ecosystem to reduce carbon footprint by moving com- [34] R. T. Kaushik, L. Cherkasova, R. Campbell, and K. Nahrstedt, “Lightning:
putational resources,” in Proc. IEEE Symp. Comput. Commun., Jul. 2015, Self-adaptive, energy-conserving, multi-zoned, commodity green cloud
pp. 99–104. storage system,” in Proc. 19th ACM Int. Symp. High Perform. Distrib.
[11] S. K. Garg, C. S. Yeo, and R. Buyya, “Green cloud framework for improv- Comput., ACM, 2010, pp. 332–335.
ing carbon efficiency of clouds,” in Proc. Eur. Conf. Parallel Process., [35] A. Kiani and N. Ansari, “Toward low-cost workload distribution for inte-
Springer, 2011, pp. 491–502. grated green data centers,” IEEE Commun. Lett., vol. 19, no. 1, pp. 26–29,
[12] R. Blanco, M. Catena, and N. Tonellotto, “Exploiting green energy to Jan. 2015.
reduce the operational costs of multi-center web search engines,” in Proc. [36] M. Giacobbe, A. Celesti, M. Fazio, M. Villari, and A. Puliafito, “An
25th Int. Conf. World Wide Web, 2016, pp. 1237–1247. approach to reduce energy costs through virtual machine migrations in
[13] D. Bhandari, C. Murthy, and S. K. Pal, “Genetic algorithm with elitist cloud federation,” in Proc. IEEE Symp. Comput. Commun., Jul. 2015,
model and its convergence,” Int. J. Pattern Recognit. Artif. Intell., vol. 10, pp. 782–787.
no. 6, pp. 731–747, 1996.
[14] R. Poli and W. B. Langdon, “Genetic programming with one-point Zhenyu Wen (Member, IEEE) received the
crossover and point mutation,” in Proc. Soft Comput. Eng. Des. Manuf., M.Sc. and Ph.D. degrees in computer science
Springer-Verlag, 1997, pp. 180–189. from Newcastle University, Newcastle Upon
[15] D. Goldberg, Genetic Algorithms in Search Optimization and Machine Tyne, U.K., in 2011 and 2016, respectively.
Learning. Boston, MA, USA: Addison-Wesley, 1988, pp. 62–66. He is currently a Postdoc Researcher with
[16] J. Smith and T. C. Fogarty, “Self adaptation of mutation rates in a steady the School of Computing, Newcastle Univer-
state genetic algorithm,” in Proc. IEEE Int. Conf. Evol. Comput., 1996, sity, Newcastle upon Tyne, U.K. His current re-
pp. 318–323. search interests include multi-objects optimisa-
[17] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and tion, crowdsources, AI, and cloud computing.
R. Buyya, “CloudSim: A toolkit for modeling and simulation of For his contributions to the area of scalable data
cloud computing environments and evaluation of resource provision- management for the Internet-of-Things, he was
ing algorithms,” Softw. Pract. Experience, vol. 41, no. 1, pp. 23–50, awarded the IEEE TCSC Award for Excellence in Scalable Computing
Jan. 2011. (Early Career Researchers) in 2020.
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.
5656 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 8, AUGUST 2021
Saurabh Garg received the 5-year Integrated Schahram Dustdar (Fellow, IEEE) received the
Master of Technology in mathematics and com- M.Sc. and PhD. degrees in business informat-
puting from the Indian Institute of Technology ics (Wirtschaftsinformatik) from the University of
(IIT) Delhi, India, in 2006, and the Ph.D. degree Linz, Austria, in 1990 and 1992, respectively.
in computer science from the University of Mel- He is Full Professor of computer science and
bourne, Melbourne, Australia, in 2010. head of the Distributed Systems Group, the TU
He is one of the few Ph.D. students who Wien, Austria. From 2004 to 2010, he was also
completed in less than three years. His doctoral Honorary Professor of Information Systems with
thesis focused on devising novel and market- the Department of Computing Science, the Uni-
oriented meta-scheduling mechanisms for dis- versity of Groningen (RuG), Netherlands. From
tributed systems under conditions of concurrent December 2016 until January 2017, he was a
and conflicting resource demand. He has gained about three years of Visiting Professor with the University of Sevilla, Spain and from January
experience in the Industrial Research while working at IBM Research until June 2017 he was a Visiting Professor with UC Berkeley, USA.
Australia and India. Dr. Dustdar is the Editor-in-Chief of Computing (Springer) and an
Associate Editor for the IEEE TRANSACTIONS ON CLOUD COMPUTING,
IEEE TRANSACTIONS ON SERVICES COMPUTING, ACM TRANSACTIONS ON
THE WEB, and ACM TRANSACTIONS ON INTERNET TECHNOLOGY and on
the editorial board of IEEE INTERNET COMPUTING AND IEEE COMPUTER.
Gagangeet Singh Aujla (Senior Member,
IEEE) received the bachelor’s and master’s de-
grees in computer science from the Punjab
Technical University, Kapurthala, India, in 2003
and 2013, respectively, and the Ph.D. degree
in computer science from the Thapar University,
Patiala, India, in 2018.
Albert Y. Zomaya (Fellow, IEEE) received
He is an Assistant Professor of Computer
the B.S. degree in electrical engineering from
Science with Durham University, Durham, U.K.
Kuwait University, Kuwait, in 1987, and the Ph.D.
Before this, he was a Postdoc Research Asso-
degree in control engineering from Sheffield
ciate with Newcastle University, Newcastle upon
University, U.K., in 1999.
Tyne, U.K., a Research Associate with Thapar University, India, a Vis-
He is currently the Chair Professor of High
iting Researcher with the University of Klagenfurt, Klagenfurt, Austria,
Performance Computing and Networking with
and on various other academic positions. For his contributions to the
the School of Computer Science, University of
area of scalable and sustainable computing, he was awarded the 2018
Sydney. He is also the Director of the Centre for
IEEE TCSC Outstanding Ph.D. Dissertation Award of Excellence.
Distributed and High Performance Computing,
which was established in late 2009. He was an
Australian Research Council Professorial Fellow during 2010–2014 and
held the CISCO Systems Chair Professor of Internet working during the
period 2002–2007 and also was the Head of school for 2006–2007 in
Khaled Alwasel received the B.S. and M.S. the same school.
degrees in information technology from Indiana
University-Purdue University Indianapolis, Indi-
anapolis, IN, USA and from Florida Interna-
tional University, Miami, FL, USA, in 2014 and
2015 respectively. He is currently working to-
ward the Ph.D. degree in computer science with
the School of Computing Science, Newcastle Rajiv Ranjan (Senior Member, IEEE) received
University, Newcastle upon Tyne, U.K. the Ph.D. degree in computer science and soft-
His research interest include areas of SDN, ware engineering from the Department of Com-
big data, IoT, and cloud computing. puter Science and Software Engineering, Uni-
versity of Melbourne, Melbourne, Australia, in
2009.
He is a Full Professor in computing science
with Newcastle University, U.K. Before moving
to Newcastle University, he was Julius Fellow
(2013–2015), Senior Research Scientist and
Project Leader in the Digital Productivity and
Deepak Puthal (Senior Member, IEEE) Services Flagship of Commonwealth Scientific and Industrial Research
received the Ph.D. degree in computer science Organization (CSIRO C Australian Government’s Premier Research
and information systems from the University of Agency). He was a Senior Research Associate (Lecturer level B) with
Technology Sydney, Ultimo, Australia, in 2017. the School of Computer Science and Engineering, University of New
He is an Assistant Professor with the School South Wales.
of Computing, Newcastle University, Newcastle
upon Tyne, U.K., and an Honorary Fellow with
the Faculty of Engineering and Information
Technology, University of Technology Sydney,
Australia. Before this position, he was working
as a Lecturer with the University of Technology
Sydney, and as a Graduate Research Fellow with Data 61, CSIRO,
Australia.
Dr. Puthal is the recipient of the 2019 Best IEEE ComSoc Young Re-
searcher Award (For EMEA Region), 2018 IEEE Technical Committee
on Scalable Computing Award for Excellence in Scalable Computing
(Early Career Researcher), and 2017 IEEE Distinguished Doctoral
Dissertation Award.
Authorized licensed use limited to: University of Exeter. Downloaded on May 27,2021 at 04:20:03 UTC from IEEE Xplore. Restrictions apply.