You are on page 1of 17

International Journal of Cloud Applications and Computing

Volume 6 • Issue 4 • October-December 2016

A Customer-Oriented Task Scheduling for


Heterogeneous Multi-Cloud Environment
Sohan Kumar Pande, Department of CSE, SUIIT, Burla, India
Sanjaya Kumar Panda, Department of CSE, Indian Institute of Technology (ISM), Dhanbad, India & Department of CSE
and IT, VSS University of Technology, Burla, India
Satyabrata Das, Department of CSE and IT, V S S University of Technology, Burla, India

ABSTRACT

Task scheduling is widely studied in various environments such as cluster, grid and cloud computing
systems. Moreover, it is NP-Complete as the optimization criteria is to minimize the overall processing
time of all the tasks (i.e., makespan). However, minimization of makespan does not equate to customer
satisfaction. In this paper, the authors propose a customer-oriented task scheduling algorithm for
heterogeneous multi-cloud environment. The basic idea of this algorithm is to assign a suitable task
for each cloud which takes minimum execution time. Then it balances the makespan by inserting as
much as tasks into the idle slots of each cloud. As a result, the customers will get better services in
minimum time. They simulate the proposed algorithm in a virtualized environment and compare the
simulation results with a well-known algorithm, called cloud min-min scheduling. The results show
the superiority of the proposed algorithm in terms of customer satisfaction and surplus customer
expectation. The authors validate the results using two statistical techniques, namely T-test and
ANOVA.

Keywords
Cloud Computing, Customer Satisfaction, Makespan, Min-Min, Multi-Cloud, Surplus Customer Expectation,
T-Test, Task Scheduling

1. INTRODUCTION

Cloud computing provides various services such as infrastructure, platform and software as a service
over the Internet (Buyya, Yeo, Venugopal et al., 2009; Durao, Carvalho, Fonseka, & Garcia, 2014).
These services are requested by the customers as and when required. In general, the customer
requests are represented in the form of applications/jobs/tasks (Tsai, Fang, & Chou, 2013; Li et al.,
2012; Panda, & Jana, 2015; Panda, & Jana, 2016). On the contrary, the services are provisioned in
the form of various resources such as network, storage, hardware, software and many more (Tsai,
Fang, & Chou, 2013). In order to provide the services, the customer requests are mapped with the
pool of resources (Li et al., 2012). Therefore, efficient mapping of customer requests to the resources
(referred as task scheduling) is a challenging problem which was shown to be NP-Complete (Braun
et al., 2001; Maheswaran, Ali, Siegelet al., 1999; Mokotoff, 1999).
Task scheduling is the ordering of n customers’ tasks to the m resources or clouds such that the
overall processing time (i.e., makespan) is minimized (Mokotoff, 1999). Note that n >> m. Here, the
requirements of the customers are varying with respect to the number of resource, cost, deadline etc. On
the contrary, the resources are varying with respect to processing speed, capacity, bandwidth, service

DOI: 10.4018/IJCAC.2016100101

Copyright © 2016, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

1
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

level etc. Therefore, the performance of the customer task is different from one resource to another
resource. It introduces the problem of resource selection for each task in heterogeneous environment
like cloud computing systems in which the primary objective is to minimize the makespan. However,
minimization of makespan does not necessarily mean customers satisfaction as some tasks dominate
the execution of other tasks. Therefore, task scheduling must emphasis on the customer satisfaction.
More specifically, it must focuses on the minimization of individual makespan of the tasks rather
than the overall makespan of all the tasks.
In this paper, we present the following task scheduling problem. Given a set of n independent
customer tasks and a set of m clouds, the primary objective is to minimize the individual makespan
of all the tasks so that the customer satisfaction is considerably increased. We propose an algorithm
called customer-oriented task scheduling (COTS) for the above scheduling problem.
The paper is organized as follows. Section 2 discusses the related work in task scheduling
algorithms. Section 3 presents the model and problem description. Section 4 proposes a customer-
oriented task scheduling algorithm and analyzes the complexity of the algorithm. Section 5 introduces
two performance metrics followed by simulation results in Section 6. We conclude with some future
insights in Section 7.

2. RELATED WORK

Task scheduling is a challenging problem from the invention of parallel and distributed computing
(Braun et al., 2001; Maheswaran, Ali, Siegel, Hensgen, & Freund, 1999; Ibarra, & Kim, 1977). As the
requirements of the customers are rapidly changing, the existing algorithms become infeasible for that
working environment. Thus many researchers have proposed various task scheduling algorithms by
considering different set of customer requirements (Tsai, Fang, & Chou, 2013; Li et al., 2012; Panda,
& Jana, 2015; Panda, & Jana, 2016; Braun et al., 2001; Maheswaran, Ali, Siegel, Hensgen, & Freund,
1999; Ibarra, & Kim, 1977; Ergu, Kou, Peng, Shi, & Shi, 2013; Xhafa, Carretero, Barolli, & Durresi,
2007; Xhafa, Barolli, & Durresi, 2007; Panda, & Jana, 2016; Panda, & Jana, 2015; Panda, & Jana,
2015; Armstrong, Hensgen, & Kidd, 1998; Freund et al., 1998). Ibarra and Kim (1977) have proposed
a time-bound algorithm for scheduling n tasks on m resources. The algorithm takes n lg n for m =
2 and it produces a scheduling length of at most (√5 + 1)/2 time of the optimal time. Freund et al.
(1998) have called the algorithm D and algorithm E of (Ibarra, & Kim, 1977) as max-min and min-min
respectively. They have also stated explicitly that the time complexity of these algorithms is O(n2m)
respectively. Maheswaran, Ali, Siegel, Hensgen and Freund (1999) have proposed three heuristics,
namely switching algorithm, k-percent best and sufferage for heterogeneous computing. The simulation
results show the selection of heuristic based on the heterogeneity of tasks and machines. Braun et al.
(2001) have compared eleven static heuristics for heterogeneous computing. The comparisons revel
that min-min heuristic outperforms all other heuristics in terms of makespan performance metric.
Later, Xhafa, Carretero, Barolli and Durresi (2007) have compared the performance of these heuristics
in benchmark dataset generated by Braun et al. (2001). The results show that min-min performs well
in makespan whereas max-min performs better in resource utilization performance metric.
Li et al. (2012) have recently applied min-min scheduling algorithm in heterogeneous multi-
cloud environment and they called the developed algorithm as cloud min-min scheduling (CMMS).
The objective of CMMS is to minimize the makespan of a set of applications. However, they have
not considered the customer satisfaction in their proposed algorithm. Panda and Jana (2015) have
also used min-min algorithm and developed cloud min-max normalization (CMMN) scheduling.
The CMMN first normalizes the customer requests using the popular min-max normalization and
places the requests into one of the two queues based on a threshold value. However, the process of
normalization requires a time complexity of O(mn). Miriam and Easwarakumar (2010) have proposed

2
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

a set pair analysis (SPA) based task scheduling to enhance the performance of hyper cubic P2P grid.
This algorithm is also based on the popular min-min algorithm. Note that SPA comprises of two
different sets with some associated connections. The goal of SPA is to analyze both uncertainty and
certainty of the systems with their similarities and differences.
The above literatures clearly show the importance of min-min algorithm in various environments
such as grid and cloud. However, the algorithm aims to minimize the makespan without any major
consideration of customer satisfaction. Therefore, we propose COTS to overcome the hurdle of cloud
min-min scheduling (CMMS). The algorithm is completely different from CMMS with respect to
following aspects. 1) CMMS maps only one task at each iteration (i.e., 1:1) whereas the proposed
algorithm maps m tasks at each iteration (i.e., m:1). 2) CMMS is a two-phase scheduling, namely
minimum completion time for each task and minimum among them. On the contrary, the proposed
COTS assigns a task for each cloud that takes minimum completion time and accommodates the
rest unassigned tasks to the idle slots if the slots are vacant. 3) CMMS emphasis on the makespan
parameter whereas COTS focuses on the customer satisfaction parameter.

3. MODEL AND PROBLEM STATEMENT

3.1. Cloud Model


We assume that a set of datacenters are connected to provide on-demand services to the customers.
These datacenters form clouds which is of varying capabilities for computational-intensive tasks.
Without loss of generality, the datacenters are from the different cloud service providers. A customer
can send a request to any of the data centers in order to fulfil his/her requests. However, the request
does not necessarily fulfilled by the same data centers as the datacenter may be overloaded with early
arrived requests and the computational capacity of the datacenter is finite. As a result, one datacenter
may transfer few requests (or tasks) to other available datacenters in order to accommodate maximum
requests. It is noteworthy to mention that a data center transfer the requests to those datacenters where
it achieves minimum completion time. The above model is also used in (Li et al., 2012; Panda, &
Jana, 2015; Panda, & Jana, 2016) for scheduling of tasks in heterogeneous multi-cloud.

3.2. Case Study


Consider a set C of m clouds, a set T of n independent tasks from a set CU of n customers (assuming,
each customer requests only one task) and a mapping function M: T → C. The primary objective is
to map the customers tasks to the clouds such that maximum customer will get service in minimum
time. The mapping function M is represented in the form an expected time to complete (ETC) matrix
as shown in Equation 1.

C1 C2  Cm


T1 ETC 11 ETC 12  ETC 1m

ETC = T2 ETC 21 ETC 22  ETC 2m (1)

     

Tn ETC n 1 ETC n 2  ETC nm


Here, ETCij is the expected time to complete ith task on jth cloud. A task Ti, 1 ≤ i ≤ n can only be
executed in one of the m clouds. On the other hand, a cloud Cj, 1 ≤ j ≤ m can execute more than one
task in chronological order. Note that the expected execution time of a task is varying with respect
to the cloud as all the clouds have different computational power and specifications.

3
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

4. PROPOSED ALGORITHM

4.1. Methodology
The algorithm first selects m tasks for m clouds such that each cloud has earliest finish time for the
corresponding task. Then it finds the makespan for the first m tasks and tries to balance the loads of
the clouds by inserting as much as possible tasks into the idle slots. This procedure continues until
all the n tasks are mapped. As a circumstance, the tasks will complete in minimum time. We simulate
the proposed algorithm using a wide variety of tasks and clouds. We also compare the proposed
algorithm with a well-known algorithm, called cloud min-min scheduling using two performance
metrics, namely customer satisfaction and surplus customer expectation. To validate the results,
we perform two statistical tests, called T-test and ANOVA. The results show the superiority of the
proposed algorithm.
Our major contributions are as follows.

• Development of a customer-oriented task scheduling for heterogeneous multi-cloud environment.


• Evaluation of the proposed algorithm using two novel measures, namely customer satisfaction
and surplus customer expectation.
• Simulation of the proposed algorithm in a virtualized environment with a large set of tasks and clouds
• Validation of simulation results using two statistical techniques.

The proposed customer-oriented task scheduling (COTS) is a two-phase scheduling, namely


mapping and balancing. In mapping, every cloud finds a suitable task for scheduling such that a task
cannot be assigned to more than one cloud. On the other hand, balancing accommodates unassigned
tasks to the idle slots generated by mapping. COTS aims to maximize the customer satisfaction by
considering the individual processing time of the customer (i.e., individual makespan).
The details of two-phase are discussed as follows.

4.2. Mapping
Mapping is a two-step process, namely matching and scheduling (Braun et al., 2001; Maheswaran,
Ali, Siegel, Hensgen, & Freund, 1999). Matching is a process to find the cloud-task pairs based on
the minimum processing time of a task on a given cloud. Scheduling is a process to assign the tasks to
the clouds by following chronological order. The process of mapping assigns m tasks at each iteration
where m is the total number of clouds. Therefore, it is said to be a m:1 relationship.
For the pseudo code for the proposed COTS algorithm (Box 1), we use the notations as shown
in Table 1.
COTS algorithm uses a global queue, Q to place the incoming customer tasks. Initially, the
makespan is set to be zero (Line 1 of Box 1). Then it calls Procedure 1 (MAPPING) to find cloud-
task pairs (Line 2) which is shown in Box 2. This procedure finds a suitable task for each cloud and
assigns the tasks to the respective clouds. For this, it selects the first unassigned task as the suitable
task (Lines 4-7 of Box 2). Now it compares the completion time of other unassigned tasks with the
selected task to find a most suitable task for the cloud (Lines 10-17). At last, it assigns the task to
the cloud and updates the ready time of the cloud and makespan (Lines 18-26). This procedure is
repeated until m tasks are successfully assigned to m clouds.

Lemma 4.1: The time complexity of Procedure 1 (MAPPING) is O(mn).


Proof: Let m be the total number of clouds and n be the total number of tasks. To find the first
unassigned task, Steps 2-9 require O(n) time. Again, it requires O(n) time to find a suitable task
over all the unassigned tasks (Steps 10-17). Steps 18-26 require O(1) time. However, the outer
for loop (Step 1) iterates m times which require O(m) × O(n) = O(mn) time. Therefore, the time
complexity of Procedure 1 is O(mn).

4
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Table 1. Notations and their definitions

Notation Definition

n Number of tasks

m Number of clouds

ETC(i, j) Expected time to compute task i on cloud j

RT(j) Ready time on cloud j


Execution status of ith task
1 if task i is executed
x(i) x (i ) = 
0 Otherwise

CT(i) Completion time of ith task

Box 1. Pseudo code for COTS

Algorithm: COTS
Input: 1) A set of customer tasks 2) A set of clouds
Output: 1) Customer satisfaction 2) Surplus customer expectation
1. while Q ≠ NULL
2. Set makespan = 0
3. Call MAPPING(ETC, RT, n, m)
4. Call BALANCING(ETC, RT, n, m, makespan)
5. endwhile

4.3. Balancing
Balancing is a process to allocate the remaining unassigned tasks to the idle slots of all the clouds
such that the overall makespan remains intact. The process of balancing has two advantages. 1) It
improves the customer satisfaction. 2) It reduces the number of iterations required to assign the tasks.
The pseudo code for the process of balancing is shown in Box 3. This procedure is called from
the line 4 of Box 1. In other words, it follows Procedure 1. Like Procedure 1, this procedure also finds
the suitable task(s) for each cloud. However, the sum of the execution time of the suitable task (i.e.,
ETC(i, j), 1 ≤ i ≤ n, 1 ≤ j ≤ m) and ready time on the respective cloud (RT(j), 1 ≤ j ≤ m) should not
exceed the overall makespan (Lines 4-18). Mathematically:

ETC(i, j) + RT(j) ≤ makespan (2)

It assigns the tasks to the cloud and updates the ready time of the cloud. This procedure continues
until the unassigned tasks are mapped to m clouds.

Lemma 4.2: The time complexity of Procedure 2 (BALANCING) is O(n2m).


Proof: The inner most for loops (Step 3 and Step 11) take O(n) time. Steps 19 to 24 require constant
time. As the inner for loop iterates for n time, it requires O(n) time (Steps 2-25). However, the
outer for loop iterates m time (Steps 2-25). Therefore, the time complexity of Procedure 2 is
O(m) × O(n) × O(n) = O(n2m).

5
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Box 2. Pseudo code for mapping

Procedure 1: MAPPING(ETC, RT, n, m)


1. for j = 1, 2, 3,…, m
2. for i = 1, 2, 3,…, n
3. if x(i) == 0
4. minimum = ETC(i, j)
5. task_index = i
6. execute = 0
7. break
8. endif
9. endfor
10. for i = 1, 2, 3,…, n
11. if x(i) == 0
12. if minimum > ETC(i, j)
13. minimum = ETC(i, j)
14. task_index = i
15. endif
16. endif
17. endfor
18. if execute == 0
19. x(task_index) = 1
20. RT(j) = RT(j) + ETC(task_index, j)
21. CT(task_index) = RT(j)
22. execute = 1
23. endif
24. if makespan < RT(j)
25. makespan = RT(j)
26. endif
27. endfor

Lemma 4.3: The process of matching maps (n – α × m) – β tasks to m clouds at α iteration in the
worst case where α = {0, 1, 2, …, (n/m)} and:

0 α=0

β =  α
∑ βk α>0
 k =1

Proof: Let α be the iteration number and β be the total number of tasks that are assigned in the process
of balancing. The process of mapping assigns m tasks at each iteration. More specifically, after
the first iteration of mapping (α = 1), the remaining unassigned tasks are (n – m). Therefore,
the process of balancing maps (n – m) tasks to m clouds. Assume that, it assigns β1 tasks. As a
result, we have ((n – m) – β1) number of unassigned tasks for the next iteration.

Similarly, the process of mapping assigns m tasks at the second iteration and the number of
unassigned tasks is ((n – 2 × m) – β1). We assume that the balancing process assigns β2 tasks. Therefore,
we left with ((n – 2 × m) – β1 – β2) = ((n – 2 × m) – (β1 + β2)) unassigned tasks. In the (n/m) iteration,
the mapping process assigns (n – (n / m ) × m) – (β1 + β2 +…+ β(n/m)-1)) tasks in the worst case

6
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Box 3. Pseudo code for balancing

Procedure 2: BALANCING(ETC, RT, n, m, makespan)


1. for j = 1, 2, 3,…, m
2. for i = 1, 2, 3,…, n
3. for k = 1, 2, 3,…, n
4. if x(k) == 0 && (ETC(k, j) + RT(j)) ≤ makespan
5. minimum = ETC(k, j)
6. task_index = k
7. execute = 0
8. break
9. endif
10. endfor
11. for k = 1, 2, 3,…, n
12. if x(k) == 0 && (ETC(k, j) + RT(j)) ≤ makespan
13. if minimum > ETC(k, j)
14. minimum = ETC(k, j)
15. task_index = k
16. endif
17. endif
18. endfor
19. if execute == 0
20. x(task_index) = 1
21. RT(j) = RT(j) + ETC(task_index, j)
22. CT(task_index) = RT(j)
23. execute = 1
24. endif
25. endfor
26. endfor

by assuming 0 ≤ (β1 + β2 +…+ β(n/m)-1) ≤ m – 1. From the above facts, we claim that the process of
matching maps (n – α × m) – β tasks to m clouds at α iteration in the worst case.

Lemma 4.4: The proposed algorithm requires at most (n / m ) iterations by assuming 0 ≤ β <
n
( − (n / m ) ) × m .
m
Proof: In each iteration, the mapping process assigns m tasks followed by the balancing process
assigns βi, 1 ≤ i ≤ ( (n / m ) – 1) tasks. In (n / m ) iterations, the proposed algorithm
assigns up to ( (n / m ) × m) tasks in mapping process and β = β1 + β2 +…+ β (n /m ) tasks
in balancing process. There are two cases for the value of β.
n
Case 1: β ≥ ( − (n / m ) ) × m
m
In this case, the proposed algorithm requires less than or equal to (n / m ) iterations as (n
– β) < (n / m ) × m.

n
Case 2: β < ( − (n / m ) ) × m
m
7
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

In this case, the proposed algorithm assigns (m – β) tasks at (n / m ) iteration as (n / m )


× m + β < n.

Theorem 4.1: The overall time complexity of COTS is O(kn2m).


Proof: The COTS algorithm invokes Procedure 1 followed by Procedure 2, say k times in which
Procedure 2 dominates the time complexity of Procedure 1. Therefore, the overall time complexity
of the proposed algorithm is O(kn2m).

4.4. Illustration
Let us consider an example for the step-by-step illustration of COTS algorithm. We assume that there
are five different tasks (T1 to T5) that are assigned to three different clouds (C1 to C3) and their ETC
matrix is shown in Table 2.
In the mapping process, cloud C1 finds a suitable task T3 and it is assigned to that cloud. Now the
ready time of cloud C1 is 10 and makespan is also 10. Similarly, cloud C2 and cloud C3 assign the task
T4 and task T1 respectively. Note that cloud C2 selects task T4 as because task T3 is already assigned to
cloud C1. The makespan is updated to 80. This completes the process of mapping at the first iteration.
In the balancing process, the remaining tasks T2 and T5 are assigned to cloud C1 as the sum
of the execution time of these tasks and ready time of cloud C1 is less than or equal to the overall
makespan. The scheduling sequence (including completion time of each task) and Gantt chart for
COTS are shown in Table 3 and Table 4 in which ‘*’ denotes the idle time, CT denotes completion
time and RT denotes ready time.
For the sake of easy comparison, we also generate the scheduling sequence and Gantt chart for
CMMS in Table 5 and Table 6 respectively.

Table 2. An ETC matrix

C1 C2 C3
T1 20 80 80
T2 30 100 120
T3 10 50 110
T4 50 60 130
T5 40 70 140

Table 3. Scheduling sequence for COTS

C1 C2 C3 CT
T3 10 10
T4 60 60
T1 80 80
T2 30 40
T5 40 80
RT 80 60 80

8
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Table 4. Gantt chart for COTS

C1 C2 C3
0 ~ 10 T3
10 ~ 40 T2 T4
T1
40 ~ 60
T5
60 ~ 80 *

Table 5. Scheduling sequence for CMMS

C1 C2 C3 CT
T3 10 10
T1 20 30
T2 30 60
T4 60 60
T5 40 100
RT 100 60

Table 6. Gantt chart for CMMS

C1 C2 C3
0 ~ 10 T3
10 ~ 30 T1 T4
*
30 ~ 60 T2
60 ~ 100 T5 *

The comparison table of COTS and CMMS is shown in Table 7 which clearly shows the
effectiveness of the proposed COTS algorithm in terms of customer satisfaction (Sc), surplus customer
expectation (Ec) and makespan.

5. PERFORMANCE METRICS

We evaluate the performance of the existing and proposed algorithms using two performance metrics,
namely customer satisfaction and surplus customer expectation. They are briefly discussed as follows.

5.1. Customer Satisfaction


The customer satisfaction (Sc) is the total number of tasks that are completed at the earliest time. Let
us assume that there are two algorithms, namely A and B to process five tasks (i.e., T1, T2, T3, T4 and
T5). The finish times of these tasks in algorithm A are 30, 60, 10, 60 and 100 and algorithm B are
180, 40, 10, 60 and 80. Therefore, algorithm A has Sc = 1 as it takes earliest time for task T1 only. On
the contrary, algorithm B has Sc = 2 as it takes earliest time for task T2 and task T5. Mathematically,
the customer satisfaction of algorithm A (denoted as Sc(A)) is defined as follows:

9
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Table 7. Illustrative comparison of COTS and CMMS

Algorithm Sc Ec Makespan
COTS 2 2 80
CMMS 1 2 100

n
Sc (A) = ∑ Fi (3)
i =1

where Fi is a Boolean variable which is defined as follows:

1 if FT (T ) > FT (T )
Fi =  A i B i

0 Otherwise


Here, FTA(Ti) and FTB(Ti) show the finish time of task Ti in algorithm A and algorithm B
respectively.

5.2 Surplus Customer Expectation


The surplus customer expectation (Ec) is the total number of tasks that are completed at the same time.
In the above example (Refer Section 5.1), algorithm A and algorithm B has Ec = 2 as both algorithms
take earliest time for task T3 and task T4 respectively. The surplus customer expectation of algorithm
A (denoted as Ec(A)) is mathematically expressed as follows:

n
Ec (A) = ∑ Fi (4)
i =1

where Fi is a Boolean variable which is defined as follows:

1 if FT (T ) = FT (T )
Fi =  A i B i

0 Otherwise

Remark 5.1: It is noteworthy to mention that the surplus customer expectation is same for both the
algorithms. In other words, Ec(A) = Ec(B).
Remark 5.2: Sc(A) + Sc(B) + Ec(A) = n where n is the total number of tasks.

6. SIMULATION RESULTS

We test the proposed algorithm by creating a virtualized environment in MATLAB R2014a version
8.3.0.532. The simulation was conducted on an Intel(R) Core(TM) i5-4210U CPU @ 1.70 GHz 1.70 GHz
processor, 64-bit operating system and 8 GB RAM running on Windows 7 platform. This environment
runs the proposed algorithm by generating various synthetic datasets. The parameters of the datasets are
shown in Table 8. Now, we generate the datasets using Monte Carlo simulation method (Jeon, & Hong,
2015; Cunha, Nasser, Sampaio, Lopes, & Breitman, 2014) which is explained as follows.

10
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Table 8. Parameters and their values

Parameter Value
Number of tasks 256, 512, 1024, 2048, 4096
Number of clouds 10, 20, 30, 40, 50
Structure of the datasets number-of-tasks × number-of-clouds
[500 ~ 1000] for first 30% clouds
Range of datasets
[5000 ~ 10000] for remaining clouds
Instances (ix) i1, i2, i3, i4, i5

6.1. Define a Domain of Inputs


We take two inputs, namely number of tasks and number of clouds. We select the number of tasks as
256, 512, 1024, 2048 and 4096 and the number of clouds as 10, 20, 30, 40 and 50 respectively. Note
that we represent the task-cloud pair (or ETC matrix) in the form of n × m where n is the number of
tasks and m is the number of clouds respectively.

6.2. Generate Inputs


We generate the datasets using a random function called randi() which is based on discrete uniform
distribution. The pre-defined structure of this random function is randi([imin, imax], [n, m]) where
imin and imax are minimum and maximum value of the datasets respectively. This function returns
a two-dimensional array of size n × m in the interval of [imin ~ imax] respectively. It is noteworthy
to mention that we select the range [500 ~ 1000] for first 30% tasks and [5000 ~ 10000] for the rest
tasks respectively. In order to get an acceptable result, we generate five different instances in each
dataset and we represent it by ix where ix ∈ {i1, i2, i3, i4, i5}.

6.3. Perform Computations


We run the instances of each dataset and measure the performance of CMMS and proposed COTS
algorithms.

6.4. Generate Results


We generate the simulation results of existing and proposed algorithms using two performance metrics,
namely customer satisfaction and surplus customer expectation.
The customer satisfaction of the proposed algorithm COTS is calculated for twenty-five instances
of five different datasets and compared with CMMS algorithm as shown in Table 9. For the sake of
better prominence, the graphical comparison of customer satisfaction is shown in Figure 1. The results
clearly indicate that the proposed algorithm outperforms than CMMS for all the twenty-five instances.
The rationality behind this is that the proposed algorithm maps m tasks to m clouds at each iteration.
The mapping for each cloud is based on the minimum completion time of the tasks. Moreover, the
proposed algorithm assigns the unassigned tasks to the idle slots in order to accommodate other
customer tasks which really improves the customer satisfaction.
The surplus customer expectation of the proposed COTS and CMMS task scheduling algorithm
are shown in Table 10 and Figure 2. Note that the surplus customer expectation is same for proposed
COTS and CMMS algorithm.
We validate the results by conducting two statistical tests, called T-test (Ott, & Longnecker,
2010; Carlberg, 2014) and ANOVA (Muller, & Fetterman, 2002; Basso, Salmaso, Pesarin, & Solari,
2009). In T-test, we check that the means of two different populations are equal or not. Thus we
take initial assumption as both populations are equal and it is called as null hypothesis. On the other

11
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Table 9. Comparison of customer satisfaction for proposed COTS and CMMS algorithm in synthetic datasets

Dataset Instance i1 i2 i3 i4 i5
COTS 1.270e+02 1.160e+02 1.010e+02 1.040e+02 1.230e+02
256 × 10
CMMS 6.800e+01 7.900e+01 6.400e+01 7.000e+01 8.100e+01
COTS 2.200e+02 1.990e+02 2.260e+02 2.560e+02 2.330e+02
512 × 20
CMMS 1.820e+02 1.360e+02 1.210e+02 1.580e+02 1.840e+02
COTS 4.270e+02 4.320e+02 4.720e+02 4.710e+02 4.450e+02
1024 × 30
CMMS 3.390e+02 2.580e+02 2.840e+02 3.550e+02 2.730e+02
COTS 9.690e+02 9.490e+02 9.530e+02 9.020e+02 1.017e+03
2048 × 40
CMMS 4.610e+02 4.600e+02 6.010e+02 5.140e+02 5.230e+02
COTS 1.982e+03 1.961e+03 2.065e+03 2.096e+03 1.898e+03
4096 × 50
CMMS 1.410e+03 1.203e+03 1.256e+03 1.187e+03 1.332e+03

Figure 1. Graphical Comparison of customer satisfaction for proposed COTS and CMMS using (a) 256 × 10 (b) 512 × 20 (c) 1024
× 30 (d) 2048 × 40 and (e) 4096 × 50 dataset

12
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Table 10. Surplus customer expectation for proposed COTS and CMMS algorithm in synthetic datasets

Dataset i1 i2 i3 i4 i5
256 × 10 6.100e+01 6.100e+01 9.100e+01 8.200e+01 5.200e+01
512 × 20 1.100e+02 1.770e+02 1.650e+02 9.800e+01 9.500e+01
1024 × 30 2.580e+02 3.340e+02 2.680e+02 1.980e+02 3.060e+02
2048 × 40 6.180e+02 6.390e+02 4.940e+02 6.320e+02 5.080e+02
4096 × 50 7.040e+02 9.320e+02 7.750e+02 8.130e+02 8.660e+02

Figure 2. Surplus customer expectation for both COTS and CMMS using (a) 256 × 10 (b) 512 × 20 (c) 1024 × 30 (d) 2048 × 40 and
(e) 4096 × 50 dataset

hand, we take alternate hypothesis as both means are not equal. Then we conduct the T-test of 256
× 10, 512 × 20, 1024 × 30, 2048 × 40 and 4096 × 50 datasets separately by assuming hypothesized
mean difference is zero and alpha is 0.05. The test results are shown in Table 11 where df stands for
degree of freedom which is calculated by finding the difference between the number of samples and
number of tests. Here, we reject the null hypothesis in each dataset as t statistic value is greater than
t critical one-tail and it is also greater than t critical two-tail. As a consequence, we claim that the
population means are not equal. The main evidence for this is that both p-values, namely p one-tail
and p two-tail are very small (i.e., less than 0.05).

13
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Table 11. T-test results of proposed COTS and CMMS algorithm in synthetic datasets

p one- t critical p two- t critical


Dataset Algorithm Population Mean Variance t statistic df
tail one-tail tail two-tail

COTS 5 7.24e+01 5.33e+01


256 × 10 6.8905 8 0.0001 1.8946 0.0002 2.3646
CMMS 5 1.14e+02 1.31e+02

COTS 5 1.56e+02 7.72e+02


512 × 20 4.5574 8 0.0013 1.8946 0.0026 2.3646
CMMS 5 2.27e+02 4.28e+02

1024 × COTS 5 3.02e+02 1.82e+03


6.9272 8 0.0002 1.9432 0.0004 2.4469
30 CMMS 5 4.49e+02 4.50e+02

2048 × COTS 5 5.12e+02 3.34e+03


14.0419 8 0.0000 1.8946 0.0000 2.3646
40 CMMS 5 9.58e+02 1.71e+03

4096 × COTS 5 1.28e+03 8.68e+03


13.1507 8 0.0000 1.8595 0.0000 2.3060
50 CMMS 5 2.00e+03 6.42e+03

In analysis of variance (ANOVA), we take the same initial assumption as T-test. The ANOVA
results of synthetic datasets are shown in Table 12. Here, we also reject the null hypothesis as f value
is greater than f critical value. This is due to the fact that p-value is less than 0.05. Therefore, the two
population means are different.
From the above simulation results and statistical analysis, it is clear that the proposed COTS
algorithm performs better in compare to CMMS algorithm.

7. CONCLUSION

We have presented a novel task scheduling algorithm called COTS for heterogeneous multi-cloud
environment. The algorithm has been shown to require O(kn2m) time for mapping of n tasks and m

Table 12. ANOVA results of proposed COTS and CMMS algorithm in synthetic datasets

Dataset Source of Variation Sum of Square df Mean Square F-value P-value F critical

Between Groups 4.3680e+03 1 4.3681e+03

256 × 10 Within Groups 7.3600e+02 8 9.2000e+01 47.4793 0.0001 5.3176

Total 5.1041e+03 9

Between Groups 1.2461e+04 1 1.2461e+04

512 × 20 Within Groups 4.7996e+03 8 5.9995e+02 20.7699 0.0018 5.3176

Total 1.7260e+04 9

Between Groups 5.4464e+04 1 5.4464e+04

1024 × 30 Within Groups 9.0800e+03 8 1.1350e+03 47.9862 0.0001 5.3176

Total 6.3544e+04 9

Between Groups 4.9774e+05 1 4.9774e+05

2048 × 40 Within Groups 2.0195e+04 8 2.5243e+03 197.1740 0.0000 5.3176

Total 5.1793e+05 9

Between Groups 1.3061e+06 1 1.3061e+06

4096 × 50 Within Groups 6.0418e+04 8 7.5523e+03 172.9406 0.0000 5.3176

Total 1.3665e+06 9

14
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

clouds in k iterations. COTS aims to maximize the customer satisfaction by considering the individual
makespan of the customer and maximize the surplus customer expectation. It undergoes two phases,
namely mapping and balancing to perform the task scheduling. We have shown the results in twenty-
five instances of five different datasets and compared the results with a popular algorithm CMMS
in terms of customer satisfaction and surplus customer expectation metrics. We have also validated
the results by conducting two different tests, namely T-test and ANOVA. The results and validations
show the adoptability of the proposed algorithm. However, the proposed algorithm has not considered
the execution cost of the clouds, energy consumption and deadline which will be considered as our
future work.

15
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

REFERENCES

Armstrong, R., Hensgen, D., & Kidd, T. (1998). The relative performance of various mapping algorithms is
independent of sizable variances in run-time predictions. Proceedings of the 7th IEEE Heterogeneous Computing
Workshop (pp. 79-87). doi:10.1109/HCW.1998.666547
Basso, D., Salmaso, L., Pesarin, F., & Solari, A. (2009). Permutation Tests for Stochastic Ordering and ANOVA.
Springer.
Braun., https://code.google.com/p/hcsp-chc/source/ browse/trunk/AE/ProblemInstances/HCSP/Braun_et_al/
u_c_hihi.0?r=93. (2014). Benchmark dataset.
Braun, T. D., Siegel, H. J., Beck, N., Boloni, L. L., Maheswaran, M., Reuther, A. I., & Freund, R. F. et al. (2001).
A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed
computing systems. Journal of Parallel and Distributed Computing, 61(6), 810–837. doi:10.1006/jpdc.2000.1714
Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging IT
platforms: Vision, hype and reality for delivering computing as the 5th utility. Future Generation Computer
Systems, 25(6), 599–616. doi:10.1016/j.future.2008.12.001
Carlberg, C. (2014). Statistical Analysis: Microsoft Excel 2013. Pearson Education.
Cunha, J. A. Jr, Nasser, R., Sampaio, R., Lopes, H., & Breitman, K. (2014). Uncertainty quantification through
the monte carlo method in a cloud computing setting. Computer Physics Communications, 185(5), 1355–1363.
doi:10.1016/j.cpc.2014.01.006
Durao, F., Carvalho, J. F. S., Fonseka, A., & Garcia, V. C. (2014). A systematic review on cloud computing.
The Journal of Supercomputing, 68(3), 1321–1346. doi:10.1007/s11227-014-1089-x
Ergu, D., Kou, G., Peng, Y., Shi, Y., & Shi, Y. (2013). The analytic hierarchy process: Task scheduling and
resource allocation in cloud computing environment. Journal of Supercomputing, 64(3), 835–848. doi:10.1007/
s11227-011-0625-1
Freund, R. F., Gherrity, M., Ambrosius, S., Campbell, M., Halderman, M., Hensgen, D., . . . Siegel, H. J. (1998).
Scheduling resources in multi-user, heterogeneous, computing environments with smartnet. Proceedings of the
7th IEEE Heterogeneous Computing Workshop (pp. 184-199). doi:10.1109/HCW.1998.666558
Ibarra, O. H., & Kim, C. E. (1977). Heuristic algorithms for scheduling independent tasks on nonidentical
processors. Journal of the Association for Computing Machinery, 24(2), 280–289. doi:10.1145/322003.322011
Jeon, S., & Hong, B. (in press). Monte carlo simulation-based traffic speed forecasting using historical big data.
Future Generation Computer Systems.
Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., & Gu, Z. (2012). Online optimization for scheduling preemptable tasks
on IaaS cloud system. Journal of Parallel Distributed Computing, 72(5), 666–677. doi:10.1016/j.jpdc.2012.02.002
Maheswaran, M., Ali, S., Siegel, H. J., Hensgen, D., & Freund, R. F. (1999). Dynamic mapping of a class of
independent tasks onto heterogeneous computing systems. Journal of Parallel and Distributed Computing,
59(2), 107–131. doi:10.1006/jpdc.1999.1581
Miriam, D. D. H., & Easwarakumar, K. S. (2010). A double min min algorithm for task metascheduler on
hypercubic P2P grid systems. International Journal of Computer Science Issues, 7(5), 8–18.
Mokotoff, E. (1999). Scheduling to minimize the makespan on identical parallel machines: An LP-based
algorithm. Investigacion Operativa, 8, 97–107.
Muller, K. E., & Fetterman, B. A. (2002). Regression and ANOVA: An Integrated Approach Using SAS Software.
SAS Publisher.
Ott, R. L., & Longnecker, M. (2010). An Introduction to Statistical Methods and Data Analysis (6th ed.).
Duxbury Press.
Panda, S. K., & Jana, P. K. (2015). Efficient task scheduling algorithms for heterogeneous multi-cloud
environment. The Journal of Supercomputing, 71(4), 1505–1533. doi:10.1007/s11227-014-1376-6

16
International Journal of Cloud Applications and Computing
Volume 6 • Issue 4 • October-December 2016

Panda, S. K., & Jana, P. K. (2015). A multi-objective task scheduling algorithm for heterogeneous multi-cloud
environment. Proceedings of theInternational Conference on Electronic Design, Computer Networks and
Automated Verification (pp. 82-87). IEEE. doi:10.1109/EDCAV.2015.7060544
Panda, S. K., & Jana, P. K. (2016). Uncertainty based QoS min-min algorithm for heterogeneous multi-cloud
environment. The Arabian Journal for Science and Engineering.
Panda, S. K., & Jana, P. K. (2016). An efficient task consolidation algorithm for cloud computing systems.
Proceedings of the 12th International Conference on Distributed Computing and Internet Technology, LNCS
(Vol. 9581, pp. 61-74). Springer. doi:10.1007/978-3-319-28034-9_8
Panda, S.L., & Jana, P.K. (2015). An efficient resource allocation algorithm for IaaS cloud. Proceedings of
the 11th International Conference on Distributed Computing and Internet Technology, LNCS (Vol. 8956, pp.
351-355). Springer.
Tsai, J., Fang, J., & Chou, J. (2013). Optimized task scheduling and resource allocation on cloud computing
environment using improved differential evolution algorithm. Computers & Operations Research, 40(12),
3045–3055.
Xhafa, F., Barolli, L., & Durresi, A. (2007). Batch mode scheduling in grid systems. International Journal Web
and Grid Services, 3(1), 19–37. doi:10.1504/IJWGS.2007.012635
Xhafa, F., Carretero, J., Barolli, L., & Durresi, A. (2007). Immediate mode scheduling in grid systems.
International Journal Web and Grid Services, 3(2), 219–236. doi:10.1504/IJWGS.2007.014075

Sohan Kumar Pande is currently pursuing an MTech degree at SUIIT, Burla, Odisha, India. He received his
BTech degree from C V Raman College of Engineering, Bhubaneswar, Odisha, India. He is the reviewer of many
international journals and conferences. His research interests include Load Balancing, Cloud Task Scheduling
and Distributed System.

Sanjaya Kumar Panda is working as an Assistant Professor in the Department of CSE and IT at VSSUT, Burla.
He received his MTech degree from NIT, Rourkela, India and BTech degree from VSSUT, Burla, India in CSE.
He is pursuing a PhD degree at ISM, Dhanbad, India. He received two silver medal awards for best graduate
and best post-graduate in CSE. He also received Young Scientist Award, CSI Best Paper Award at International
Conference and CSI Distinguished Speaker Award. He has published more than 35 papers in reputed journals
and conferences. He is the editorial board member of American Journal of Computer Science and Information
Engineering, USA, International Journal of Sensors and Sensor Networks, USA, International Journal of Wireless
Communications and Mobile Computing, USA and American Association for Science and Technology, USA. He is
a member of IEEE, Invited Member of IEEE Communications Society, Invited Member of IEEE Signal Processing
Society, IAENG, ISTE, CSI, IACSIT, UACEE, ACEEE and SDIWC. His current research interests include Ubiquitous
Computing, Cloud Scheduling, Big Data Analytics, Grid Scheduling, Fault Tolerance and Load Balancing. He has
delivered several invited talks, chair sessions in national and international conferences and workshops. He acted
as reviewers in many reputed journals including IEEE transaction on systems, man and cybernetics, mathematical
problems in engineering, Hindawi, applied soft computing, Elsevier etc. and conferences including INDICON,
ICACCI, PDGC, INDIACom, ISSPIT, MobiApps, PECON, WSCAR, ISGT, TCGC, WCI, SPICES and ISBEIA etc.

Satyabrata Das is working as a Reader in the Department of CSE and IT at VSSUT, Burla. He received a PhD
degree in Information and Communication Technology, an MTech degree in computer science and engineering
(CSE) and a BE degree in CSE. He has published more than 20 papers in reputed journals and conferences. He is
the reviewer of many international journals and conferences. He has delivered several invited talks, chair sessions
in national and international conferences and workshops. His current research interests include Cloud Scheduling,
Big Data Analytics, Fault Tolerance, Signal Processing, Image Processing and Load Balancing.

17

You might also like