Professional Documents
Culture Documents
Research Article
Artificial Intelligence-Based Framework for Scheduling
Distributed Systems Using a Combination of Neural Networks and
Genetic Algorithms
Received 21 October 2021; Revised 19 January 2022; Accepted 21 January 2022; Published 8 February 2022
Copyright © 2022 Abdolreza Pirhoseinlo et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
High-performance distributed computation has emerged as a new development in which demand for access to resources over the
Internet is being delivered to distributed servers that are dynamically scalable. One of the important research issues that must be
considered in order to achieve efficient performance is timing. The goal is to map tasks to the right resources and optimize one or
more subgoals so that users get the most out of their resources as quickly as possible. Therefore, in this study, in order to find a
suitable solution to the scheduling problem and reduce the total time of task completion, a hybrid approach is proposed by neural
networks and genetic algorithms. In this hybrid algorithm, the predictive power of artificial neural networks is used to improve
scheduling problem task in order to predict the time of tasks completion on resources and the meta-heuristic of genetic algorithm
is used in order to find optimal resources for tasks. Experimental results show that the performance of the proposed method is
improved in Makespan compared to other methods available in the journals and can adhere to the principles of service quality.
The purpose of scheduling different tasks in resources is completion time and overall deadline of each task [9]. Aditi
to increase execution speed, reduce execution time, and et al. have proposed a genetic algorithm-based scheduling
minimize communication delays, communication costs, and program with three conflicting goals. This algorithm pro-
priority problems. In distributed scheduling, the whole task vides an efficient representation of the chromosomes, which
is subdivided into subtasks and assigned to multiple sources, is the complete solution to the problem, and the validity of
so they perform faster than single sources [7, 8]. the chromosomes is reliable even after fitting and mutation
Therefore, in this paper, a multilevel approach is used in [10]. Izadkhah has developed a genetic learning algorithm
order to optimally schedule tasks in distributed systems by a for static scheduling in homogeneous distributed systems by
combination of predictive neural networks and meta-heu- two learning criteria, namely steepest ascent and next ascent
ristic and evolutionary features of the genetic algorithm. In learning, which use penalty and reward concepts in learning
this new approach, by neural network prediction, we first [11]. Akbari et al. have proposed a genetic-based algorithm
estimate the time taken to complete the existing work on as a meta-heuristic to deal with static task scheduling of
computational resources. On the other hand, by considering processors in heterogeneous computing systems. This al-
computational resources in addition to the duration of tasks gorithm is introduced through significant changes in its
on resources, the cost of executing tasks is also significant and genetic functions, and the introduction of new operators
is a concern of the user that addressed the issue. In this article, guarantees population diversity and continuous coverage of
given the cost of executing the tasks, we used the clustering the whole space and improves the performance of the ge-
method in the second level. After this step, in the third level, to netic algorithm [12]. To solve the problem of resource al-
find efficient resources to execute the selected tasks, we use the location in heterogeneous distributed systems, Pan et al.
meta-heuristic of the genetic algorithm to improve population have proposed a distributed resource scheduling based on a
production and combine them and improve the total exe- hybrid genetic algorithm [13].
cution time of the tasks. At this level, the produced chro- About the issue of resource scheduling in a distributed
mosomes are the same line of each source and in each cluster, cloud environment, Casas et al. have proposed a GA-ETI
which includes the previous tasks in the line of each new scheduling plan to address all the challenges in cloud
assigned resource and task, and the cutoff point is to jump to scheduling by providing some efficient solutions for sci-
the same point of entry of the new task into the queue of a entific implementation in the cloud resources [14]. Dasgupta
generator. The chromosomes corresponding to the sources in et al. proposed a novel load balancing method by the genetic
a cluster are compared, and the best combination is selected algorithm. This algorithm is set to balance the load on the
for the chromosomes. The new tasks are then joined to the cloud infrastructure while trying to minimize the length of
queue of the most appropriate combination between the the given tasks [15]. Kashanchi et al. have used the benefits of
resources in the cluster. In this way, the proposed method can evolutionary genetic algorithms and heuristic approaches to
produce an optimal solution for mapping tasks in resources. analyze and model behavioral scheduling. In this method,
The contribution of this paper is summarized as follows: the expected specifications for scheduling are extracted in
the form of linear time logic (LTL) formulas, which are used
(i) Predicting the completion time of tasks in resources
to achieve optimal scheduling of the LTS-labeled transfer
according to the characteristics of resources and
system [16]. Amjad Mahmoud et al. have presented a greedy
tasks using neural networks
genetic algorithm with an appropriate selection of fit and
(ii) Ranking the tasks based on less completion time and jump operators (called AGAs) for the assignment and timing
their execution deadline of real-time tasks that have precedence constraints on
(iii) Determination of primary chromosomes of genetic heterogeneous virtual machines [17].
algorithm based on the order of execution of tasks In [18], a graph-based extreme learning machine method
(iv) Using the multicriteria fit function in accordance (G-ELM) is proposed for imbalanced epileptic EEG signal
with service quality goals in the genetic algorithm recognition. In [19], a method aimed to improve the early
clinical diagnosis rate of atrophic gastritis (AG) and reduce
(v) Mapping tasks to resources based on finding the
the risk of disease deterioration or cancerization has been
optimal solution in a genetic algorithm
proposed. In [20], a group of dummy query sequences, to
The structure of the article is as follows: in Section 2, we cover up the query locations and query attributes of mobile
will discuss the tasks related to the timing of distributed users and thus protect users’ privacy in LBS, has been
systems. Section 3 will detail the proposed method. In constructed. In [21], a location privacy-preserving system
Section 4, the results of the experiments will be stated. Fi- for LBS by constructing “cover-up ranges” to protect the
nally, in Section 5, we will discuss and conclude the article. query ranges associated with a location query sequence has
been proposed. In [22], a dummy-based approach for text
2. Related Work retrieval privacy protection has been proposed.
of neural network prediction that cluster meta-heuristic and measurement and simplicity of calculation, the character-
evolutionary feature of genetic algorithm. In this new ap- istics of the resources in the computing environment to
proach, by the predictive neural network, we first estimate predict the execution time in tasks. These parameters are
the time spent on the available resources. On the other hand, considered as inputs of the artificial neural network, and by
in distributed systems, in addition to the time-consuming determining the weight in the middle layer of the neural
tasks on resources, the cost of executing tasks is also sig- network, the output will be the predicted completion time
nificant and is a concern of the user, and this study addresses for the computing environment resources of the distributed
this case and considers the cost of executing the tasks, and in systems. Quadratic properties of distributed computing
the second level, we have used a cost-based resource clus- environment sources derived from standard CPU-perfor-
tering approach. After this step, in the third level, to find mance-dataset in the standard UCI data repository.
efficient resources to execute the selected tasks, we use the It should be noted, however, that with respect to the
meta-heuristic and evolutionary feature of the genetic al- properties of computational resources, artificial neural
gorithm to improve population production and their networks can somewhat predict the execution time of tasks.
combination, and then, we improve the total execution time This prediction value from the neural network output as a
of the tasks. At this level, the produced chromosomes are in parameter of similarity, together with the cost required for
the same queues of each source as previous tasks, and the each source, contributes to cluster the computing envi-
cutoff point jumps to the same entry point of the new task on ronment resources of the distributed systems. The resources
the one-source queue. The corresponding chromosomes in within the clusters will have approximately similar features,
the sources of a cluster are compared pairwise, and the best and the user will be aware of the completion time (waiting
combination is selected for the chromosomes. In this way, time for their task) and the cost of each resource. The new
the new tasks are joined to the queue of the most appropriate task that comes up is assigned to a resource with the least
combination between the two sources. In this way, the expected completion time. The genetic algorithm is then
proposed method can produce an optimal solution for implemented to optimize resource allocation tasks and re-
mapping tasks on resources. duce work completion time, and a resource queue with the
Based on the investigations among the types of dis- least job completion time is anticipated for the newly
tributed scheduling algorithms, the genetic algorithm has assigned task and a resource queue that assigned for the
been exploited as the main algorithm for discussing the completion time of the predicted task received the second
scheduling tasks of distributed systems in terms of speed of rank as the primary population. The genetic algorithm, by
execution and its ability to search for possible solutions in jumping between these two queues, determines the most
distributed scheduling tasks. Genetic algorithm is one of the appropriate place where a new task can be added. The ar-
most common evolutionary optimization algorithms that chitecture of the proposed method is shown in Figure 1.
search the answer to find optimal solutions to an optimi-
zation problem with incompatible and conflicting goals. But
3.2. Proposed Scheduling Algorithm. The proposed method
as mentioned earlier, the problem of the above algorithm is
combines the strengths of the two genetic algorithms and the
that the initial solution is a random mapping of tasks on
neural network in order to achieve a more optimal algo-
resources. Therefore, the use of prediction tools for some of
rithm. These methods are combined in such a way that they
the required parameters will, to some extent, lead us to better
cover each other’s weaknesses so that they result in an in-
performance. It also deliberately performs scheduling of
tegrated and optimized algorithm. Scheduling operations
resource clusters tailored to the task demanded by the user,
start by the neural network capability to predict the com-
scheduling and distributing loads between the resources of
pletion time of the tasks in terms of the four mentioned
the distributed systems.
parameters in the previous sections. The structural features
of the used neural network are as follows: (a) formation of 12
hidden layers, (b) application of the back-propagation al-
3.1. Prerequisites. The first action to be taken by the
gorithm, and (c) and application of evaluation function. This
scheduler in scheduling the tasks of the distributed systems
neural network is used in CPU-performance-dataset stan-
is to identify the appropriate set of resources that perform
dard data of the standard UCI data repository, using
the tasks. The resources that can be used to execute tasks as
MATLAB simulator based on learned resource attribute
well as gain capabilities are identified based on some of the
values and its suitable outputs in the task scheduling process
features gained through distributed systems intelligence.
to optimize the task mapping process.
Therefore, in order to improve the situation and to reduce
the impact of the above problem in the process of allocating
tasks on resources to execute, we have used neural network 3.2.1. Prediction Completion Time Using Neural Networks.
capability in predicting the execution time of tasks and Due to the fact that in the proposed method, various tasks
aiming to find a suitable solution based on the prediction of are sent, and there are also heterogeneous resources in
runtime as the key to achieve optimal scheduling in dis- distributed systems, so each task and resource will have its
tributed systems. In order to realize the performance of the own characteristics. One of the features of workflow tasks is
neural network are considered these parameters as reasons the number of instructions in each task. In order to perform
of impact on runtime, being independent of each other in a task, each task must execute the instructions in the task text
measurement and control, simplicity of estimation or in order to produce output results based on these
4 Mobile Information Systems
T1 T2 Tn
T1
T2
Tm
instructions. The unit of measurement is the number of This method has been used to predict the completion
instructions based on MI, which indicates the number of time of tasks in resources using neural networks. The input
instructions on a million scale. In contrast, for processing of neural networks includes the characteristics of the
instructions in virtual machines, the power of virtual ma- number of task instructions, the task execution deadline, and
chines is introduced based on MIPS, which is on the scale of the minimum and maximum time required to execute in-
millions of instructions per unit time. structions in the resources. Task completion times in pri-
The second feature that is considered in the proposed oritization sources in order to find eligible tasks for early
method for tasks is the deadline for performing tasks. In the execution are predicted. Hence, the existing tasks are trained
proposed method, a deadline has been set for each of the based on the features mentioned in the neural networks.
tasks, according to which the tasks must be completed by the Hence, the output of neural networks is used as a predictor
time they arrive. The deadline for performing tasks starts for the completion of tasks in resources. In the proposed
from the moment of entering each task. Depending on the neural network, in order to evaluate the tasks, the proposed
length of each task, the deadline for performing tasks is fitness function in the following relation has been used.
naturally different. Therefore, tasks that have a shorter n m n m
deadline should be performed sooner so as not to disrupt the min f � deadline(i, j) + Tlen(i, j),
workflow process. i�1 j�1 i�1 j�1
In the proposed method, resource properties are pre- s.t.,
pared from the standard data set and include the minimum n
(1)
cost required to execute an instruction in the resource, the εj ∗ Tlen (i, j) ≤ deadline(i, j),
maximum cost required to execute an instruction in the i�1
resource, the minimum time required to execute an in- i, j > 0,
struction in the resource, the maximum time required to
execute an instruction in the resource, the maximum length where n represents the total number of tasks, m represents
of the resource queue to store the instruction, the minimum the total number of resources, i represents the task index, j
time required to transfer instruction to the resource, the represents the source, deadline, (i, j) represents the deadline
minimum cost required to transmit an instruction to the for performing the task in resource allocation, Tlen (i, j)
resource, and the energy constant to execute an instruction indicates the number of task instructions i that need to be
in the resource. processed in virtual machine j, and εj represents the average
Mobile Information Systems 5
ample of the chromosomes presented in the proposed wri (i, j) ≤ deadline(i, j),
j�1
method. In this type of encoding, tasks are assigned to virtual
machines. i, j > 0,
In the proposed method, this type of coding is used in
(2)
which the value of each gene represents the virtual task
index. In the proposed method, since the tasks are ranked in where wti is the time required to complete task i in resource j,
the first step, but the distribution of values within the genes wci is the cost required to complete task i in resource j, wri is
is random, the processing power of the virtual machine is the estimated time based on the processing power of source j
also considered as one of the parameters of the fit function of to perform task i, and α is the cost threshold for performing
the genetic algorithm. Therefore, in chromosomes where tasks. In this function, the weight of the completion time and
high-ranking tasks are assigned to powerful resources to be the performing cost of the task (wt and wc) are taken into
processed faster, the value of the fit function will be more account by the user. Time and cost are not of the same scale,
optimal. and they are not in the same range; to solve this problem, we
Crossover and mutation algorithm in the proposed have to normalize them and mapped them in the range of
method: given that in the proposed method, input tasks are zero to one. It should be noted that the weighting coefficients
prioritized, and naturally, there should be a slight change in are specified by the user in introducing the task. Wt and Wc
the fitting and jump operators so that the priority of give more freedom to user in presenting his work to
6 Mobile Information Systems
10-1
0 100 200 300 400 500 600 700 800 900 1000
1000 Epochs
Train Test
Validation Best
Figure 5: Evaluation function accuracy.
predetermined number of clusters (K cluster). The basic idea result of this loop, we may find that the central points of K in
behind this method is to define the k as the central focal each time shift their position until a step is completed
point, one for each cluster. This focal point should be chosen without any change. In other words, the focal point has no
wisely because different situations give different results. further movement. Table 2 shows resources allocated to
Therefore, it can be concluded that the best choice is to keep clusters based on resource characteristics.
them as far apart as possible. In this study, K is equal to 3, at As shown in Table 2, the processors belong to a cluster
which the focal point of the first cluster has the lowest value, that is less distant than the focal point of that cluster.
the focal point of the second cluster has the midst data, and Processors within identical clusters are very similar. If the
the focal point of the third cluster has the highest value. The CPUs in different clusters are more cost-effective and time-
clusters are divided into three clusters: cheap, medium, and consuming, tasks delivered to the distributed system envi-
economically expensive, based on the amount of time and ronment are driven to one of these clusters that are more in
cost of completing their tasks. The next step is to capture line with their preference, depending on user or application
each point in the dataset and connect it to the nearest central concerns about the cost and deadline of the tasks.
point. When no point is waiting, the first stage is over, and
the initial groups are formed. At this point, it is necessary to
calculate the points at K of the new central point as the 4.3. Performance Evaluation of Genetic Algorithm. As
centerline of clusters delivered from the previous step. After mentioned, the genetic algorithm is used to map the task in
obtaining this new K focal point, a new connection must be the appropriate processor to reduce the total execution time
established between the same set of data points and the of a task. Therefore, it is expected that the total waiting time
nearest new focal point. Hence, a loop is generated, and as a and task processing time before applying the genetic
8 Mobile Information Systems
15 0 100% 5 1 83.3%
1 1
44.1% 0.0% 0.0% 62.5% 12.5% 16.7%
Output Class
Output Class
0 19 100% 0 2 100%
2 2
0.0% 55.9% 0.0% 0.0% 25.0% 0.0%
1 2 1 2
Target Class Target Class
Test Confusion Matrix All Confusion Matrix
5 2 71.4% 25 3 89.3%
1 1
62.5% 25.0% 28.6% 50.0% 6.0% 10.7%
Output Class
Output Class
0 1 100% 0 22 100%
2 2
0.0% 12.5% 0.0% 0.0% 44.0% 0.0%
1 2 1 2
Target Class Target Class
Figure 6: Confusion diagram of the proposed model.
algorithm will be different compared to the same amount optimal solution in Gantt’s proposed genetic algorithm, the
after applying the genetic algorithm and allocating tasks on assignment of tasks to resources is shown in Figure 8.
the most appropriate processor to reduce runtime. Due to As shown in Figure 8, the computational tasks are
the evolution of solutions, the genetic algorithm converges assigned to the appropriate virtual machines based on the
to the optimal solution during the iteration steps. Figure 7 proposed method. Then, in Figure 9, the completion time of
shows the convergence of the genetic algorithm towards the the tasks is compared based on the proposed method and the
optimal solution. neural networks without using a genetic algorithm.
As shown in Figure 7, the genetic algorithm achieves the As shown in Figure 9, the time to complete tasks in each
optimal solution after the iteration steps. Based on the resource in the proposed method is less than the neural
Mobile Information Systems 9
8, 59, 64,
3, 81, 4, 63, 6, 57, 10, 53,
Tasks 1, 68 2, 95 5, 92 7, 117 75, 83, 9, 5, 110
116 147 109, 140 56, 73, 93
127
Source S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
20, 51,
Tasks 11, 96 12, 106 13, 94, 142 14, 97 15, 123 16, 79 17, 98 18, 76, 148 19, 118 54, 80,
119, 143
Source S11 S12 S13 S14 S15 S16 S17 S18 S19 S20
21, 124, 24, 85, 25, 65, 26, 85, 27, 89, 28, 90,
Tasks 22, 84, 111 23, 69, 149 29, 99 30, 70
134 133, 144 107, 135 136 137 146, 150
Source S21 S22 S23 S24 S25 S26 S27 S28 S29 S30
Source S31 S32 S33 S34 S35 S36 S37 S38 S39 S40
41, 86, 42, 55, 74, 46, 91, 47, 105, 50, 62,
Tasks 43, 60, 126 44, 114 45, 78 48, 61, 132 49, 87, 115
130 82,104 131 139 145
Source S41 S42 S43 S44 S45 S46 S47 S48 S49 S50
network without genetic algorithm for finding optimal re- use of genetic algorithm is greater than the sum of the same
sources. Therefore, resource selection is not optimal on the time in the proposed method. The graph of the proposed
basis of resource processing power and the provision of method is linearly increasing with a slope lower than the
server services and estimated time to complete tasks in previous graph, with respect to the tasks in the queue and the
resources, and the combination of time and cost in task allocation of tasks to the most appropriate resource using the
allocation can make this approach more optimal. Figure 10 genetic algorithm proportional function. As shown in Fig-
also shows the total execution time of all tasks under all ure 9, the neural network combinatorial method and genetic
sources in the neural network-based method and the pro- algorithm improve the Makespan criterion better than the
posed method. neural-network-based scheduling method. In other words,
As shown in Figure 10, the total execution time of all the total execution time of the assigned tasks in the proposed
tasks under all sources plus the waiting time to release all method is about 24% less than the neural-network-based
resources in the neural-network-based approach without the scheduling method.
10 Mobile Information Systems
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40 45 50
Sources
Neural network
Proposed method
Figure 9: Comparison of the Makespan criterion for each source in the proposed method before and after applying the genetic algorithm.
1.2
1
Makespan (sec)
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40 45 50
Sources
Neural network
Proposed method
Figure 10: Comparison of the total execution time of all tasks under all sources in the neural network method and the proposed method.
We also compare it with previous methods to evaluate Therefore, Figure 11 shows the comparison of previous
the validity of the proposed method. As shown in Figure 9, methods with the proposed method in this study based on
the accuracy of the proposed method in distributed systems the Makespan criterion.
is optimized by allocating tasks to resources and combining As shown in Figure 11, the performance of the proposed
the parameters in the form of a proportional function. The method was better than other methods available in the
proposed method can now be compared with previous journals in allocating tasks to resources and reducing the
methods based on the same criteria with equal conditions. time required to perform the whole task. Therefore, the
Mobile Information Systems 11
3
Makespan (sec)
2.5
1.5
0.5
ANN
DRBF
RBFNN
ANN-DSO
ANN-BP
DSO
PSO
GA
Proposed Method
proposed method has a lower value for the Makespan cri- References
terion than previous methods and can adhere to the prin-
ciples of service quality. [1] R. Tyagi and S. K. Gupta, “A survey on scheduling algorithms
for parallel and distributed systems,” in Silicon Photonics &
High Performance ComputingSpringer, Berlin, Germany,
5. Conclusion 2018.
[2] E. Ilavarasan and P. Thambidurai, “Genetic algorithm for task
In this study, a scheduling algorithm combining artificial scheduling on distributed heterogeneous computing system,”
neural networks and genetic algorithms that presented as a Engineering Applications, vol. 3, 2015.
method for scheduling tasks in cloud computation aims to [3] M. Sulaiman, Z. Halim, M. Lebbah, M. Waqas, and S. Tu, “An
improve the task allocation conditions by reducing evolutionary computing-based efficient hybrid task schedul-
Makespan without violating service quality constraints. ing approach for heterogeneous computing environment,”
Predicting the time consumption of tasks in resources is the Journal of Grid Computing, vol. 19, no. 1, pp. 1–31, 2021.
key to obtain optimal scheduling in the cloud. Application- [4] S. Singh and I. Chana, “A survey on resource scheduling in
level schedulers can use prediction to achieve better per- cloud computing: issues and challenges,” Journal of Grid
formance. The genetic algorithm also searches for the al- Computing, vol. 14, no. 2, pp. 217–264, 2016.
location of tasks to optimal resources by combining time [5] M. Kumar, S. C. Sharma, A. Goel, and S. P. Singh, “A
and cost. By predicting the execution time of tasks in the comprehensive survey for scheduling techniques in cloud
cloud environment and using fast resources, task sched- computing,” Journal of Network and Computer Applications,
uling can be optimized. The purpose of this research is to vol. 143, pp. 1–33, 2019.
minimize the total execution time of tasks assigned to each [6] Y. Jiang, “A survey of task allocation and load balancing in
resource. In the proposed method, the parameters of distributed systems,” IEEE Transactions on Parallel and
service quality are considered that have a time constraint. Distributed Systems, vol. 27, no. 2, pp. 585–599, 2015.
An implementation result shows that the total execution [7] A. Asghari, M. K. Sohrabi, and F. Yaghmaee, “Task sched-
time of the assigned tasks in the proposed method is about uling, resource provisioning, and load balancing on scientific
24% less than the neural-network-based scheduling workflows using parallel SARSA reinforcement learning
method. The performance of the proposed method is also agents and genetic algorithm,” The Journal of Supercomputing,
better than other methods available in the journals in al- vol. 77, no. 3, pp. 2800–2828, 2021.
[8] J. C. Loftus, A. A. Perez, and A. Sih, “Task syndromes: linking
locating tasks to resources and reducing the time required
personality and task allocation in social animal groups,”
to complete the whole work. Therefore, the proposed
Behavioral Ecology, vol. 32, no. 1, pp. 1–17, 2021.
method has a lower value for the Makespan criterion than [9] D. Konar, K. Sharma, V. Sarogi, and S. Bhattacharyya, “A
previous methods and can adhere to the principles of multi-objective quantum-inspired genetic algorithm (Mo-
service quality. QIGA) for real-time tasks scheduling in multiprocessor en-
vironment,” Procedia Computer Science, vol. 131, pp. 591–599,
Data Availability 2018.
[10] A. Bose, T. Biswas, and P. Kuila, “A novel genetic algorithm
The data used to support the findings of this study are based scheduling for multi-core systems,” in Smart Innova-
simulated in a random scenario. tions in Communication and Computational Sciences,
pp. 45–54, Springer, Berlin, Germany, 2019.
Conflicts of Interest [11] H. Izadkhah, “Learning based genetic algorithm for task graph
scheduling,” Applied Computational Intelligence and Soft
The authors declare that they have no conflicts of interest. Computing, vol. 2019, Article ID 6543957, 15 pages, 2019.
12 Mobile Information Systems