Professional Documents
Culture Documents
Future Generation Computer Systems: Mostafa Ghobaei-Arani Sam Jabbehdari Mohammad Ali Pourmina
Future Generation Computer Systems: Mostafa Ghobaei-Arani Sam Jabbehdari Mohammad Ali Pourmina
highlights
• We designed a framework for autonomic resource provisioning to cloud services.
• We customized an autonomic resource provisioning approach based on the control MAPE loop.
• We enhanced the performance of the planning phase by using the RL-based agent.
• We conducted a series of experiments under real-world workload traces for different metrics.
the end users submit the requests for utilizing the cloud services interactions with the cloud environment. The main contributions
offered by a SaaS provider that owns a cloud application, and to of this research can be summarized as follows:
host its cloud application, it rents resources from an IaaS provider
• We designed a framework for autonomic resource provisioning
such as Amazon EC2 [8].
which is inspired by the cloud layer model and PaaS layer, it is
One of the unique characteristics of cloud computing is
responsible for resource provisioning to cloud services based on
its elasticity, which enables SaaS providers to adapt workload
the control MAPE loop.
changes of their cloud services by provisioning and deprovisioning
• We customized an autonomic resource provisioning approach
resources automatically such that at each point in time, the
for cloud services offered by a SaaS provider, which is covered
available resources match the current demand as closely as
all the phases defined within the control MAPE loop.
possible [9]. The SaaS providers can acquire and release resources
for running their cloud services on demand and only pays for the
• We enhanced the performance of the planning phase of the
control MAPE loop by using the RL-based method as a decision-
resources that are actually used based on a pay-per-use model [10,
maker.
11].
Deciding the right amount of resources for the cloud service
• We conducted a series of experiments to evaluate the
performance of proposed approach under real-world workload
during its execution time is not trivial, and it depends on the
traces for different metrics.
current workload of the cloud service. As users have irregular
access to cloud services offered by a SaaS provider, the cloud The rest of this paper is organized as follows: In Section 2, we
services will experience workload fluctuations. These fluctuating focus on a survey of related work. Section 3 provides the necessary
workloads may lead to undesirable states that are referred background. In Section 4, we formulate the problem and describe
to as over-provisioning or under-provisioning states. The over- the proposed solution. In Section 5, we present an evaluation and
provisioning state can exist when the more resources than discuss the experimental results, and in Section 6, we conclude the
demands of a cloud application are provisioned. This is correct from paper and present future works.
the point of view of service level agreements (SLAs); however, it
incurs an unnecessary cost to the user and SaaS provider. On the 2. Related works
other hand, the under-provisioning state can exist when the fewer
resources than demands of a cloud application are provisioned. The dynamic resource provisioning mechanisms are achieved
This problem causes SLA violations, which lead to lose of revenue by scaling in/out the resources (i.e., removing or adding a
and users [12,13]. Therefore, an effective elasticity mechanism VM) through a set of rules to match as closely as possible
must be able to estimate the needed resources properly to satisfy the available resources with the current workload. Since the
a given SLA based on the current workload of a cloud application.
resource provisioning proposed approach is a combination of
To deal with the mentioned above resource provisioning prob-
the autonomic computing and the reinforcement learning (as a
lems, dynamic resource provisioning is utilized. The dynamic re-
machine learning technique), we will focus on dynamic resource
source provisioning is an effective approach which its fundamental
provisioning techniques into the following two major categories:
idea is to provision the resources based on the workload changes of
(i) resource provisioning based on the autonomic computing
the cloud application. Its objective is to automate the dynamic pro-
techniques [19–27], (ii) resource provisioning based on machine
visioning of resources by minimizing the cost of renting resources
learning techniques [28–37].
from an IaaS provider and meeting the SLA of the cloud applica-
tion. The main objective of the SaaS provider is to maximize its
profit during the execution of its cloud application, and this can be 2.1. Resource provisioning based on autonomic computing techniques
achieved by minimizing the payment of using resources from the
IaaS provider, as well as the penalties cost caused by SLA violations Huebscher et al. [19] presented a survey on autonomic
that have to be paid to users. computing and the IBM’s MAPE-Knowledge (MAPE-K) reference
In this paper, we propose a hybrid resource provisioning ap- model. They introduced the motivation and concepts of autonomic
proach for cloud applications based on a combination of the con- computing, degrees, models, and applications. In [20] proposed a
cept of the autonomic computing and the reinforcement learning framework dynamic cloud provisioning of system topologies for
(RL). To achieve autonomic computing, IBM has proposed a refer- common two-tier application scenarios based on a MAPE loop
ence model for autonomic control loops [14,15], which is called the concept, while our framework applies to cloud services offered
control MAPE (Monitor, Analysis, Plan, Execute) loop. The control by SaaS providers. Pop et al. [21] reviewed advanced topics
MAPE loop is similar to the general agent model proposed by Rus- in resource management for ubiquitous cloud computing, and
sel and Norvig [16], in which an intelligent agent perceives its en- proposed an adaptive approach that maximizes the profit for
vironment using sensors, and uses these perceptions to determine service providers, while it minimizes the total cost to customers.
actions to be executed in the environment. The proposed approach Maurer et al. [22] proposed adaptive resource provisioning
follows the control MAPE loop, which consists of four phases: mon- techniques based on the autonomic control MAPE loop for cloud
itoring (M), analysis (A), planning (P), and execution (E). First, in infrastructural management. These techniques are the case-based
the monitoring phase, a monitoring component gathers the infor- reasoning and a rule-based approach, while our approach employs
mation about the resources and cloud application state, and this in- time series analysis and machine-learning techniques. In [23]
formation is processed to estimate future resource utilization and designed an adaptive framework based on the control MAPE loop
demands at the analysis phase. In the planning phase, a suitable re- for optimizing the configuration of scientific applications in three
source modification action (e.g., scale in or scale out) is determined, layers, i.e., the application layer, execution environment layer, and
and finally, the modification actions determined in the planning the resource layer. In [24], the LoM2HiS framework is presented,
phase are performed in the execution phase. The control MAPE which is used for managing the mappings of the low-level resource
loop is regularly executed and manages the virtual machines (VMs) metrics into high-level SLA parameters. The LoM2HiS framework
that are allocated to each cloud service at specific time intervals. is embedded into the FoSII [25] infrastructure, which facilitates
We apply RL [17,18] as a decision-maker that uses the predicted autonomic SLA management and enforcement. In [26], the authors
results of an analysis phase in order to obtain the optimal action to developed a dynamic resources provisioning and monitoring
remove or add VMs in the planning phase. RL is an adaptive self- (DRPM) system. Moreover, they proposed a multi-agent system
learning system that improves its performance through repeated to manage the cloud provider’s resources, where the customers’
M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210 193
Table 1
Survey of works related to resource provisioning techniques.
Ref. Category Objective metrics Policy Scope Method
layer. The role of this layer is to handle the resource provisioning price, VM type, processing speed, service time, and data transfer
algorithm based on control MAPE loop. The IaaS layer includes speed. In this work, we focus on the interaction between users
data centers at which VMs are hosted, and it offers VMs to SaaS and the SaaS Provider (i.e., the user SLA) and calculating the SLA
providers; Moreover, it is responsible for dispatching VMs to violation. The SaaS provider has a dual role (i.e., buyer/seller). In
run on their physical resources. Also, we consider three entities, other words, the SaaS provider leases resources from IaaS provider
namely users, SaaS provider, and IaaS provider. The users (i.e., end (buyer), and in turn leases cloud services to users (seller). The main
users) submit requests for the use of cloud services that are objective of SaaS providers is to maximize profit. To do this, they
offered by a SaaS provider. The SaaS provider is the owner of a need to minimize the infrastructure cost as well as the penalty cost
cloud application, and to host the cloud application, it utilizes the caused by SLA violations. To achieve this goal, SaaS providers aim
internal resources of its data centers (in-house hosting) or leases to minimize the payment for using resources from IaaS provider,
resources from the third party IaaS provider such as Amazon EC2. and also to guarantee the QoS requirements for users.
The latter case is the subject of this research. Each SaaS provider As shown in Fig. 3, the main components of the resource
offers the different types of cloud services. The IaaS provider offers provisioning framework based on the control MAPE loop are:
virtually unlimited resources in the form of VMs. The cloud services A. Monitor
are loosely coupled, and each of them runs on one VM. It can be The monitor component is responsible for collecting metrics
extended to run more than one cloud services on a VM, but we related to the resources and users. The monitor component con-
will not discuss that scenario in this paper. Furthermore, there sists of two subcomponents, namely the resource monitor and user
are two types of SLA, namely the user SLA and the resource SLA. monitor. The resource monitor component is responsible for col-
The user SLA is a contract between the SaaS provider and users, lecting information about the metrics of the computational, stor-
and includes items such as the deadline, budget, penalty rate, and age, and network resources (e.g., CPU utilization, memory usage,
request length. The resource SLA is a contract between the SaaS and network traffic). The user monitor component is responsible
provider and IaaS provider, and includes items such as the VM for collecting information about the workload submitted by users
196 M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210
Table 2
Notations and definitions.
Notation Definition
U Number of users
User u The uth user
C Total number of requests for all users
Ru The number of requests belongs to uth user
Reqru rth request of the uth user
BDru Budget of rth request belongs to uth user
DLru Deadline of rth request belongs to uth user
AT ru The time at which the rth request belongs to uth user arrives on the system
λru Penalty rate of rth request belongs to uth user
Si ith cloud service offered by SaaS provider
Numi (1t ) Number of VMs allocated to cloud service Si at the 1tth interval
U i ( 1t ) The CPU Utilization of VMs of allocated to cloud service Si at the 1tth interval
Wi (1t ) Number of requests for cloud service Si at the 1tth interval
N Number of initiated VMs
I The total number of cloud services offered by the SaaS provider
L The number of VM types (e.g., L = 3, small, medium, large)
VM Pricel Price of VM type l
initVM Pricel The initiation VM price of type l
VM hour n The duration of time for the nth VM is on
TS n The time spent in setting up the VM n
FT ru The finishing time for user request Reqru
TDru The delay time for request Reqru
(e.g., request rate, type of requests, size of requests, processed SaaS provider for the requested cloud services (BDru ), the maximum
requests, pending requests, and dropped requests). These differ- time (deadline) that the user would prefer to wait for the result
ent sets of monitoring information are collected, aggregated, and (DLru ), the time at which the request arrives on the system (AT ru ) and
stored in a knowledge base for use by other components. the penalty rate for the requested cloud service, which depends
B. Analyzer on the request type (λru ) (Reqru = {BDru , DLru , AT ru , λru }). Let I be the
The analyzer component is responsible for processing the total number of cloud services offered by the SaaS provider, and Si
information gathered directly from the monitor component. Data indicates the cloud service with ID i (i.e., S = {S1 , S2 , S3 , . . . , SI }).
obtained by the monitors (average response time, and request Let Numi (1t ) denote the number of VMs needed for executing the
arrival rate, etc.) are examined by this component to determine cloud service Si at the 1tth interval, Ui (1t ) is the CPU utilization
whether adaptive actions are required to guarantee the requested of VMs that are allocated to cloud service Si at the 1tth interval,
QoS level. Moreover, the workload analyzer uses the linear and Wi (1t ) is the number of requests for cloud service Si at the
regression model (LRM) to predict future demands in order to deal 1tth interval. Let N be the number of initiated VMs (VM =
with fluctuating demands. If any change is needed, it should trigger {VM 1 , VM 2 , VM 3 , . . . , VM N }) offered by IaaS provider to a SaaS
the planner component. provider for executing cloud services. Every IaaS provider offers L
types of VM (e.g., L = 3, small, medium, large Amazon EC2), where
C. Planner each VM type l has VM Price1 a price. Let Total Cost be the total cost
The planner component is the core component of the frame- incurred by the SaaS provider to serve all requested cloud services,
work. It determines when and how many VMs should be allocated and as described in Eq. (2), it includes the VM Costs and the Penalty
to different cloud services in order to find a satisfactory trade-off Costs:
between the guaranteed SLA and optimized costs. In this paper, we
use the RL-based method as a decision-maker that uses the pre- Total Cost = VM Costs + Penalty Costs. (2)
dicted results of an analyzer component to remove or add VMs. The VM Costs is the total cost for all VMs, and is expressed by Eq.
D. Executor (3):
The executor component consists of two subcomponents,
N
i.e., the load balancer and VM manager. The load balancer will
VM Costs = VM Cost n . (3)
receive all of the incoming requests from users, and distribute
n=1
them to a suitable VM according to the load-balancing policies
(e.g., round robin, and random, etc.). Because the VM Manager For each nth VM, the VM Cost n depends on the VM price of type
deals directly with virtual resources in the IaaS provider, it is l (VM Price1 that is costs for the SaaS provider to use a VM for
responsible for actually executing the actions decided (e.g., scale in, the customer request per hour and it measured according to $/h),
scale out) by the planner. Finally, it allocates or de-allocates VMs. the duration of time for which the VM is on (VM hour n ), the start-
up time of the nth VM (TS n ), and the initiation VM price of type
1 (initVM Price1 that is costs for initiating a VM of type l and it
4.2. Problem formulation
measured according to $/h) and it is expressed by Eq. (4):
In this section, we describe equations and notations used in the VM Cost n = (VM Price1 × VM hour n )
proposed approach. As shown in Table 2, let U be the number of + (initVM Price1 × TSn ); ∀ n ∈ N ; l ∈ L. (4)
users (User = {User 1 , User 2 , User 3 , . . . , User U }), and let each user
u have Ru requests (Requ = {Req1u , Req2u , Req3u , . . . , ReqRuu }) for the The Penalty Costs is the total penalty cost for all user requests, and
use of cloud services offered by the SaaS Provider (i.e., the total is expressed by Eq. (5):
U r
number of requests is equal to C = u=1 Ru ). Let Requ denote the Ru
U
rth request belonging to the uth user. Every request Reqru includes
Penalty Costs = Penalty Cost (Reqru ). (5)
items having the maximum cost budget offered by user u to the u =1 r =1
M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210 197
Table 3
Decision table corresponding to proposed MDP.
U (t ) > Uupper-threshold Ulower-threshold < U (t ) < Uupper-threshold U (t ) < Ulower-threshold
Table 4 Table 6
Decision table corresponding to proposed MDP. Decision table corresponding to proposed MDP.
Q (State/Action) Scale-in No-action Scale-out Q updated (State/Action) Scale-in No-action Scale-out
Over-Utilization 0 0 0 Over-Utilization 0.723 0.217 0.060
Normal-Utilization 0 0 0 Normal-Utilization 0.385 0.205 0.410
Under-Utilization 0 0 0 Under-Utilization 0.115 0.131 0.754
Table 5 next time interval is determined based on look up the next state
Decision table corresponding to proposed MDP. into the updated Q -values table (lines 17–20) as shown in Table 6.
R (State/Action) Scale-in No-action Scale-out
Table 7 Table 8
All types of virtual machines. Different settings for cloud services.
VM type Extra large Large Medium Small Type service Edition VM type Max user
scaling approach based on workload prediction and it uses the proactive, i.e., attempt to predict the amount of resources at any
linear regression model for dynamic resource provisioning [37]. given time, to anticipate unwanted events.
The second strategy is referred to as ‘‘Cost-aware (ARMA)’’, which
uses a second-order ARMA method filter [34] to predict the
workload and adjusts resources that are allocated to users ahead 5.2. Performance metrics
of time. The third strategy is referred to as ‘‘DRPM’’ [26], which is
a multi-agent based strategy for dynamic resources provisioning
We applied the following metrics for a comparison of our
and monitoring. The DRPM strategy includes a global utility agent
approach with other strategies:
for the management of all of the system resources and a set of
local utility agents that are assigned to each user. Finally, our Utilization: The CPU utilization of the cloud service Si at the 1tth
approach is referred to as a ‘‘Proposed’’ approach. The reason of interval is defined as the ratio between the allocated Machine
choose of these strategies for comparison this is that the firstly, Instruction Per Second (MIPS) of VMs for requests of cloud service
these follow the control MAPE loop and the secondly, these are Si and the total MIPS that is potentially offered by VMs at the 1tth
M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210 201
interval, and is expressed by Eq. (13): Because of statically provisioning of VMs by the DRPM, cost-
aware (LRM) and the cost-aware (ARMA) approaches compared
Allocated MIPS (1t )
USi (1t ) = . (13) with the proposed approach, there will be more idle resources
Total MIPS (1t ) for most intervals, and therefore, will lead to under-utilization
VMs allocated: This metric is defined as the number of VMs state (i.e., over-provisioning state). Although this will reduce SLA
allocated to a cloud service Si at the 1tth interval. violations however, the will incurs an unnecessary cost to the SaaS
provider under light workloads and the CPU utilization will be
SLA Violation: The percentage SLA violation of the cloud service Si
reduced Also, the DRPM consumes the most resources compared
at the 1tth interval for user request Reqru of cloud service Si at the
to the other approaches, as shown in Table 10.
1tth interval is expressed by Eq. (14):
Fig. 8 shows the percentage of SLA violations for four
C
1 approaches at each interval. As mentioned above, the under-
SLAV Si (1t ) = SLAV (Reqru ) (14) provisioning of resources (i.e., over-utilization state) cause cloud
C c =1 services end-users to experience excessive delays, especially
r r during demand surges. This problem will lead to SLA violations,
FT u − DLu FT ru > DLru and subsequently, lead to lower profit and fewer users. Therefore,
SLAV (Requ ) = DLu − AT ru
r r
(15) the DRPM compared with the cost-aware (LRM) and the cost-
0 otherwise aware (ARMA) approaches, because of having more resources for
most intervals, it reduces the SLA violations in both workloads
where AT ru indicates the time at which the user request c arrives on
for all of the intervals. In general, compared with the other
the system, FT ru and DLru are the finishing time and deadline time
approaches, the proposed approach leads to a 65% reduction in the
for the user request Reqru , respectively.
number of SLA violations, as shown in Table 11. Although the cost-
Total Cost: This metric is defined as the total cost incurred by the aware (LRM) outperforms cost-aware (ARMA) in terms of the CPU
SaaS provider to serve all of the requests for cloud service Si at the utilization, it is worse than cost-aware (ARMA) in terms of the SLA
1tth interval, and is expressed by Eq. (16): violations.
Total Cost Si (1t ) = VM CostsSi (1t ) + Penalty CostsSi (1t ). (16) Fig. 9 shows the total cost of the four approaches for both
ClarkNet and NASA workloads at each interval. As mentioned
Profit: This metric is defined as the profit gained by the SaaS above, the total cost depends on the cost of SLA violation (i.e., the
provider to serve all requests for cloud services, and is expressed Penalty Cost, Fig. 10) and the cost of allocated VMs (i.e., the VM
by Eq. (18): Cost, Fig. 11). Therefore, the proposed approach outperforms other
Profit Si (1t ) = Budget Si (1t ) − Total Cost Si (1t ) (17) approaches by saving about 50% of the cost in the both ClarkNet
and NASA workloads, as shown in Table 12. Moreover, as shown in
U K
Fig. 8, when the workload is NASA, all approaches having less SLA
Budget Si (1t ) = BDku (18)
violations compared to the ClarkNet workload, because of there
u=1 k=1
are regular patterns and predictable inNASA workload at certain
where k is the number of requests that are finished and BDku is the intervals. Therefore, the total cost generated in the case of a NASA
budget of kth requests belonging to the uth user at the 1tth time workload is less than that for the ClarkNet workload ones. Also,
interval. the DRPM compared with the cost-aware (LRM) and the cost-
aware (ARMA) approaches, because of having less SLA violations
5.3. Experimental results for most intervals, it reduces the total cost in both workloads for
most intervals.
We evaluated the performance of the proposed approach under Fig. 12 shows the profit for the four approaches at each interval.
two real world workload traces for metrics that were discussed in A SaaS provider can maximize its profit by minimizing the total
the previous subsection. For the sake of accuracy, we performed cost, which depends on the allocated VMs and number of SLA
each experiment 20 times using different cloud services, and we violations. Because the proposed approach outperforms other
report the average of the results. approaches by minimizing the total cost, the proposed approach
therefore outperforms other approaches by saving about 50% of the
5.3.1. Evaluation based on Metrics cost in the case of both ClarkNet and NASA workloads. Also, the
Fig. 6 shows the CPU utilizations of the four approaches for cost-aware (LRM) compared with the other approaches, because
both ClarkNet and NASA workloads at each interval. The average of having more resources and the higher SLA violations, it reduces
CPU utilizations under both workloads for the proposed, cost- the profit of SaaS provider in both workloads for most intervals, as
aware (LRM), cost-aware (ARMA), and the DRPM are 95%, 83%, shown in Table 13.
78%, and 71%, respectively, as shown in Table 9. From the results, Based on the above results, we compared the CPU utilization,
we observe that the proposed, cost-aware (ARMA) and cost-aware allocated VMs, SLA violations, total cost and profit of the proposed
(LRM) approaches are able to utilize resources more fully, and and three approaches under both ClarkNet and NASA workloads.
that the DRPM wastes more resources in both workloads for most We observed that the proposed approach reduces the total cost by
intervals. This is because of the conservative and static allocation up to 50%, the number of SLA violations by up to 60%, the allocated
of VMs by the DRPM approach. Also, cost-aware (LRM) due to VMs by up to 16% and increases the CPU utilization by up to 12%,
accuracy prediction compared with cost-aware (ARMA), it has the profit by up to 11% compared with the other approaches, as
outperforms in term of the CPU utilization. In some time intervals, shown in Table 14 and Table 15.
the CPU utilizations are more than 100%; this is because the
SaaS Provider does not have enough resources to process all the 5.3.2. Overhead of proposed approach
incoming requests for the cloud services at those time intervals and Although elasticity is one of the main characteristics of cloud
will lead to under-provisioning state (i.e., over-utilization state) computing environment to resource provisioning dynamically ac-
which causes SLA violations. cording to demand, but on the other hand, this dynamic provision-
Fig. 7 shows the trends of the allocated VMs of the four ing process is that it takes time. The Start-up time represents the
approaches for both ClarkNet and NASA workloads at each interval. length of time between requesting scaling up/down actions and
202 M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210
Fig. 6. CPU utilization at different intervals for (a) ClarkNet workload, (b) NASA workload.
Table 9
The average CPU utilization for ClarkNet and NASA workloads.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
Table 10
Average allocated VMs for ClarkNet and NASA workloads.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
Fig. 7. Allocated VMs at different intervals for (a) ClarkNet workload, (b) NASA workload.
Table 11
Average percentage of SLA violations for ClarkNet and NASA workloads.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
Table 12
Average total cost for ClarkNet and NASA workloads.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
Fig. 8. Percentage of SLA violation at different intervals for (a) ClarkNet workload, (b) NASA workload.
Table 13
Average profit of SaaS provider for ClarkNet and NASA workloads.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
actual resource provisioning/de-provisioning time [49]. The start- period for start-up time because new machines generally take be-
up time is including allocating IP address, configuring the OS, and tween 5 and 10 min to be operational. In this paper, we consider
booting the OS, and it depends on data center location, VM type, the start-up time between 2 and 9 min based on VM type, as shown
and the number of VMs. The RightScale [50] recommends a 15 min- in Table 4. Since the goal of this paper is to minimize the total cost,
M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210 205
Fig. 9. Total cost at different intervals for (a) ClarkNet, (b) NASA workloads.
Table 14
Comparison of four approaches for ClarkNet workload.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
Fig. 10. Penalty cost at different intervals for (a) ClarkNet, (b) NASA workloads.
Table 15
Comparison of four approaches for NASA workload.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
the VM initiation cost as overhead of proposed algorithm is con- on the start-up time of the VM (initVM Time), and the VM initiation
sidered. For each VM, the VM initiation cost (initVM Cost) depends price of type 1 (initVM Pricel that is costs for initiating a VM of type
M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210 207
Fig. 11. VM cost at different intervals for (a) ClarkNet, (b) NASA workloads.
1 and it measured according to $/h) and it is expressed by Eq. (19): the dynamic allocation of VMs according to the workload changes
needs a greater number of scaling.
initVM Cost = initVM Pricel × initVM Time (19)
Fig. 13 shows the cumulative scaling cost (i.e., the VM initiation 6. Conclusion and future work
cost) of the four approaches for both ClarkNet and NASA workloads
at each interval. From the results, it was observed that the DRPM In this paper, we investigate the problem of dynamic resource
approach outperforms the others for both workloads at each provisioning for cloud services, and we present a hybrid resource
interval, as shown in Table 16; however, the proposed approach provisioning approach based on a combination of the autonomic
fails to satisfy the VM initiation cost. This outperformance is computing and the reinforcement learning. In order to implement
because of the static allocation of VMs by the DRPM approach, as the proposed approach, we present a resource provisioning
well as having a lower number of scaling (i.e., the numbers of VMs framework that supports the control MAPE loop. The proposed
allocated/deallocated); while the proposed approach because of approach should dynamically adapt to uncertainties, workload
208 M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210
Fig. 12. Profit of SaaS provider at different intervals for (a) ClarkNet (b) NASA workloads.
Table 16
Average Scaling Cost for ClarkNet and NASA at each interval.
Proposed Cost-aware (LRM) [37] Cost-aware (ARMA) [34] DRPM [26]
spikes and should deal with the undesirable states of over- cloudsim toolkit, and compared the results obtained with those
provisioning and under-provisioning. Our approach is regularly of the other approaches. The results show that the proposed
executed at specified time intervals and we vary the optimal approach increases the resource utilization and decreases the total
amount of resources corresponding to the workload of cloud cost, while avoiding SLA violations. In future work, we plan to
services. We evaluated the performance of the proposed approach explore: integration of the proposed approach with admission
under real world workload traces for different metrics using the control strategies, using the proposed approach for multi-tier
M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210 209
Fig. 13. The overhead of cumulative scaling costs at different intervals for (a) ClarkNet (b) NASA workloads.
cloud applications and extension of planning phase of the control [6] S. Rodríguez, D.I. Tapia, E. Sanz, C. Zato, F. de la Prieta, O. Gil, Cloud computing
MAPE loop by using of the fuzzy logic. integrated into service-oriented multi-agent architecture, in: Balanced
Automation Systems for Future Manufacturing Networks, Springer, Berlin,
Heidelberg, 2010, pp. 251-–259.
[7] B. Piprani, D. Sheppard, A. Barbir, Comparative analysis of SOA and cloud
References computing architectures using fact based modeling, in: On the Move
to Meaningful Internet Systems: OTM 2013 Workshops, Springer, Berlin,
[1] R. Buyya, C. Vecchiola, S.T. Selvi, Mastering Cloud Computing: Foundations and Heidelberg, 2013, pp. 524–533.
Applications Programming, Newnes, 2013. [8] J. Varia, Architecting applications for the Amazon cloud, in: Cloud Computing:
[2] K. Chandrasekaran, Essentials of Cloud Computing, CRC Press, 2014. Principles and Paradigms, Wiley Press, New York, USA, 2011, pp. 249–274.
[3] S.S. Manvi, G.K. Shyam, Resource management for Infrastructure as a Service [9] N.R. Herbst, S. Kounev, R. Reussner, Elasticity in cloud computing: What it is,
(IaaS) in cloud computing: A survey, J. Netw. Comput. Appl. 41 (2014) 424–440. and what it is not, in: Proceedings of the 10th International Conference on
[4] S. Mustafa, B. Nazir, A. Hayat, S.A. Madani, Resource management in cloud Autonomic Computing, ICAC 2013, San Jose, CA, 2013.
computing: Taxonomy, prospects, and challenges, Comput. Electr. Eng. (2015). [10] E.F. Coutinho, F.R. de Carvalho Sousa, P.A.L. Rego, D.G. Gomes, J.N. de
[5] M.P. Papazoglou, W.J. Van Den Heuvel, Service oriented architectures: Souza, Elasticity in cloud computing: a survey, Ann. Telecommun.—Ann.
approaches, technologies and research issues, VLDB J. 16 (3) (2007) 389–415. Télécommun. (2015) 1–21.
210 M. Ghobaei-Arani et al. / Future Generation Computer Systems 78 (2018) 191–210
[11] T. Lorido-Botran, J. Miguel-Alonso, J.A. Lozano, A review of auto- scaling [39] M. Verma, G.R. Gangadharan, N.C. Narendra, R. Vadlamani, V. Inamdar, L.
techniques for elastic applications in cloud environments, J. Grid Comput. 12 Ramachandran, R.N. Calheiros, R. Buyya, Dynamic resource demand prediction
(4) (2014) 559–592. and allocation in multi-tenant service clouds, Concurr. Comput.: Pract. Exper.
[12] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. (2016).
Patterson, A. Rabkin, I. Stoica, M. Zaharia, A view of cloud computing, Commun. [40] W. Li, F.C. Delicato, P.F. Pires, A.Y. Zomaya, Energy-efficient task allocation with
ACM 53 (4) (2010) 50–58. quality of service provisioning for concurrent applications in multi-functional
[13] S. Singh, I. Chana, Q-aware: Quality of service based cloud resource wireless sensor network systems, Concurr. Comput.: Pract. Exper. 26 (11)
provisioning, Comput. Electr. Eng. 47 (2015) 138–160. (2014) 1869–1888.
[14] B. Jacob, R. Lanyon-Hogg, D.K. Nadgir, A.F. Yassin, A Practical Guide to the IBM [41] K.L.A. Yau, P. Komisarczuk, P.D. Teal, Reinforcement learning for context
Autonomic Computing Toolkit, IBM Redbooks, 2004. awareness and intelligence in wireless networks: review, new features and
[15] M. Maurer, I. Breskovic, V.C. Emeakaroha, I. Brandic, Revealing the MAPE open issues, J. Netw. Comput. Appl. 35 (1) (2012) 253–267.
loop for the autonomic management of Cloud infrastructures, in: 2011 [42] N. Verstaevel, C. Régis, M.P. Gleizes, F. Robert, Principles and experimentations
IEEE Symposium on Computers and Communications, (ISCC), IEEE, 2011, of self-organizing embedded agents allowing learning from demonstration in
pp. 147–152. ambient robotics, Future Gener. Comput. Syst. (2016).
[16] S. Russel, P. Norvig, Artificial Intelligence: A Modern Approach, EUA: Prentice [43] J. Wu, X. Xu, P. Zhang, C. Liu, A novel multi-agent reinforcement learning
Hall, 2003. approach for job scheduling in Grid computing, Future Gener. Comput. Syst.
[17] L.P. Kaelbling, M.L. Littman, A.W. Moore, Reinforcement learning: A survey, J. 27 (5) (2011) 430–439.
Artificial Intelligence Res. (1996) 237–285. [44] J. Kober, J. Peters, Reinforcement learning in robotics: A survey, in: Reinforce-
[18] R. Rajavel, M. Thangarathanam, Adaptive Probabilistic Behavioural Learning ment Learning, Springer, Berlin, Heidelberg, 2012, pp. 579–610.
System for the effective behavioural decision in cloud trading negotiation [45] S. Cho, S. Fong, Y.W. Park, K. Cho, Simulation framework of ubiquitous network
market, Future Gener. Comput. Syst. 58 (2016) 29–41. environments for designing diverse network robots, Future Gener. Comput.
[19] M.C. Huebscher, J.A. McCann, A survey of autonomic computing— degrees, Syst. (2016).
models, and applications, ACM Comput. Surv. (CSUR) 40 (3) (2008) 7. [46] R.N. Calheiros, R. Ranjan, A. BelogRLzov, C.A. De Rose, R. Buyya, CloudSim: a
[20] T. Ritter, B. Mitschang, C. Mega, Dynamic provisioning of system topologies toolkit for modeling and simulation of cloud computing environments and
in the cloud, in: Enterprise Interoperability V, Springer, London, 2012, evaluation of resource provisioning algorithms, Softw. - Pract. Exp. 41 (1)
pp. 391–401. (2011) 23–50.
[21] F. Pop, M. Potop-Butucaru, ARMCO: Advanced topics in resource management [47] Clarknet-http-two weeks of http logs from the clarknet www server.
for ubiquitous cloud computing: An adaptive approach, Future Gener. Comput. http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html (Accessed 15.10.14).
Syst. 54 (2016) 79–81. [48] Nasa-http- two months of http logs from the kscnasa www server.
[22] M. Maurer, I. Brandic, R. Sakellariou, Adaptive resource configuration for http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html (Accessed 15.10.14).
Cloud infrastructure management, Future Gener. Comput. Syst. 29 (2) (2013) [49] M.A.N. Bikas, A. Alourani, M. Grechanik, How elasticity property plays an
472–487. important role in the cloud: A survey, Adv. Comput. (2016).
[23] M. Koehler, An adaptive framework for utility-based optimization of scientific [50] Rightscale. set up autoscaling using voting tags.
applications in the cloud, J. Cloud Comput. 3 (1) (2014) 1–12. https://support.rightscale.com/ (Accessed 01.10.14).
[24] V.C. Emeakaroha, I. Brandic, M. Maurer, S. Dustdar, Cloud resource provision-
ing and SLA enforcement via LoM2HiS framework, Concurr. Comput.: Pract.
Exper. 25 (10) (2013) 1462–1481.
[25] Foundation of Self-governing ICT Infrastructures (FoSII), Available Mostafa Ghobaei-Arani received the B.Sc. degree in Soft-
http://www.infosys.tuwien.ac.at/linksites/FOSII/index.htm. ware Engineering from Kashan University, Iran in 2009,
[26] M. Al-Ayyoub, Y. Jararweh, M. Daraghmeh, Q. Althebyan, Multi- agent based and M.Sc. degree from Azad University of Tehran, Iran in
dynamic resource provisioning and monitoring for cloud computing systems 2011, respectively. He was honored Ph.D. degree Software
infrastructure, Cluster Comput. 18 (2) (2015) 919–932. Engineering from Islamic Azad University, Science and Re-
[27] E. Casalicchio, L. Silvestri, Mechanisms for SLA provisioning in cloud-based search Branch, Tehran, Iran in 2016. His current research
service providers, Comput. Netw. 57 (3) (2013) 795–810. interests are Distributed Systems, Cloud Computing, Per-
[28] F. Bahrpeyma, H. Haghighi, A. Zakerolhosseini, An adaptive RL based vasive Computing, Big Data, SDN, and IoT.
approach for dynamic resource provisioning in Cloud virtualized data centers,
Computing (2015) 1–26.
[29] S. Islam, J. Keung, K. Lee, A. Liu, Empirical prediction models for adaptive
0020resource provisioning in the cloud, Future Gener. Comput. Syst. 28 (1)
(2012) 155–162. Sam Jabbehdari currently working as an associated
[30] E. Barrett, E. Howley, J. Duggan, Applying reinforcement learning towards professor at the department of Computer Engineering in
automating resource allocation and application scalability in the cloud, IAU (Islamic Azad University), North Tehran Branch, in
Concurr. Comput.: Pract. Exper. 25 (12) (2013) 1656–1674. Tehran, since 1993. He received his both B.Sc. and M.S.
[31] J. Liu, Y. Zhang, Y. Zhou, D. Zhang, H. Liu, Aggressive resource provisioning for degrees in Electrical Engineering Telecommunication from
ensuring QoS in virtualized environments, IEEE Trans. Cloud Comput. 3 (2) Khajeh Nasir Toosi University of Technology, and IAU,
(2015) 119–131. South Tehran branch in Tehran, Iran, respectively. He was
[32] H.R. Qavami, S. Jamali, M.K. Akbari, B. Javadi, Dynamic resource provisioning honored Ph.D. degree in Computer Engineering from IAU,
in cloud computing: A Heuristic Markovian approach, in: Cloud Computing, Science and Research Branch, Tehran, Iran in 2005. His
Springer International Publishing, 2014, pp. 102–111. current research interests are Scheduling, QoS, MANETs,
[33] S. Muppala, G. Chen, X. Zhou, Multi-tier service differentiation by coordinated Wireless Sensor Networks and Cloud Computing.
learning-based resource provisioning and admission control, J. Parallel Distrib.
Comput. 74 (5) (2014) 2351–2364.
[34] N. Roy, A. Dubey, A. Gokhale, Efficient auto-scaling in the cloud using
predictive models for workload forecasting, in: 2011 IEEE International Mohammad Ali Pourmina is associated professor in elec-
Conference on Cloud Computing, (CLOUD), IEEE, 2011, pp. 500–507. trical engineering (Telecommunication) in Science and re-
[35] S. Misra, P.V. Krishna, K. Kalaiselvan, V. Saritha, M.S. Obaidat, Learning search Branch, Islamic Azad University, Tehran, Iran and
automata-based QoS framework for cloud IaaS, IEEE Trans. Netw. Serv. top professor of this university in 2016. He received his
Manage. 11 (1) (2014) 15–24. Ph.D. degree in electrical engineering (Telecommunica-
[36] C.Z. Xu, J. Rao, X. Bu, URL: A unified reinforcement learning approach for tion) at Science and research Branch, Islamic Azad Uni-
autonomic cloud management, J. Parallel Distrib. Comput. 72 (2) (2012) versity, Tehran, Iran, in 1996 and joined to this university
95–105. 1996. He has been a member of Iran Telecommunication
[37] J. Yang, C. Liu, Y. Shang, B. Cheng, Z. Mao, C. Liu, L. Niu, J. Chen, A cost-aware research center since 1992. He has performed research in
auto-scaling approach using the workload prediction in service clouds, Inf. the areas of packet radio networks and digital signal pro-
Syst. Front. 16 (1) (2014) 7–18. cessing systems since 1991. His current research interests
[38] J.O. Kephart, D.M. Chess, The vision of autonomic computing, Computer 36 (1) include spread-spectrum systems, cellular mobile communications, indoor wireless
(2003) 41–50. communications, DSP processors and wireless multimedia networks.